Building an EFFICIENT Sudoku Solver

Building an EFFICIENT Sudoku Solver - java

Yes, I know this is nothing new and there are many questions already out there (it even has its own tag), but I'd like to create a Sudoku Solver in Java solely for the purpose of training myself to write code that is more efficient.
Probably the easiest way to do this in a program is have a ton of for loops parse through each column and row, collect the possible values of each cell, then weed out the cells with only one possibility (whether they contain only 1 number, or they're the only cell in their row/column that contains this number) until you have a solved puzzle. Of course, a sheer thought of the action should raise a red flag in every programmer's mind.
What I'm looking for is the methodology to go about solving this sucker in the most efficient way possible (please try not to include too much code - I want to figure that part out, myself).
I want to avoid mathematical algorithms if at all possible - those would be too easy and 100% not my work.
If someone could provide a step-by-step, efficient thought process for solving a Sudoku puzzle (whether by a human or computer), I would be most happy :). I'm looking for something that's vague (so it's a challenge), but informative enough (so I'm not totally lost) to get me started.
Many thanks,
Justian Meyer
EDIT:
Looking at my code, I got to thinking: what would be some of the possibilities for storing these solving states (i.e. the Sudoku grid). 2D Arrays and 3D Arrays come to mind. Which might be best? 2D might be easier to manage from the surface, but 3D Arrays would provide the "box"/"cage" number as well.
EDIT:
Nevermind. I'm gonna go with a 3D array.

It depends on how you define efficient.
You can use a brute force method, which searches through each column and row, collects the possible values of each cell, then weeds out the cells with only one possibility.
If you have cells remaining with more than one possibility, save the puzzle state, pick the cell with the fewest possibilities, pick one of the possibilities, and attempt to solve the puzzle. If the possibility you picked leads to a puzzle contradiction, restore the saved puzzle state, go back to the cell and choose a different possibility. If none of the possibilities in the cell you picked solves the puzzle, pick the next cell with the fewest possibilities. Cycle through the remaining possibilities and cells until you've solved the puzzle.
Attempt to solve the puzzle means searching through each column and row, collecting the possible values of each cell, then weeding out the cells with only one possibility. When all of the cells are weeded out, you've solved the puzzle.
You can use a logical / mathematical method, where your code tries different strategies until the puzzle is solved. Search Google with "sudoku strategies" to see the different strategies. Using logical / mathematical methods, your code can "explain" how the puzzle was solved.

When I made mine, I thought I could solve every board using a set of rules without doing any backtracking. This proved impossible as even puzzles targeting human players potentially require making a few hypothesis.
So I starting with implementing the basic "rules" for solving a puzzle, trying to find the next rule to implement that would allow the resolution of where it stopped last time. In the end, I was forced to add a brute forcing recursive algorithm, but most puzzles are actually solved without using that.
I wrote a blog post about my sudoku solver. Just read through the "The algorithm" section and you'll get a pretty good idea how I went about it.
http://www.byteauthor.com/2010/08/sudoku-solver/

Should anyone need a reference Android implementation, I wrote a solution that uses the algorithm from the post above.
Full open-source code here: https://github.com/bizz84/SudokuSolver
Additionally, this solution loads Sudoku Puzzles in JSON format from a web server and posts back the results.

You should think about reducing the Sudoku Problem to a SATisfiability problem.
This method will avoid you to think too mathematically but more logically about the AI.
The goal step by step is basically :
* Find all the constraints that a Sudoku has. (line, column, box).
* Write these constraints as boolean constraints.
* Put all these constraints in a Boolean Satisfiability Problem.
* Run a SAT solver (or write your own ;) ) on this problem.
* Transform the SAT solution into the solution of the initial Sudoku.
It has been done by Ivor Spence by using SAT4J and you can find the Java Applet of his work here : http://www.cs.qub.ac.uk/~I.Spence/SuDoku/SuDoku.html.
You can also download directly the Java code from SAT4J website, to see how it look like : http://sat4j.org/products.php#sudoku.
And finally, the big advantage of this method is : You can solve N*N Sudokus, and not only the typical 9*9, which is I think, much more challenging for an AI :).

Related

How to find path trough blocks in a grid

I've got a grid 9x9 squares. Those squares are written in array of integer.
so every integer represents 1 block. Integer provides enough bits for X,Y positions, if this block is accessible and other data. I have a problem: What would be the most effective way to get boolean value, if we can get from the center of this grid to some random point. So for example i provide X and Y positions for block where i want to go, and a method will return boolean value, if there is some way through ACCESSIBLE blocks. I made a simple picture for this. Thanks for answers.

What you're describing here is a well-known problem in computer science called pathfinding and not trivial to solve efficiently.
However, there are several algorithms to solve this, like A* and Dijkstra, which are probably the way to go for large, complicated maps.
If all of your problems are as small and simple as your posted example, you could also try to work with a simpler solution, like a brute force graph search, (as suggested by Codor).

The problem can be solved by depth-first search, where the specific implementation greatly depends on the representation of the grid, which is to be interpreted as a graph.

Formulate sudoku puzzle given the solutions

I just recently did a sudoku solver in java using backtracking.
Is it possible, given the solutions, to formulate a problem or puzzle?
EDIT
to formulate the original puzzle, is there some way to achieve this?
and an additional question,
given the puzzle and the solutions.
If I am able to solve the puzzle using the solutions (result is puzzles)
and at the same time able to solve the solutions using the puzzle (result is solutions)
Which has the greater number?
the puzzles? or the solutions?

It is possible to formulate one of the multiple possible original states.
Start with the final solution (all numbers are present)
Remove one number (chosen randomly or not)
Check if this number can be found back given the current state of the board (you already have a solver, it should be easy)
If this number can be calculated, everything is OK. Go back to 2.
If this number cannot be found back, put it back where it was. Go back to 2.
If no more numbers can be removed, you have reached one of the original states of the puzzle.
If you chose the numbers you remove randomly (step 2), you can execute this several times, and get different starting points that lead to the same final puzzle.

Creating a puzzle from a solution is very simple. You just apply the solving steps backwards.
For example, if a line contains 8 digits, you can fill in the 9th. If you do that backwards, if a line contains 9 digits, you can remove one. This will yield a very boring puzzle, but still a valid one (a valid puzzle is a puzzle with only one solution).
The more complicated the steps you make are, the more difficult the puzzle will be. In fact, brute forcing the puzzle is the most difficult strategy, executing it backwards probably boils down to randomly removing a digit, and brute force check if there is still only 1 unique solution. Notice you do not have to solve the entire puzzle: It's enough to prove there is only 1 way to add the removed digit back into the puzzle.
As for the second part of your question: This feels a bit like a mathematical question, but let me answer:
A good puzzle only has 1 solution. Since there are multiple puzzles that yield the same solution (like, there are 81 ways to fill in 80 of the 81 squares, all yielding the same solution from a different puzzle), you could say there are many more puzzles than solutions.
If you also allow puzzles with multiple solutions, it changes. For every puzzle, there must be one or more solutions, but all those solutions also belong to that puzzle, so the number of solutions to puzzles is equal to the number of puzzles to solutions. Invalid puzzles do not change this: Since they belong to 0 solutions, you need no additional puzzles belonging to those solutions.
Ps. It's also trivial to create puzzles if they do not need to be uniquely solvable: Just randomly remove some digits and you're done.

If I am able to solve the puzzle using the solutions (result is puzzles)
and at the same time able to solve the solutions using the puzzle (result is solutions)
Which has the greater number?
the puzzles? or the solutions?
There is no well-defined answer.
Each solution has exactly 281 corresponding puzzles, including the trivial one. Not all of these have a unique solution, but many do. Therefore, if the chosen set of solutions contains only one element, then the maximal set of puzzles that jointly correspond to the solution set is vastly larger.
On the other hand, the completely blank puzzle affords 6670903752021072936960 solutions. There are many pairs of those that share only the blank grid as a common puzzle. Thus, if the chosen set of puzzles includes only the blank grid then the set of corresponding solutions is vastly larger.

Detecting if multiple bent lines intersect

I'm in the process of making a swimlane diagram but can't come up with a good algorithm to automatically lay out the lines that connect the nodes in the diagram. What I essentially want is this.
However, I don't have any protection against lines overlapping or intersecting right now and it sometimes gets very messy.
Does anyone know a way to detect if a line will intersect ANY of the lines already drawn?
One idea that I've come up with is to store the paths in an array or table and check the entire table every time a new line is slated to be drawn but that does not seem efficient.
I'm doing this in javascript and java through the use of GWT so maybe there is an easy way to solve this using one of the tools provided by these languages?

If your real issue is to minimize the line intersections, there are several algorithms that try to accomplish this in diagrams. Check this link for example, there are also more algorithms that are used in auto routing for electric design automation that are also used in this kind of diagrams, like Lee algorithm, or A* Algorithm.
I don't know if the tools that you are using have enough flexibility to implement this kind of algorithms, usually you need to implement your own heuristic according to the specific type of diagram, but I hope that this links are enough to give you good ideas.
Minimizing the line intersections in a graph is a difficult NP-Hard problem, check this link (about the crossing number) for more information.
Good luck.

The King's Maze

I'm practicing up for a programming competition, and i'm going over some difficult problems that I wasn't able to answer in the past. One of them was the King's Maze. Essentially you are given an NxN array of numbers -50<x<50 that represent "tokens". You have to start at position 1,1(i assume that's 0,0 in array indexes) and finish at N,N. You have to pick up tokens on cells you visit, and you can't step on a cell with no token (represented by a 0). If you get surrounded by 0's you lose. If there is no solution to a maze you output "No solution". Else, you output the highest possible number you can get from adding up the tokens you pick up.
I have no idea how to solve this problem. I figured you could write a maze algorithm to solve it, but that takes time, and in programming contests you are only given two hours to solve multiple problems. I'm guessing there's some sort of pattern i'm missing. Anyone know how I should approach this?
Also, it might help to mention that this problem is meant for high school students.

This type of problem is typically solved using dynamic programming or memoization.
Basically you formulate a recursive solution, and solve it bottom up while remembering and reusing previously computed results.

The simple approach (i.e. simplest to code) is try all the possible paths - try each first step; for each first step try each second step; for each first step/second step combination try each third step; and so on. However depending on how big the maze is this may take too long to run (or it may not).
Your next step is to think about how you can do this faster. The first step is usually to eliminate moves that you know can't lead to a finish, or can't lead to a finish with higher points than the one you already have. Since this is practice for a competition we'll leave you to do this work yourself.

Think "graph" algorithms: The Algorithm Design Manual

Boosting my GA with Neural Networks and/or Reinforcement Learning

As I have mentioned in previous questions I am writing a maze solving application to help me learn about more theoretical CS subjects, after some trouble I've got a Genetic Algorithm working that can evolve a set of rules (handled by boolean values) in order to find a good solution through a maze.
That being said, the GA alone is okay, but I'd like to beef it up with a Neural Network, even though I have no real working knowledge of Neural Networks (no formal theoretical CS education). After doing a bit of reading on the subject I found that a Neural Network could be used to train a genome in order to improve results. Let's say I have a genome (group of genes), such as
1 0 0 1 0 1 0 1 0 1 1 1 0 0...
How could I use a Neural Network (I'm assuming MLP?) to train and improve my genome?
In addition to this as I know nothing about Neural Networks I've been looking into implementing some form of Reinforcement Learning, using my maze matrix (2 dimensional array), although I'm a bit stuck on what the following algorithm wants from me:
(from http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning-Algorithm.htm)
1. Set parameter , and environment reward matrix R
2. Initialize matrix Q as zero matrix
3. For each episode:
* Select random initial state
* Do while not reach goal state
o Select one among all possible actions for the current state
o Using this possible action, consider to go to the next state
o Get maximum Q value of this next state based on all possible actions
o Compute
o Set the next state as the current state
End Do
End For
The big problem for me is implementing a reward matrix R and what a Q matrix exactly is, and getting the Q value. I use a multi-dimensional array for my maze and enum states for every move. How would this be used in a Q-Learning algorithm?
If someone could help out by explaining what I would need to do to implement the following, preferably in Java although C# would be nice too, possibly with some source code examples it'd be appreciated.

As noted in some comments, your question indeed involves a large set of background knowledge and topics that hardly can be eloquently covered on stackoverflow. However, what we can try here is suggest approaches to get around your problem.
First of all: what does your GA do? I see a set of binary values; what are they? I see them as either:
bad: a sequence of 'turn right' and 'turn left' instructions. Why is this bad? Because you're basically doing a random, brute-force attempt at solving your problem. You're not evolving a genotype: you're refining random guesses.
better: every gene (location in the genome) represents a feature that will be expressed in the phenotype. There should not be a 1-to-1 mapping between genome and phenotype!
Let me give you an example: in our brain there are 10^13ish neurons. But we have only around 10^9 genes (yes, it's not an exact value, bear with me for a second). What does this tell us? That our genotype does not encode every neuron. Our genome encodes the proteins that will then go and make the components of our body.
Hence, evolution works on the genotype directly by selecting features of the phenotype. If I were to have 6 fingers on each hand and if that would made me a better programmer, making me have more kids because I'm more successful in life, well, my genotype would then be selected by evolution because it contains the capability to give me a more fit body (yes, there is a pun there, given the average geekiness-to-reproducibily ratio of most people around here).
Now, think about your GA: what is that you are trying to accomplish? Are you sure that evolving rules would help? In other words -- how would you perform in a maze? What is the most successful thing that can help you: having a different body, or having a memory of the right path to get out? Perhaps you might want to reconsider your genotype and have it encode memorization abilities. Maybe encode in the genotype how much data can be stored, and how fast can your agents access it -- then measure fitness in terms of how fast they get out of the maze.
Another (weaker) approach could be to encode the rules that your agent uses to decide where to go. The take-home message is, encode features that, once expressed, can be selected by fitness.
Now, to the neural network issue. One thing to remember is that NNs are filters. They receive an input. perform operations on it, and return an output. What is this output? Maybe you just need to discriminate a true/false condition; for example, once you feed a maze map to a NN, it can tell you if you can get out from the maze or not. How would you do such a thing? You will need to encode the data properly.
This is the key point about NNs: your input data must be encoded properly. Usually people normalize it, maybe scale it, perhaps you can apply a sigma function to it to avoid values that are too large or too small; those are details that deal with error measures and performance. What you need to understand now is what a NN is, and what you cannot use it for.
To your problem now. You mentioned you want to use NNs as well: what about,
using a neural network to guide the agent, and
using a genetic algorithm to evolve the neural network parameters?
Rephrased like so:
let's suppose you have a robot: your NN is controlling the left and right wheel, and as input it receives the distance of the next wall and how much it has traveled so far (it's just an example)
you start by generating a random genotype
make the genotype into a phenotype: the first gene is the network sensitivity; the second gene encodes the learning ratio; the third gene.. so on and so forth
now that you have a neural network, run the simulation
see how it performs
generate a second random genotype, evolve second NN
see how this second individual performs
get the best individual, then either mutate its genotype or recombinate it with the loser
repeat
there is an excellent reading on the matter here: Inman Harvey Microbial GA.
I hope I did you some insight on such issues. NNs and GA are no silver bullet to solve all problems. In some they can do very much, in others they are just the wrong tool. It's (still!) up to us to get the best one, and to do so we must understand them well.
Have fun in it! It's great to know such things, makes everyday life a bit more entertaining :)

There is probably no 'maze gene' to find,
genetic algorithms are trying to setup a vector of properties and a 'filtering system' to decide by some kind of 'surival of the fittest' algorithm to find out which set of properties would do the best job.
The easiest way to find a way out of a maze is to move always left (or right) along a wall.
The Q-Algorithm seems to have a problem with local maxima this was workaround as I remember by kicking (adding random values to the matrix) if the results didn't improve.
EDIT: As mentioned above a backtracking algorithm suits this task better than GA or NN.
How to combine both algorithm is described here NeuroGen descibes how GA is used for training a NN.

Try using the free open source NerounDotNet C# library for your Neural networks instead of implementing it.
For Reinforcement Learning library, I am currently looking for one, especially for Dot NET framework..

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.