Suduko simulating Annealing

Suduko simulating Annealing - java

Is Simulating Annealing a good algorithm for generating and solving Sudoku problems? Why or why not?
I had implemented a Sudoku game with backtracking but now I want to do it with a local search kind of algorithm in Java. However, I have no idea of where to start. Is there any available library I could use?

Sudoku is a problem with breadcrumbs in it that lead to the optimal solution (even in the hardest cases). What I mean by that is that the constraints confine it that much that the combinatorial explosion of the search space isn't too big (relatively speaking of course): the way to proceed is clear. Other examples of such problems are the Einstein/Zebra puzzle, SendMoreMoney puzzle and n-queens. Those are perfect backtracking cases. While Simulated Annealing does kinda work on those, it's not the correct tool for the job (backtracking is). On the other hand, Simulated Annealing (and other metaheuristics) excel at realistic problems such as course timetabling, employee rostering, vrp, ...
An available library you could use is OptaPlanner (Java, open source): Someone already wrote a Sudoku solver for it 2 years ago, for an older version of OptaPlanner (then it was still called Drools Planner). By default it looks like he configured Tabu Search, but it's a 2 line change to switch that to Simulated Annealing.

To answer your 1st question on simulating annealing algorithm,
Pros + Cons of Simulated Annealing
Good: Quickly finds a minimum
Bad: May not find global minimum (best solution)
Increasing temperature makes it slower, but less likely we will get stuck in local minimum
Source: cs.mercer.edu
As for your 2nd question on solving algorithms in Java, see here for full source code with walk through. Hope this helps!

Related

Existing Algorithm for Scheduling Problems?

Let's say I want to build a function that would properly schedule three bus drivers to drive in a week with the following constraints:
Each driver must not drive more than five times per week
There must be two drivers driving everyday
They will rest one day each week (will not clash with other drivers' rest day)
What kind of algorithm would be used to solve a problem like this?
I looked through several sites and I found these:
1) Backtracking algorithm (brute force)
2) Genetic algorithm
3) Constraint programming
Frankly, these are all "culture shock" for me as I have never learnt any kind of linear programming in the past. There are two things I want to know:
1) Which algorithm will best suit the case scenario above?
2) What would be the simplest algorithm to solve this problem?
3) Please suggest any other algorithms I can look into to solve the above problem.

1) I agree brute force is bad.
2) Your Problem is an Integer Problem. They can be solved with Linear Programming though.
3) You can distinquish 2 different approaches: heuristics and exact approaches.
Heuristics provide good solutions in reasonable computation time. They are used when there are strict requirements on the computation time or if the problem is too hard to calculate an optimal solution. Genetic Algorithms is a heuristic.
As your Problem is comparably simple, you would probably go with an exact approach.
4) The standard way to solve this exacly, is to embed a Linear Program in a Branch & Bound search tree. There is lots of literature on it. The procedure can be outlined as follows:
Solve the Linear Program with the Simplex-Algorithm
Find a fractional variable for branching. I.e. x=1.5
Create two new nodes and add the constraints x<=1 and x>=2 respectively
Go into one node (selected by some strategy)
Go to point 1
Additionally, at every node in the tree, after point 1, the algorithms checks, if a node can be pruned. That means to stop searching 'deeper' from this node on, because
a) the problem has become infeasible,
b) a better solution already exists,
c) an integer solution is found. This objective value of this solution is used to determine point b.
The procedure finishes when all nodes are pruned.
Luckily, as Nicolas stated, there are free implementations that do just this. All you have to do is to create your model. Code its objective and constraints in some tool and let it solve.

First of all this is a discrete optimization problem, so linear programming is probably not a good idea (since it is meant for continuous optimization). You can still solve this using linear programming (it will become an integer or mixed-integer program) but that is exponentially heard (if your input size is small then it is ok).
Now back to the comparison:
Brute force : worst.
Genetic: Can not guarantee optimality. The algorithm may not be able to solve the problem.
Constraint programming: definitely the best in this case (and in many discrete optimization problems). There is a super efficient implementation of it in IBM ILOG CPLEX solver (but is is not free, it is free for academia or for testing though).

ML technique for classification with probability estimates

I want to implement a OCR system. I need my program to not make any mistakes on the letters it does choose to recognize. It doesn't matter if it cannot recognize a lot of them (i.e high precision even with a low recall is Okay).
Can someone help me choose a suitable ML algorithm for this. I've been looking around and find some confusing things. For example, I found contradicting statements about SVM. In the scikits learn docs, it was mentioned that we cannot get probability estimates for SVM. Whereas, I found another post that says it is possible to do this in WEKA.
Anyway, I am looking for a machine learning algorithm that best suites this purpose. It would be great if you could suggest a library for the algorithm as well. I prefer Python based solutions, but I am OK to work with Java as well.

It is possible to get probability estimates from SVMs in scikit-learn by simply setting probability=True when constructing the SVC object. The docs only warn that the probability estimates might not be very good.
The quintessential probabilistic classifier is logistic regression, so you might give that a try. Note that LR is a linear model though, unlike SVMs which can learn complicated non-linear decision boundaries by using kernels.

I've seen people using neural networks with good results, but that was already a few years ago. I asked an expert colleague and he said that nowadays people use things like nearest-neighbor classifiers.
I don't know scikit or WEKA, but any half-decent classification package should have at least k-nearest neighbors implemented. Or you can implement it yourself, it's ridiculously easy. Give that one a try: it will probably have lower precision than you want, however you can make a slight modification where instead of taking a simple majority vote (i.e. the most frequent class among the neighbors wins) you require larger consensus among the neighbors to assign a class (for example, at least 50% of neighbors must be of the same class). The larger the consensus you require, the larger your precision will be, at the expense of recall.

Learn backtracking algorithm

I want to learn the backtracking algorithm. Can someone please teach me some of it? I tried learning from some websites, but it didn't work. So can someone please teach me. Thank you!

Though language agnostic, this tutorial is nice and presents several examples that might provide the necessary intuition.
That said, the idea behind backtracking is not difficult to grasp at all. A backtracking algorithm essentially explores all the solution space just like when performing a brute force, except (and this makes it more efficient) it backtracks from a partial solution as soon as it realizes that it is not feasible.
An example
Consider this partial solution for the well known eight queens problem.
The queens in the first four columns have already been positioned, but the the last one is in an invalid square. A brute force solution would continue placing queens for the rest of the columns, oblivious of the fact that regardless of how this partial solution is augmented the result will be invalid.
The backtracking algorithm will be "smarter": it will realize the fourth queen is incorrectly placed and "go back" to considering other squares for it.

Fundamentals Of Computer Algorithms contains a nice chapter on backtracking. But you have not specified how much familiar you are with formal algorithm text and data structures. You may have some problems in reading this book if you are not familiar with basic algorithmic things like complexity analysis or don't know what is a tree. I mean in that case you will need to read the book from the beginning, direct jumping to backtracking chapter will not be much helpful.

Random maps/graphs and OSM

just wondering if you have any suggestions here. I need a lot of sample maps/graphs to test my shortest path search solution (I was told I should have >100 of them). My code is supposed to work in a simulator, which uses OpenStreetMap maps of urban setting, limiting the total number of junctions to a few thousand. the problem is, there are only two or three maps provided with the simulator. The way I see it, I have a few choices here:
Write my own random graph generator. Possibly lots of work (do you think? --I've never done it before) and reinventing the wheel.
Use off-the-shelf solution. I'm not aware of any that would generate me map-like graphs (well, at least I didn't find it in JUNG :-) )
In some automated way grab them from OSM. I don't really intend to myself go and pick out a 100+ urban maps that would satisfy <15000 nodes requirement. I don't think that would be easy to automate either, though.
I would assume that 3 would be tough to do. Any advice on some off-the-shelf solution? or comments about writing my own? I'm not an experienced programmer by any measure, but given a few days.

The first thought:
You have a known problem and you need to test its solution. Generate lots of test data, find correct solutions with verified algorithm, then run your algorithm against generated data set and compare results. (or just download verified dijkstra algorithm implementation, I believe that implementing this algorithm is your task)
The second thought:
Random-generated data set is not the best way to test algorithms. You need to think about cases when your algorithm can fail and create correspondent tests. For example, graph with 1 node, graph with cycles, linear graph i.e. N1---N2---N3-...-Nn, complete graph with maximum nodes number. I think if you create these 4 tests and 2-3 small random tests it'll be enough to be sure that your algorithm is implemented correctly.

What do i need to know about dynamic programming?

Started up solving UVa problems again as a way to pass time (going to the army in 6 weeks). I love writing Java, but end up using C / C++. It's not because IO is faster, no need to box data, more memory or use of unsigned, because its algorithm efficiency that counts.
In short i am slowly constructing how to/article/code base for different categories of efficient algorithms and dp is next.
Quoting Mark Twain: It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.
I aid assistance in building priority list what are must have efficient algorithms.

This MIT lecture is a good introduction to dynamic programming if you're already familiar with algorithms.

The wikipedia article on Dynamic Programming has a section entitled "Algorithms that use dynamic programming" with many examples.
Here is another good list of practice problems in dynamic programming.
Since you referenced the UVa problem list, you should definitely take a look at Problem 103 - Stacking Boxes. The problem lends itself well to a solution using a Longest Increasing Subsequence algorithm.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.