I am implementing an android application using different geographic coordinates and I need to solve a problem similar to the traveling salesman.
I found an implementation of the algorithm at http://www.theprojectspot.com/tutorial-post/simulated-annealing-algorithm-for-beginners/6.
I adjusted the code to what I need, and it produces theoretically optimal results. I noticed, however, that each execution produces a different type of result.
I went back to the original code and found that even in the original, there is disagreement as to the results.
Do not understand. Shouldn't the result be unique? After all, we are looking for the smallest path ... perhaps some small variation, but each execution differs by several units from the previous execution.
How could I adjust the algorithm to produce the same result in all runs? Has anyone worked with this?
That's the price you pay for an algorithm like this one: the results obtained might very well be different every time. The algorithm does not "find the shortest path," which is a computationally intractable problem ("travelling salesman"). Instead, it seeks to quickly find a solution that is "short enough." Whether or not it actually does so depends very much on the data ... and, to a non-trivial degree, on random chance.
And, since the algorithm is comparatively fast, sometimes you do run it several times in a row in order to gauge the variability of the solutions obtained. If (say) three runs each produce results that are "close enough" to one another, there's a good chance that the result is reliable. But if the standard deviation is very large, the algorithm might not be giving you a good answer. (Bear in mind that sometimes the solution will be wrong.)
So to speak: "you get what you pay for, but you don't pay much for it, and of course that is the point."
Related
Since we expect feasible solution from Genetic Algorithm, So will genetic algorithm provides different output every time with same input set?
It depends.
GAs will take different routes through the solution space; and
GAs are not guaranteed to converge
GAs are typically used on complex problems with large solutions spaces with convoluted payoffs and local minima. In such problems you would not expect identical outputs at the end of a GA run.
On the other hand, GAs may be applied to large problems that have a single correct answer. In such situations, the population may converge.
A good GA implementation will support reproducible results. This applies to all metaheuristics (not just GA's). Reproducible means that the same run will yield the same order of the same set of new best solution events are found. Depending on the actual CPU time given to the process, the number of iterations might differ and therefore they might not end with the same best solution.
Internally, reproducible results implies that:
everything uses 1 seeded Random instance.
even parallel implementations yield reproducible results (=> no work stealing)
...
During development, reproducibility is worth its weight in gold to find, diagnose, debug and fix bugs.
In production, a few companies turn it off (to take advantage of performance gains such as work stealing), but most enterprises still like to leave it on.
I have a few questions about my genetic algorithm and GAs overall.
I have created a GA that when given points to a curve it tries to figure out what function produced this curve.
An example is the following
Points
{{-2, 4},{-1, 1},{0, 0},{1, 1},{2, 4}}
Function
x^2
Sometimes I will give it points that will never produce a function, or will sometimes produce a function. It can even depend on how deep the initial trees are.
Some questions:
Why does the tree depth matter in trying to evaluate the points and
produce a satisfactory function?
Why do I sometimes get a premature convergence and the GA never
breaks out if the cycle?
What can I do to prevent a premature convergence?
What about annealing? How can I use it?
Can you take a quick look at my code and tell me if anything is obviously wrong with it? (This is test code, I need to do some code clean up.)
https://github.com/kevkid/GeneticAlgorithmTest
Source: http://www.gp-field-guide.org.uk/
EDIT:
Looks like Thomas's suggestions worked well I get very fast results, and less premature convergence. I feel like increasing the Gene pool gives better results, but i am not exactly sure if it is actually getting better over every generation or if the fact that it is random allows it to find a correct solution.
EDIT 2:
Following Thomas's suggestions I was able to get it work properly, seems like I had an issue with getting survivors, and expanding my gene pool. Also I recently added constants to my GA test if anyone else wants to look at it.
In order to avoid premature convergence you can also use multiple-subpopulations. Each sub-population will evolve independently. At the end of each generation you can exchange some individuals between subpopulations.
I did an implementation with multiple-subpopulations for a Genetic Programming variant: http://www.mepx.org/source_code.html
I don't have the time to dig into your code but I'll try to answer from what I remember on GAs:
Sometimes I will give it points that will never produce a function, or will sometimes produce a function. It can even depend on how deep the initial trees are.
I'm not sure what's the question here but if you need a result you could try and select the function that provides the least distance to the given points (could be sum, mean, number of points etc. depending on your needs).
Why does the tree depth matter in trying to evaluate the points and produce a satisfactory function?
I'm not sure what tree depth you mean but it could affect two things:
accuracy: i.e. the higher the depth the more accurate the solution might be or the more possibilities for mutations are given
performance: depending on what tree you mean a higher depth might increase performance (allowing for more educated guesses on the function) or decrease it (requiring more solutions to be generated and compared).
Why do I sometimes get a premature convergence and the GA never breaks out if the cycle?
That might be due to too little mutations. If you have a set of solutions that all converge around a local optimimum only slight mutations might not move the resulting solutions far enough from that local optimum in order to break out.
What can I do to prevent a premature convergence?
You could allow for bigger mutations, e.g. when solutions start to converge. Alternatively you could throw entirely new solutions into the mix (think of is as "immigration").
What about annealing? How can I use it?
Annealing could be used to gradually improve your solutions once they start to converge on a certain point/optimum, i.e. you'd improve the solutions in a more controlled way than "random" mutations.
You can also use it to break out of a local optimum depending on how those are distributed. As an example, you could use your GA until solutions start to converge, then use annealing and/or larger mutations and/or completely new solutions (you could generate several sets of solutions with different approaches and compare them at the end), create your new population and if the convergence is broken, start a new iteration with the GA. If the solutions still converge at the same optimum then you could stop since no bigger improvement is to be expected.
Besides all that, heuristic algorithms may still hit a local optimum but that's the tradeoff they provide: performance vs. accuracy.
I am creating a website (my Academic Project) in which user can upload his program files (.cs,.PHP,.java), then the web compiles the program and able to say Time and space complexity automatically. Is this possible? How can we calculate complexity of program. Is there any code in Java for finding complexity of a program? Or can we found these from compiler itself?
Determining the time- and space complexity of a program is a hard problem. As the feedback have pointed out, it is not even possible in general to point out if a program will terminate. (This is known as the Halting Problem)
To make a start with your project I would advice to look into Cyclomatic Complexity which is calculated for example by the GMetrics project.
This will get you started in your exploration of the subject matter.
Assuming you know what kind of input to give the program, you can estimate the complexity through successive iterations of actually running the program. However, general case static analysis is impossible, due to the Halting Problem.
If you can run the application multiple times on input sets of various sizes, you can develop an approximation.
In the classic case of sorting numbers, you can have the application sort a list of 2 numbers, then 4, 8, 16, 32, etc... and essentially graph the memory and time requirements for each run. Basic curve fitting will show you the growth in complexity.
Note that this is not rigorously accurate, as certain algorithms may have points at which their performance changes radically. Such a system may also get fooled by the differences between growths that "look" similar, but have vastly different properties, such as asymptotic and logarithmic curves, or exponential and polynomial curves.
I've run the implementation at available at: http://www.apl.jhu.edu/~hall/java/NQueens.java , which solve the N-queen problem with O(n) time complexity. It's amazingly fast and helps find out one solution without searching. However, I'm not really clear about the logic behind.
Why do they split the problem into 3: odd, even (but not in form 6k), even (but not in form 6k+2).
Can any one check the code and explain in more detail for me (logic only)?
They split the problem because neither construction covers all cases. Probably if you try to prove that they work in the bad cases, you'll find that a certain number is not a unit modulo n. This is a pretty typical state of affairs when constructing constrained combinatorial objects. For example, there exist Steiner triple systems of orders 6k+1 and 6k+3, but the two residues mod 6 require different constructions.
There are many tools for code quality. But sometimes need gain performance also if code is not corresponds to rules of cod quality. Exists some open source tool for this?
Thanks.
There's no tool for that, but you can try out jVisualVM, however.
http://download.oracle.com/javase/6/docs/technotes/tools/share/jvisualvm.html
It usually comes with your jdk. # C:\Program Files\Java\jdk1.6.0_21\bin
No tool is going to tell you performance and quality. Both are hard to measure.
You can certainly use something like FindBugs or IntelliJ's Inspector to examine your code, but they'll just look for rule violations. I'm not aware of a tool that will point out when I've written code that performs badly. How will a Java code inspector know that your database has no indexes?
I can't answer you regarding code quality. Others can. But when you "need gain performance", I would rather tell you how to do it than tell you what tools to use.
There are tools, but more important than tools is understanding what you're doing.
The most important is to understand that measuring doesn't tell you what to fix to get higher performance; it only tells you how much improvement you got.
The way to improve performance is to find activities, whatever they are, that account for a significant fraction of time and can be improved.
Measuring is not finding.
Example:
I can manually sample the state of a program, several times, and see it much of the time doing container class manipulations, like fetching elements, testing for end conditions, etc.
(That's the finding part.)
This can be happening in many different places in the code, so no particular routine appears to be causing a large fraction of time to be spent.
There is no particular hotspot or obvious bottleneck.
There is no "bad algorithm" or "slow routine", the kinds of thing people say they look for.
Nevertheless, I can see in those few samples that it is doing container class operations, and I can see exactly where.
If I can replace those container class operations with something else that accomplishes the same purpose, I can save time.
How much time? Up to roughly the fraction of time I saw those operations happening, and that can be quite large.
The real payoff for doing this is there can be multiple issues.
Suppose issue A costs 40% of the time, B costs 20%, and C costs 10%,
and the total time is, say, 10 seconds.
You go after A, the most obvious one.
Fixing it reduces time to about 6 seconds. (Speedup 10/6 = 1.67).
Then problem B takes a larger percent of time (2/6 = .33) so it is easier to find with samples.
Fixing it reduces time to 4 seconds (Speedup 6/4 = 1.5)
Then C is (1/4 = 25%) and is much easier to find than before.
Removing it reduces time to 3 seconds (Speedup 4/3 = 1.33).
The total speedup factor is 10/3 = 3.33.
You can look at it as the compounded product of each speedup: 10/6 * 6/4 * 4/3 = 10/3.
Now I'm dealing in numbers here, but none of these had to be measurements of time spent in localized pieces of code.
They were just rough estimates gotten from describing what was happening in a small number of detailed samples of what the program was doing.
The samples aren't really concerned with measuring.
They are concerned with exposing the problems.