Will genetic algorithm provides different output every time? - java

Since we expect feasible solution from Genetic Algorithm, So will genetic algorithm provides different output every time with same input set?

It depends.
GAs will take different routes through the solution space; and
GAs are not guaranteed to converge
GAs are typically used on complex problems with large solutions spaces with convoluted payoffs and local minima. In such problems you would not expect identical outputs at the end of a GA run.
On the other hand, GAs may be applied to large problems that have a single correct answer. In such situations, the population may converge.

A good GA implementation will support reproducible results. This applies to all metaheuristics (not just GA's). Reproducible means that the same run will yield the same order of the same set of new best solution events are found. Depending on the actual CPU time given to the process, the number of iterations might differ and therefore they might not end with the same best solution.
Internally, reproducible results implies that:
everything uses 1 seeded Random instance.
even parallel implementations yield reproducible results (=> no work stealing)
...
During development, reproducibility is worth its weight in gold to find, diagnose, debug and fix bugs.
In production, a few companies turn it off (to take advantage of performance gains such as work stealing), but most enterprises still like to leave it on.

Related

Shortest path using Simulated Annealing using an Android app

I am implementing an android application using different geographic coordinates and I need to solve a problem similar to the traveling salesman.
I found an implementation of the algorithm at http://www.theprojectspot.com/tutorial-post/simulated-annealing-algorithm-for-beginners/6.
I adjusted the code to what I need, and it produces theoretically optimal results. I noticed, however, that each execution produces a different type of result.
I went back to the original code and found that even in the original, there is disagreement as to the results.
Do not understand. Shouldn't the result be unique? After all, we are looking for the smallest path ... perhaps some small variation, but each execution differs by several units from the previous execution.
How could I adjust the algorithm to produce the same result in all runs? Has anyone worked with this?
That's the price you pay for an algorithm like this one: the results obtained might very well be different every time. The algorithm does not "find the shortest path," which is a computationally intractable problem ("travelling salesman"). Instead, it seeks to quickly find a solution that is "short enough." Whether or not it actually does so depends very much on the data ... and, to a non-trivial degree, on random chance.
And, since the algorithm is comparatively fast, sometimes you do run it several times in a row in order to gauge the variability of the solutions obtained. If (say) three runs each produce results that are "close enough" to one another, there's a good chance that the result is reliable. But if the standard deviation is very large, the algorithm might not be giving you a good answer. (Bear in mind that sometimes the solution will be wrong.)
So to speak: "you get what you pay for, but you don't pay much for it, and of course that is the point."

ArrayList or Multiple LinkedHashMap

I have an ArrayList of a custom object A. I need to retrieve 2 variables from A based on certain conditions. Should I simply use for loop to retrieve data from the list each time or create 2 LinkedHashMap and store the required variable in it as key/value pair for faster access later? Which is more efficient? Does creating 2 additional map objects justify the efficiency during search?
List will contain about 100-150 objects so does the two maps.
It will be used by concurrent users on daily basis.
Asking about "efficiency" is like asking about "beauty". What is "efficiency"? I argue that efficiency is what gets the code out soonest without bugs or other misbehavior. What's most efficient in terms of software costs is what saves programmer time, both for initial development and maintenance. In the time it took you to find "answers" on SO, you could have had a correct implementation coded and correct, and still had time to test your alternatives rigorously under controlled conditions to see which made any difference in the program's operation.
If you save 10 ms of program run time at the cost of horridly complex, over-engineered code that is rife with bugs and stupidly difficult to refactor or fix, is that "efficient"?
Furthermore, as phrased, the question is useless on SO. You provided no definition of "efficient" from your context. You provided no information on how the structures in question fit into your project architecture, or the extent of their use, or the size of the problem, or anything else relevant to any definition of "efficiency".
Even if you had, we'd have no more ability to answer such a question than if you asked a roomful of lawyers, "Should I sue so-and-so for what they did?" It all depends. You need advice, if you need advice at all, that is very specific to your situation and the exact circumstances of your development environment and process, your runtime environment, your team, the project goals, budget, and other relevant data.
If you are interested in runtime "efficiency", do the following. Precisely define what exactly you mean by "efficient", including an answer to "how 'efficient' is 'efficient' enough?", and including criteria to measure such "efficiency". Once you have such a precise and (dis)provable definition, then set up a rigorous test protocol to compare the alternatives in your context, and actually measure "efficiency".
When defining "efficiency", make sure that what you define matters. It makes no difference to be "efficient" in an area that has very low project cost or impact, and ignore an area that has huge cost or impact.
Don't expect any meaningful answer for your situation here on SO.
Use LinkedHashMap because it made for key value pair (according to your requirement).because data will increase in production environment.

How to decide what is the relevant group in a precsion and recall computation?

One of the most famous measurements for an information retrieval system is to compute its precision and recall. For both cases, we need to compute the number of total relevant documents and compare it with the documents that the system has returned. My question is that how can we find this super set of relevant documents in following scenario:
Consider we have an academic search engine which its job is to accept a full name of an academic paper and base on some algorithms, it returns a list of relevant papers. Here, to judge whether or not the system has a good accuracy, we wish to calculate its precision and recall. But We do not know how can we produce a set of relevant papers -which the search engine should return them, regarding to different user's query- and accordingly, computing the precision and recall.
Most relevant document sets for system design involve showing documents to a user (real person).
Non-human evaluation:
You could probably come up with a "fake" evaluation in your particular instance. I would expect that the highest ranked paper for the paper "Variations in relevance judgments and the measurement of retrieval effectiveness" [1] would be that paper itself. So you could take your data and create an automatic evaluation. It wouldn't tell you if your system was really finding new things (which you care about), but it would tell you if your system was terrible or not.
e.g. If you were at a McDonald's, and you asked a mapping system where the nearest McDonald's was, and it didn't find the one you were at, you would know this to be some kind of system failure.
For a real evaluation:
You come up with a set queries, and you judge the top K results from your systems for each of them. In practice, you can't look at all millions of papers for each query -- so you approximate the recall set by the recall set you currently know about. This is why it's important to have some diversity in the systems you're pooling. Relevance is tricky; people disagree a lot on what documents are relevant to a query.
In your case: people will disagree about what papers are relevant to another paper. But that's largely okay, since they will mostly agree on the obvious ones.
Disagreement is okay if you're comparing systems:
This paradigm only makes sense if you're comparing different information retrieval systems. It doesn't help you understand how good a single system is, but it helps you determine if one system is better than the other reliably [1].
[1] Voorhees, Ellen M. "Variations in relevance judgments and the measurement of retrieval effectiveness." Information processing & management 36.5 (2000): 697-716.

Genetic Algorithm - convergence

I have a few questions about my genetic algorithm and GAs overall.
I have created a GA that when given points to a curve it tries to figure out what function produced this curve.
An example is the following
Points
{{-2, 4},{-1, 1},{0, 0},{1, 1},{2, 4}}
Function
x^2
Sometimes I will give it points that will never produce a function, or will sometimes produce a function. It can even depend on how deep the initial trees are.
Some questions:
Why does the tree depth matter in trying to evaluate the points and
produce a satisfactory function?
Why do I sometimes get a premature convergence and the GA never
breaks out if the cycle?
What can I do to prevent a premature convergence?
What about annealing? How can I use it?
Can you take a quick look at my code and tell me if anything is obviously wrong with it? (This is test code, I need to do some code clean up.)
https://github.com/kevkid/GeneticAlgorithmTest
Source: http://www.gp-field-guide.org.uk/
EDIT:
Looks like Thomas's suggestions worked well I get very fast results, and less premature convergence. I feel like increasing the Gene pool gives better results, but i am not exactly sure if it is actually getting better over every generation or if the fact that it is random allows it to find a correct solution.
EDIT 2:
Following Thomas's suggestions I was able to get it work properly, seems like I had an issue with getting survivors, and expanding my gene pool. Also I recently added constants to my GA test if anyone else wants to look at it.
In order to avoid premature convergence you can also use multiple-subpopulations. Each sub-population will evolve independently. At the end of each generation you can exchange some individuals between subpopulations.
I did an implementation with multiple-subpopulations for a Genetic Programming variant: http://www.mepx.org/source_code.html
I don't have the time to dig into your code but I'll try to answer from what I remember on GAs:
Sometimes I will give it points that will never produce a function, or will sometimes produce a function. It can even depend on how deep the initial trees are.
I'm not sure what's the question here but if you need a result you could try and select the function that provides the least distance to the given points (could be sum, mean, number of points etc. depending on your needs).
Why does the tree depth matter in trying to evaluate the points and produce a satisfactory function?
I'm not sure what tree depth you mean but it could affect two things:
accuracy: i.e. the higher the depth the more accurate the solution might be or the more possibilities for mutations are given
performance: depending on what tree you mean a higher depth might increase performance (allowing for more educated guesses on the function) or decrease it (requiring more solutions to be generated and compared).
Why do I sometimes get a premature convergence and the GA never breaks out if the cycle?
That might be due to too little mutations. If you have a set of solutions that all converge around a local optimimum only slight mutations might not move the resulting solutions far enough from that local optimum in order to break out.
What can I do to prevent a premature convergence?
You could allow for bigger mutations, e.g. when solutions start to converge. Alternatively you could throw entirely new solutions into the mix (think of is as "immigration").
What about annealing? How can I use it?
Annealing could be used to gradually improve your solutions once they start to converge on a certain point/optimum, i.e. you'd improve the solutions in a more controlled way than "random" mutations.
You can also use it to break out of a local optimum depending on how those are distributed. As an example, you could use your GA until solutions start to converge, then use annealing and/or larger mutations and/or completely new solutions (you could generate several sets of solutions with different approaches and compare them at the end), create your new population and if the convergence is broken, start a new iteration with the GA. If the solutions still converge at the same optimum then you could stop since no bigger improvement is to be expected.
Besides all that, heuristic algorithms may still hit a local optimum but that's the tradeoff they provide: performance vs. accuracy.

Where to code this heuristic?

I want to ask a complex question.
I have to code a heuristic for my thesis. I need followings:
Evaluate some integral functions
Minimize functions over an interval
Do this over thousand and thousand times.
So I need a faster programming language to do these jobs. Which language do you suggest? First, I started with Java, but taking integrals become a problem. And I'm not sure about speed.
Connecting Java and other softwares like MATLAB may be a good idea. Since I'm not sure, I want to take your opinions.
Thanks!
C,Java, ... are all Turing complete languages. They can calculate the same functions with the same precision.
If you want achieve performance goals use C that is a compiled and high performances language . Can decrease your computation time avoiding method calls and high level features present in an interpreted language like Java.
Anyway remember that your implementation may impact the performances more than which language you choose, because for increasing input dimension is the computational complexity that is relevant ( http://en.wikipedia.org/wiki/Computational_complexity_theory ).
It's not the programming language, it's probably your algorithm. Determine the big0 notation of your algorithm. If you use loops in loops, where you could use a search by a hash in a Map instead, your algorithm can be made n times faster.
Note: Modern JVM's (JDK 1.5 or 1.6) compile Just-In-Time natively (as in not-interpreted) to a specific OS and a specific OS version and a specific hardware architecture. You could try the -server to JIT even more aggressively (at the cost of an even longer initialization time).
Do this over thousand and thousand times.
Are you sure it's not more, something like 10^1000 instead? Try accurately calculating how many times you need to run that loop, it might surprise you. The type of problems on which heuristics are used, tend to have a really big search space.
Before you start switching languages, I'd first try to do the following things:
Find the best available algorithms.
Find available implementations of those algorithms usable from your language.
There are e.g. scientific libraries for Java. Try to use these libraries.
If they are not fast enough investigate whether there is anything to be done about it. Is your problem more specific than what the library assumes. Are you able to improve the algorithm based on that knowledge.
What is it that takes so much/memory? Is this realy related to your language? Try to avoid observing JVM start times instead of the time it performed calculation for you.
Then, I'd consider switching languages. But don't expect it to be easy to beat optimized third party java libraries in c.
Order of the algorithm
Tipically switching between languages only reduce the time required by a constant factor. Let's say you can double the speed using C, but if your algorithm is O(n^2) it will take four times to process if you double the data no matter the language.
And the JVM can optimize a lot of things getting good results.
Some posible optimizations in Java
If you have functions that are called a lot of times make them final. And the same for entire classes. The compiler will know that it can inline the method code, avoiding creating method-call stack frames for that call.

Categories

Resources