I am creating a website (my Academic Project) in which user can upload his program files (.cs,.PHP,.java), then the web compiles the program and able to say Time and space complexity automatically. Is this possible? How can we calculate complexity of program. Is there any code in Java for finding complexity of a program? Or can we found these from compiler itself?
Determining the time- and space complexity of a program is a hard problem. As the feedback have pointed out, it is not even possible in general to point out if a program will terminate. (This is known as the Halting Problem)
To make a start with your project I would advice to look into Cyclomatic Complexity which is calculated for example by the GMetrics project.
This will get you started in your exploration of the subject matter.
Assuming you know what kind of input to give the program, you can estimate the complexity through successive iterations of actually running the program. However, general case static analysis is impossible, due to the Halting Problem.
If you can run the application multiple times on input sets of various sizes, you can develop an approximation.
In the classic case of sorting numbers, you can have the application sort a list of 2 numbers, then 4, 8, 16, 32, etc... and essentially graph the memory and time requirements for each run. Basic curve fitting will show you the growth in complexity.
Note that this is not rigorously accurate, as certain algorithms may have points at which their performance changes radically. Such a system may also get fooled by the differences between growths that "look" similar, but have vastly different properties, such as asymptotic and logarithmic curves, or exponential and polynomial curves.
Related
I am implementing an android application using different geographic coordinates and I need to solve a problem similar to the traveling salesman.
I found an implementation of the algorithm at http://www.theprojectspot.com/tutorial-post/simulated-annealing-algorithm-for-beginners/6.
I adjusted the code to what I need, and it produces theoretically optimal results. I noticed, however, that each execution produces a different type of result.
I went back to the original code and found that even in the original, there is disagreement as to the results.
Do not understand. Shouldn't the result be unique? After all, we are looking for the smallest path ... perhaps some small variation, but each execution differs by several units from the previous execution.
How could I adjust the algorithm to produce the same result in all runs? Has anyone worked with this?
That's the price you pay for an algorithm like this one: the results obtained might very well be different every time. The algorithm does not "find the shortest path," which is a computationally intractable problem ("travelling salesman"). Instead, it seeks to quickly find a solution that is "short enough." Whether or not it actually does so depends very much on the data ... and, to a non-trivial degree, on random chance.
And, since the algorithm is comparatively fast, sometimes you do run it several times in a row in order to gauge the variability of the solutions obtained. If (say) three runs each produce results that are "close enough" to one another, there's a good chance that the result is reliable. But if the standard deviation is very large, the algorithm might not be giving you a good answer. (Bear in mind that sometimes the solution will be wrong.)
So to speak: "you get what you pay for, but you don't pay much for it, and of course that is the point."
I have a few questions about my genetic algorithm and GAs overall.
I have created a GA that when given points to a curve it tries to figure out what function produced this curve.
An example is the following
Points
{{-2, 4},{-1, 1},{0, 0},{1, 1},{2, 4}}
Function
x^2
Sometimes I will give it points that will never produce a function, or will sometimes produce a function. It can even depend on how deep the initial trees are.
Some questions:
Why does the tree depth matter in trying to evaluate the points and
produce a satisfactory function?
Why do I sometimes get a premature convergence and the GA never
breaks out if the cycle?
What can I do to prevent a premature convergence?
What about annealing? How can I use it?
Can you take a quick look at my code and tell me if anything is obviously wrong with it? (This is test code, I need to do some code clean up.)
https://github.com/kevkid/GeneticAlgorithmTest
Source: http://www.gp-field-guide.org.uk/
EDIT:
Looks like Thomas's suggestions worked well I get very fast results, and less premature convergence. I feel like increasing the Gene pool gives better results, but i am not exactly sure if it is actually getting better over every generation or if the fact that it is random allows it to find a correct solution.
EDIT 2:
Following Thomas's suggestions I was able to get it work properly, seems like I had an issue with getting survivors, and expanding my gene pool. Also I recently added constants to my GA test if anyone else wants to look at it.
In order to avoid premature convergence you can also use multiple-subpopulations. Each sub-population will evolve independently. At the end of each generation you can exchange some individuals between subpopulations.
I did an implementation with multiple-subpopulations for a Genetic Programming variant: http://www.mepx.org/source_code.html
I don't have the time to dig into your code but I'll try to answer from what I remember on GAs:
Sometimes I will give it points that will never produce a function, or will sometimes produce a function. It can even depend on how deep the initial trees are.
I'm not sure what's the question here but if you need a result you could try and select the function that provides the least distance to the given points (could be sum, mean, number of points etc. depending on your needs).
Why does the tree depth matter in trying to evaluate the points and produce a satisfactory function?
I'm not sure what tree depth you mean but it could affect two things:
accuracy: i.e. the higher the depth the more accurate the solution might be or the more possibilities for mutations are given
performance: depending on what tree you mean a higher depth might increase performance (allowing for more educated guesses on the function) or decrease it (requiring more solutions to be generated and compared).
Why do I sometimes get a premature convergence and the GA never breaks out if the cycle?
That might be due to too little mutations. If you have a set of solutions that all converge around a local optimimum only slight mutations might not move the resulting solutions far enough from that local optimum in order to break out.
What can I do to prevent a premature convergence?
You could allow for bigger mutations, e.g. when solutions start to converge. Alternatively you could throw entirely new solutions into the mix (think of is as "immigration").
What about annealing? How can I use it?
Annealing could be used to gradually improve your solutions once they start to converge on a certain point/optimum, i.e. you'd improve the solutions in a more controlled way than "random" mutations.
You can also use it to break out of a local optimum depending on how those are distributed. As an example, you could use your GA until solutions start to converge, then use annealing and/or larger mutations and/or completely new solutions (you could generate several sets of solutions with different approaches and compare them at the end), create your new population and if the convergence is broken, start a new iteration with the GA. If the solutions still converge at the same optimum then you could stop since no bigger improvement is to be expected.
Besides all that, heuristic algorithms may still hit a local optimum but that's the tradeoff they provide: performance vs. accuracy.
I just realized that I have no idea on how to tell whether or not a piece of Java code is efficient from the computational point of view. Reading several source codes sometimes I feel that the code I'm reading is highly inefficient, some other times I feel the opposite.
Could you list the basic one-line rules to respect and why they are so important?
edit - My question is related to the Java implementations of the JVM, so things like Java allocation issues, String management, exception handling, thread synchronization and so on.
Thanks in advance
p.s. don't take the "one-line" literally pls
Basic one-line rule? Okay, here you go:
Avoid unnecessary computations.
How do you do it? Sorry, no one-line answer to that. :(
Well, people spends years in college learning about algorithms and data structures in computer science for a reason... might want to take a course on algorithms/data structures sometime.
I'm not sure what you mean by "from a computation point of view" (it seems to imply algorithm issues), but assuming you mean tricks more similar to things like profiling, try these:
Run the program, then suddenly pause it, and see where it paused. Do this a few times; wherever it stops the most is a bottleneck, and how often it stops indicates how bad of a bottleneck it is.
Avoid boxing/unboxing (converting between int and Integer, etc.); especially avoid Integer[], List<Integer>, and other things that internally store arrays of objects of primitive types
Factor out common code (sometimes a speed issue, sometimes readability)
Avoid looping with String operations; use StringBuilder/StringBuffer instead. (In short, avoid creating and/or copying data when that's not needed.)
I'll add to this if other things come to mind.
Use profiling. Look at JProfile or for any other profilers.
I'll second Mherdad's answer in that there definitely are no "basic one-line rules."
Regarding answers that suggest using profiling tools, profiling isn't really useful until you understand algorithmic time complexity and big-O notation. From wikipedia's article on Big O notation:
In mathematics, computer science, and
related fields, big-O notation
describes the limiting behavior of the
function when the argument tends
towards a particular value or
infinity, usually in terms of simpler
functions. Big O notation
characterizes functions according to
their growth rates: different
functions with the same growth rate
may be represented using the same O
notation.
The idea behind big-O notation is that it gives you a feel for how input size affects execution time for a given algorithm. For instance, consider the following two methods:
void linearFoo(List<String> strings){
for(String s:strings){
doSomethingWithString(s);
}
}
void quadraticFoo(List<String> strings){
for(String s:strings){
for(String s1:strings){
doSomethingWithTwoStrings(s,s1);
}
}
}
linearFoo is said to be O(n), meaning that its time increases linearly with the input size n (ie. strings.size()). quadraticFoo is said to be O(n2), meaning that the time it takes to execute quadraticFoo is a function of strings.size() squared.
Once you have a feel for the algorithmic time complexity of your program profiling tools will start to be useful. For instance, you'll be able to tell that if while profiling you find out that a method typically takes 1ms for a fixed input size, if that method is O(n), doubling the input size will result in an execution time of 2ms (1ms = n, therefore 2n = 2ms). However, if it is O(n2), doubling input size will mean that your method will take around 4ms to execute (1ms = n2 therefore (2n)2 = 4ms).
Take a look at the book Effective Java by Joshua Bloch if you really need a list of rules that you should follow in Java. The book offers guidelines not just for performance but also not he proper way of programming in Java.
You can use jconsole for monitoring your application's deadlocks, memory leaks, threads and heap. In short you can see your applications performance in graphs.
I want to ask a complex question.
I have to code a heuristic for my thesis. I need followings:
Evaluate some integral functions
Minimize functions over an interval
Do this over thousand and thousand times.
So I need a faster programming language to do these jobs. Which language do you suggest? First, I started with Java, but taking integrals become a problem. And I'm not sure about speed.
Connecting Java and other softwares like MATLAB may be a good idea. Since I'm not sure, I want to take your opinions.
Thanks!
C,Java, ... are all Turing complete languages. They can calculate the same functions with the same precision.
If you want achieve performance goals use C that is a compiled and high performances language . Can decrease your computation time avoiding method calls and high level features present in an interpreted language like Java.
Anyway remember that your implementation may impact the performances more than which language you choose, because for increasing input dimension is the computational complexity that is relevant ( http://en.wikipedia.org/wiki/Computational_complexity_theory ).
It's not the programming language, it's probably your algorithm. Determine the big0 notation of your algorithm. If you use loops in loops, where you could use a search by a hash in a Map instead, your algorithm can be made n times faster.
Note: Modern JVM's (JDK 1.5 or 1.6) compile Just-In-Time natively (as in not-interpreted) to a specific OS and a specific OS version and a specific hardware architecture. You could try the -server to JIT even more aggressively (at the cost of an even longer initialization time).
Do this over thousand and thousand times.
Are you sure it's not more, something like 10^1000 instead? Try accurately calculating how many times you need to run that loop, it might surprise you. The type of problems on which heuristics are used, tend to have a really big search space.
Before you start switching languages, I'd first try to do the following things:
Find the best available algorithms.
Find available implementations of those algorithms usable from your language.
There are e.g. scientific libraries for Java. Try to use these libraries.
If they are not fast enough investigate whether there is anything to be done about it. Is your problem more specific than what the library assumes. Are you able to improve the algorithm based on that knowledge.
What is it that takes so much/memory? Is this realy related to your language? Try to avoid observing JVM start times instead of the time it performed calculation for you.
Then, I'd consider switching languages. But don't expect it to be easy to beat optimized third party java libraries in c.
Order of the algorithm
Tipically switching between languages only reduce the time required by a constant factor. Let's say you can double the speed using C, but if your algorithm is O(n^2) it will take four times to process if you double the data no matter the language.
And the JVM can optimize a lot of things getting good results.
Some posible optimizations in Java
If you have functions that are called a lot of times make them final. And the same for entire classes. The compiler will know that it can inline the method code, avoiding creating method-call stack frames for that call.
For a project at university, we had to implement a few different algorithms to calculate the equivalenceclasses when given a set of elements and a collection of relations between said elements.
We were instructed to implement, among others, the Union-Find algorithm and its optimizations (Union by Depth, Size). By accident (doing something I thought was necessary for the correctness of the algorithm) I discovered another way to optimize the algorithm.
It isn't as fast as Union By Depth, but close. I couldn't for the life of me figure out why it was as fast as it was, so I consulted one of the teaching assistants who couldn't figure it out either.
The project was in java and the datastructures I used were based on simple arrays of Integers (the object, not the int)
Later, at the project's evaluation, I was told that it probably had something to do with 'Java caching', but I can't find anything online about how caching would effect this.
What would be the best way, without calculating the complexity of the algorithm, to prove or disprove that my optimization is this fast because of java's way of doing stuff? Implementing it in another, (lower level?) language? But who's to say that language won't do the same thing?
I hope I made myself clear,
thanks
The only way is to prove the worst-case (average case, etc) complexity of the algorithm.
Because if you don't, it might just be a consequence of a combination of
The particular data
The size of the data
Some aspect of the hardware
Some aspect of the language implementation
etc.
It is generally very difficult to perform such task given modern VM's! Like you hint they perform all sorts of stuff behind your back. Method calls gets inlined, objects are reused. Etc. A prime example is seeing how trivial loops gets compiled away if their obviously are not performing anything other than counting. Or how a funtioncall in functional programming are inlined or tail-call optimized.
Further, you have the difficulty of proving your point in general on any data set. An O(n^2) can easily be much faster than a seemingly faster, say O(n), algorithm. Two examples
Bubble sort is faster at sorting a near-sorted data collection than quick sort.
Quick sort in the general case, of course is faster.
Generally the big-O notation purposely ignores constants which in a practical situation can mean life or death to your implementation. and those constants may have been what hit you. So in practice 0.00001 * n
^2 (say the running time of your algorithm) is faster than 1000000 * n log n
So reasoning is hard given the limited information you provide.
It is likely that either the compiler or the JVM found an optimisation for your code. You could try reading the bytecode that is output by the javac compiler, and disabling runtime JIT compilation with the -Djava.compiler=NONE option.
If you have access to the source code -- and the JDK source code is available, I believe -- then you can trawl through it to find the relevant implementation details.