I was wondering how can I estimate a total running time of a java program on specific machine before program ends? I need to know how much it will take so I can announce the progress by that.
FYI Main algorithm of my program takes O(n^3) time complexity. Suppose n= 100000, how much it takes to run this program on my machine? (dual intel xeon e2650)
Regards.
In theory 1GHz of computational power should result in about 1 billion simple operations. However finding the number of simple operations is not always easy. Even if you know the time complexity of a given algorithm, this is not enough - you also need to know the constant factor. In theory it is possible to have a linear algorithm that takes several seconds to compute something for input of size 10000(and some algorithms like this exist - like the linear pre-compute time RMQ).
What you do know, however is that something of O(n^3) will need to perform on the order of 100000^3 operations. So even if your constant is about 1/10^6(which is highly unprobable), this computation will take a lot of time.
I believe #ArturMalinowski 's proposal is the right way to approach your problem. If you benchmark the performance of your algorithm for some sequence known aforehand e.g. {32,64,128,...} or as he proposes {1,10,100,...}. This way you will be able to determine the constant factor with relatively good precision.
Related
I have been developing with Java for some time now, and always strife to do something in the most efficient way. By now i have mostly been trying to condense the number of lines of code I have. But when starting to work with 2d rendering it is more about how long it takes to compute a certain piece of code as it is called many times a second.
My question:
Is there some way to measure how long it takes to compute a certain piece of code in Eclipse, Java, ... ?
First, some nitpicking. You title this question ...
Putting a number to the efficiency of an algorithm
There is no practical quantifiable measure of "efficiency" for an algorithm. Efficiency (as normally conceived) is a measure of "something" relative to an ideal / perfect; e.g. a hypothetical 100% efficient steam engine would convert all of the energy in the coal being burned into useful "work". But for software, there is no ideal to measure against. (Or if there is, we can't be sure that it is the ideal.) Hence "efficiency" is the wrong term.
What you actually mean is a measure of the "performance" of ...
Algorithms are an abstract concept, and their performance cannot be measured.
What you actually want is a measure of the performance of a specific implementation of an algorithm; i.e. some actual code.
So how do you quantify performance?
Well, ultimately there is only one sound way to quantify performance. You measure it, empirically. (How you do that ... and the limitations ... are a matter I will come to.)
But what about theoretical approaches?
A common theoretical approach is to analyse the algorithm to give you a measure of computational complexity. The classic measure is Big-O Complexity. This is a very useful measure, but unfortunately Big-O Complexity does not actually measure performance at all. Rather, it is a way of characterizing the behaviour of an algorithm as the problem size scales up.
To illustrate, consider these algorithms for adding B numbers together:
int sum(int[] input) {
int sum = 0;
for (int i = 0; i < input.size(); i++) {
sum += input[i];
}
return i;
}
int sum(int[] input) {
int tmp = p(1000); // calculates the 1000th prime number
int sum = 0;
for (int i = 0; i < input.size(); i++) {
sum += input[i];
}
return i;
}
We can prove that both versions of sum have a complexity of O(N), according to the accepted mathematical definitions. Yet it obvious that the first one will be faster than the second one ... because the second one does a large (and pointless) calculation as well.
In short: Big-O Complexity is NOT a measure of Performance.
What about theoretical measures of Performance?
Well, as far as I'm aware, there are none that really work. The problem is that real performance (as in time taken to complete) depends on various complicated things in the compilation of code to executables AND the way that real execution platforms (hardware) behaves. It is too complicated to do a theoretical analysis that will reliably predict actual performance.
So how do you measure performance?
The naive answer is to benchmark like this:
Take a clock measurement
Run the code
Take a second clock measurement
Subtract the first measurement from the second ... and that is your answer.
But it doesn't work. Or more precisely, the answer you get may be wildly different from the performance that the code exhibits when you use it in a real world context.
Why?
There may be other things happening on the machine that are happening ... or have happened ... that influence the code's execution time. Another program might be running. You may have files pre-loaded into the file system cache. You may get hit by CPU clock scaling ... or a burst of network traffic.
Compilers and compiler flags can often make a lot of difference to how fast a piece of code runs.
The choice of inputs can often make a big difference.
If the compiler is smart, it might deduce that some or all of your benchmarked code does nothing "useful" (in the context) ... and optimize it away entirely.
And for languages like Java and C#, there are other important issues:
Implementations of these languages typically do a lot of work during startup to load and link the code.
Implementations of these languages are typically JIT compiled. This means that the language runtime system does the final translation of the code (e.g. bytecodes) to native code at runtime. The performance characteristics of your code after JIT compilation change drastically, and the time taken to do the compilation may be significant ... and may distort your time measurements.
Implementations of these languages typically rely on a garbage collected heap for memory management. The performance of a heap is often uneven, especially at startup.
These things (and possibly others) contribute to something that we call (in Java) JVM warmup overheads; particularly JIT compilation. If you don't take account of these overheads in your methodology, then your results are liable to be distorted.
So what is the RIGHT way to measure performance of Java code?
It is complicated, but the general principle is to run the benchmark code lots of times in the same JVM instance, measuring each iteration. The first few measurements (during JVM warmup) should be discarded, and the remaining measurements should be averaged.
These days, the recommended way to do Java benchmarking is to use a reputable benchmarking framework. The two prime candidates are Caliper and Oracle's jmh tool.
And what are the limitations of performance measurements that you mentioned?
Well I have alluded to them above.
Performance measurements can be distorted to various environmental factors on the execution platform.
Performance can be dependent on inputs ... an this may not be revealed by simple measurement.
Performance (e.g. of C / C++ code) can be dependent on the compiler and compiler switches.
Performance can be dependent on hardware; e.g. processors speed, number of cores, memory architecture, and so on.
These factors can make it difficult to make general statements about the performance of a specific piece of code, and to make general comparisons between alternative versions of the same code. As a rule, we can only make limited statements like "on system X, with compiler Y and input set Z the performance measures are P, Q, R".
The amount of lines has very little correlation to the execution speed of a program.
Your program will look completely different after it's processed by the compiler. In general, large compilers perform many optimizations, such as loop unrolling, getting rid of variables that are not used, getting rid of dead code, and hundreds more.
So instead of trying to "squeeze" the last bit of performance/memory out of your program by using short instead of int, char[] instead of String or whichever method you think will "optimize" (premature optimization) your program, just do it using objects, or types such that make sense to you, so it will be easier to maintain. Your compiler, interpreter, VM should take care of the rest. If it doesn't, only then do you start looking for bottlenecks, and start playing with hacks.
So what makes programs fast then? Algorithmic efficiency (at least it tends to make the biggest difference if the algorithm/data structure was not designed right). This is what computer scientists study.
Let's say you're given 2 data structures. An array, and a singly linked list.
An array stores things in a block, one after the other.
+-+-+-+-+-+-+-+
|1|3|2|7|4|6|1|
+-+-+-+-+-+-+-+
To retrieve the element at index 3, you simply just go to the 4th square and retrieve it. You know where it is because you know it's 3 after the first square.
A singly linked list will store things in a node, which may not be stored contiguously in memory, but each node will have a tag (pointer, reference) on it telling you where the next item in the list is.
+-+ +-+ +-+ +-+ +-+ +-+ +-+
|1| -> |3| -> |2| -> |7| -> |4| -> |6| -> |1|
+-+ +-+ +-+ +-+ +-+ +-+ +-+
To retrieve the element at index of 3, you will have to start with the first node, then go to the connected node, which is 1, and then go to 2, and finally after, you arrive at 3. All because you don't know where they are, so you follow a path to them.
Now say you have an Array and an SLL, both containing the same data, with the length n, which one would be faster? Depends on how you use it.
Let's say you do a lot of insertions at the end of the list. The algorithms (pseudocode) would be:
Array:
array[list.length] = element to add
increment length field
SLL:
currentNode = first element of SLL
while currentNode has next node:
currentNode = currentNode's next element
currentNode's next element = new Node(element to add)
increment length field
As you can see, in the array algorithm, it doesn't matter what the size of the array is. It always takes a constant amount of operations. Let's say a[list.length] takes 1 operation. Assigning it is another operation, incrementing the field, and writing it to memory is 2 operations. It would take 4 operations every time. But if you look at the SLL algorithm, it would take at least list.length number of operations just to find the last element in the list. In other words, the time it takes to add an element to the end of an SLL increases linearly as the size of the SLL increases t(n) = n, whereas for the array, it's more like t(n) = 4.
I suggest reading the free book written by my data structures professor. Even has working code in C++ and Java
Generally speaking, the speed vs. lines of code is not the most effective measure of performance since it depends heavily depends on your hardware and your compiler. There is something called Big Oh notation, which gives one a picture of how fast an algorithm will run as the number of inputs increase.
For example, if your algorithm speed is O(n), then the time it will take for code to run scales linear with time. If your algorithm speed is O(1), then the time it will take your code to run will be constant.
I found this particular way of measuring performance useful because you learn that it's not really lines of code that will effect speed it's your codes design that will effect speed. A code with a more efficient way of handling the problem can be faster than code with a less efficient method with 1/10 lines of code.
I have read about time complexities only in theory.. Is there any way to calculate them in a program? Not by assumptions like 'n' or anything but by actual values..
For example.. calculating time complexities of Merge sort and quick sort..
Merge Sort= O(nlogn);// any case
Quick Sort= O(n^2);// worst case(when pivot is largest or smallest value)
there is a huge difference in nlogn and n^2 mathematically..
So i tried this in my program..
main()
{
long t1=System.nanoTime();
// code of program..
long t2=System.nanoTime();
time taken=t2-t1;
}
The answer i get for both the algorithms,in fact for any algorithm i tried is mostly 20.
Is System.nanoTime() not precise enough or should i use a slower system? Or is there any other way?
Is there any way to calculate them in a program? Not by assumptions like 'n' or anything but by actual values.
I think you misunderstand what complexity is. It is not a value. It is not even a series of values. It is a formula. If you get rid of the N it is meaningless as a complexity measure (except in the case of O(1) ... obviously).
Setting that issue on one side, it would be theoretically possible to automate the rigorous analysis of complexity. However this is a hard problem: automated theorem proving is difficult ... especially if there is no human being in the loop to "guide" the process. And the Halting Theorem implies that there cannot be an automated theorem prover that can prove the complexity of an arbitrary program. (Certainly there cannot be a complexity prover that works for all programs that may or may not terminate ...)
But there is one way to calculate a performance measure for a program with a given set of input. You just run it! And indeed, you do a series of runs, graphing performance against some problem size measure (i.e. an N) ... and make an educated guess at a formula that relates the performance and the N measure. Or you could attempt to fit the measurements to a formula.
However ...
it is only a guess, and
this approach is not always going to work.
For example, if you tried this on classic Quicksort, you most likely conclude that complexity is O(NlogN) and miss the important caveat that there is a "worst case" where it is O(N^2). Another example is where the observable performance characteristics change as the problem size gets big.
In short, this approach is liable to give you unreliable answers.
Well, in practice with some assumptions on the program, you might be able to run your program on large number of test case (and measure the time it takes) and use interpolation to estimate the growth rate and the complexity of the program, and use statistical hypothesis testing to show the probability you are correct.
However, this thing cannot be done in ALL cases. In fact, you cannot even have an algorithm that tells for each program if it is going to halt or not (run an infinite loop). This is known as the Halting Problem, which is proven to be insolveable.
Micro benchmarks like this are inherently flawed, and you're never going to get brilliantly accurate readings using them - especially not in the nanoseconds range. The JIT needs time to "warm up" to your code, during which time it will optimise itself around what code is being called.
If you must go down this route, then you need a big test set for your algorithm that'll take seconds to run rather than nanoseconds, and preferably a "warm up" period in there as well - then you might see some differences close to what you're expecting. You're never going to just be able to take those timings though and calculate the time complexity from them directly - you'd need to run many cases with different sizes and then plot a graph as to the time taken for each input size. Even that approach won't gain you brilliantly accurate results, but it would be enough to give an idea.
Your question might be related to Can a program calculate the complexity of an algorithm? and Program/algorithm to find the time complexity of any given program, I think that you do a program where you count while or for loops and see if its nested or not but I don't figure how you can calculate complexity for some recursive functions.
The microbenchmark which you wrote is incorrect. When you want to gather some time metrics of your code for further optimization, JFF, etc. use JMH. This will help you a lot.
When we say that an algorithm exhibits O(nlogn) complexity, we're saying that the asymptotic upper bound for that algorithm is O(nlogn). That is, for sufficiently large values of n, the algorithm behaves like a function n log n. We're not saying that for n inputs, there will definitely be n log n executions. Simply that this is the definition set, that your algorithm belongs to.
By taking time intervals on your system, you're actually exposing yourself to the various variables involved in the computer system. That is, you're dealing with system latency, wire resistance, CPU speed, RAM usage... etc etc. All of these things will have a measurable effect on your outcome. That is why we use asymptotics to compute the time complexity of an algorithm.
One way to check the time complexity is to run both algorithms , on different sizes of n and check the ratio between each run . From this ratio you can get the time complexity
For example
If time complexity is O(n) then the ratio will be linear
If time complexity is O(n^2) the ratio will be (n1/n2)^2
if time complexity is O(log(n)) the ratio will be log(n1)/log(n2)
I 've done some work on my graduation project and achieved several molecule structures + force calculation under Lennard-Jones potential and Coulomb potential + inter-molecular bonding(as in picture)
(http://img17.imageshack.us/img17/3133/simulasyon.png)
All done with Verlet algorithm in a single thread.
The problem is: i am using "calculations table"-array for quick answers to x^(3.5),x^(1.4), (1/x).... because it is very slow to compute with native methods of java. Array - access time is real high so i tried "unsafe()" methods and still very slow (only %10 performance gain).
Tried IntBuffer and DoubleBuffer and still no good.
Program calculates O(n) bond calculations, O(nlog(n)) Lennard-Jones (+ extra Pauli exclusion principle) and O(nlog(n)) Coulomb force calculations.
Poor speed at 1500+ particles (and 7000+ bonds) .
I already checked where is the speed bottleneck(it is Lennard Jones + Coulomb). it takes 4 milliseconds for one time-step calculation at 1500 particles. I need it be 1 milliseconds.
Only if i could use arrays as fast as any other language(safe or not).
Also tried replacing divisions with multiplications and hashmaps and lists(same performance with arrays).
Do you know any other way of decreasing the time for calculation per timestep?
Thank you.
Computer: 2.0 GHz single-core intel, 1.2GB RAM, windows-XP SP-3 and Eclipse Indigo.
Instead of using a lookup tables, try using Chebyshev polynomials. Remember that you can exponentiate x^k in only ln(k) steps.
It may seem like a lot of operations, but the fact that in can be done without hitting memory (and therefore not affecting the cache) can make it significantly faster than a lookup table.
In my current project, I am measuring complexity of algorithms written in Java. I operate with asymptotic complexity (expected result) and I want to validate the expectation by comparison with the actual number of operations. Using Incrematation per operation seems to me a bit clumsy awkward. Is there any better approach to measure operational complexity?
Thanks
Edit: more info
The algorithms might run on different machines
Some parts of divide and conquer algorithms might be precached, hence it is probable, that the procedure will be faster than expected
Also it is important for me to find out the multiplicative constant (or the additive one), which is not taken in consideration in asymptotic complexity
Is there particular reason not to just measure the CPU time? The time utility or a profiler will get you the numbers. Just run each algorithm with a sufficient range of inputs and capture the cpu time (not wall clock time) spent.
On an actual computer you want to measure execution time, getCurrentTimeMillis(). Vary the N parameter, get solid statistics. Do an error estimate. Your basic least squares will be fine.
Counting operations on an algorithm is fine, but it has limited use. Different processors do stuff at different speeds. Counting the number of expressions or statements executed by your implemented algorithm is close to useless. In the algorithm you can use it to make comparisons for tweaking, in your implementation this is no longer the case, compiler/JIT//CPU tricks will dominate.
Asymptotic behavior should be very close to the calculated/expected if you do good measurements.
ByCounter can be used to instrument Java (across multiple classes) and count the number of bytecodes executed by the JVM at runtime.
I'm trying to do a typical "A/B testing" like approach on two different implementations of a real-life algorithm, using the same data-set in both cases. The algorithm is deterministic in terms of execution, so I really expect the results to be repeatable.
On the Core 2 Duo, this is also the case. Using just the linux "time" command I'll get variations in execution time around 0.1% (over 10 runs).
On the i7 I will get all sorts of variations, and I can easily have 30% variations up and down from the average. I assume this is due to the various CPU optimizations that the i7 does (dynamic overclocking etc), but it really makes it hard to do this kind of testing. Is there any other way to determine which of 2 algorithms is "best", any other sensible metrics I can use ?
Edit: The algorithm does not sustain for very long and this is actually the real-life scenario I'm trying to benchmark. So running repeatedly is not really an option as such.
See if you can turn off the dynamic over-clocking in your BIOS. Also, ditch all possible other processes running when doing the benchmarking.
Well you could use O-notation principles in determining the performance of algorithms. This will determine the theoretical speed of an algorithm.
http://en.wikipedia.org/wiki/Big_O_notation
If you absolutely must know the real life speed of the alogorithm, then ofc you must benchmark it on a system. But using the O-notation you can see past all that and only focus on the factors/variables that are important.
You didn't indicate how you're benchmarking. You might want to read this if you haven't yet: How do I write a correct micro-benchmark in Java?
If you're running a sustained test I doubt dynamic clocking is causing your variations. It should stay at the maximum turbo speed. If you're running it too long perhaps it's going down one multiplier for heat. Although I doubt that, unless you're over-clocking and are near the thermal envelope.
Hyper-Threading might be playing a role. You could disable that in your BIOS and see if it makes a difference in your numbers.
On linux you can lock the CPU speed to stop clock speed variation. ;)
You need to make the benchmark as realistic as possible. For example, if you run an algorithm flat out and take an average it you might get very different results from performing the same tasks every 10 ms. i.e. I have seen 2x to 10x variation (between flat out and relatively low load), even with a locked clock speed.