I'm trying to create a benchmark test with java. Currently I have the following simple method:
public static long runTest(int times){
long start = System.nanoTime();
String str = "str";
for(int i=0; i<times; i++){
str = "str"+i;
}
return System.nanoTime()-start;
}
I'm currently having this loop multiple times within another loop that is happening multiple times and getting the min/max/avg time it takes to run this method through. Then I am starting some activity on another thread and testing again. Basically I am just wanting to get consistent results... It seems pretty consistent if I have the runTest loop 10 million times:
Number of times ran: 5
The max time was: 1231419504 (102.85% of the average)
The min time was: 1177508466 (98.35% of the average)
The average time was: 1197291937
The difference between the max and min is: 4.58%
Activated thread activity.
Number of times ran: 5
The max time was: 3872724739 (100.82% of the average)
The min time was: 3804827995 (99.05% of the average)
The average time was: 3841216849
The difference between the max and min is: 1.78%
Running with thread activity took 320.83% as much time as running without.
But this seems a bit much, and takes some time... if I try a lower number (100000) in the runTest loop... it starts to become very inconsistent:
Number of times ran: 5
The max time was: 34726168 (143.01% of the average)
The min time was: 20889055 (86.02% of the average)
The average time was: 24283026
The difference between the max and min is: 66.24%
Activated thread activity.
Number of times ran: 5
The max time was: 143950627 (148.83% of the average)
The min time was: 64780554 (66.98% of the average)
The average time was: 96719589
The difference between the max and min is: 122.21%
Running with thread activity took 398.3% as much time as running without.
Is there a way that I can do a benchmark like this that is both consistent and efficient/fast?
I'm not testing the code that is between the start and end times by the way. I'm testing the CPU load in a way (see how I'm starting some thread activity and retesting). So I think that what I'm looking for it something to substitute for the code I have in "runTest" that will yield quicker and more consistent results.
Thanks
In short:
(Micro-)benchmarking is very complex, so use a tool like the Benchmarking framework http://www.ellipticgroup.com/misc/projectLibrary.zip - and still be skeptical about the results ("Put micro-trust in a micro-benchmark", Dr. Cliff Click).
In detail:
There are a lot of factors that can strongly influence the results:
The accuracy and precision of System.nanoTime: it is in the worst case as bad as of System.currentTimeMillis.
code warmup and class loading
mixed mode: JVMs JIT compile (see Edwin Buck's answer) only after a code block is called sufficiently often (1500 or 1000 times)
dynamic optimizations: deoptimization, on-stack replacement, dead code elimination (you should use the result you computed in your loop, e.g. print it)
resource reclamation: garbace collection (see Michael Borgwardt's answer) and object finalization
caching: I/O and CPU
your operating system on the whole: screen saver, power management, other processes (indexer, virus scan, ...)
Brent Boyer's article "Robust Java benchmarking, Part 1: Issues" ( http://www.ibm.com/developerworks/java/library/j-benchmark1/index.html) is a good description of all those issues and whether/what you can do against them (e.g. use JVM options or call ProcessIdleTask beforehand).
You won't be able to eliminate all these factors, so doing statistics is a good idea. But:
instead of computing the difference between the max and min, you should put in the effort to compute the standard deviation (the results {1, 1000 times 2, 3} is different from {501 times 1, 501 times 3}).
The reliability is taken into account by producing confidence intervals (e.g. via bootstrapping).
The above mentioned Benchmark framework ( http://www.ellipticgroup.com/misc/projectLibrary.zip) uses these techniques. You can read about them in Brent Boyer's article "Robust Java benchmarking, Part 2: Statistics and solutions" ( https://www.ibm.com/developerworks/java/library/j-benchmark2/).
Your code ends up testing mainly garbage collection performance because appending to a String in a loop ends up creating and immediately discarding a large number of increasingly large String objects.
This is something that inherently leads to wildly varying measurements and is influenced strongy by multi-thread activity.
I suggest you do something else in your loop that has more predictable performance, like mathematical calculations.
In the 10 million times run, odds are good the HotSpot compiler detected a "heavily used" piece of code and compiled it into machine native code.
JVM bytecode is interpreted, which leads it susceptible to more interrupts from other background processes occurring in the JVM (like garbage collection).
Generally speaking, these kinds of benchmarks are rife with assumptions that don't hold. You cannot believe that a micro benchmark really proves what it set out to prove without a lot of evidence proving that the initial measurement (time) isn't actually measuring your task and possibly some other background tasks. If you don't attempt to control for background tasks, then the measurement is much less useful.
Related
How can I achieve it without giving as input very large arrays? I am measuring the running time of different algorithms and for an array of 20 elements I get very (the same) similar values. I tried divided the total time by 1000000000 to clear of the E and then used like 16 mirrors where I copied the input array and executed it again for the mirror. But still it is the same for Heap and Quick sort. Any ideas without needing to write redundant lines?
Sample output:
Random array:
MergeSort:
Total time 14.333066343496
QuickSort:
Total time 14.3330663435256
HeapSort:
Total time 14.3330663435256
If you need code snippets just notify me.
To your direct question, use System.nanoTime() for more granular timestamps.
To your underlying question of how to get better benchmarks, you should run the benchmark repeatedly and on larger data sets. A benchmark that takes ~14ms to execute is going to be very noisy, even with a more precise clock. See also How do I write a correct micro-benchmark in Java?
You can't improve the granularity of this method. According to Java SE documentation:
Returns the current time in milliseconds. Note that while the unit of
time of the return value is a millisecond, the granularity of the
value depends on the underlying operating system and may be larger.
For example, many operating systems measure time in units of tens of
milliseconds.
(source)
As others stated, for time lapses, public static long nanoTime() would give you more precision, but not resolution:
This method provides nanosecond precision, but not necessarily
nanosecond resolution.
(source)
If an algorithm with O(n2 ) average case time complexity takes 10 seconds to execute for an input size of 1000 elements, how long will it take to run when the input size is 10,000 elements?
Can't be answered.. And anybody who actually gives a number is wrong. Because Time complexity is independent of underlying machine architecture. That is why we ignore the machine dependent constants.
Each platform has its own overhead of doing certain operations. So, again, the answer is not possible to say.
While it is not possible to give a specific number while applies to all machines, you can estimate that an O(n^2) should be around 100x for a 10x increase in n
This comes with two important qualifications
the time taken could be much less than this as the first test could include a significant amount of warm up. i.e. you might find it only takes twice as long. It would be surprising, but it is possible. In Java, warm is a significant factor in short tests, and it is not uncommon to see a test of 10K take far less time if you run it again. In fact a test of 10K run multiple times could start to show times faster than the 1K run once (and the JIT kicks in, esp if the code is easily optimised)
the time taken could be much higher, esp if some resource threshold is reached. e.g. say 1000 elements fits in cache, but 10000 elements fit in main memory. Similarly 1000 could fit in memory but 10000 must be paged from disk. This could result in it taking much longer.
I am exploring OpenJDK JMH for benchmarking my code. As per my understanding JMH by default forks multiple JVM in order to defend the test from previously collected “profiles”. Which is explained very well in this sample code.
However my question is that what impact I will have on result if I will execute using following two approaches:
1) with 1 fork , 100 iterations
2) with 10 fork, 10 iterations each
And which approach will give more accurate result?
It depends. Multiple forks are needed to estimate run-to-run variance, see JMHSample_13_RunTo_Run. Therefore, a single fork is definitely worse. Then, if you ask what is better: 10x100 run or 100x10 run, this again depends on what is the worse concern -- run-to-run variance, or in-run variance.
It depends on how much the results vary per fork vs. per iteration, which is workload specific.
If you want a rigorous statistical approach to figuring out this tradeoff, check out "Rigorous Benchmarking in Reasonable Time" (Kalibera, Jones). Equation 3 gives the optimal counts per level (in your case, these would be number of forks to run and number of iterations per fork) by using the observed variances between forks and between iterations.
I have developed an image processing algorithm in core java (without using any third party API), Now I have to calculate execution time of that algorithm, for this i have used System.currentTimeMillis() like that,
public class MyAlgo {
public MyAlgo(String imagePath){
long stTime = System.currentTimeMillis();
// ..........................
// My Algorithm
// ..........................
long endTime = System.currentTimeMillis();
System.out.println("Time ==> " + (endTime - stTime));
}
public static void main(String args[]){
new MyAlgo("d:\\myImage.bmp");
}
}
But the problem is that each time I am running this program I am getting different execution time. Can anyone please suggest me that how can I do this?
If you don't want to use external profiling libraries just wrap your algorithm in a for() loop that executes it 1000 times and divide the total time by 1000. The result will be much more accurate since all the other tasks/processes will even out.
Note: The overall measure time will reflect the expected time of the algorithm to finish and not the total time that algorithms code instruction require.
For example if your algorithm uses a lot of memory and on average java VM calls garbage collector twice per each execution of algorithm - than you should take into account also the time of the garbage collector.
That is exactly what a for() loop does, so you will get good results.
You cannot get a reliable result from one execution alone; Java (well, JVMs) does runtime optimizations, plus there are other processes competing for CPU time/resource access. Also, are you sure your algorithm runs in constant time whatever the inputs?
Your best bet to have a calculation as reliable as possible is to use a library dedicated to performance measurements; one of them is caliper.
Set up a benchmark with different inputs/outputs etc and run it.
You need to apply some statistical analysis over multiple executions of your algorithm. For instance, execute it 1000 times and analyze min, max and average time.
Multiple executions in different scenarios might provide insights too, for instance, in different hardware or with images with different resolution.
I suppose your algorithm can be divided in multiple steps. You can monitor the steps independently to understand the impact of each one.
Marvin Image Processing Framework, for instance, provides methods to monitor and analyze the time and the number of executions of each algorithm step.
First of all I have to admit that these are very basic and primitive questions... I want to demonstrate different Algorithms in Java for sorting and searching, and to get a value for the runtime. There're issues I cannot solve:
there's Hotspot compiling - which is a runtime optimization I need to deactivate (I guess).
How do I get time-values (seconds) for runtimes? Starting a timer before the execution and stopping it afterwards... seems a little primitive. And the timer-object itself consumes runtime... I need to avoid that.
Is there anything in the Java API one could utilize to solve these problems?
Thanks,
Klaus
You can disable HotSpot with -Xint on the command line, to get an order of magnitude decrease in performance. However, why don't you want to measure real world performance? Different things can become bottlenecks when you compile.
Generally for microbenchmarks:
use System.nanoTime to get a time measurement at the start and end
run for a reasonable length of time
do the measurement a number of times over (there's some "warm up")
don't interleave measurements of different algorithms
don't do any I/O in the measured segment
use the result (HotSpot can completely optimise away trivial operations)
do it in a real world situation (or a cloae as possible)
remember dual core is the norm, and more cores will become normal
Use -Xint JVM flag. Other options can be seen here.
Use the ThreadMXBean API to get CPU/User times for your thread. An example can be seen here.
Using System.nanoTime() twice consumes less than 1 micro-second. I suggest you run any benchmark for a number of second and take an average so a micro-second error won't be significant.
Overall, I would suggest not making things more complicated than you need it to be.
To have a built in warm up I often ignore the first 10%-20% of iterations. Something like
long start;
int count;
for(int i = -count / 5; i < count; i++) {
if (count == 0) start = System.nanoTime();
// do tested code
}
long time = System.nanoTime() - start;
long average = time / count;
System.out.printf("Average time was %,d micro-seconds%n", average / 1000);