I'm trying to time the performance of my program by using System.currentTimeMillis() (or alternatively System.nanoTime()) and I've noticed that every time I run it - it gives a different result for time it took to finish the task.
Even the straightforward test:
long totalTime;
long startTime;
long endTime;
startTime = System.currentTimeMillis();
for (int i = 0; i < 1000000000; i++)
{
for (int j = 0; j < 1000000000; j++)
{
}
}
endTime = System.currentTimeMillis();
totalTime = endTime-startTime;
System.out.println("Time: " + totalTime);
produces all sorts of different outputs, from 0 to 200. Can anyone say what I'm doing wrong or suggest an alternative solution?
The loop doesn't do anything, so you are timing how long it takes to detect the loop is pointless.
Timing the loops more accurately won't help, you need to do something slightly useful to get repeatable results.
I suggest you try -server if you are running on 32-bit windows.
A billion billion clock cycles takes about 10 years so its not really iterating that many times.
This is exactly the expected behavior -- it's supposed to get faster as you rerun the timing. As you rerun a method many times, the JIT devotes more effort to compiling it to native code and optimizing it; I would expect that after running this code for long enough, the JIT would eliminate the loop entirely, since it doesn't actually do anything.
The best and simplest way to get precise benchmarks on Java code is to use a tool like Caliper that "warms up" the JIT to encourage it to optimize your code fully.
Related
My program iterates over a given data and I have observed a strange behavior. The first few samples that the algorithm processes, show a slow run time performance but then the subsequent samples and iterations run at almost a consistent (and with a relatively low run time than the first few samples/iterations).
Why is this so? I even tried to call the function outside of the iterating loop as a warm up function call hoping if the JVM was optimizing the code it would do that with the warm up function call.
// warm up function call
warpInfo = warp.getDTW(testSet.get(startIndex), trainSet.get(0), distFn, windowSize);
this.startTime = System.currentTimeMillis();
for(int i=startIndex; i<endIndex; i++) {
for(int j=0; j<trainSet.size(); j++) {
train = trainSet.get(j);
instStartTime = System.currentTimeMillis();
warpInfo = warp.getDTW(test, train, distFn, windowSize);
if(warpInfo.getWarpDistance()<bestDist) {
bestDist = warpInfo.getWarpDistance();
classPredicted = train.getTSClass();
}
instEndTime = System.currentTimeMillis();
instProcessingTime = instEndTime - instStartTime;
// record timiing and results here
}
// record other information here
}
After searching, I have come across a few links SO Answer to a similar question and this Java performance comparison which have the option of specifying the -XX:CompileThreshold=1 when running the program to tell the JVM to compile the code after this many function calls. Also see the Oracle Page.
After performing a test run with my program, I can state that it solves the problem.
**EDIT: **Based on Zarev's comments, I'd say I was wrong in posting this as an answer. CompieThreshold flag only allows for a premature compiling of the code so it is not causing the larger run time for the first iteration but in reality it may be causing the program to be compiled without considerations for the details.
In Java I want to measure time for
1000 integer comparisons ("<" operator),
1000 integer additions (a+b
each case for different a and b),
another simple operations.
I know I can do it in the following way:
Random rand = new Random();
long elapsedTime = 0;
for (int i = 0; i < 1000; i++) {
int a = Integer.MIN_VALUE + rand.nextInt(Integer.MAX_VALUE);
int b = Integer.MIN_VALUE + rand.nextInt(Integer.MAX_VALUE);
long start = System.currentTimeMillis();
if (a < b) {}
long stop = System.currentTimeMillis();
elapsedTime += (start - stop);
}
System.out.println(elapsedTime);
I know that this question may seem somehow not clear.
How those values depend on my processor (i.e. relation between time for those operations and my processor) and JVM? Any suggestions?
I'm looking for understandable readings...
How those values depend on my processor (i.e. relation between time for those operations and my processor) and JVM? Any suggestions?
It is not dependant on your processor, at least not directly.
Normally, when you run code enough, it will compile it to native code. When it does this, it removes code which doesn't do anything, so what you will be doing here is measuring the time it takes to perform a System.currentMillis(), which is typically about 0.00003 ms. This means you will get 0 99.997% of the time and see a 1 very rarely.
I say normally, but in this case your code won't be compiled to native code, as the default threshold is 10,000 iterations. I.e. you would be testing how long it takes the interpretor to execute the byte code. This is much slower, but would still be a fraction of a milli-second. i.e. you have higher chance seeing a 1 but still unlikely.
If you want to learn more about low level benchmarking in Java, I suggest you read JMH and the Author's blog http://shipilev.net/
If you want to see what machine code is generated from Java code I suggest you try JITWatch
This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed 9 years ago.
Today I made a simple test to compare the speed between java and c - a simple loop that makes an integer "i" increment from 0 to two billion.
I really expected c-language to be faster than java. I was surprised of the outcome:
the time it takes in seconds for java: approx. 1.8 seconds
the time it takes in seconds for c: approx. 3.6 seconds.
I DO NOT think that java is a faster language at all, but I DO NOT either understand why the loop is twice as fast as c in my simple programs?
Did I made a crucial misstake in the program? Or is the compiler of MinGW badly configurated or something?
public class Jrand {
public static void main (String[] args) {
long startTime = System.currentTimeMillis();
int i;
for (i = 0; i < 2000000000; i++) {
// Do nothing!
}
long endTime = System.currentTimeMillis();
float totalTime = (endTime - startTime);
System.out.println("time: " + totalTime/1000);
}
}
THE C-PROGRAM
#include<stdio.h>
#include<stdlib.h>
#include <time.h>
int main () {
clock_t startTime;
startTime = clock();
int i;
for (i = 0; i <= 2000000000; i++) {
// Do nothing
}
clock_t endTime;
endTime = clock();
float totalTime = endTime - startTime;
printf("%f", totalTime/1000);
return 0;
}
Rebuild your C version with any optimization level other than -O0 (e.g. -O2) and you will find it runs in 0 seconds. So the Java version takes 1.6 seconds to do nothing, and the C version takes 0.0 seconds (really, around 0.00005 seconds) to do nothing.
Java is more aggressive at eliminating code which doesn't do anything. It is less likely to assume the developer knows what they are doing. You are not timing the loop but how long it takes java to detect and eliminate the loop.
In short, Java is often faster at doing nothing useful.
Also you may find that if you optimise the C code and remove debugging information it will do the same thing, most likely shorter.
If you want to benchmark this, instead of doing nothing, try to something useful like calculating something on each iterations. For e.g. count the loops in some other variable, and make sure you use it at the end (by printing it for e.g), so that it will not be optimized out.
Alternate simple tests could be accessing an array linearly (reading only), copying elements from one array to another (read+write), or doing some operations on the data. Some of these cases might be interesting as they open several very simple compiler optimizations that you can later see in the result binary/bytecode, such as loop unrolling, register allocation, and maybe even more complicated stuff like vectorization or code motion. Java on the other may employ some nastier tricks such as jitting (dynamically recompiling on the fly)
The scope of compiler optimization is huge, you've just encountered the most basic one - eliminating useless code :)
I have one problem that I can't explain. Here is the code in main function:
String numberStr = "3151312423412354315";
System.out.println(numberStr + "\n");
System.out.println("Lehman method: ");
long beginTime = System.currentTimeMillis();
System.out.println(Lehman.getFullFactorization(numberStr));
long finishTime = System.currentTimeMillis();
System.out.println((finishTime-beginTime)/1000. + " sec.");
System.out.println();
System.out.println("Lehman method: ");
beginTime = System.currentTimeMillis();
System.out.println(Lehman.getFullFactorization(numberStr));
finishTime = System.currentTimeMillis();
System.out.println((finishTime-beginTime)/1000. + " sec.");
If it is necessary: method Lehman.getFullFactorization(...) returns the ArrayList of prime divisors in String format.
Here is the output:
3151312423412354315
Lehman method:
[5, 67, 24473, 384378815693]
0.149 sec.
Lehman method:
[5, 67, 24473, 384378815693]
0.016 sec.
I was surprised, when I saw it. Why a second execution of the same method much faster than first? Firstly, I thought that at the first running of the method it calculates time with time of running JVM and its resources, but it's impossible, because obviously JVM starts before execution of the "main" method.
In some cases, Java's JIT compiler (see http://java.sun.com/developer/onlineTraining/Programming/JDCBook/perf2.html#jit) kicks in on the first execution of a method and performs optimizations of that methods code. That is supposed to make all subsequent executions faster. I think this might be what happens in your case.
Try doing it more than 10,000 times and it will be much faster. This is because the code first has to be loaded (expensive) then runs in interpreted mode (ok speed) and is finally compiled to native code (much faster)
Can you try this?
int runs = 100*1000;
for(int i = -20000 /* warmup */; i < runs; i++) {
if(i == 0)
beginTime = System.nanoTime();
Lehman.getFullFactorization(numberStr);
}
finishTime = System.nanoTime();
System.out.println("Average time was " + (finishTime-beginTime)/1e9/runs. + " sec.");
I suppose the JVM has cached the results (may be particularly) of the first calculation and you observe the faster second calculation. JIT in action.
There are two things that make the second run faster.
The first time, the class containing the method must be loaded. The second time, it is already in memory.
Most importantly, the JIT optimizes code that is often executed: during the first call, the JVM starts by interpreting the byte code and then compiles it into machine code and continues the execution. The second time, the code is already compiled.
That's why micro-benchmarks in Java are often hard to validate.
My guess is that it's saved in the L1/L2 cache on the CPU for optimization.
Or Java doesn't have to interpret it again and recalls it from the memory as already compiled code.
I have a very long string with the pattern </value> at the very end, I am trying to test the performance of some function calls, so I made the following test to try to find out the answer... but I think I might be using nanoTime incorrectly? Because the result doesn't make sense no matter how I swap the order around...
long start, end;
start = System.nanoTime();
StringUtils.indexOf(s, "</value>");
end = System.nanoTime();
System.out.println(end - start);
start = System.nanoTime();
s.indexOf("</value>");
end = System.nanoTime();
System.out.println(end - start);
start = System.nanoTime();
sb.indexOf("</value>");
end = System.nanoTime();
System.out.println(end - start);
I get the following:
163566 // StringUtils
395227 // String
30797 // StringBuilder
165619 // StringBuilder
359639 // String
32850 // StringUtils
No matter which order I swap them around, the numbers will always be somewhat the same... What's the deal here?
From java.sun.com website's FAQ:
Using System.nanoTime() between various points in the code to perform elapsed time measurements should always be accurate.
Also:
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#nanoTime()
The differences between the two runs is in the order of microseconds and that is expected. There are many things going on on your machine which make the execution environment never the same between two runs of your application. That is why you get that difference.
EDIT: Java API says:
This method provides nanosecond precision, but not necessarily
nanosecond accuracy.
Most likely there's memory initialization issues or other things that happen at the JVM's startup that is skewing your numbers. You should get a bigger sample to get more accurate numbers. Play around with the order, run it multiple times, etc.
It is more than likely that the methods you check use some common code behind the scenes. But the JIT will do its work only after about 10.000 invocations. Hence, this could be the cause why your first two example seem to be always slower.
Quick fix: just do the 3 method calls before the first measuring on a long enoug string.