This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed 9 years ago.
Today I made a simple test to compare the speed between java and c - a simple loop that makes an integer "i" increment from 0 to two billion.
I really expected c-language to be faster than java. I was surprised of the outcome:
the time it takes in seconds for java: approx. 1.8 seconds
the time it takes in seconds for c: approx. 3.6 seconds.
I DO NOT think that java is a faster language at all, but I DO NOT either understand why the loop is twice as fast as c in my simple programs?
Did I made a crucial misstake in the program? Or is the compiler of MinGW badly configurated or something?
public class Jrand {
public static void main (String[] args) {
long startTime = System.currentTimeMillis();
int i;
for (i = 0; i < 2000000000; i++) {
// Do nothing!
}
long endTime = System.currentTimeMillis();
float totalTime = (endTime - startTime);
System.out.println("time: " + totalTime/1000);
}
}
THE C-PROGRAM
#include<stdio.h>
#include<stdlib.h>
#include <time.h>
int main () {
clock_t startTime;
startTime = clock();
int i;
for (i = 0; i <= 2000000000; i++) {
// Do nothing
}
clock_t endTime;
endTime = clock();
float totalTime = endTime - startTime;
printf("%f", totalTime/1000);
return 0;
}
Rebuild your C version with any optimization level other than -O0 (e.g. -O2) and you will find it runs in 0 seconds. So the Java version takes 1.6 seconds to do nothing, and the C version takes 0.0 seconds (really, around 0.00005 seconds) to do nothing.
Java is more aggressive at eliminating code which doesn't do anything. It is less likely to assume the developer knows what they are doing. You are not timing the loop but how long it takes java to detect and eliminate the loop.
In short, Java is often faster at doing nothing useful.
Also you may find that if you optimise the C code and remove debugging information it will do the same thing, most likely shorter.
If you want to benchmark this, instead of doing nothing, try to something useful like calculating something on each iterations. For e.g. count the loops in some other variable, and make sure you use it at the end (by printing it for e.g), so that it will not be optimized out.
Alternate simple tests could be accessing an array linearly (reading only), copying elements from one array to another (read+write), or doing some operations on the data. Some of these cases might be interesting as they open several very simple compiler optimizations that you can later see in the result binary/bytecode, such as loop unrolling, register allocation, and maybe even more complicated stuff like vectorization or code motion. Java on the other may employ some nastier tricks such as jitting (dynamically recompiling on the fly)
The scope of compiler optimization is huge, you've just encountered the most basic one - eliminating useless code :)
Related
I've a requirement to capture the execution time of some code in iterations. I've decided to use a Map<Integer,Long> for capturing this data where Integer(key) is the iteration number and Long(value) is the time consumed by that iteration in milliseconds.
I've written the below java code to compute the time taken for each iteration. I want to ensure that the time taken by all iterations is zero before invoking actual code. Surprisingly, the below code behaves differently for every execution.
Sometimes, I get the desired output(zero millisecond for all iterations), but at times I do get positive and even negative values for some random iterations.
I've tried replacing System.currentTimeMillis(); with below code:
new java.util.Date().getTime();
System.nanoTime();
org.apache.commons.lang.time.StopWatch
but still no luck.
Any suggestions as why some iterations take additional time and how to eliminate it?
package com.stackoverflow.programmer;
import java.util.HashMap;
import java.util.Map;
public class TestTimeConsumption {
public static void main(String[] args) {
Integer totalIterations = 100000;
Integer nonZeroMilliSecondsCounter = 0;
Map<Integer, Long> timeTakenMap = new HashMap<>();
for (Integer iteration = 1; iteration <= totalIterations; iteration++) {
timeTakenMap.put(iteration, getTimeConsumed(iteration));
if (timeTakenMap.get(iteration) != 0) {
nonZeroMilliSecondsCounter++;
System.out.format("Iteration %6d has taken %d millisecond(s).\n", iteration,
timeTakenMap.get(iteration));
}
}
System.out.format("Total non zero entries : %d", nonZeroMilliSecondsCounter);
}
private static Long getTimeConsumed(Integer iteration) {
long startTime = System.currentTimeMillis();
// Execute code for which execution time needs to be captured
long endTime = System.currentTimeMillis();
return (endTime - startTime);
}
}
Here's the sample output from 5 different executions of the same code:
Execution #1 (NOT OK)
Iteration 42970 has taken 1 millisecond(s).
Total non zero entries : 1
Execution #2 (OK)
Total non zero entries : 0
Execution #3 (OK)
Total non zero entries : 0
Execution #4 (NOT OK)
Iteration 65769 has taken -1 millisecond(s).
Total non zero entries : 1
Execution #5 (NOT OK)
Iteration 424 has taken 1 millisecond(s).
Iteration 33053 has taken 1 millisecond(s).
Iteration 76755 has taken -1 millisecond(s).
Total non zero entries : 3
I am looking for a Java based solution that ensures that all
iterations consume zero milliseconds consistently. I prefer to
accomplish this using pure Java code without using a profiler.
Note: I was also able to accomplish this through C code.
Your HashMap performance may be dropping if it is resizing. The default capacity is 16 which you are exceeding. If you know the expected capacity up front, create the HashMap with the appropriate size taking into account the default load factor of 0.75
If you rerun iterations without defining a new map and the Integer key does not start again from zero, you will need to resize the map taking into account the total of all possible iterations.
int capacity = (int) ((100000/0.75)+1);
Map<Integer, Long> timeTakenMap = new HashMap<>(capacity);
As you are starting to learn here, writing microbenchmarks in Java is not as easy as one would first assume. Everybody gets bitten at some point, even the hardened performance experts who have been doing it for years.
A lot is going on within the JVM and the OS that skews the results, such as GC, hotspot on the fly optimisations, recompilations, clock corrections, thread contention/scheduling, memory contention and cache misses. To name just a few. And sadly these skews are not consistent, and they can very easily dominate a microbenchmark.
To answer your immediate question of why the timings can some times go negative, it is because currentTimeMillis is designed to capture wall clock time and not elapsed time. No wall clock is accurate on a computer and there are times when the clock will be adjusted.. very possibly backwards. More detail on Java's clocks can be read on the following Oracle Blog Inside the Oracle Hotspot VM clocks.
Further details and support of nanoTime verses currentTimeMillis can be read here.
Before continuing with your own benchmark, I strongly recommend that you read how do I write a currect micro benchmark in java. The quick synopses is to 1) warm up the JVM before taking results, 2) jump through hoops to avoid dead code elimination, 3) ensure that nothing else is running on the same machine but accept that there will be thread scheduling going on.. you may even want to pin threads to cores, depends on how far you want to take this, 4) use a framework specifically designed for microbenchmarking such as JMH or for quick light weight spikes JUnitMosaic gives good results.
I'm not sure if I understand your question.
You're trying to execute a certain set of statements S, and expect the execution time to be zero. You then test this premise by executing it a number of times and verifying the result.
That is a strange expectation to have: anything consumes some time, and possibly even more. Hence, although it would be possible to test successfully, that does not prove that no time has been used, since your program is save_time();execute(S);compare_time(). Even if execute(S) is nothing, your timing is discrete, and as such, it is possible that the 'tick' of your wallclock just happens to happen just between save_time and compare_time, leading to some time having been visibly past.
As such, I'd expect your C program to behave exactly the same. Have you run that multiple times? What happens when you increase the iterations to over millions? If it still does not occur, then apparently your C compiler has optimized the code in such a way that no time is measured, and apparently, Java doesn't.
Or am I understanding you wrong?
You hint it right... System.currentTimeMillis(); is the way to go in this case.
There is no warranty that increasing the value of the integer object i represent either a millisecond or a Cycle-Time in no system...
you should take the System.currentTimeMillis() and calculated the elapsed time
Example:
public static void main(String[] args) {
long lapsedTime = System.currentTimeMillis();
doFoo();
lapsedTime -= System.currentTimeMillis();
System.out.println("Time:" + -lapsedTime);
}
I am also not sure exactly, You're trying to execute a certain code, and try to get the execution for each iteration of execution.
I hope I understand correct, if that so than i would suggest please use
System.nanoTime() instead of System.currentTimeMillis(); because if your statement of block has very small enough you always get Zero in Millisecond.
Simple Ex could be:
public static void main(String[] args) {
long lapsedTime = System.nanoTime();
//do your stuff here.
lapsedTime -= System.nanoTime();
System.out.println("Time Taken" + -lapsedTime);
}
If System.nanoTime() and System.currentTimeMillis(); are nothing much difference. But its just how much accurate result you need and some time difference in millisecond you may get Zero in case if you your set of statement are not more in each iteration.
In Java I want to measure time for
1000 integer comparisons ("<" operator),
1000 integer additions (a+b
each case for different a and b),
another simple operations.
I know I can do it in the following way:
Random rand = new Random();
long elapsedTime = 0;
for (int i = 0; i < 1000; i++) {
int a = Integer.MIN_VALUE + rand.nextInt(Integer.MAX_VALUE);
int b = Integer.MIN_VALUE + rand.nextInt(Integer.MAX_VALUE);
long start = System.currentTimeMillis();
if (a < b) {}
long stop = System.currentTimeMillis();
elapsedTime += (start - stop);
}
System.out.println(elapsedTime);
I know that this question may seem somehow not clear.
How those values depend on my processor (i.e. relation between time for those operations and my processor) and JVM? Any suggestions?
I'm looking for understandable readings...
How those values depend on my processor (i.e. relation between time for those operations and my processor) and JVM? Any suggestions?
It is not dependant on your processor, at least not directly.
Normally, when you run code enough, it will compile it to native code. When it does this, it removes code which doesn't do anything, so what you will be doing here is measuring the time it takes to perform a System.currentMillis(), which is typically about 0.00003 ms. This means you will get 0 99.997% of the time and see a 1 very rarely.
I say normally, but in this case your code won't be compiled to native code, as the default threshold is 10,000 iterations. I.e. you would be testing how long it takes the interpretor to execute the byte code. This is much slower, but would still be a fraction of a milli-second. i.e. you have higher chance seeing a 1 but still unlikely.
If you want to learn more about low level benchmarking in Java, I suggest you read JMH and the Author's blog http://shipilev.net/
If you want to see what machine code is generated from Java code I suggest you try JITWatch
I'm trying to time the performance of my program by using System.currentTimeMillis() (or alternatively System.nanoTime()) and I've noticed that every time I run it - it gives a different result for time it took to finish the task.
Even the straightforward test:
long totalTime;
long startTime;
long endTime;
startTime = System.currentTimeMillis();
for (int i = 0; i < 1000000000; i++)
{
for (int j = 0; j < 1000000000; j++)
{
}
}
endTime = System.currentTimeMillis();
totalTime = endTime-startTime;
System.out.println("Time: " + totalTime);
produces all sorts of different outputs, from 0 to 200. Can anyone say what I'm doing wrong or suggest an alternative solution?
The loop doesn't do anything, so you are timing how long it takes to detect the loop is pointless.
Timing the loops more accurately won't help, you need to do something slightly useful to get repeatable results.
I suggest you try -server if you are running on 32-bit windows.
A billion billion clock cycles takes about 10 years so its not really iterating that many times.
This is exactly the expected behavior -- it's supposed to get faster as you rerun the timing. As you rerun a method many times, the JIT devotes more effort to compiling it to native code and optimizing it; I would expect that after running this code for long enough, the JIT would eliminate the loop entirely, since it doesn't actually do anything.
The best and simplest way to get precise benchmarks on Java code is to use a tool like Caliper that "warms up" the JIT to encourage it to optimize your code fully.
this question is just speculative.
I have the following implementation in C++:
using namespace std;
void testvector(int x)
{
vector<string> v;
char aux[20];
int a = x * 2000;
int z = a + 2000;
string s("X-");
for (int i = a; i < z; i++)
{
sprintf(aux, "%d", i);
v.push_back(s + aux);
}
}
int main()
{
for (int i = 0; i < 10000; i++)
{
if (i % 1000 == 0) cout << i << endl;
testvector(i);
}
}
In my box, this program gets executed in approx. 12 seconds; amazingly, I have a similar implementation in Java [using String and ArrayList] and it runs lot faster than my C++ application (approx. 2 seconds).
I know the Java HotSpot performs a lot of optimizations when translating to native, but I think if such performance can be done in Java, it could be implemented in C++ too...
So, what do you think that should be modified in the program above or, I dunno, in the libraries used or in the memory allocator to reach similar performances in this stuff? (writing actual code of these things can be very long, so, discussing about it would be great)...
Thank you.
You have to be careful with performance tests because it's very easy to deceive yourself or not compare like with like.
However, I've seen similar results comparing C# with C++, and there are a number of well-known blog posts about the astonishment of native coders when confronted with this kind of evidence. Basically a good modern generational compacting GC is very much more optimised for lots of small allocations.
In C++'s default allocator, every block is treated the same, and so are averagely expensive to allocate and free. In a generational GC, all blocks are very, very cheap to allocate (nearly as cheap as stack allocation) and if they turn out to be short-lived then they are also very cheap to clean up.
This is why the "fast performance" of C++ compared with more modern languages is - for the most part - mythical. You have to hand tune your C++ program out of all recognition before it can compete with the performance of an equivalent naively written C# or Java program.
All your program does is print the numbers 0..9000 in steps of 1000. The calls to testvector() do nothing and can be eliminated. I suspect that your JVM notices this, and is essentially optimising the whole function away.
You can achieve a similar effect in your C++ version by just commenting out the call to testvector()!
Well, this is a pretty useless test that only measures allocation of small objects.
That said, simple changes made me get the running time down from about 15 secs to about 4 secs. New version:
typedef vector<string, boost::pool_allocator<string> > str_vector;
void testvector(int x, str_vector::iterator it, str_vector::iterator end)
{
char aux[25] = "X-";
int a = x * 2000;
for (; it != end; ++a)
{
sprintf(aux+2, "%d", a);
*it++ = aux;
}
}
int main(int argc, char** argv)
{
str_vector v(2000);
for (int i = 0; i < 10000; i++)
{
if (i % 1000 == 0) cout << i << endl;
testvector(i, v.begin(), v.begin()+2000);
}
return 0;
}
real 0m4.089s
user 0m3.686s
sys 0m0.000s
Java version has the times:
real 0m2.923s
user 0m2.490s
sys 0m0.063s
(This is my direct java port of your original program, except it passes the ArrayList as a parameter to cut down on useless allocations).
So, to sum up, small allocations are faster on java, and memory management is a bit more hassle in C++. But we knew that already :)
Hotspot optimises hot spots in code. Typically, anything that gets executed 10000 times it tries to optimise.
For this code, after 5 iterations it will try and optimise the inner loop adding the strings to the vector. The optimisation it will do more than likely will include escape analyi o the variables in the method. A the vector is a local variable and never escapes local context, it is very likely that it will remove all of the code in the method and turn it into a no op. To test this, try returning the results from the method. Even then, be careful to do something meaningful with the result - just getting it's length for example can be optimised as horpsot can see the result is alway the same a s the number of iterations in the loop.
All of this points to the key benefit of a dynamic compiler like hotspot - using runtime analysis you can optimise what is actually being done at runtime and get rid of redundant code. After all, it doesn't matter how efficient your custom C++ memory allocator is - not executing any code is always going to be faster.
In my box, this program gets executed in approx. 12 seconds; amazingly, I have a similar implementation in Java [using String and ArrayList] and it runs lot faster than my C++ application (approx. 2 seconds).
I cannot reproduce that result.
To account for the optimization mentioned by Alex, I’ve modified the codes so that both the Java and the C++ code printed the last result of the v vector at the end of the testvector method.
Now, the C++ code (compiled with -O3) runs about as fast as yours (12 sec). The Java code (straightforward, uses ArrayList instead of Vector although I doubt that this would impact the performance, thanks to escape analysis) takes about twice that time.
I did not do a lot of testing so this result is by no means significant. It just shows how easy it is to get these tests completely wrong, and how little single tests can say about real performance.
Just for the record, the tests were run on the following configuration:
$ uname -ms
Darwin i386
$ java -version
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b03-226)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02-92, mixed mode)
$ g++ --version
i686-apple-darwin9-g++-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5490)
It should help if you use Vector::reserve to reserve space for z elements in v before the loop (however the same thing should also speed up the java equivalent of this code).
To suggest why the performance both C++ and java differ it would essential to see source for both, I can see a number of performance issues in the C++, for some it would be useful to see if you were doing the same in the java (e.g. flushing the output stream via std::endl, do you call System.out.flush() or just append a '\n', if the later then you've just given the java a distinct advantage)?
What are you actually trying to measure here? Putting ints into a vector?
You can start by pre-allocating space into the vector with the know size of the vector:
instead of:
void testvector(int x)
{
vector<string> v;
int a = x * 2000;
int z = a + 2000;
string s("X-");
for (int i = a; i < z; i++)
v.push_back(i);
}
try:
void testvector(int x)
{
int a = x * 2000;
int z = a + 2000;
string s("X-");
vector<string> v(z);
for (int i = a; i < z; i++)
v.push_back(i);
}
In your inner loop, you are pushing ints into a string vector. If you just single-step that at the machine-code level, I'll bet you find that a lot of that time goes into allocating and formatting the strings, and then some time goes into the pushback (not to mention deallocation when you release the vector).
This could easily vary between run-time-library implementations, based on the developer's sense of what people would reasonably want to do.
One of my friends showed me something he had done, and I was at a serious loss to explain how this could have happened: he was using a System.nanotime to time something, and it gave the user an update every second to tell how much time had elapsed (it Thread.sleep(1000) for that part), and it took seemingly forever (something that was waiting for 10 seconds took roughly 3 minutes to finish). We tried using millitime in order to see how much time had elapsed: it printed how much nanotime had elapsed every second, and we saw that for every second, the nanotime was moving by roughly 40-50 milliseconds every second.
I checked for bugs relating to System.nanotime and Java, but it seemed the only things I could find involved the nanotime suddenly greatly increasing and then stopping. I also browsed this blog entry based on something I read in a different question, but that didn't have anything that may cause it.
Obviously this could be worked around for this situation by just using the millitime instead; there are lots of workarounds to this, but what I'm curious about is if there's anything other than a hardware issue with the system clock or at least whatever the most accurate clock the CPU has (since that's what System.nanotime seems to use) that could cause it to run consistently slow like this?
long initialNano = System.nanoTime();
long initialMili = System.currentTimeMillis();
//Obviously the code isn't actually doing a while(true),
//but it illustrates the point
while(true) {
Thread.sleep(1000);
long currentNano = System.nanoTime();
long currentMili = System.currentTimeMillis();
double secondsNano = ((double) (currentNano - initialNano))/1000000000D;
double secondsMili = ((double) (currentMili - initialMili))/1000D;
System.out.println(secondsNano);
System.out.println(secondsMili);
}
secondsNano will print something along the lines of 0.04, whereas secondsMili will print something very close to 1.
It looks like a bug along this line has been reported at Sun's bug database, but they closed it as a duplicate, but their link doesn't go to an existing bug. It seems to be very system-specific, so I'm getting more and more sure this is a hardware issue.
... he was using a System.nanotime to cause the program to wait before doing something, and ...
Can you show us some code that demonstrates exactly what he was doing? Was it some strange kind of busy loop, like this:
long t = System.nanoTime() + 1000000000L;
while (System.nanoTime() < t) { /* do nothing */ }
If yes, then that's not the right way to make your program pause for a while. Use Thread.sleep(...) instead to make the program wait for a specified number of milliseconds.
You do realise that the loop you are using doesn't take exactly 1 second to run? Firstly Thread.sleep() isn't guaranteed to be accurate, and the rest of the code in the loop does take some time to execute (Both nanoTime() and currentTimeMillis() actually can be quite slow depending on the underlying implementation). Secondly, System.currentTimeMillis() is not guaranteed to be accurate either (it only updates every 50ms on some operating system and hardware combinations). You also mention it being inaccurate to 40-50ms above and then go on to say 0.004s which is actually only 4ms.
I would recommend you change your System.out.println() to be:
System.out.println(secondsNano - secondsMili);
This way, you'll be able to see how much the two clocks differ on a second-by-second basis. I left it running for about 12 hours on my laptop and it was out by 1.46 seconds (fast, not slow). This shows that there is some drift in the two clocks.
I would think that the currentTimeMillis() method provides a more accurate time over a large period of time, yet nanoTime() has a greater resolution and is good for timing code or providing sub-millisecond timing over short time periods.
I've experienced the same problem. Except in my case, it is more pronounced.
With this simple program:
public class test {
public static void main(String[] args) {
while (true) {
try {
Thread.sleep(1000);
}
catch (InterruptedException e) {
}
OStream.out("\t" + System.currentTimeMillis() + "\t" + nanoTimeMillis());
}
}
static long nanoTimeMillis() {
return Math.round(System.nanoTime() / 1000000.0);
}
}
I get the following results:
13:05:16:380 main: 1288199116375 61530042
13:05:16:764 main: 1288199117375 61530438
13:05:17:134 main: 1288199118375 61530808
13:05:17:510 main: 1288199119375 61531183
13:05:17:886 main: 1288199120375 61531559
The nanoTime is showing only ~400ms elapsed for each second.