Java : Issue with capturing execution time per iteration in a Map - java

I've a requirement to capture the execution time of some code in iterations. I've decided to use a Map<Integer,Long> for capturing this data where Integer(key) is the iteration number and Long(value) is the time consumed by that iteration in milliseconds.
I've written the below java code to compute the time taken for each iteration. I want to ensure that the time taken by all iterations is zero before invoking actual code. Surprisingly, the below code behaves differently for every execution.
Sometimes, I get the desired output(zero millisecond for all iterations), but at times I do get positive and even negative values for some random iterations.
I've tried replacing System.currentTimeMillis(); with below code:
new java.util.Date().getTime();
System.nanoTime();
org.apache.commons.lang.time.StopWatch
but still no luck.
Any suggestions as why some iterations take additional time and how to eliminate it?
package com.stackoverflow.programmer;
import java.util.HashMap;
import java.util.Map;
public class TestTimeConsumption {
public static void main(String[] args) {
Integer totalIterations = 100000;
Integer nonZeroMilliSecondsCounter = 0;
Map<Integer, Long> timeTakenMap = new HashMap<>();
for (Integer iteration = 1; iteration <= totalIterations; iteration++) {
timeTakenMap.put(iteration, getTimeConsumed(iteration));
if (timeTakenMap.get(iteration) != 0) {
nonZeroMilliSecondsCounter++;
System.out.format("Iteration %6d has taken %d millisecond(s).\n", iteration,
timeTakenMap.get(iteration));
}
}
System.out.format("Total non zero entries : %d", nonZeroMilliSecondsCounter);
}
private static Long getTimeConsumed(Integer iteration) {
long startTime = System.currentTimeMillis();
// Execute code for which execution time needs to be captured
long endTime = System.currentTimeMillis();
return (endTime - startTime);
}
}
Here's the sample output from 5 different executions of the same code:
Execution #1 (NOT OK)
Iteration 42970 has taken 1 millisecond(s).
Total non zero entries : 1
Execution #2 (OK)
Total non zero entries : 0
Execution #3 (OK)
Total non zero entries : 0
Execution #4 (NOT OK)
Iteration 65769 has taken -1 millisecond(s).
Total non zero entries : 1
Execution #5 (NOT OK)
Iteration 424 has taken 1 millisecond(s).
Iteration 33053 has taken 1 millisecond(s).
Iteration 76755 has taken -1 millisecond(s).
Total non zero entries : 3
I am looking for a Java based solution that ensures that all
iterations consume zero milliseconds consistently. I prefer to
accomplish this using pure Java code without using a profiler.
Note: I was also able to accomplish this through C code.

Your HashMap performance may be dropping if it is resizing. The default capacity is 16 which you are exceeding. If you know the expected capacity up front, create the HashMap with the appropriate size taking into account the default load factor of 0.75
If you rerun iterations without defining a new map and the Integer key does not start again from zero, you will need to resize the map taking into account the total of all possible iterations.
int capacity = (int) ((100000/0.75)+1);
Map<Integer, Long> timeTakenMap = new HashMap<>(capacity);

As you are starting to learn here, writing microbenchmarks in Java is not as easy as one would first assume. Everybody gets bitten at some point, even the hardened performance experts who have been doing it for years.
A lot is going on within the JVM and the OS that skews the results, such as GC, hotspot on the fly optimisations, recompilations, clock corrections, thread contention/scheduling, memory contention and cache misses. To name just a few. And sadly these skews are not consistent, and they can very easily dominate a microbenchmark.
To answer your immediate question of why the timings can some times go negative, it is because currentTimeMillis is designed to capture wall clock time and not elapsed time. No wall clock is accurate on a computer and there are times when the clock will be adjusted.. very possibly backwards. More detail on Java's clocks can be read on the following Oracle Blog Inside the Oracle Hotspot VM clocks.
Further details and support of nanoTime verses currentTimeMillis can be read here.
Before continuing with your own benchmark, I strongly recommend that you read how do I write a currect micro benchmark in java. The quick synopses is to 1) warm up the JVM before taking results, 2) jump through hoops to avoid dead code elimination, 3) ensure that nothing else is running on the same machine but accept that there will be thread scheduling going on.. you may even want to pin threads to cores, depends on how far you want to take this, 4) use a framework specifically designed for microbenchmarking such as JMH or for quick light weight spikes JUnitMosaic gives good results.

I'm not sure if I understand your question.
You're trying to execute a certain set of statements S, and expect the execution time to be zero. You then test this premise by executing it a number of times and verifying the result.
That is a strange expectation to have: anything consumes some time, and possibly even more. Hence, although it would be possible to test successfully, that does not prove that no time has been used, since your program is save_time();execute(S);compare_time(). Even if execute(S) is nothing, your timing is discrete, and as such, it is possible that the 'tick' of your wallclock just happens to happen just between save_time and compare_time, leading to some time having been visibly past.
As such, I'd expect your C program to behave exactly the same. Have you run that multiple times? What happens when you increase the iterations to over millions? If it still does not occur, then apparently your C compiler has optimized the code in such a way that no time is measured, and apparently, Java doesn't.
Or am I understanding you wrong?

You hint it right... System.currentTimeMillis(); is the way to go in this case.
There is no warranty that increasing the value of the integer object i represent either a millisecond or a Cycle-Time in no system...
you should take the System.currentTimeMillis() and calculated the elapsed time
Example:
public static void main(String[] args) {
long lapsedTime = System.currentTimeMillis();
doFoo();
lapsedTime -= System.currentTimeMillis();
System.out.println("Time:" + -lapsedTime);
}

I am also not sure exactly, You're trying to execute a certain code, and try to get the execution for each iteration of execution.
I hope I understand correct, if that so than i would suggest please use
System.nanoTime() instead of System.currentTimeMillis(); because if your statement of block has very small enough you always get Zero in Millisecond.
Simple Ex could be:
public static void main(String[] args) {
long lapsedTime = System.nanoTime();
//do your stuff here.
lapsedTime -= System.nanoTime();
System.out.println("Time Taken" + -lapsedTime);
}
If System.nanoTime() and System.currentTimeMillis(); are nothing much difference. But its just how much accurate result you need and some time difference in millisecond you may get Zero in case if you your set of statement are not more in each iteration.

Related

java for loop performance difference

I am running below simple program , I know this is not best way to measure performance but the results are surprising to me , hence wanted to post question here.
public class findFirstTest {
public static void main(String[] args) {
for(int q=0;q<10;q++) {
long start2 = System.currentTimeMillis();
int k = 0;
for (int j = 0; j < 5000000; j++) {
if (j > 4500000) {
k = j;
break;
}
}
System.out.println("for value " + k + " with time " + (System.currentTimeMillis() - start2));
}
}
}
results are like below after multiple times running code.
for value 4500001 with time 3
for value 4500001 with time 25 ( surprised as it took 25 ms in 2nd iteration)
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
so I am not understanding why 2nd iteration took 25ms but 1st 3ms and later 0 ms and also why always for 2nd iteration when I am running code.
if I move start and endtime printing outside of outer forloop then results I am having is like
for value 4500001 with time 10
In first iteration, the code is running interpreted.
In second iteration, JIT kicks in, slowing it down a bit while it compiles to native code.
In remaining iterations, native code runs very fast.
Because your winamp needed to decode another few frames of your mp3 to queue it into the sound output buffers. Or because the phase of the moon changed a bit and your dynamic background needed changing, or because someone in east Croydon farted and your computer is subscribed to the 'smells from London' twitter feed. Who knows?
This isn't how you performance test. Your CPU is not such a simple machine after all; it has many cores, and each core has pipelines and multiple hierarchies of caches. Any given core can only interact with one of its caches, and because of this, if a core runs an instruction that operates on memory which is not currently in cache, then the core will shut down for a while: It sends to the memory controller a request to load the page of memory with the memory you need to access into a given cachepage, and will then wait until it is there; this can take many, many cycles.
On the other end you have an OS that is juggling hundreds of thousands of processes and threads, many of them internal to the kernel, per-empting like there is no tomorrow, and trying to give extra precedence to processes that are time sensitive, such as the aforementioned winamp which must get a chance to decode some more mp3 frames before the sound buffer is fully exhausted, or you'd notice skipping. This is non-trivial: On ye olde windows you just couldn't get this done which is why ye olde winamp was a magical marvel of engineering, more or less hacking into windows to ensure it got the priority it needed. Those days are long gone, but if you remember them, well, draw the conclusion that this isn't trivial, and thus, OSes do pre-empt with prejudice all the time these days.
A third significant factor is the JVM itself which is doing all sorts of borderline voodoo magic, as it has both a hotspot engine (which is doing bookkeeping on your code so that it can eventually conclude that it is worth spending considerable CPU resources to analyse the heck out of a method to rewrite it in optimized machinecode because that method seems to be taking a lot of CPU time), and a garbage collector.
The solution is to forget entirely about trying to measure time using such mere banalities as measuring currentTimeMillis or nanoTime and writing a few loops. It's just way too complicated for that to actually work.
No. Use JMH.

Problem in using System.currentTimeMillis(); in Java

I tried to observe the time taken by different inputs in the calculation of the nth Fibonacci number, but the output is <50ms for the first input and 0 ms for the rest
Why so?
import java.io.*;
import java.util.*;
class fib{
long fibo(int s){
if(s==1 ||s==2)
return 1;
else return fibo(s-1)+(s-2);
}
}
class fibrec{
public static void main(String args[]) throws java.io.IOException{
BufferedWriter wr=new BufferedWriter(new FileWriter("C:/Users/91887/desktop/books/java/foo3.txt"));
fib f=new fib();
Random rand=new Random();
int input[]=new int[10];
for(int p=0;p<10;p++){
long st=System.currentTimeMillis();
int i=rand.nextInt(12000);
wr.write("Input : "+i+"\nOutput : "+f.fibo(i)+"\n");
long et=System.currentTimeMillis();
wr.write("Time taken = "+(et-st)+"ms\n\n");
System.out.println(st+"\t"+et+"\t"+(et-st));
}
wr.close();
}
}
The granularity of the millisecond clock is at best one millisecond1.
But apparently, the execution times for your loop iterations are less than one millisecond. Sub-millisecond time intervals cannot be measured accurately using System.currentTimeMillis(). That is why you are getting zeros.
The explanation for the first measurement being 35 milliseconds is that this is due to JVM warmup effects. These may include:
time taken to load and initialize library code2,
time taken to JIT compile code, and
time taken up with a (possible) GC during or after loading and JIT compilation.
Secondly, I notice that your time measurement includes the time taken to print the number. You should move that after the second call to get the clock value because it could be significant.
Finally, it you want to get reproducible results, you should explicitly seed Random yourself rather than relying on the OS to give you a random seed. And I'm not convinced that you should be benchmarking a Fibonacci algorithm with random inputs anyway ...
1 - The numbers in your output suggest that it is actually 1 millisecond ...
2 - For example, the initialization and construction of a Random instance entails an OS call to get some "entropy" to seed the random number generator. That should be fast, but there are circumstances where it may not be. In your case, this happens before you start measuring time ...
The code between the two calls to System.currentTimeMillis() is executing too fast (after the first iteration) for any difference to be captured. You'd be able to see a difference if you were using System.nanoTime().
As for why the first iteration is slower than the subsequent ones, that would be because Java uses a Just In Time (JIT) compiler to optimise code at runtime.

What class is loaded when creating a thread with a lambda expression? [duplicate]

I have this following code
public class BenchMark {
public static void main(String args[]) {
doLinear();
doLinear();
doLinear();
doLinear();
}
private static void doParallel() {
IntStream range = IntStream.range(1, 6).parallel();
long startTime = System.nanoTime();
int reduce = range
.reduce((a, item) -> a * item).getAsInt();
long endTime = System.nanoTime();
System.out.println("parallel: " +reduce + " -- Time: " + (endTime - startTime));
}
private static void doLinear() {
IntStream range = IntStream.range(1, 6);
long startTime = System.nanoTime();
int reduce = range
.reduce((a, item) -> a * item).getAsInt();
long endTime = System.nanoTime();
System.out.println("linear: " +reduce + " -- Time: " + (endTime - startTime));
}
}
I was trying to benchmark streams but came through this execution time steadily decreasing upon calling the same function again and again
Output:
linear: 120 -- Time: 57008226
linear: 120 -- Time: 23202
linear: 120 -- Time: 17192
linear: 120 -- Time: 17802
Process finished with exit code 0
There is a huge difference between first and second execution time.
I'm sure JVM might be doing some tricks behind the scenes but can anybody help me understand whats really going on there ?
Is there anyway to avoid this optimization so I can benchmark true execution time ?
I'm sure JVM might be doing some tricks behind the scenes but can anybody help me understand whats really going on there?
The massive latency of the first invocation is due to the initialization of the complete lambda runtime subsystem. You pay this only once for the whole application.
The first time your code reaches any given lambda expression, you pay for the linkage of that lambda (initialization of the invokedynamic call site).
After some iterations you'll see additional speedup due to the JIT compiler optimizing your reduction code.
Is there anyway to avoid this optimization so I can benchmark true execution time?
You are asking for a contradiction here: the "true" execution time is the one you get after warmup, when all optimizations have been applied. This is the runtime an actual application would experience. The latency of the first few runs is not relevant to the wider picture, unless you are interested in single-shot performance.
For the sake of exploration you can see how your code behaves with JIT compilation disabled: pass -Xint to the java command. There are many more flags which disable various aspects of optimization.
UPDATE: Refer #Marko's answer for an explanation of the initial latency due to lambda linkage.
The higher execution time for the first call is probably a result of the JIT effect. In short, the JIT compilation of the byte codes into native machine code occurs during the first time your method is called. The JVM then attempts further optimization by identifying frequently-called (hot) methods, and re-generate their codes for higher performance.
Is there anyway to avoid this optimization so I can benchmark true execution time ?
You can certainly account for the JVM initial warm-up by excluding the first few result. Then increase the number of repeated calls to your method in a loop of tens of thousands of iterations, and average the results.
There are a few more options that you might want to consider adding to your execution to help reduce noises as discussed in this post. There are also some good tips from this post too.
true execution time
There's no thing like "true execution time". If you need to solve this task only once, the true execution time would be the time of the first test (along with time to startup the JVM itself). In general the time spent for execution of given piece of code depends on many things:
Whether this piece of code is interpreted, JIT-compiled by C1 or C2 compiler. Note that there are not just three options. If you call one method from another, one of them might be interpreted and another might be C2-compiled.
For C2 compiler: how this code was executed previously, so what's in branch and type profile. The polluted type profile can drastically reduce the performance.
Garbage collector state: whether it interrupts the execution or not
Compilation queue: whether JIT-compiler compiles other code simultaneously (which may slow down the execution of current code)
The memory layout: how objects located in the memory, how many cache lines should be loaded to access all the necessary data.
CPU branch predictor state which depends on the previous code execution and may increase or decrease number of branch mispredictions.
And so on and so forth. So even if you measure something in the isolated benchmark, this does not mean that the speed of the same code in the production will be the same. It may differ in the order of magnitude. So before measuring something you should ask yourself why you want to measure this thing. Usually you don't care how long some part of your program is executed. What you usually care is the latency and the throughput of the whole program. So profile the whole program and optimize the slowest parts. Probably the thing you are measuring is not the slowest.
Java VM loads a class into memory first time the class is used.
So the difference between 1st and 2nd run may be caused by class loading.

Time taken to execute a java method is zero?

I am reading the system time just before the method is invoked and immediately after method returns and taking the time difference, which will give the time taken by a method for execution.
Code snippet
long start = System.currentTimeMillis ();
method ();
long end = System.currentTimeMillis ();
System.out.println ("Time taken for execution is " + (end - start));
The strange thing is the output is 0..how is this possible..?
Chances are it's taking a shorter time than the fairly coarse-grained system clock. (For example, you may find that System.currentTimeMillis() only changes every 10 or 15 milliseconds.)
System.currentTimeMillis is good for finding out the current time, but it's not fine-grained enough for measuring short durations. Instead, you should use System.nanoTime() which uses a high-resolution timer. nanoTime() is not suitable for finding the current time - but it's designed for measuring durations.
Think of it as being the difference between a wall clock and a stopwatch.
use nanoTime()
Because it took less than 1 millisecond?
If you want to get a more meaningful metric, I would suggest calling your method in a loop 1000000 times, timing that, and then dividing by 1000000.
Of course, even then, that might not be representative; the effects on the cache will be different, etc.

System.nanotime running slow?

One of my friends showed me something he had done, and I was at a serious loss to explain how this could have happened: he was using a System.nanotime to time something, and it gave the user an update every second to tell how much time had elapsed (it Thread.sleep(1000) for that part), and it took seemingly forever (something that was waiting for 10 seconds took roughly 3 minutes to finish). We tried using millitime in order to see how much time had elapsed: it printed how much nanotime had elapsed every second, and we saw that for every second, the nanotime was moving by roughly 40-50 milliseconds every second.
I checked for bugs relating to System.nanotime and Java, but it seemed the only things I could find involved the nanotime suddenly greatly increasing and then stopping. I also browsed this blog entry based on something I read in a different question, but that didn't have anything that may cause it.
Obviously this could be worked around for this situation by just using the millitime instead; there are lots of workarounds to this, but what I'm curious about is if there's anything other than a hardware issue with the system clock or at least whatever the most accurate clock the CPU has (since that's what System.nanotime seems to use) that could cause it to run consistently slow like this?
long initialNano = System.nanoTime();
long initialMili = System.currentTimeMillis();
//Obviously the code isn't actually doing a while(true),
//but it illustrates the point
while(true) {
Thread.sleep(1000);
long currentNano = System.nanoTime();
long currentMili = System.currentTimeMillis();
double secondsNano = ((double) (currentNano - initialNano))/1000000000D;
double secondsMili = ((double) (currentMili - initialMili))/1000D;
System.out.println(secondsNano);
System.out.println(secondsMili);
}
secondsNano will print something along the lines of 0.04, whereas secondsMili will print something very close to 1.
It looks like a bug along this line has been reported at Sun's bug database, but they closed it as a duplicate, but their link doesn't go to an existing bug. It seems to be very system-specific, so I'm getting more and more sure this is a hardware issue.
... he was using a System.nanotime to cause the program to wait before doing something, and ...
Can you show us some code that demonstrates exactly what he was doing? Was it some strange kind of busy loop, like this:
long t = System.nanoTime() + 1000000000L;
while (System.nanoTime() < t) { /* do nothing */ }
If yes, then that's not the right way to make your program pause for a while. Use Thread.sleep(...) instead to make the program wait for a specified number of milliseconds.
You do realise that the loop you are using doesn't take exactly 1 second to run? Firstly Thread.sleep() isn't guaranteed to be accurate, and the rest of the code in the loop does take some time to execute (Both nanoTime() and currentTimeMillis() actually can be quite slow depending on the underlying implementation). Secondly, System.currentTimeMillis() is not guaranteed to be accurate either (it only updates every 50ms on some operating system and hardware combinations). You also mention it being inaccurate to 40-50ms above and then go on to say 0.004s which is actually only 4ms.
I would recommend you change your System.out.println() to be:
System.out.println(secondsNano - secondsMili);
This way, you'll be able to see how much the two clocks differ on a second-by-second basis. I left it running for about 12 hours on my laptop and it was out by 1.46 seconds (fast, not slow). This shows that there is some drift in the two clocks.
I would think that the currentTimeMillis() method provides a more accurate time over a large period of time, yet nanoTime() has a greater resolution and is good for timing code or providing sub-millisecond timing over short time periods.
I've experienced the same problem. Except in my case, it is more pronounced.
With this simple program:
public class test {
public static void main(String[] args) {
while (true) {
try {
Thread.sleep(1000);
}
catch (InterruptedException e) {
}
OStream.out("\t" + System.currentTimeMillis() + "\t" + nanoTimeMillis());
}
}
static long nanoTimeMillis() {
return Math.round(System.nanoTime() / 1000000.0);
}
}
I get the following results:
13:05:16:380 main: 1288199116375 61530042
13:05:16:764 main: 1288199117375 61530438
13:05:17:134 main: 1288199118375 61530808
13:05:17:510 main: 1288199119375 61531183
13:05:17:886 main: 1288199120375 61531559
The nanoTime is showing only ~400ms elapsed for each second.

Categories

Resources