Basic (arithmetic) operations and their dependence on JVM and CPU

Basic (arithmetic) operations and their dependence on JVM and CPU - java

In Java I want to measure time for
1000 integer comparisons ("<" operator),
1000 integer additions (a+b
each case for different a and b),
another simple operations.
I know I can do it in the following way:
Random rand = new Random();
long elapsedTime = 0;
for (int i = 0; i < 1000; i++) {
int a = Integer.MIN_VALUE + rand.nextInt(Integer.MAX_VALUE);
int b = Integer.MIN_VALUE + rand.nextInt(Integer.MAX_VALUE);
long start = System.currentTimeMillis();
if (a < b) {}
long stop = System.currentTimeMillis();
elapsedTime += (start - stop);
}
System.out.println(elapsedTime);
I know that this question may seem somehow not clear.
How those values depend on my processor (i.e. relation between time for those operations and my processor) and JVM? Any suggestions?
I'm looking for understandable readings...

How those values depend on my processor (i.e. relation between time for those operations and my processor) and JVM? Any suggestions?
It is not dependant on your processor, at least not directly.
Normally, when you run code enough, it will compile it to native code. When it does this, it removes code which doesn't do anything, so what you will be doing here is measuring the time it takes to perform a System.currentMillis(), which is typically about 0.00003 ms. This means you will get 0 99.997% of the time and see a 1 very rarely.
I say normally, but in this case your code won't be compiled to native code, as the default threshold is 10,000 iterations. I.e. you would be testing how long it takes the interpretor to execute the byte code. This is much slower, but would still be a fraction of a milli-second. i.e. you have higher chance seeing a 1 but still unlikely.
If you want to learn more about low level benchmarking in Java, I suggest you read JMH and the Author's blog http://shipilev.net/
If you want to see what machine code is generated from Java code I suggest you try JITWatch

Related

java for loop performance difference

I am running below simple program , I know this is not best way to measure performance but the results are surprising to me , hence wanted to post question here.
public class findFirstTest {
public static void main(String[] args) {
for(int q=0;q<10;q++) {
long start2 = System.currentTimeMillis();
int k = 0;
for (int j = 0; j < 5000000; j++) {
if (j > 4500000) {
k = j;
break;
}
}
System.out.println("for value " + k + " with time " + (System.currentTimeMillis() - start2));
}
}
}
results are like below after multiple times running code.
for value 4500001 with time 3
for value 4500001 with time 25 ( surprised as it took 25 ms in 2nd iteration)
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
for value 4500001 with time 0
so I am not understanding why 2nd iteration took 25ms but 1st 3ms and later 0 ms and also why always for 2nd iteration when I am running code.
if I move start and endtime printing outside of outer forloop then results I am having is like
for value 4500001 with time 10

In first iteration, the code is running interpreted.
In second iteration, JIT kicks in, slowing it down a bit while it compiles to native code.
In remaining iterations, native code runs very fast.

Because your winamp needed to decode another few frames of your mp3 to queue it into the sound output buffers. Or because the phase of the moon changed a bit and your dynamic background needed changing, or because someone in east Croydon farted and your computer is subscribed to the 'smells from London' twitter feed. Who knows?
This isn't how you performance test. Your CPU is not such a simple machine after all; it has many cores, and each core has pipelines and multiple hierarchies of caches. Any given core can only interact with one of its caches, and because of this, if a core runs an instruction that operates on memory which is not currently in cache, then the core will shut down for a while: It sends to the memory controller a request to load the page of memory with the memory you need to access into a given cachepage, and will then wait until it is there; this can take many, many cycles.
On the other end you have an OS that is juggling hundreds of thousands of processes and threads, many of them internal to the kernel, per-empting like there is no tomorrow, and trying to give extra precedence to processes that are time sensitive, such as the aforementioned winamp which must get a chance to decode some more mp3 frames before the sound buffer is fully exhausted, or you'd notice skipping. This is non-trivial: On ye olde windows you just couldn't get this done which is why ye olde winamp was a magical marvel of engineering, more or less hacking into windows to ensure it got the priority it needed. Those days are long gone, but if you remember them, well, draw the conclusion that this isn't trivial, and thus, OSes do pre-empt with prejudice all the time these days.
A third significant factor is the JVM itself which is doing all sorts of borderline voodoo magic, as it has both a hotspot engine (which is doing bookkeeping on your code so that it can eventually conclude that it is worth spending considerable CPU resources to analyse the heck out of a method to rewrite it in optimized machinecode because that method seems to be taking a lot of CPU time), and a garbage collector.
The solution is to forget entirely about trying to measure time using such mere banalities as measuring currentTimeMillis or nanoTime and writing a few loops. It's just way too complicated for that to actually work.
No. Use JMH.

Java : Issue with capturing execution time per iteration in a Map

I've a requirement to capture the execution time of some code in iterations. I've decided to use a Map<Integer,Long> for capturing this data where Integer(key) is the iteration number and Long(value) is the time consumed by that iteration in milliseconds.
I've written the below java code to compute the time taken for each iteration. I want to ensure that the time taken by all iterations is zero before invoking actual code. Surprisingly, the below code behaves differently for every execution.
Sometimes, I get the desired output(zero millisecond for all iterations), but at times I do get positive and even negative values for some random iterations.
I've tried replacing System.currentTimeMillis(); with below code:
new java.util.Date().getTime();
System.nanoTime();
org.apache.commons.lang.time.StopWatch
but still no luck.
Any suggestions as why some iterations take additional time and how to eliminate it?
package com.stackoverflow.programmer;
import java.util.HashMap;
import java.util.Map;
public class TestTimeConsumption {
public static void main(String[] args) {
Integer totalIterations = 100000;
Integer nonZeroMilliSecondsCounter = 0;
Map<Integer, Long> timeTakenMap = new HashMap<>();
for (Integer iteration = 1; iteration <= totalIterations; iteration++) {
timeTakenMap.put(iteration, getTimeConsumed(iteration));
if (timeTakenMap.get(iteration) != 0) {
nonZeroMilliSecondsCounter++;
System.out.format("Iteration %6d has taken %d millisecond(s).\n", iteration,
timeTakenMap.get(iteration));
}
}
System.out.format("Total non zero entries : %d", nonZeroMilliSecondsCounter);
}
private static Long getTimeConsumed(Integer iteration) {
long startTime = System.currentTimeMillis();
// Execute code for which execution time needs to be captured
long endTime = System.currentTimeMillis();
return (endTime - startTime);
}
}
Here's the sample output from 5 different executions of the same code:
Execution #1 (NOT OK)
Iteration 42970 has taken 1 millisecond(s).
Total non zero entries : 1
Execution #2 (OK)
Total non zero entries : 0
Execution #3 (OK)
Total non zero entries : 0
Execution #4 (NOT OK)
Iteration 65769 has taken -1 millisecond(s).
Total non zero entries : 1
Execution #5 (NOT OK)
Iteration 424 has taken 1 millisecond(s).
Iteration 33053 has taken 1 millisecond(s).
Iteration 76755 has taken -1 millisecond(s).
Total non zero entries : 3
I am looking for a Java based solution that ensures that all
iterations consume zero milliseconds consistently. I prefer to
accomplish this using pure Java code without using a profiler.
Note: I was also able to accomplish this through C code.

Your HashMap performance may be dropping if it is resizing. The default capacity is 16 which you are exceeding. If you know the expected capacity up front, create the HashMap with the appropriate size taking into account the default load factor of 0.75
If you rerun iterations without defining a new map and the Integer key does not start again from zero, you will need to resize the map taking into account the total of all possible iterations.
int capacity = (int) ((100000/0.75)+1);
Map<Integer, Long> timeTakenMap = new HashMap<>(capacity);

As you are starting to learn here, writing microbenchmarks in Java is not as easy as one would first assume. Everybody gets bitten at some point, even the hardened performance experts who have been doing it for years.
A lot is going on within the JVM and the OS that skews the results, such as GC, hotspot on the fly optimisations, recompilations, clock corrections, thread contention/scheduling, memory contention and cache misses. To name just a few. And sadly these skews are not consistent, and they can very easily dominate a microbenchmark.
To answer your immediate question of why the timings can some times go negative, it is because currentTimeMillis is designed to capture wall clock time and not elapsed time. No wall clock is accurate on a computer and there are times when the clock will be adjusted.. very possibly backwards. More detail on Java's clocks can be read on the following Oracle Blog Inside the Oracle Hotspot VM clocks.
Further details and support of nanoTime verses currentTimeMillis can be read here.
Before continuing with your own benchmark, I strongly recommend that you read how do I write a currect micro benchmark in java. The quick synopses is to 1) warm up the JVM before taking results, 2) jump through hoops to avoid dead code elimination, 3) ensure that nothing else is running on the same machine but accept that there will be thread scheduling going on.. you may even want to pin threads to cores, depends on how far you want to take this, 4) use a framework specifically designed for microbenchmarking such as JMH or for quick light weight spikes JUnitMosaic gives good results.

I'm not sure if I understand your question.
You're trying to execute a certain set of statements S, and expect the execution time to be zero. You then test this premise by executing it a number of times and verifying the result.
That is a strange expectation to have: anything consumes some time, and possibly even more. Hence, although it would be possible to test successfully, that does not prove that no time has been used, since your program is save_time();execute(S);compare_time(). Even if execute(S) is nothing, your timing is discrete, and as such, it is possible that the 'tick' of your wallclock just happens to happen just between save_time and compare_time, leading to some time having been visibly past.
As such, I'd expect your C program to behave exactly the same. Have you run that multiple times? What happens when you increase the iterations to over millions? If it still does not occur, then apparently your C compiler has optimized the code in such a way that no time is measured, and apparently, Java doesn't.
Or am I understanding you wrong?

You hint it right... System.currentTimeMillis(); is the way to go in this case.
There is no warranty that increasing the value of the integer object i represent either a millisecond or a Cycle-Time in no system...
you should take the System.currentTimeMillis() and calculated the elapsed time
Example:
public static void main(String[] args) {
long lapsedTime = System.currentTimeMillis();
doFoo();
lapsedTime -= System.currentTimeMillis();
System.out.println("Time:" + -lapsedTime);
}

I am also not sure exactly, You're trying to execute a certain code, and try to get the execution for each iteration of execution.
I hope I understand correct, if that so than i would suggest please use
System.nanoTime() instead of System.currentTimeMillis(); because if your statement of block has very small enough you always get Zero in Millisecond.
Simple Ex could be:
public static void main(String[] args) {
long lapsedTime = System.nanoTime();
//do your stuff here.
lapsedTime -= System.nanoTime();
System.out.println("Time Taken" + -lapsedTime);
}
If System.nanoTime() and System.currentTimeMillis(); are nothing much difference. But its just how much accurate result you need and some time difference in millisecond you may get Zero in case if you your set of statement are not more in each iteration.

general concept-java code and cycle clocks

I am just curious how can we know how many cycle clocks CPU needs by looking through certain java code.
ex:
public class Factorial
{
public static void main(String[] args)
{ final int NUM_FACTS = 100;
for(int i = 0; i < NUM_FACTS; i++)
System.out.println( i + "! is " + factorial(i));
}
public static int factorial(int n)
{ int result = 1;
for(int i = 2; i <= n; i++)
result *= i;
return result;
}
}

I am just curious how can we know how many cycle clocks CPU needs by looking through certain java code.
If you are talking about real hardware clock cycles, the answer is "You can't know"1.
The reason that it is so hard is that a program goes through a number of complicated (and largely opaque) transformations before and during execution:
The source code is compiled to bytecodes ahead of time. This depends on the bytecode compiler used.
The bytecodes are JIT compiled to native code, at some time during the execution. This depends on the JIT compiler in the execution platform AND on the execution behavior of the application.
The number of clock cycles taken to execute a given instruction sequence depends on native code, the CPU model including such things as memory cache sizes and ... the application's memory access patterns.
On top of that, the JVM has various sources of "under the hood" non-determinism, and various launch-time tuning parameters that influence the behavior ... and cycle counts.
But fortunately, there are practical ways to examine software performance that don't depend on measuring hardware clock cycles. You can:
measure or estimate native instructions executed,
measure or estimate bytecodes executed,
estimate Java-level operations or statements executed, or
run the code and measure the time taken.
The last two are generally the most practical.
1 - ... except by running the application / JVM on an accurate hardware-level simulator for your exact hardware configuration and getting the simulator to count the clock cycles. And to be honest, I don't know if simulators that operate to that level actually exist. If they do, they are most likely proprietary to Intel, AMD and so on.

I don't think you'd be able to know the clock cycles.
But you could measure the CPU time it took to run the code.
You'd need to use the java.lang.management API.
Take a look here:
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/management/ThreadMXBean.html

Java faster than C [duplicate]

This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed 9 years ago.
Today I made a simple test to compare the speed between java and c - a simple loop that makes an integer "i" increment from 0 to two billion.
I really expected c-language to be faster than java. I was surprised of the outcome:
the time it takes in seconds for java: approx. 1.8 seconds
the time it takes in seconds for c: approx. 3.6 seconds.
I DO NOT think that java is a faster language at all, but I DO NOT either understand why the loop is twice as fast as c in my simple programs?
Did I made a crucial misstake in the program? Or is the compiler of MinGW badly configurated or something?
public class Jrand {
public static void main (String[] args) {
long startTime = System.currentTimeMillis();
int i;
for (i = 0; i < 2000000000; i++) {
// Do nothing!
}
long endTime = System.currentTimeMillis();
float totalTime = (endTime - startTime);
System.out.println("time: " + totalTime/1000);
}
}
THE C-PROGRAM
#include<stdio.h>
#include<stdlib.h>
#include <time.h>
int main () {
clock_t startTime;
startTime = clock();
int i;
for (i = 0; i <= 2000000000; i++) {
// Do nothing
}
clock_t endTime;
endTime = clock();
float totalTime = endTime - startTime;
printf("%f", totalTime/1000);
return 0;
}

Rebuild your C version with any optimization level other than -O0 (e.g. -O2) and you will find it runs in 0 seconds. So the Java version takes 1.6 seconds to do nothing, and the C version takes 0.0 seconds (really, around 0.00005 seconds) to do nothing.

Java is more aggressive at eliminating code which doesn't do anything. It is less likely to assume the developer knows what they are doing. You are not timing the loop but how long it takes java to detect and eliminate the loop.
In short, Java is often faster at doing nothing useful.
Also you may find that if you optimise the C code and remove debugging information it will do the same thing, most likely shorter.

If you want to benchmark this, instead of doing nothing, try to something useful like calculating something on each iterations. For e.g. count the loops in some other variable, and make sure you use it at the end (by printing it for e.g), so that it will not be optimized out.
Alternate simple tests could be accessing an array linearly (reading only), copying elements from one array to another (read+write), or doing some operations on the data. Some of these cases might be interesting as they open several very simple compiler optimizations that you can later see in the result binary/bytecode, such as loop unrolling, register allocation, and maybe even more complicated stuff like vectorization or code motion. Java on the other may employ some nastier tricks such as jitting (dynamically recompiling on the fly)
The scope of compiler optimization is huge, you've just encountered the most basic one - eliminating useless code :)

inconsistent results when timing a process

I'm trying to time the performance of my program by using System.currentTimeMillis() (or alternatively System.nanoTime()) and I've noticed that every time I run it - it gives a different result for time it took to finish the task.
Even the straightforward test:
long totalTime;
long startTime;
long endTime;
startTime = System.currentTimeMillis();
for (int i = 0; i < 1000000000; i++)
{
for (int j = 0; j < 1000000000; j++)
{
}
}
endTime = System.currentTimeMillis();
totalTime = endTime-startTime;
System.out.println("Time: " + totalTime);
produces all sorts of different outputs, from 0 to 200. Can anyone say what I'm doing wrong or suggest an alternative solution?

The loop doesn't do anything, so you are timing how long it takes to detect the loop is pointless.
Timing the loops more accurately won't help, you need to do something slightly useful to get repeatable results.
I suggest you try -server if you are running on 32-bit windows.
A billion billion clock cycles takes about 10 years so its not really iterating that many times.

This is exactly the expected behavior -- it's supposed to get faster as you rerun the timing. As you rerun a method many times, the JIT devotes more effort to compiling it to native code and optimizing it; I would expect that after running this code for long enough, the JIT would eliminate the loop entirely, since it doesn't actually do anything.
The best and simplest way to get precise benchmarks on Java code is to use a tool like Caliper that "warms up" the JIT to encourage it to optimize your code fully.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.