This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed 4 years ago.
I am totally lost for words, i have no idea what's going on...
//import java.util.Random;
public class TestInProgress {
public static void main(String[] args) {
long start, end;
long average;
// Random random= new Random();
for(int j=0; j<5; j++) {
average= 0L;
for(int i=0; i<100; i++) {
start= System.nanoTime();
double temp= fastComputeSigm(0.0);
end= System.nanoTime();
average+= (end - start);
}
average= average/ 100L;
System.out.println("AVERAGE FASTCOMPUTE: "+average);
average= 0L;
for(int i=0; i<100; i++) {
start= System.nanoTime();
double temp= Math.exp(0.0)/(1.0 + Math.exp(0.0));
end= System.nanoTime();
average+= (end - start);
}
average= average / 100L;
System.out.println("AVERAGE SIGMOID: "+average);
}
}
static double fastComputeSigm(double value) {
return 0.0;
}
}
The most surprising is the output:
AVERAGE FASTCOMPUTE: 98 AVERAGE SIGMOID: 38625 AVERAGE FASTCOMPUTE:
106 AVERAGE SIGMOID: 65 AVERAGE FASTCOMPUTE: 299 AVERAGE SIGMOID: 201
AVERAGE FASTCOMPUTE: 36 AVERAGE SIGMOID: 65 AVERAGE FASTCOMPUTE: 53
AVERAGE SIGMOID: 57
see that? I uncommented Random (and its import)
What would you expect? output:
AVERAGE FASTCOMPUTE: 90 AVERAGE SIGMOID: 324 AVERAGE FASTCOMPUTE: 131
AVERAGE SIGMOID: 73 AVERAGE FASTCOMPUTE: 217 AVERAGE SIGMOID: 36
AVERAGE FASTCOMPUTE: 53 AVERAGE SIGMOID: 12 AVERAGE FASTCOMPUTE: 53
AVERAGE SIGMOID: 69
I tried this on eclipse Oxygen.3a Release (4.7.3a)
I'm using java 10.0.2
My computer has a Intel Core i5-7300HQ
with those cache memories : L1 256, L2 1024, L3 6M
I run on windows 10
note: I tried to make the forloop with j<1,2,3,... several times no helping the first sigmoid always has absurd time length
edit: tried launching it with windows commandline (javac and java), same result
In general comments pointing you at the trickiness of micro-benchmarking are correct and you should be using JMH. In this particular case it looks like your timing / benchmarking code introduces latency that wasn't there in the first place.
Here is a compacted version of your code that demonstrates that the issue is triggered not by the use of Random but of the use of System.nanoTime() (which Random() calls internally)
public static void main(String[] args) {
long start, end, average;
double temp = 0;
// start = System.nanoTime();
average = 0L;
for (int i = 0; i < 100; i++) {
start = System.nanoTime();
temp += Math.exp(0.0) / (1.0 + Math.exp(0.0));
end = System.nanoTime();
average += (end - start);
// System.out.println(i + " " + (end - start));
}
System.out.println("AVERAGE: " + (average / 100));
System.out.println("temp: " + temp);
}
When I uncomment start = System.nanoTime() I observer about x100 speedup - at this point it is important to remember that the units here are nanoseconds so we are very sensitive to whatever it is the runtime is doing in the background. If you uncomment System.out.println(i + " " + (end - start)); you will see an occasional hiccup that is responsible for the entire slowdown. While it may be interesting to chase the reason for that hiccup the following version of the code indicates that it is due to the measurement rather than the core functionality so you may want to make sure that this is something you want to spend your own time on [1]:
public static void main(String[] args) {
double temp = 0;
long start = System.nanoTime();
for (int i = 0; i < 100; i++) {
temp += Math.exp(0.0) / (1.0 + Math.exp(0.0));
}
System.out.println("AVERAGE: " + ((System.nanoTime() - start) / 100));
System.out.println("temp: " + temp);
}
The AVERAGE values here are similar to the previous version with start = System.nanoTime() uncommented.
[1] If you are interested in digging deeper into this particular behavior, try timing your test while running with the -Xint VM option which disables the compiler.
Related
What is the difference between these 2? They both give me the execution-times with slighlty different values and I can tell they are written differently. But since the outcome is nearly identical, what does code 2 do that code 1 does not? What actually is the function of "sum" and is it even executed?
Code 1:
for (int i = 0; i < 10; i++)
{
long n0 = System.nanoTime();
long n1 = System.nanoTime();
System.out.println(" resolution " + (n1 - n0) + " nanoseconds");
}
Code 2:
int[] given = {1,2,3,4,5,6,7,8,9,0};
int sum = 0;
for (int i = 0; i < 10; i++)
{
long t0 = System.nanoTime();
sum += given[i];
long t1 = System.nanoTime();
System.out.println(" resolution " + (t1 - t0) + " nanoseconds");
}
Here is the output for those who asked:
Code 1:
Code 2:
It is simply code to try out System.nanoTime(). Doing something or nothing between two calls does not make a discernible difference. The resolution of the clock is about 100 ns.
As sum is not printed, the compiler might have optimized the code by removing just that extra code.
Furthermore it seems that nanoTime alone already requires ~100 ns.
Note
The code is primarily written for ones own curiosity.
While reading essential java item45, there are three different simple loops as shown in the code below with timing. The first and the third is preferred. But when I time them,
for N < 1,000,000:
length of the list: 1000000
sum is: 499999500000
fast method time: 25
sum is: 499999500000
slower method time: 5
sum is: 499999500000
range method time: 21
the second method is actually faster, I think this may because java compiler is smart enough to substitute the a.size() with the actual number.
However, when N grows, the second method indeed becomes slower. I am aware this experiment is naive and machine specific. I wonder if there is an explanation for the second method outperform the other two when N is small. (the program have been run multiple times)
length of the list: 10000000
sum is: 49999995000000
fast method time: 44
sum is: 49999995000000
slower method time: 48
sum is: 49999995000000
range method time: 37
The code:
public static void main(String [] args){
// test the speed of initialize 1 million elements
// timing two different loops.
// int N = 10000000;
int N = 1000000;
List<Integer> a = new ArrayList<Integer>();
for(int i = 0; i < N; ++i){
a.add(i);
}
System.out.println("length of the list: " + a.size());
long t1 = System.currentTimeMillis();
long sum = 0;
for(int i = 0, n = a.size(); i < n; ++i){
sum += a.get(i);
}
long t2 = System.currentTimeMillis();
System.out.println("sum is: " + sum);
System.out.println("fast method time: " + (t2 - t1));
t1 = System.currentTimeMillis();
sum = 0;
for(int i = 0; i < a.size(); ++i){
sum += a.get(i);
}
t2 = System.currentTimeMillis();
System.out.println("sum is: " + sum);
System.out.println("slower method time: " + (t2 - t1));
t1 = System.currentTimeMillis();
sum = 0;
for(int i: a){
sum += i;
}
t2 = System.currentTimeMillis();
System.out.println("sum is: " + sum);
System.out.println("range method time: " + (t2 - t1));
}
I was indeed having the same results than you:
length of the list: 1000000
sum is: 499999500000
fast method time: 32
sum is: 499999500000
slower method time: 12
sum is: 499999500000
range method time: 24
So I used javap -c to dissasemble the bytecode, and I saw that javac was not making any kind of optimization when seeing that N was small, and in deed, no optimization was being done.
So, I tried exchanging the order of the first two statements, and here is the result:
length of the list: 1000000
sum is: 499999500000
slower method time: 30
sum is: 499999500000
fast method time: 8
sum is: 499999500000
range method time: 25
So the difference between those two methods is not the method itself, but which one happens first will be the slower.
As for why it happens, it is still puzzling me (maybe deferred loading of some class? hot-code native compilation?)
See edits below
There is no casting going on with the termination check. I would think the < and the ++ would be as fast with ints and longs on a 64bit machine. But I guess not?
int: 65 milliseconds:
public void testWTF() throws Exception {
int runs = 10;
long hs = 0;
long timeSum = 0;
for (int run = 0; run < runs; run++) {
int term = Integer.MAX_VALUE;
long start = System.currentTimeMillis();
// ***** loop to be tested ******
for (int i = 0; i < term; i++) {
hs++;
}
timeSum += (System.currentTimeMillis() - start);
System.out.println("hs = " + hs);
hs = 0;
}
System.out.println("timeSum = " + timeSum);
System.out.println("avg time = " + (timeSum / runs) + " for " + runs + " runs");
System.out.println("hs = " + hs);
}
long: 1445 milliseconds
public void testWTF() throws Exception {
int runs = 10;
long hs = 0;
long timeSum = 0;
for (int run = 0; run < runs; run++) {
long term = Integer.MAX_VALUE;
long start = System.currentTimeMillis();
// ***** loop to be tested ******
for (long i = 0; i < term; i++) {
hs++;
}
timeSum += (System.currentTimeMillis() - start);
System.out.println("hs = " + hs);
hs = 0;
}
System.out.println("timeSum = " + timeSum);
System.out.println("avg time = " + (timeSum / runs) + " for " + runs + " runs");
System.out.println("hs = " + hs);
}
hardware: 64-bit Xeon running windows 7 64bit.
edit: I updated this to do several iterations. For 1 million runs with the int version, the average time is 65 milliseconds. The long version takes too long for 1 million, 1000 and even 100. For 10 runs the average time is 1447 milliseconds.
Also, I'm using hs outside the loop so that the loop does not get jitted away.
This is a very bad/unreliable/unrealistic way of doing benchmarks, since the JIT isn't really given a chance to do much optimization -- you only run the benchmarks once, and you measure the first run.
Basically, Java's JIT will optimize your code significantly more once it sees your code getting used extensively. In a real program, the JIT will be optimizing any critical loops, so if you want a benchmark that mimics the real world, you have to convince the JIT to kick in.
The simplest way to get an accurate benchmark in Java is to use a tool like Caliper that knows how to properly warm up the JIT and get accurate measurements, and then see if the results are more consistent.
The SO community was right, profiling your code before you ask performance questions seems to make more sense then my approach of randomly guessing :-) I profiled my code(very intensive math) and didn't realize over 70% of my code is apparently in a part I didn't think was a source of slowdown, rounding of decimals.
static double roundTwoDecimals(double d) {
DecimalFormat twoDForm = new DecimalFormat("#.###");
return Double.valueOf(twoDForm.format(d));
}
My problem is I get decimal numbers that are normally .01,.02,etc..but sometimes I get something like .070000000001 (I really only care about the 0.07 but floating point precision causes my other formulas that result to fail), I simply want the first 3 decimals to avoid this problem.
So is there a better/faster way to do this?
The standard way to round (positive) numbers would be something like this:
double rounded = floor(1000 * doubleVal + 0.5) / 1000;
Example 1: floor(1000 * .1234 + 0.5) / 1000 = floor(123.9)/1000 = 0.123
Example 2: floor(1000 * .5678 + 0.5) / 1000 = floor(568.3)/1000 = 0.568
But as #nuakh commented, you'll always be plagued by rounding errors to some extent. If you want exactly 3 decimal places, your best bet is to convert to thousandths (that is, multiply everything by 1000) and use an integral data type (int, long, etc.)
In that case, you'd skip the final division by 1000 and use the integral values 123 and 568 for your calculations. If you want the results in the form of percentages, you'd divide by 10 for display:
123 → 12.3%
568 → 56.8%
Using a cast is faster than using floor or round. I suspect a cast is more heavily optimised by the HotSpot compiler.
public class Main {
public static final int ITERS = 1000 * 1000;
public static void main(String... args) {
for (int i = 0; i < 3; i++) {
perfRoundTo3();
perfCastRoundTo3();
}
}
private static double perfRoundTo3() {
double sum = 0.0;
long start = 0;
for (int i = -20000; i < ITERS; i++) {
if (i == 0) start = System.nanoTime();
sum += roundTo3(i * 1e-4);
}
long time = System.nanoTime() - start;
System.out.printf("Took %,d ns per round%n", time / ITERS);
return sum;
}
private static double perfCastRoundTo3() {
double sum = 0.0;
long start = 0;
for (int i = -20000; i < ITERS; i++) {
if (i == 0) start = System.nanoTime();
sum += castRoundTo3(i * 1e-4);
}
long time = System.nanoTime() - start;
System.out.printf("Took %,d ns per cast round%n", time / ITERS);
return sum;
}
public static double roundTo3(double d) {
return Math.round(d * 1000 + 0.5) / 1000.0;
}
public static double castRoundTo3(double d) {
return (long) (d * 1000 + 0.5) / 1000.0;
}
}
prints
Took 22 ns per round
Took 9 ns per cast round
Took 23 ns per round
Took 6 ns per cast round
Took 20 ns per round
Took 6 ns per cast round
Note: as of Java 7 floor(x + 0.5) and round(x) don't do quite the same thing as per this issue. Why does Math.round(0.49999999999999994) return 1
This will round correctly to within the representation error. This means that while the result is not exact the decimal e.g. 0.001 is not represented exactly, when you use toString() it will correct for this. Its only when you convert to BigDecimal or perform an arithmetic operation that you will see this representation error.
I was performing some test performance on an algorithm and noticed something weird. Maybe I am missing something here.
I first measure the time in milliseconde:
long startTime = System.currentTimeMillis();
x.sort(sortStringInput);
long endTime = System.currentTimeMillis();
and then in nanoseconde:
long startTime = System.nanoTime();
x.sort(sortStringInput);
long endTime = System.nanoTime();
The results are 437ms qnd 26366ns.
I am calling the same method so how can it be possible to get a result in ns which is way smaller than the one in ms. I know that 1 ms is 1 000 000 ns so 26366 is even smaller than 1 ms...
Thanks,
Are you sorting the same list twice? The second call will be extremely fast if the list is already sorted.
Depending on what platform you're on, System.nanoTime() itself can be very slow. You're better off running your benchmark multiple times and measuring the overall duration in miliseconds.
I suggest you run the test for at least 2 seconds before counting any result and run the test for at least 2 seconds and take the average. The following code prints.
Average sort time 116 ms.
Average sort time 117100526 ns.
Average sort time 116 ms.
Average sort time 116530255 ns.
Average sort time 117 ms.
Average sort time 116905977 ns.
Code
public static void main(String... args) throws IOException {
String[] strings = new String[100 * 1000];
for (int i = 0; i < strings.length; i++)
strings[i] = "" + Math.random();
int runTimeMS = 2000;
for (int i = 0; i <= 3; i++) {
{
long start = System.currentTimeMillis();
int count = 0;
do {
Arrays.sort(strings.clone());
count++;
} while (System.currentTimeMillis() - start < runTimeMS);
long time = System.currentTimeMillis() - start;
if (i>0) System.out.println("Average sort time " + time / count + " ms.");
}
{
long start = System.nanoTime();
int count = 0;
do {
Arrays.sort(strings.clone());
count++;
} while (System.nanoTime() - start < runTimeMS * 1000L * 1000L);
long time = System.nanoTime() - start;
if (i>0) System.out.println("Average sort time " + time / count + " ns.");
}
}
}