Why is printing "B" dramatically slower than printing "#"?

Why is printing "B" dramatically slower than printing "#"? - java

I generated two matrices of 1000 x 1000:
First Matrix: O and #.
Second Matrix: O and B.
Using the following code, the first matrix took 8.52 seconds to complete:
Random r = new Random();
for (int i = 0; i < 1000; i++) {
for (int j = 0; j < 1000; j++) {
if(r.nextInt(4) == 0) {
System.out.print("O");
} else {
System.out.print("#");
}
}
System.out.println("");
}
With this code, the second matrix took 259.152 seconds to complete:
Random r = new Random();
for (int i = 0; i < 1000; i++) {
for (int j = 0; j < 1000; j++) {
if(r.nextInt(4) == 0) {
System.out.print("O");
} else {
System.out.print("B"); //only line changed
}
}
System.out.println("");
}
What is the reason behind the dramatically different run times?
As suggested in the comments, printing only System.out.print("#"); takes 7.8871 seconds, whereas System.out.print("B"); gives still printing....
As others who pointed out that it works for them normally, I tried Ideone.com for instance, and both pieces of code execute at the same speed.
Test Conditions:
I ran this test from Netbeans 7.2, with the output into its console
I used System.nanoTime() for measurements

Pure speculation is that you're using a terminal that attempts to do word-wrapping rather than character-wrapping, and treats B as a word character but # as a non-word character. So when it reaches the end of a line and searches for a place to break the line, it sees a # almost immediately and happily breaks there; whereas with the B, it has to keep searching for longer, and may have more text to wrap (which may be expensive on some terminals, e.g., outputting backspaces, then outputting spaces to overwrite the letters being wrapped).
But that's pure speculation.

I performed tests on Eclipse vs Netbeans 8.0.2, both with Java version 1.8;
I used System.nanoTime() for measurements.
Eclipse:
I got the same time on both cases - around 1.564 seconds.
Netbeans:
Using "#": 1.536 seconds
Using "B": 44.164 seconds
So, it looks like Netbeans has bad performance on print to console.
After more research I realized that the problem is line-wrapping of the max buffer of Netbeans (it's not restricted to System.out.println command), demonstrated by this code:
for (int i = 0; i < 1000; i++) {
long t1 = System.nanoTime();
System.out.print("BBB......BBB"); \\<-contain 1000 "B"
long t2 = System.nanoTime();
System.out.println(t2-t1);
System.out.println("");
}
The time results are less then 1 millisecond every iteration except every fifth iteration, when the time result is around 225 millisecond. Something like (in nanoseconds):
BBB...31744
BBB...31744
BBB...31744
BBB...31744
BBB...226365807
BBB...31744
BBB...31744
BBB...31744
BBB...31744
BBB...226365807
.
.
.
And so on..
Summary:
Eclipse works perfectly with "B"
Netbeans has a line-wrapping problem that can be solved (because the problem does not occur in eclipse)(without adding space after B ("B ")).

Yes the culprit is definitely word-wrapping. When I tested your two programs, NetBeans IDE 8.2 gave me the following result.
First Matrix: O and # = 6.03 seconds
Second Matrix: O and B = 50.97 seconds
Looking at your code closely you have used a line break at the end of first loop. But you didn't use any line break in second loop. So you are going to print a word with 1000 characters in the second loop. That causes a word-wrapping problem. If we use a non-word character " " after B, it takes only 5.35 seconds to compile the program. And If we use a line break in the second loop after passing 100 values or 50 values, it takes only 8.56 seconds and 7.05 seconds respectively.
Random r = new Random();
for (int i = 0; i < 1000; i++) {
for (int j = 0; j < 1000; j++) {
if(r.nextInt(4) == 0) {
System.out.print("O");
} else {
System.out.print("B");
}
if(j%100==0){ //Adding a line break in second loop
System.out.println();
}
}
System.out.println("");
}
Another advice is that to change settings of NetBeans IDE. First of all, go to NetBeans Tools and click Options. After that click Editor and go to Formatting tab. Then select Anywhere in Line Wrap Option. It will take almost 6.24% less time to compile the program.

Related

Time how long a function runs (short duration)

I'm relatively new to Java programming, and I'm running into an issue calculating the amount of time it takes for a function to run.
First some background - I've got a lot of experience with Python, and I'm trying to recreate the functionality of the Jupyter Notebook/Lab %%timeit function, if you're familiar with that. Here's a pic of it in action (sorry, not enough karma to embed yet):
Snip of Jupyter %%timeit
What it does is run the contents of the cell (in this case a recursive function) either 1k, 10k, or 100k times, and give you the average run time of the function, and the standard deviation.
My first implementation (using the same recursive function) used System.nanoTime():
public static void main(String[] args) {
long t1, t2, diff;
long[] times = new long[1000];
int t;
for (int i=0; i< 1000; i++) {
t1 = System.nanoTime();
t = triangle(20);
t2 = System.nanoTime();
diff = t2-t1;
System.out.println(diff);
times[i] = diff;
}
long total = 0;
for (int j=0; j<times.length; j++) {
total += times[j];
}
System.out.println("Mean = " + total/1000.0);
}
But the mean is wildly thrown off -- for some reason, the first iteration of the function (on many runs) takes upwards of a million nanoseconds:
Pic of initial terminal output
Every iteration after the first dozen or so takes either 395 nanos or 0 -- so there could be a problem there too... not sure what's going on!
Also -- the code of the recursive function I'm timing:
static int triangle(int n) {
if (n == 1) {
return n;
} else {
return n + triangle(n -1);
}
}
Initially I had the line n = Math.abs(n) on the first line of the function, but then I removed it because... meh. I'm the only one using this.
I tried a number of different suggestions brought up in this SO post, but they each have their own problems... which I can go into if you need.
Anyway, thank you in advance for your help and expertise!

Why is this C++ code execution so slow compared to java?

I recently wrote a computation-intensive algorithm in Java, and then translated it to C++. To my surprise the C++ executed considerably slower. I have now written a much shorter Java test program, and a corresponding C++ program - see below. My original code featured a lot of array access, as does the test code. The C++ takes 5.5 times longer to execute (see comment at end of each program).
Conclusions after 1st 21 comments below ...
Test code:
g++ -o ... Java 5.5 times faster
g++ -O3 -o ... Java 2.9 times faster
g++ -fprofile-generate -march=native -O3 -o ... (run, then g++ -fprofile-use etc) Java 1.07 times faster.
My original project (much more complex than test code):
Java 1.8 times faster
C++ 1.9 times faster
C++ 2 times faster
Software environment:
Ubuntu 16.04 (64 bit).
Netbeans 8.2 / jdk 8u121 (java code executed inside netbeans)
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Compilation: g++ -o cpp_test cpp_test.cpp
Java code:
public class JavaTest {
public static void main(String[] args) {
final int ARRAY_LENGTH = 100;
final int FINISH_TRIGGER = 100000000;
int[] intArray = new int[ARRAY_LENGTH];
for (int i = 0; i < ARRAY_LENGTH; i++) intArray[i] = 1;
int i = 0;
boolean finished = false;
long loopCount = 0;
System.out.println("Start");
long startTime = System.nanoTime();
while (!finished) {
loopCount++;
intArray[i]++;
if (intArray[i] >= FINISH_TRIGGER) finished = true;
else if (i <(ARRAY_LENGTH - 1)) i++;
else i = 0;
}
System.out.println("Finish: " + loopCount + " loops; " +
((System.nanoTime() - startTime)/1e9) + " secs");
// 5 executions in range 5.98 - 6.17 secs (each 9999999801 loops)
}
}
C++ code:
//cpp_test.cpp:
#include <iostream>
#include <sys/time.h>
int main() {
const int ARRAY_LENGTH = 100;
const int FINISH_TRIGGER = 100000000;
int *intArray = new int[ARRAY_LENGTH];
for (int i = 0; i < ARRAY_LENGTH; i++) intArray[i] = 1;
int i = 0;
bool finished = false;
long long loopCount = 0;
std::cout << "Start\n";
timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
long long startTime = (1000000000*ts.tv_sec) + ts.tv_nsec;
while (!finished) {
loopCount++;
intArray[i]++;
if (intArray[i] >= FINISH_TRIGGER) finished = true;
else if (i < (ARRAY_LENGTH - 1)) i++;
else i = 0;
}
clock_gettime(CLOCK_REALTIME, &ts);
double elapsedTime =
((1000000000*ts.tv_sec) + ts.tv_nsec - startTime)/1e9;
std::cout << "Finish: " << loopCount << " loops; ";
std::cout << elapsedTime << " secs\n";
// 5 executions in range 33.07 - 33.45 secs (each 9999999801 loops)
}

The only time I could get the C++ program to outperform Java was when using profiling information. This shows that there's something in the runtime information (that Java gets by default) that allows for faster execution.
There's not much going on in your program apart from a non-trivial if statement. That is, without analysing the entire program, it's hard to predict which branch is most likely. This leads me to believe that this is a branch misprediction issue. Modern CPUs do instruction pipelining which allows for higher CPU throughput. However, this requires a prediction of what the next instructions to execute are. If the guess is wrong, the instruction pipeline must be cleared out, and the correct instructions loaded in (which takes time).
At compile time, the compiler doesn't have enough information to predict which branch is most likely. CPUs do a bit of branch prediction as well, but this is generally along the lines of loops loop and ifs if (rather than else).
Java, however, has the advantage of being able to use information at runtime as well as compile time. This allows Java to identify the middle branch as the one that occurs most frequently and so have this branch predicted for the pipeline.

Somehow both GCC and clang fail to unroll this loop and pull out the invariants even in -O3 and -Os, but Java does.
Java's final JITted assembly code is similar to this (in reality repeated twice):
while (true) {
loopCount++;
if (++intArray[i++] >= FINISH_TRIGGER) break;
loopCount++;
if (++intArray[i++] >= FINISH_TRIGGER) break;
loopCount++;
if (++intArray[i++] >= FINISH_TRIGGER) break;
loopCount++;
if (++intArray[i++] >= FINISH_TRIGGER) { if (i >= ARRAY_LENGTH) i = 0; break; }
if (i >= ARRAY_LENGTH) i = 0;
}
With this loop I'm getting exact same timings (6.4s) between C++ and Java.
Why is this legal to do? Because ARRAY_LENGTH is 100, which is a multiple of 4. So i can only exceed 100 and be reset to 0 every 4 iterations.
This looks like an opportunity for improvement for GCC and clang; they fail to unroll loops for which the total number of iterations is unknown, but even if unrolling is forced, they fail to recognize parts of the loop that apply to only certain iterations.
Regarding your findings in a more complex code (a.k.a. real life): Java's optimizer is exceptionally good for small loops, a lot of thought has been put into that, but Java loses a lot of time on virtual calls and GC.
In the end it comes down to machine instructions running on a concrete architecture, whoever comes up with the best set, wins. Don't assume the compiler will "do the right thing", look and the generated code, profile, repeat.
For example, if you restructure your loop just a bit:
while (!finished) {
for (i=0; i<ARRAY_LENGTH; ++i) {
loopCount++;
if (++intArray[i] >= FINISH_TRIGGER) {
finished=true;
break;
}
}
}
Then C++ will outperform Java (5.9s vs 6.4s). (revised C++ assembly)
And if you can allow a slight overrun (increment more intArray elements after reaching the exit condition):
while (!finished) {
for (int i=0; i<ARRAY_LENGTH; ++i) {
++intArray[i];
}
loopCount+=ARRAY_LENGTH;
for (int i=0; i<ARRAY_LENGTH; ++i) {
if (intArray[i] >= FINISH_TRIGGER) {
loopCount-=ARRAY_LENGTH-i-1;
finished=true;
break;
}
}
}
Now clang is able to vectorize the loop and reaches the speed of 3.5s vs. Java's 4.8s (GCC is unfortunately still not able to vectorize it).

Why slowdown in this identical code?

I was trying to measure the time to execute this loop :
for (boolean t : test) {
if (!t)
++count;
}
And was getting inconsistent results. Eventually I have managed to get consistent results with the following code :
public class Test {
public static void main(String[] args) {
int size = 100;
boolean[] test = new boolean[10_000_000];
java.util.Random r = new java.util.Random();
for (int n = 0; n < 10_000_000; ++n)
test[n] = !r.nextBoolean();
int expected = 0;
long acumulated = 0;
for (int repeat = -1; repeat < size; ++repeat) {
int count = 0;
long start = System.currentTimeMillis();
for (boolean t : test) {
if (!t)
++count;
}
long end = System.currentTimeMillis();
if (repeat != -1) // First run does not count, VM warming up
acumulated += end - start;
else // Use count to avoid compiler or JVM
expected = count; //optimization of inner loop
if ( count!=expected )
throw new Error("Tests don't run same ammount of times");
}
float average = (float) acumulated / size;
System.out.println("1st test : " + average);
int expectedBis = 0;
acumulated = 0;
if ( "reassign".equals(args[0])) {
for (int n = 0; n < 10_000_000; ++n)
test[n] = test[n];
}
for (int repeat = -1; repeat < size; ++repeat) {
int count = 0;
long start = System.currentTimeMillis();
for (boolean t : test) {
if (!t)
++count;
}
long end = System.currentTimeMillis();
if (repeat != -1) // First run does not count, VM warming up
acumulated += end - start;
else // Use count to avoid compiler or JVM
expectedBis = count; //optimization of inner loop
if ( count!=expected || count!=expectedBis)
throw new Error("Tests don't run same ammount of times");
}
average = (float) acumulated / size;
System.out.println("2nd test : " + average);
}
}
The results I get are :
$ java -jar Test.jar noreassign
1st test : 23.98
2nd test : 23.97
$ java -jar Test.jar reassign
1st test : 23.98
2nd test : 40.86
$ java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (Gentoo package icedtea-7.2.5.5)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
The difference is in executing or not this loop before the 2nd test.
for (int n = 0; n < 10_000_000; ++n)
test[n] = test[n];
Why? Why does doing that reassignation cause those loops after it to take twice the time?
Getting profiling right is hard...

"As for why the JIT compiler causes such behaviour... that is beyond my skill and knowledge."
Three basic facts:
Code runs faster after JIT compilation.
JIT compilation is triggered after a chunk of code has run for a bit. (How long "a bit" is is influenced the JVM platform and command line options.)
JIT compilation takes time.
In your case, when you insert the big assignment loop between test 1 and test 2, you are most likely moving the time point at which JIT compilation is triggered ... from during test 2 to between the 2 tests.
The simple way address this in this case is to put the body of main into a loop and run it repeatedly. Then discard the anomalous results in the first few runs.
(Turning off JIT compilation is not a good answer. Normally, it is the performance characteristics of code after JIT compilation that is going to be indicative of how a real application performs ...)
By setting the compiler to NONE, you are disabling JIT compilation, taking it out of the equation.
This kind of anomaly is common when people attempt to write micro-benchmarks by hand. Read this Q&A:
How do I write a correct micro-benchmark in Java?

I would add this as a comment, but my reputation is too low, so it must be added as an answer.
I've created a jar with your exact code, and ran it several times. I also copied the code to C# and ran it in the .NET runtime as well.
Both Java and C# show the same exact time, with and without the 'reassign' loop.
What timing are you getting if you change the loop to
if ( "reassign".equals(args[0])) {
for (int n = 0; n < 5_000_000; ++n)
test[n] = test[n];
}
?

Marko Topolniks's and rossum's comments got me on the right direction.
It is a JIT compiler issue.
If I disable the JIT compiler I get these results :
$ java -jar Test.jar -Djava.compiler=NONE noreassign
1st test : 19.23
2nd test : 19.33
$ java -jar Test.jar -Djava.compiler=NONE reassign
1st test : 19.23
2nd test : 19.32
The strange slowdown dissapears once the JIT compiler is deactivated.
As for why the JIT compiler causes such behaviour... that is beyond my skill and knowledge.
But it does not happen in all JVMs as Marius Dornean's tests show.

Strange performance drop after innocent changes to a trivial program

Imagine you want to count how many non-ASCII chars a given char[] contains. Imagine, the performance really matters, so we can skip our favorite slogan.
The simplest way is obviously
int simpleCount() {
int result = 0;
for (int i = 0; i < string.length; i++) {
result += string[i] >= 128 ? 1 : 0;
}
return result;
}
Then you think that many inputs are pure ASCII and that it could be a good idea to deal with them separately. For simplicity assume you write just this
private int skip(int i) {
for (; i < string.length; i++) {
if (string[i] >= 128) break;
}
return i;
}
Such a trivial method could be useful for more complicated processing and here it can't do no harm, right? So let's continue with
int smartCount() {
int result = 0;
for (int i = skip(0); i < string.length; i++) {
result += string[i] >= 128 ? 1 : 0;
}
return result;
}
It's the same as simpleCount. I'm calling it "smart" as the actual work to be done is more complicated, so skipping over ASCII quickly makes sense. If there's no or a very short ASCII prefix, it can costs a few cycles more, but that's all, right?
Maybe you want to rewrite it like this, it's the same, just possibly more reusable, right?
int smarterCount() {
return finish(skip(0));
}
int finish(int i) {
int result = 0;
for (; i < string.length; i++) {
result += string[i] >= 128 ? 1 : 0;
}
return result;
}
And then you ran a benchmark on some very long random string and get this
The parameters determine the ASCII to non-ASCII ratio and the average length of a non-ASCII sequence, but as you can see they don't matter. Trying different seeds and whatever doesn't matter. The benchmark uses caliper, so the usual gotchas don't apply. The results are fairly repeatable, the tiny black bars at the end denote the minimum and maximum times.
Does anybody have an idea what's going on here? Can anybody reproduce it?

Got it.
The difference is in the possibility for the optimizer/CPU to predict the number of loops in for. If it is able to predict the number of repeats up front, it can skip the actual check of i < string.length. Therefore the optimizer needs to know up front how often the condition in the for-loop will succeed and therefore it must know the value of string.length and i.
I made a simple test, by replacing string.length with a local variable, that is set once in the setup method. Result: smarterCount has runtime of about simpleCount. Before the change smarterCount took about 50% longer then simpleCount. smartCount did not change.
It looks like the optimizer looses the information of how many loops it will have to do when a call to another method occurs. That's the reason why finish() immediately ran faster with the constant set, but not smartCount(), as smartCount() has no clue about what i will be after the skip() step. So I did a second test, where I copied the loop from skip() into smartCount().
And voilà, all three methods return within the same time (800-900 ms).

My tentative guess would be that this is about branch prediction.
This loop:
for (int i = 0; i < string.length; i++) {
result += string[i] >= 128 ? 1 : 0;
}
Contains exactly one branch, the backward edge of the loop, and it is highly predictable. A modern processor will be able to accurately predict this, and so fill its whole pipeline with instructions. The sequence of loads is also highly predictable, so it will be able to pre-fetch everything the pipelined instructions need. High performance results.
This loop:
for (; i < string.length - 1; i++) {
if (string[i] >= 128) break;
}
Has a dirty great data-dependent conditional branch sitting in the middle of it. That is much harder for the processor to predict accurately.
Now, that doesn't entirely make sense, because (a) the processor will surely quickly learn that the break branch will usually not be taken, (b) the loads are still predictable, and so just as pre-fetchable, and (c) after that loop exits, the code goes into a loop which is identical to the loop which goes fast. So i wouldn't expect this to make all that much difference.

JAVA display in sets of 13

I am sure this is a pretty easy question, but I am pretty rusty on my programming. I need to write code that will show all numbers between 14859 - 26551 in sets of 13.
So far I just have the normal for loop to show all the numbers, no sure how to get sets of 13.
for(i=14859; i < 26551; i++){
System.out.println(i);
}

Assuming you want to display your numbers as Jukka said in comments:
for(i=14859; i < 26551; i++){
if((i-14858)%13==0)
System.out.println(); // or anything delimiting your sets
System.out.println(i);
}
Or if you want to display only one number every 13:
for(i=14859; i < 26551; i+=13){
System.out.println(i);
}
You can't just type i+13 as you said you tried in comments: 3rd argument in a for loop is an assignment, so you have to assign something to a variable.

for(int i=14859; i < 26551; i++)
System.out.print(((i % 13 != 0) ? ", " + i : "\n"));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why is printing "B" dramatically slower than printing "#"? - java

Related

Time how long a function runs (short duration)

Why is this C++ code execution so slow compared to java?

Why slowdown in this identical code?

Strange performance drop after innocent changes to a trivial program

JAVA display in sets of 13

Categories

Resources