Weird Multithreading Performance of Simple Benchmark (Java)

Weird Multithreading Performance of Simple Benchmark (Java) - java

Recently I wrote a really simple code as a benchmark to see performance increase in my machine. It just creates some number of threads and divides some number of spins between those threads. Below is my class which extends Thread in java:
public static class LoopThread extends Thread {
int index;
long numberOfRound;
long numberOfSpins;
public LoopThread(int index, long numberOfRound, long numberOfSpins) {
this.index = index;
this.numberOfRound = numberOfRound;
this.numberOfSpins = numberOfSpins;
}
public void run() {
System.out.println("Thread " + index + " started for " + numberOfRound + " rounds and " + numberOfSpins + " spins");
for(long i = 0; i < numberOfRound; i++) {
for(long j = 0; j < numberOfSpins; j++) {
}
}
System.out.println("Thread " + index + " ended");
}
The weird thing is when I first wrote this piece of code, it was almost scaling linearly until 8 threads. However, it was taking more time with 9 threads and the increase was small after 9 threads although my machine has 16 hardware threads. In order to investigate the problem, I just changed the code in a simple way so I put a time information in the last line of the code like:
System.out.println("Thread " + index + " ended: " + System.currentTimeMillis());
This change suprisingly made my code more scalable. Since I can't see an obvious reason behind why this change caused this improvement, I tried again and again and here some results which I consistently see:
Before the change:
time passed for 8 threads and 1000000 rounds and 800000 spins is 63 seconds
time passed for 9 threads and 1000000 rounds and 800000 spins is 69 seconds
After the simple change:
time passed for 8 threads and 1000000 rounds and 800000 spins is 62 seconds
time passed for 9 threads and 1000000 rounds and 800000 spins is 56 seconds
Again, I can't see any obvious reason behind that and it seems very weird to me. Do you have any idea about why this is happening?
Thanks
Edit(the code which starts threads and times):
public static void main(String[] args) {
int numberOfThreads = Integer.parseInt(args[0]);
long numberOfRounds = Long.parseLong(args[1]);
long numberOfSpins = Long.parseLong(args[2]);
long startTime = System.currentTimeMillis();
Thread[] threads = new Thread[numberOfThreads];
for( int i = 0; i < numberOfThreads; i++) {
threads[i] = new LoopThread(i, numberOfRounds, (long) numberOfSpins/numberOfThreads);
threads[i].start();
}
for(Thread t: threads) {
try {
t.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
long endTime = System.currentTimeMillis() - startTime;
System.out.println("time passed for " + numberOfThreads + " threads and " + numberOfRounds + " rounds and " + numberOfSpins + " spins is " + endTime/1000 + " seconds");
}

Related

Why is execution times measured differently here using Java?

What is the difference between these 2? They both give me the execution-times with slighlty different values and I can tell they are written differently. But since the outcome is nearly identical, what does code 2 do that code 1 does not? What actually is the function of "sum" and is it even executed?
Code 1:
for (int i = 0; i < 10; i++)
{
long n0 = System.nanoTime();
long n1 = System.nanoTime();
System.out.println(" resolution " + (n1 - n0) + " nanoseconds");
}
Code 2:
int[] given = {1,2,3,4,5,6,7,8,9,0};
int sum = 0;
for (int i = 0; i < 10; i++)
{
long t0 = System.nanoTime();
sum += given[i];
long t1 = System.nanoTime();
System.out.println(" resolution " + (t1 - t0) + " nanoseconds");
}
Here is the output for those who asked:
Code 1:
Code 2:

It is simply code to try out System.nanoTime(). Doing something or nothing between two calls does not make a discernible difference. The resolution of the clock is about 100 ns.
As sum is not printed, the compiler might have optimized the code by removing just that extra code.
Furthermore it seems that nanoTime alone already requires ~100 ns.
Note
The code is primarily written for ones own curiosity.

removing 1 million integers from an arraylist and linkedlist one by one from the end of both lists

I am trying to prove that adding 1 million integers to an arrayList and then deleting them 1 by 1 from the end vs doing the same thing with a linkedlist is faster or slower in milliseconds and seconds.. having issues with removing from the end of the arraylist and removing from the end of linkedlist. it gives me an indexoutofboundsexception. here is my code so far:
package blah;
import java.util.ArrayList;
import java.util.LinkedList;
public abstract class blah {
public static void main(String[] args) {
ArrayList<Integer> Array = new ArrayList<Integer>();
LinkedList<Integer> Link = new LinkedList<Integer>();
ArrayList<Integer> Array1 = new ArrayList<Integer>();
LinkedList<Integer> Link1 = new LinkedList<Integer>();
long start;
long stop;
long result;
start = System.currentTimeMillis();
for (int i = 1; i <= 1000000; i++) {
Array.add(0, i);
}
stop = System.currentTimeMillis();
result = stop - start;
System.out.println("ArrayList time : " + result + " milliseconds");
System.out.println("ArrayList time: " + result / 1000 + " seconds");
System.out.println("");
start = System.currentTimeMillis();
for (int i = 1; i <= 1000000; i++) {
Array1.add(i);
Array1.remove(i);
}
stop = System.currentTimeMillis();
result = stop - start;
System.out.println("ArrayList time : " + result + " milliseconds");
System.out.println("ArrayList time : " + result / 1000 + " seconds");
System.out.println("");
start = System.currentTimeMillis();
for (int i = 1; i <= 1000000; i++) {
Link.add(0, i);
}
stop = System.currentTimeMillis();
result = stop - start;
System.out.println("LinkedList time : " + result + " milliseconds");
System.out.println("LinkedList time : " + result / 1000 + " seconds");
System.out.println("");
start = System.currentTimeMillis();
for (int i = 1; i <= 1000000; i++) {
Link1.add(i);
Link1.remove(i);
}
stop = System.currentTimeMillis();
result = stop - start;
System.out.println("LinkedList time : " + result + " milliseconds");
System.out.println("LinkedList time : " + result / 1000 + " seconds");
}
}

This is a re-write of my original answer to clarify some things in my original response and to address the question a bit more directly than I had done previously.
Your code is doing this:
for (int i = 1; i <= 1000000; i++) {
Array1.add(i);
Array1.remove(i); // <<--- problem here
}
When you call Array1.add(i) you're adding the ith value into Array1. However, when you call Array1.remove(i) you're trying to remove the value at index i.
The fact that you add value i and then try to remove from index i means that you're not doing what you think you're doing.
Even in your first iteration, you end up with:
0
+---+
| 1 | Array1.add(1)
+---+
0
+---+
| 1 | Array1.remove(1): IndexOutOfBoundsException, only index 0 is valid
+---+
The remove operation is not trying to find the value 1 to remove it. It's trying to find whatever exists at index 1 in the array, which causes the exception.

Measure java short time running thread execution time

I'm currently working on some sort of database benchmark application. Basically, what I'm trying to do is to simulate using threads a certain number of clients that all repeat the same operation (example: a read operation) against the database during a certain period of time.
During this time, I want, in each thread, to measure the average delay for getting an answer from the database.
My first choice was to rely on ThreadMXBean's getThreadCpuTime() method (http://docs.oracle.com/javase/7/docs/api/java/lang/management/ThreadMXBean.html) but the point is that the operation is done too quickly to be measured (getThreadCpuTime() before the operation is equal to getThreadCpuTime() after the operation).
I made a little experiment to understand and illustrate the problem:
public class ExampleClass {
class LongRunningThread extends Thread {
private int n;
public LongRunningThread(int n) {
this.n = n;
}
public void run() {
ArrayList l = new ArrayList();
for (int i = 0; i < n; i++) {
l.add(new Object());
}
long time = ManagementFactory.getThreadMXBean().getThreadCpuTime(this.getId());
System.out.println("Long running thread " + this.getId() + " execution time: " + time);
}
}
class MyThread extends Thread {
int n;
public MyThread(int n) {
this.n = n;
}
public void run() {
ArrayList l = new ArrayList();
for (int i = 0; i < n; i++) {
l.add(new Object());
}
long time = ManagementFactory.getThreadMXBean().getThreadCpuTime(this.getId());
System.out.println("My thread " + this.getId() + " execution time: " + time);
}
}
public static void main(String [] args) {
System.out.println("Cpu time supported? " + ManagementFactory.getThreadMXBean().isThreadCpuTimeSupported());
System.out.println("Cpu time enabled? " + ManagementFactory.getThreadMXBean().isThreadCpuTimeEnabled());
for (int i = 1; i < 10; ++i) {
new LongRunningThread(i*1000000).start();
}
for (int i = 1; i < 10; ++i) {
new MyThread(i*100).start();
}
}
Output:
Cpu time supported? true
Cpu time enabled? true
My thread 18 execution time: 0
My thread 26 execution time: 0
My thread 20 execution time: 0
My thread 22 execution time: 0
My thread 24 execution time: 0
My thread 21 execution time: 0
My thread 25 execution time: 0
My thread 19 execution time: 0
My thread 23 execution time: 0
Long running thread 9 execution time: 15600100
Long running thread 10 execution time: 15600100
Long running thread 11 execution time: 46800300
Long running thread 12 execution time: 31200200
Long running thread 14 execution time: 78000500
Long running thread 13 execution time: 78000500
Long running thread 17 execution time: 124800800
Long running thread 15 execution time: 140400900
Long running thread 16 execution time: 109200700
I cannot get the execution time for all MyThread instances but no problem for LongRunningThread instances. Like I said, my hypothesis is that the operation done by the first threads happen too fast to be actually measured.
Is there any way to achieve what I'm trying to do? Is it possible to measure the execution time for such short time running threads?
Thanks in advance for you help :)

Have you considerd this framework http://metrics.codahale.com/. It's very very good and comes with built in support for exposing metrics via JMX

Is it possible to measure the execution time for such short time running threads?
Without measuring wall-clock times with the nano-second clock, the answer may be no. For small loops, the measured CPU time may be smaller than the precision of the method. The javadocs for ThreadMXBean.getThreadCpuTime(...) say:
Returns the total CPU time for a thread of the specified ID in nanoseconds.
The returned value is of nanoseconds precision but
not necessarily nanoseconds accuracy.
One thing to consider would be to take the CPU time if it is > 0 and take the wall-clock time if it is == 0.

as easier solution you can use next :
class MyThread extends Thread {
int n;
public MyThread(int n) {
this.n = n;
}
public void run() {
long startTime = System.nanoTime();
ArrayList l = new ArrayList(n);
for (int i = 0; i < n; i++) {
l.add(new Object());
}
long time = System.nanoTime() - startTime;
System.out.println("My thread " + this.getId() + " execution time: " + time + " ns");
}
}
if you don't need nanoseconds precision you can use System.currentTimeMillis() instead.

Java Multithreading - return statement taking too much time

In multithreading (Executor framework), the sum total of all time-prints within x() method is not matching to the total time printed by doPerform. And this difference keeps on growing with increasing number of threads in threadpool (goes upto 20 secs). Can someone please figure out why? And is there any way to decrease time taken to return from x method?
I have tested it with:
a) 500 submissions to executor (poolsize =100)
b) 500 submissions to executor (poolsize =300)
c) 300 submissions to executor (poolsize =100)
public void x() {
long startTime = System.currentTimeMillis();
for (long l = 0; l <= 10000000; l++) {
if (l % 1000000 == 0) {
System.out.println("Thread id: "
+ Thread.currentThread().getId() + "\t"
+ (System.currentTimeMillis() - startTime));
startTime = System.currentTimeMillis();
}
}
}
public void doPerform() {
long startTime = System.currentTimeMillis();
x();
System.out.println("Thread id: " + Thread.currentThread().getId()
+ "\t" + (System.currentTimeMillis() - startTime));
}

That's expected. You have 100 or 300 parallel threads being executed, and only 1, 2 or 4 cores to execute them all (unless you're running this on a giant super computer). This means that each thread is assigned some CPU time, then some other thread, then some other thread, etc. giving the illusion of parallel execution. But in reality, instructions of various threads are interlaced and executed sequentially.
So, you could have a thread A's startTime computation in doPerform() executed, and then the thread could be replaced by several other ones one on the CPU. A number of milliseconds could elapse before the thread scheduler reassigns A to a CPU and the startTime computation in x() is executed.

With java jdk 1.7 64-bit, a for loop using an int is 20+ times faster than a for loop with a long. Why?

See edits below
There is no casting going on with the termination check. I would think the < and the ++ would be as fast with ints and longs on a 64bit machine. But I guess not?
int: 65 milliseconds:
public void testWTF() throws Exception {
int runs = 10;
long hs = 0;
long timeSum = 0;
for (int run = 0; run < runs; run++) {
int term = Integer.MAX_VALUE;
long start = System.currentTimeMillis();
// ***** loop to be tested ******
for (int i = 0; i < term; i++) {
hs++;
}
timeSum += (System.currentTimeMillis() - start);
System.out.println("hs = " + hs);
hs = 0;
}
System.out.println("timeSum = " + timeSum);
System.out.println("avg time = " + (timeSum / runs) + " for " + runs + " runs");
System.out.println("hs = " + hs);
}
long: 1445 milliseconds
public void testWTF() throws Exception {
int runs = 10;
long hs = 0;
long timeSum = 0;
for (int run = 0; run < runs; run++) {
long term = Integer.MAX_VALUE;
long start = System.currentTimeMillis();
// ***** loop to be tested ******
for (long i = 0; i < term; i++) {
hs++;
}
timeSum += (System.currentTimeMillis() - start);
System.out.println("hs = " + hs);
hs = 0;
}
System.out.println("timeSum = " + timeSum);
System.out.println("avg time = " + (timeSum / runs) + " for " + runs + " runs");
System.out.println("hs = " + hs);
}
hardware: 64-bit Xeon running windows 7 64bit.
edit: I updated this to do several iterations. For 1 million runs with the int version, the average time is 65 milliseconds. The long version takes too long for 1 million, 1000 and even 100. For 10 runs the average time is 1447 milliseconds.
Also, I'm using hs outside the loop so that the loop does not get jitted away.

This is a very bad/unreliable/unrealistic way of doing benchmarks, since the JIT isn't really given a chance to do much optimization -- you only run the benchmarks once, and you measure the first run.
Basically, Java's JIT will optimize your code significantly more once it sees your code getting used extensively. In a real program, the JIT will be optimizing any critical loops, so if you want a benchmark that mimics the real world, you have to convince the JIT to kick in.
The simplest way to get an accurate benchmark in Java is to use a tool like Caliper that knows how to properly warm up the JIT and get accurate measurements, and then see if the results are more consistent.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Weird Multithreading Performance of Simple Benchmark (Java) - java

Related

Why is execution times measured differently here using Java?

removing 1 million integers from an arraylist and linkedlist one by one from the end of both lists

Measure java short time running thread execution time

Java Multithreading - return statement taking too much time

With java jdk 1.7 64-bit, a for loop using an int is 20+ times faster than a for loop with a long. Why?

Categories

Resources