Issues with using too many Threads a benchmark program - java

I've programmed a (very simple) benchmark in Java. It simply increments a double value up to a specified value and takes the time.
When I use this singlethreaded or with a low amount of threads (up to 100) on my 6-core desktop, the benchmark returns reasonable and repeatable results.
But when I use for example 1200 threads, the average multicore duration is significantly lower than the singlecore duration (about 10 times or more). I've made sure that the total amount of incrementations is the same, no matter how much threads I use.
Why does the performance drop so much with more threads? Is there a trick to solve this problem?
I'm posting my source, but I don't think, that there is a problem.
Benchmark.java:
package sibbo.benchmark;
import java.text.DecimalFormat;
import java.util.LinkedList;
import java.util.List;
public class Benchmark implements TestFinishedListener {
private static final double TARGET = 1e10;
private static final int THREAD_MULTIPLICATOR = 2;
public static void main(String[] args) throws InterruptedException {
Benchmark b = new Benchmark(TARGET);
b.start();
}
private int coreCount;
private List<Worker> workers = new LinkedList<>();
private List<Worker> finishedWorkers = new LinkedList<>();
private double target;
public Benchmark(double target) {
this.target = target;
getSystemInfos();
printInfos();
}
private void getSystemInfos() {
coreCount = Runtime.getRuntime().availableProcessors();
}
private void printInfos() {
System.out.println("Usable cores: " + coreCount);
System.out.println("Multicore threads: " + coreCount * THREAD_MULTIPLICATOR);
System.out.println("Loops per core: " + new DecimalFormat("###,###,###,###,##0").format(TARGET));
System.out.println();
}
public synchronized void start() throws InterruptedException {
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
System.out.print("Initializing singlecore benchmark... ");
Worker w = new Worker(this, 0);
workers.add(w);
Thread.sleep(1000);
System.out.println("finished");
System.out.print("Running singlecore benchmark... ");
w.runBenchmark(target);
wait();
System.out.println("finished");
printResult();
System.out.println();
// Multicore
System.out.print("Initializing multicore benchmark... ");
finishedWorkers.clear();
for (int i = 0; i < coreCount * THREAD_MULTIPLICATOR; i++) {
workers.add(new Worker(this, i));
}
Thread.sleep(1000);
System.out.println("finished");
System.out.print("Running multicore benchmark... ");
for (Worker worker : workers) {
worker.runBenchmark(target / THREAD_MULTIPLICATOR);
}
wait();
System.out.println("finished");
printResult();
Thread.currentThread().setPriority(Thread.NORM_PRIORITY);
}
private void printResult() {
DecimalFormat df = new DecimalFormat("###,###,###,##0.000");
long min = -1, av = 0, max = -1;
int threadCount = 0;
boolean once = true;
System.out.println("Result:");
for (Worker w : finishedWorkers) {
if (once) {
once = false;
min = w.getTime();
max = w.getTime();
}
if (w.getTime() > max) {
max = w.getTime();
}
if (w.getTime() < min) {
min = w.getTime();
}
threadCount++;
av += w.getTime();
if (finishedWorkers.size() <= 6) {
System.out.println("Worker " + w.getId() + ": " + df.format(w.getTime() / 1e9) + "s");
}
}
System.out.println("Min: " + df.format(min / 1e9) + "s, Max: " + df.format(max / 1e9) + "s, Av per Thread: "
+ df.format((double) av / threadCount / 1e9) + "s");
}
#Override
public synchronized void testFinished(Worker w) {
workers.remove(w);
finishedWorkers.add(w);
if (workers.isEmpty()) {
notify();
}
}
}
Worker.java:
package sibbo.benchmark;
public class Worker implements Runnable {
private double value = 0;
private long time;
private double target;
private TestFinishedListener l;
private final int id;
public Worker(TestFinishedListener l, int id) {
this.l = l;
this.id = id;
new Thread(this).start();
}
public int getId() {
return id;
}
public synchronized void runBenchmark(double target) {
this.target = target;
notify();
}
public long getTime() {
return time;
}
#Override
public void run() {
synWait();
value = 0;
long startTime = System.nanoTime();
while (value < target) {
value++;
}
long endTime = System.nanoTime();
time = endTime - startTime;
l.testFinished(this);
}
private synchronized void synWait() {
try {
wait();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

You need to understand that the OS (or Java thread scheduler, or both) is trying to balance between all of the threads in your application to give them all a chance to perform some work, and there is a non-zero cost to switch between threads. With 1200 threads, you have just reached (and probably far exceeded) the tipping point wherein the processor is spending more time context switching than doing actual work.
Here is a rough analogy:
You have one job to do in room A. You stand in room A for 8 hours a day, and do your job.
Then your boss comes by and tells you that you have to do a job in room B also. Now you need to periodically leave room A, walk down the hall to room B, and then walk back. That walking takes 1 minute per day. Now you spend 3 hours, 59.5 minutes working on each job, and one minute walking between rooms.
Now imagine that you have 1200 rooms to work in. You are going to spend more time walking between rooms than doing actual work. This is the situation that you have put your processor into. It is spending so much time switching between contexts that no real work gets done.
EDIT: Now, as per the comments below, maybe you spend a fixed amount of time in each room before moving on- your work will progress, but the number of context switches between rooms still affects the overall runtime of a single task.

Ok, I think I've found my problem, but until now, no solution.
When measuring the time every thread runs to do his part of the work, there are different possible minimums for different total amounts of threads. The maximum is the same everytime. In case that a thread is started first and then is paused very often and finishes last. For example this maximum value could be 10 seconds. Assuming that the total amount of operations that is done by every thread stays the same, no matter how much threads I use, the amount of operations that is done by a single thread has to be changed when using a different amount of threads. For example, using one thread, it has to do 1000 operations, but using ten threads, everyone of them has to do just 100 operations. Now, using ten threads, the minimum amount of time that one thread can use is much lower than using one thread. So calculating the average amount of time every thread needs to do his work is nonsense. The minimum using ten Threads would be 1 second. This happens if one thread does its work without interruption.
EDIT
The solution would be to simply measure the amount of time between the start of the first thread and the completion of the last.

Related

Multithreading in Java findin prime number takes more time?

I tried to find out the solution of this problem but was unable to find it on StackOverflow?
I just want to know that why is my multithreadin working so slow infact it should have done opposite.
public class Prime {
static BufferedWriter writer;
static DateFormat dateFormat = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
public static void main(String[] args) throws IOException {
System.out.println("Without Thread" + findPrime() + " ms");
System.out.println("With thread : " + findPrimeWithThreads() + " ms");
}
public static long findPrimeWithThreads() {
Instant start = Instant.now();
int primaryNumber = 3;
while (primaryNumber <= 100000) {
int finalPrimaryNumber = primaryNumber;
new Thread(() -> {
multiplicationHelper(finalPrimaryNumber);
}).start();
new Thread(() -> {
multiplicationHelper(finalPrimaryNumber+1);
}).start();
primaryNumber+=2;
}
return Duration.between(start, Instant.now()).toMillis();
}
public static long findPrime() throws IOException {
Instant instant = Instant.now();
int primaryNumber = 3;
while (primaryNumber <= 100000) {
multiplicationHelper(primaryNumber);
primaryNumber++;
}
return Duration.between(instant, Instant.now()).toMillis();
}
public static void multiplicationHelper(int primaryNumber){
int j = 2;
boolean isPrime = true;
while (j <= primaryNumber/2) {
if (primaryNumber % j == 0) {
isPrime = false;
break;
}
j++;
}
if (isPrime) {
// System.out.println("PRIME :: " + primaryNumber);
}
}
}
This is the code and the output of the code was:
Without Thread497 ms
With thread : 22592 ms
Please can you elaborate me why is so and How to increase performance of multithreading?
I am new to multithreading programming, so am I doing something wrong in this?
"Finding prime numbers" is a compute-bound operation. It will naturally use 100% CPU utilization because it never needs to perform I/O.
The two purposes of "multithreading" are: (a) to take advantage of multiple CPU cores, and (b) to overlap computation with I/O. (And to more-easily issue parallel I/O operations.)
Multithreading can save time in the right situation, or cost considerably more time in the wrong ones.
Your very ill-considered design appears to launch 20,000 threads!
Change your function to below to
public static long findPrimeWithThreads() {
Instant start = Instant.now();
int primaryNumber = 3;
ExecutorService pool = Executors.newFixedThreadPool(4); // considering you've 4 CPU
while (primaryNumber <= 100000) {
int finalPrimaryNumber = primaryNumber;
pool.submit(()->multiplicationHelper(finalPrimaryNumber));
primaryNumber ++;
}
pool.shutdown(); // stop your threads
return Duration.between(start, Instant.now()).toMillis();
}

More than 2 threads working slower than 1 or 2 threads unless Thread.sleep(1) is put in the run() method of a thread

The task I'm trying to implement is finding Collatz sequence for numbers in a set interval using several threads and seeing how much improvement is gained compared to one thread.
However one thread is always faster no matter if it I choose 2 threads(edit. 2 threads are faster, but not by much while 4 threads is slower than 1 thread and I have no idea why.(I could even say that the more threads the slower it gets). I hope someone can explain. Maybe I'm doing something wrong.
Below is my code that I wrote so far. I'm using ThreadPoolExecutor for executing the tasks(one task = one Collatz sequence for one number in the interval).
The Collatz class:
public class ParallelCollatz implements Runnable {
private long result;
private long inputNum;
public long getResult() {
return result;
}
public void setResult(long result) {
this.result = result;
}
public long getInputNum() {
return inputNum;
}
public void setInputNum(long inputNum) {
this.inputNum = inputNum;
}
public void run() {
//System.out.println("number:" + inputNum);
//System.out.println("Thread:" + Thread.currentThread().getId());
//int j=0;
//if(Thread.currentThread().getId()==11) {
// ++j;
// System.out.println(j);
//}
long result = 1;
//main recursive computation
while (inputNum > 1) {
if (inputNum % 2 == 0) {
inputNum = inputNum / 2;
} else {
inputNum = inputNum * 3 + 1;
}
++result;
}
// try {
//Thread.sleep(10);
//} catch (InterruptedException e) {
// TODO Auto-generated catch block
// e.printStackTrace();
//}
this.result=result;
return;
}
}
And the main class where I run the threads(yes for now I create two lists with the same numbers since after running with one thread the initial values are lost):
ThreadPoolExecutor executor = (ThreadPoolExecutor)Executors.newFixedThreadPool(1);
ThreadPoolExecutor executor2 = (ThreadPoolExecutor)Executors.newFixedThreadPool(4);
List<ParallelCollatz> tasks = new ArrayList<ParallelCollatz>();
for(int i=1; i<=1000000; i++) {
ParallelCollatz task = new ParallelCollatz();
task.setInputNum((long)(i+1000000));
tasks.add(task);
}
long startTime = System.nanoTime();
for(int i=0; i<1000000; i++) {
executor.execute(tasks.get(i));
}
executor.shutdown();
boolean tempFirst=false;
try {
tempFirst =executor.awaitTermination(5, TimeUnit.HOURS);
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
System.out.println("tempFirst " + tempFirst);
long endTime = System.nanoTime();
long durationInNano = endTime - startTime;
long durationInMillis = TimeUnit.NANOSECONDS.toMillis(durationInNano); //Total execution time in nano seconds
System.out.println("laikas " +durationInMillis);
List<ParallelCollatz> tasks2 = new ArrayList<ParallelCollatz>();
for(int i=1; i<=1000000; i++) {
ParallelCollatz task = new ParallelCollatz();
task.setInputNum((long)(i+1000000));
tasks2.add(task);
}
long startTime2 = System.nanoTime();
for(int i=0; i<1000000; i++) {
executor2.execute(tasks2.get(i));
}
executor2.shutdown();
boolean temp =false;
try {
temp=executor2.awaitTermination(5, TimeUnit.HOURS);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("temp "+ temp);
long endTime2 = System.nanoTime();
long durationInNano2 = endTime2 - startTime2;
long durationInMillis2 = TimeUnit.NANOSECONDS.toMillis(durationInNano2); //Total execution time in nano seconds
System.out.println("laikas2 " +durationInMillis2);
For example running with one thread it completes in 3280ms. Running with two threads 3437ms. Should I be considering another concurrent structure for calculating each element?
EDIT
Clarrification. I'm not trying to parallelize individual sequences, but an interval of numbers when each number has it's sequence.(Which is not related to other numbers)
EDIT2
Today I ran the program on a good PC with 6 cores and 12 logical processors and the issue persists. Does anyone have an idea where the problem might be? I also updated my code. 4 threads do worse than 2 threads for some reason.(even worse than 1 thread). I also applied what was given in the answer, but no change.
Another Edit
What I have noticed that if I put a Thread.sleep(1) in my ParallelCollatz method then the performance gradually increases with the thread count. Perhaps this detail tells someone what is wrong? However no matter how many tasks I give if there is no Thread.Sleep(1) 2 threads perform fastest 1 thread is in 2nd place and others hang arround a similiar number of milliseconds but slower both than 1 and 2 threads.
New Edit
I also tried putting more tasks(for cycle for calculating not 1 but 10 or 100 Collatz sequences) in the run() method of the Runnable class so that the thread itself would do more work. Unfortunately, this did not help as well.
Perhaps I'm launching the tasks incorrectly? Anyone any ideas?
EDIT
So it would seem that after adding more tasks to the run method fixes it a bit, but for more threads the issue still remains 8+. I still wonder is the cause of this is that it takes more time to create and run the threads than to execute the task? Or should I create a new post with this question?
You are not waiting for your tasks to complete, only measuring the time it takes to submit them to the executor.
executor.shutdown() does not wait for all tasks get finished.You need to call executor.awaitTermination after that.
executor.shutdown();
executor.awaitTermination(5, TimeUnit.HOURS);
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html#shutdown()
Update
I believe that our testing methodology is flawed. I repeated your test on my machine, (1 processor, 2 cores, 4 logical processors) and the the time measured from run to run differed wildly.
I believe the following are main reasons:
JVM startup & JIT compilation time. At the beginning, the code is running in interpreted mode.
result of calculation is ignored. I have no intuition what is removed by the JIT and what we are actually measuring.
printlines in code
To test this, I converted your test to JMH.
In particular:
I converted the runnable to a callable, and I return the sum of results to prevent inlining (alternativaly, you can use BlackHole from JMH)
My tasks have no state, I moved all moving parts to local variables. No GC is needed to cleanup the tasks.
I still create executors in each round. This is not perfect, but I decided to keep it as is.
The results I received below are consistent with my expectations: one core is waiting in the main thread, the work is performed on a single core, the numbers are rougly the same.
Benchmark Mode Cnt Score Error Units
SpeedTest.multipleThreads avgt 20 559.996 ± 20.181 ms/op
SpeedTest.singleThread avgt 20 562.048 ± 16.418 ms/op
Updated code:
public class ParallelCollatz implements Callable<Long> {
private final long inputNumInit;
public ParallelCollatz(long inputNumInit) {
this.inputNumInit = inputNumInit;
}
#Override
public Long call() {
long result = 1;
long inputNum = inputNumInit;
//main recursive computation
while (inputNum > 1) {
if (inputNum % 2 == 0) {
inputNum = inputNum / 2;
} else {
inputNum = inputNum * 3 + 1;
}
++result;
}
return result;
}
}
and the benchmark itself:
#State(Scope.Benchmark)
public class SpeedTest {
private static final int NUM_TASKS = 1000000;
private static List<ParallelCollatz> tasks = buildTasks();
#Benchmark
#Fork(value = 1, warmups = 1)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#SuppressWarnings("unused")
public long singleThread() throws Exception {
ThreadPoolExecutor executorOneThread = (ThreadPoolExecutor) Executors.newFixedThreadPool(1);
return measureTasks(executorOneThread, tasks);
}
#Benchmark
#Fork(value = 1, warmups = 1)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#SuppressWarnings("unused")
public long multipleThreads() throws Exception {
ThreadPoolExecutor executorMultipleThread = (ThreadPoolExecutor) Executors.newFixedThreadPool(4);
return measureTasks(executorMultipleThread, tasks);
}
private static long measureTasks(ThreadPoolExecutor executor, List<ParallelCollatz> tasks) throws InterruptedException, ExecutionException {
long sum = runTasksInExecutor(executor, tasks);
return sum;
}
private static long runTasksInExecutor(ThreadPoolExecutor executor, List<ParallelCollatz> tasks) throws InterruptedException, ExecutionException {
List<Future<Long>> futures = new ArrayList<>(NUM_TASKS);
for (int i = 0; i < NUM_TASKS; i++) {
Future<Long> f = executor.submit(tasks.get(i));
futures.add(f);
}
executor.shutdown();
boolean tempFirst = false;
try {
tempFirst = executor.awaitTermination(5, TimeUnit.HOURS);
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
long sum = 0l;
for (Future<Long> f : futures) {
sum += f.get();
}
//System.out.println(sum);
return sum;
}
private static List<ParallelCollatz> buildTasks() {
List<ParallelCollatz> tasks = new ArrayList<>();
for (int i = 1; i <= NUM_TASKS; i++) {
ParallelCollatz task = new ParallelCollatz((long) (i + NUM_TASKS));
tasks.add(task);
}
return tasks;
}
}

Java speedup processes / threads

I have a rather big ArrayList.
I have to go through every index, and do a expensive calculation
My first idea to speed it up was by putting it into a thread.
It works, but it is still extremely slow. I tinkered around the calculation, to make it less expensive, but its still to slow. The best solution i came up with is basically this one.
public void calculate(){
calculatePart(0);
calculatePart(1);
}
public void calculatePart(int offset) {
new Thread() {
#Override
public void run() {
int i = offset;
while(arrayList.size() > i) {
//Do the calulation
i +=2;
}
}
}.start();
}
Yet this feels like a lazy, unprofessional solution. That is why I'm asking if there is a cleaner and even faster solution
Assuming that doing task on each element doesn't lead to data races, you could leverage the power of parallelism. To maximize the number of computations occurring at the same time, you would have to give tasks to each of the processors available in your system.
In Java, you can get the number of processors (cores) available using this:
int parallelism = Runtime.getRuntime().availableProcessors();
The idea is to create number of threads equal to the available processors.
So, if you have 4 processors available, you can create 4 threads and ask them to process items at a gap of 4.Suppose you have a list of size 10, which needs to be processed in parallel.
Then,
Thread 1 processes items at index 0,4,8
Thread 2 processes items at index 1,5,9
Thread 3 processes items at index 2,6
Thread 4 processes items at index 3,7
I tried to simulate your scenario with the following code:
import java.util.Arrays;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class SpeedUpTest {
public static void main(String[] args) throws InterruptedException, ExecutionException {
long seqTime, twoThreadTime, multiThreadTime;
List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
long time = System.currentTimeMillis();
sequentialProcessing(list);
seqTime = System.currentTimeMillis() - time;
int parallelism = 2;
ExecutorService executorService = Executors.newFixedThreadPool(parallelism);
time = System.currentTimeMillis();
List<Future> tasks = new ArrayList<>();
for (int offset = 0; offset < parallelism; offset++) {
int finalParallelism = parallelism;
int finalOffset = offset;
Future task = executorService.submit(() -> {
int i = finalOffset;
while (list.size() > i) {
try {
processItem(list.get(i));
} catch (InterruptedException e) {
e.printStackTrace();
}
i += finalParallelism;
}
});
tasks.add(task);
}
for (Future task : tasks) {
task.get();
}
twoThreadTime = System.currentTimeMillis() - time;
parallelism = Runtime.getRuntime().availableProcessors();
executorService = Executors.newFixedThreadPool(parallelism);
tasks = new ArrayList<>();
time = System.currentTimeMillis();
for (int offset = 0; offset < parallelism; offset++) {
int finalParallelism = parallelism;
int finalOffset = offset;
Future task = executorService.submit(() -> {
int i = finalOffset;
while (list.size() > i) {
try {
processItem(list.get(i));
} catch (InterruptedException e) {
e.printStackTrace();
}
i += finalParallelism;
}
});
tasks.add(task);
}
for (Future task : tasks) {
task.get();
}
multiThreadTime = System.currentTimeMillis() - time;
log("RESULTS:");
log("Total time for sequential execution : " + seqTime / 1000.0 + " seconds");
log("Total time for execution with 2 threads: " + twoThreadTime / 1000.0 + " seconds");
log("Total time for execution with " + parallelism + " threads: " + multiThreadTime / 1000.0 + " seconds");
}
private static void log(String msg) {
System.out.println(msg);
}
private static void processItem(int index) throws InterruptedException {
Thread.sleep(5000);
}
private static void sequentialProcessing(List<Integer> list) throws InterruptedException {
for (int i = 0; i < list.size(); i++) {
processItem(list.get(i));
}
}
}
OUTPUT:
RESULTS:
Total time for sequential execution : 50.001 seconds
Total time for execution with 2 threads: 25.102 seconds
Total time for execution with 4 threads: 15.002 seconds
High theoretically speaking:
if you have X elements and your calculation must perform N operations on each one then
your computer(processor) must perform X*N operations total, then...
Parallel threads can make it faster only if in the calculation operations there are some of them when thread is waiting (e.g. File or Network operations). That time can be used by other threads. But if all operations are pure CPU (e.g. mathematics) and thread is not waiting - required time to perform X*N operations stays the same.
Also each tread must give other threads ability to take control over CPU at some point. It happens automatically between methods calls or if you have Thread.yield() call in your code.
as example method like:
public void run()
{
long a=0;
for (long i=1; i < Long.MAX_VALUE; i++)
{
a+=i;
}
}
will not give other thread a chance to take control over CPU until it fully completed and exited.

Multi threaded object creation slower then in a single thread

I have what probably is a basic question. When I create 100 million Hashtables it takes approximately 6 seconds (runtime = 6 seconds per core) on my machine if I do it on a single core. If I do this multi-threaded on 12 cores (my machine has 6 cores that allow hyperthreading) it takes around 10 seconds (runtime = 112 seconds per core).
This is the code I use:
Main
public class Tests
{
public static void main(String args[])
{
double start = System.currentTimeMillis();
int nThreads = 12;
double[] runTime = new double[nThreads];
TestsThread[] threads = new TestsThread[nThreads];
int totalJob = 100000000;
int jobsize = totalJob/nThreads;
for(int i = 0; i < threads.length; i++)
{
threads[i] = new TestsThread(jobsize,runTime, i);
threads[i].start();
}
waitThreads(threads);
for(int i = 0; i < runTime.length; i++)
{
System.out.println("Runtime thread:" + i + " = " + (runTime[i]/1000000) + "ms");
}
double end = System.currentTimeMillis();
System.out.println("Total runtime = " + (end-start) + " ms");
}
private static void waitThreads(TestsThread[] threads)
{
for(int i = 0; i < threads.length; i++)
{
while(threads[i].finished == false)//keep waiting untill the thread is done
{
//System.out.println("waiting on thread:" + i);
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
}
Thread
import java.util.HashMap;
import java.util.Map;
public class TestsThread extends Thread
{
int jobSize = 0;
double[] runTime;
boolean finished;
int threadNumber;
TestsThread(int job, double[] runTime, int threadNumber)
{
this.finished = false;
this.jobSize = job;
this.runTime = runTime;
this.threadNumber = threadNumber;
}
public void run()
{
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++)
{
double[] test = new double[65];
}
double end = System.nanoTime();
double difference = end-start;
runTime[threadNumber] += difference;
this.finished = true;
}
}
I do not understand why creating the object simultaneously in multiple threads takes longer per thread then doing it in serial in only 1 thread. If I remove the line where I create the Hashtable this problem disappears. If anyone could help me with this I would be greatly thankful.
Update: This problem has an associated bug report and has been fixed with Java 1.7u40. And it was never an issue for Java 1.8 as Java 8 has an entirely different hash table algorithm.
Since you are not using the created objects that operation will get optimized away. So you’re only measuring the overhead of creating threads. This is surely the more overhead the more threads you start.
I have to correct my answer regarding a detail, I didn’t know yet: there is something special with the classes Hashtable and HashMap. They both invoke sun.misc.Hashing.randomHashSeed(this) in the constructor. In other words, their instances escape during construction which has an impact on the memory visibility. This implies that their construction, unlike let’s say for an ArrayList, cannot optimized away, and multi-threaded construction slows down due to what happens inside that method (i.e. synchronization).
As said, that’s special to these classes and of course this implementation (my setup:1.7.0_13). For ordinary classes the construction time goes straight to zero for such code.
Here I add a more sophisticated benchmark code. Watch the difference between DO_HASH_MAP = true and DO_HASH_MAP = false (when false it will create an ArrayList instead which has no such special behavior).
import java.util.*;
import java.util.concurrent.*;
public class AllocBench {
static final int NUM_THREADS = 1;
static final int NUM_OBJECTS = 100000000 / NUM_THREADS;
static final boolean DO_HASH_MAP = true;
public static void main(String[] args) throws InterruptedException, ExecutionException {
ExecutorService threadPool = Executors.newFixedThreadPool(NUM_THREADS);
Callable<Long> task=new Callable<Long>() {
public Long call() {
return doAllocation(NUM_OBJECTS);
}
};
long startTime=System.nanoTime(), cpuTime=0;
for(Future<Long> f: threadPool.invokeAll(Collections.nCopies(NUM_THREADS, task))) {
cpuTime+=f.get();
}
long time=System.nanoTime()-startTime;
System.out.println("Number of threads: "+NUM_THREADS);
System.out.printf("entire allocation required %.03f s%n", time*1e-9);
System.out.printf("time x numThreads %.03f s%n", time*1e-9*NUM_THREADS);
System.out.printf("real accumulated cpu time %.03f s%n", cpuTime*1e-9);
threadPool.shutdown();
}
static long doAllocation(int numObjects) {
long t0=System.nanoTime();
for(int i=0; i<numObjects; i++)
if(DO_HASH_MAP) new HashMap<Object, Object>(); else new ArrayList<Object>();
return System.nanoTime()-t0;
}
}
What about if you do it on 6 cores? Hyperthreading isn't the exact same as having double the cores, so you might want to try the amount of real cores too.
Also the OS won't necessarily schedule each of your threads to their own cores.
Since all you are doing is measuring the time and churning memory, your bottleneck is likely to be in your L3 cache or bus to main memory. In this cases, coordinating the work between threads could be producing so much overhead it is worse instead of better.
This is too long for a comment but your inner loop can be just
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++){
Map<String,Integer> test = new HashMap<String,Integer>();
}
// runtime is an AtomicLong for thread safety
runtime.addAndGet(System.nanoTime() - start); // time in nano-seconds.
Taking the time can be as slow creating a HashMap so you might not be measuring what you think you if you call the timer too often.
BTW Hashtable is synchronized and you might find using HashMap is faster, and possibly more scalable.

what is wrong with this java thread program

I am trying to figure out what is wrong with this thread program in java. Can anyone shed some light? Here is the code:
public class Z {
private Account account = new Account();
private Thread[] thread = new Thread[100];
public static void main(String[] args) {
Z test = new Z();
System.out.println("What is balance ? " + test.account.getBalance());
}
public Z() {
ThreadGroup g = new ThreadGroup("group");
boolean done = false;
// Create and launch 100 threads
for (int i = 0; i < 100; i++) {
thread[i] = new Thread(g, new AddAPennyThread(), "t" + i);
thread[i].start();
System.out.println("depositor: " + thread[i].getName());
}
// Check if all the threads are finished
while (!done)
if (g.activeCount() == 0)
done = true;
}
// A thread for adding a penny to the account
class AddAPennyThread extends Thread {
public void run() {
account.deposit(1);
}
}
// An inner class for account
class Account {
private int balance = 0;
public int getBalance() {
return balance;
}
public void deposit(int amount) {
int newBalance = balance + amount;
balance = newBalance;
}
}
}
It compiles and runs fine. It was a test question I missed and wnated to know what is actually wrong with it. Thanks!
There is not a single bit dedicated to synchronization of 100 threads all working on exactly one (1!!!) piece of data.
Anything can happen. "Anything" includes that the code as it is works most of the time due some "coincidences":
The task at hand is quite small. (only an add)
The tasks are created and immediately started.
There is a small delay between two "create+start" pair: the System.out.println.
This adds up to: This might work in most test-runs. But it is an incorrect and non-deterministic program.
[Tread1] int newBalance = balance + amount;
[Tread2] int newBalance = balance + amount;
[Tread1] balance = newBalance;
[Tread2] balance = newBalance;
public synchronized void deposit(int amount)
balance should be a private volatile int (so that Java knows to never cache it - its value is liable to change between different thread accesses without the next thread knowing) and make Account#deposit(int amount) a public synchronized void (so that Java makes the method body a critical region and prevents simultaneous access of any of the objects it touches, ensuring the integrity of the value of balance).
Additionally, instantiating 100 threads yourself with new introduces a lot of overhead - though I appreciate this is just a toy example, a more efficient way to do this would be to use a Java thread pool.

Categories

Resources