I am trying to do a multi thread simulator where there are workers (threads) and jobs to solve, so every thread has to solve a job and start to solve the next
in order, the integer of the job is the time in seconds that is required to solve the job, this is a simulation so the code prints the index of the thread with
the initialization time of the job but it hasn't to be sleeping that number of seconds.
The problem is that i'm getting a NullPointerException only when there are a lot of jobs with the same number like
4 12 (4 threads for 12 jobs)
1 1 1 1 1 1 1 1 1 1 1 1 (12 jobs that require 1 second to be completed) it launches the exception in this part:
if (workersReady.size()>1) {
bestWorker = workersReady.iterator().next();
workersReady.remove(bestWorker);
workersReadyAtTimeT.remove(currentTime);
workersReadyAtTimeT.put(currentTime,workersReady);
nextTimesQueue.add(currentTime);
The input has to be like this:
First line:
2 5 It means that there are two threads(workers) for 5 jobs
Press enter and write the second line:
1 2 3 4 5 This is the jobs that are an integer which means the time cost of processing that job so the output after press enter will be this:
0 0 The two threads try to simultaneously take jobs from the list, so thread with index 0 actually
takes the first job and starts working on it at the moment 0
1 0 Thread with index 1 takes the first job and starts working on it at the moment 0
0 1 After 1 second, thread 0 is done with the first job and takes the third job from the list, and
starts processing it immediately at time 1.
1 2 One second later, thread 1 is done with the second job and takes the fourth job from the list, and starts processing it immediately at time 2
0 4 Finally, after 2 more seconds, thread 0 is done with the third job and takes the fifth job from the list, and starts processing it immediately at time 4
This is the code:
import java.io.*;
import java.util.HashMap;
import java.util.HashSet;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.StringTokenizer;
public class JobQueue {
private int numWorkers;
private int[] jobs;
private int[] assignedWorker;
private long[] startTime;
private FastScanner in;
private PrintWriter out;
public static void main(String[] args) throws IOException {
new JobQueue().solve();
}
private void readData() throws IOException {
numWorkers = in.nextInt();
int m = in.nextInt();
jobs = new int[m];
for (int i = 0; i < m; ++i) {
jobs[i] = in.nextInt();
}
}
private void writeResponse() {
for (int i = 0; i < jobs.length; ++i) {
out.println(assignedWorker[i] + " " + startTime[i]);
}
}
private void assignJobs() {
// TODO: replace this code with a faster algorithm.
assignedWorker = new int[jobs.length];
startTime = new long[jobs.length];
PriorityQueue<Integer> nextTimesQueue = new PriorityQueue<Integer>();
HashMap<Integer, Set<Integer>> workersReadyAtTimeT = new HashMap<Integer,Set<Integer>>();
long[] nextFreeTime = new long[numWorkers];
int duration = 0;
int bestWorker = 0;
for (int i = 0; i < jobs.length; i++) {
duration = jobs[i];
if(i<numWorkers) {
bestWorker = i;
nextTimesQueue.add(duration);
addToSet(workersReadyAtTimeT, duration, i,0);
}else {
int currentTime = nextTimesQueue.poll();
Set<Integer> workersReady = workersReadyAtTimeT.get(currentTime);
if (workersReady.size()>1) {
bestWorker = workersReady.iterator().next();
workersReady.remove(bestWorker);
workersReadyAtTimeT.remove(currentTime);
workersReadyAtTimeT.put(currentTime,workersReady);
nextTimesQueue.add(currentTime);
} else {
bestWorker = workersReady.iterator().next();
workersReadyAtTimeT.remove(currentTime);
nextTimesQueue.add(currentTime+duration);
addToSet(workersReadyAtTimeT, duration, bestWorker, currentTime);
}
}
assignedWorker[i] = bestWorker;
startTime[i] = nextFreeTime[bestWorker];
nextFreeTime[bestWorker] += duration;
}
}
private void addToSet(HashMap<Integer, Set<Integer>> workersReadyAtTimeT, int duration, int worker, int current) {
if(workersReadyAtTimeT.get(current+duration)==null) {
HashSet<Integer> s = new HashSet<Integer>();
s.add(worker);
workersReadyAtTimeT.put(current+duration, s);
}else {
Set<Integer> s = workersReadyAtTimeT.get(current+duration);
s.add(worker);
workersReadyAtTimeT.put(current+duration,s);
}
}
public void solve() throws IOException {
in = new FastScanner();
out = new PrintWriter(new BufferedOutputStream(System.out));
readData();
assignJobs();
writeResponse();
out.close();
}
static class FastScanner {
private BufferedReader reader;
private StringTokenizer tokenizer;
public FastScanner() {
reader = new BufferedReader(new InputStreamReader(System.in));
tokenizer = null;
}
public String next() throws IOException {
while (tokenizer == null || !tokenizer.hasMoreTokens()) {
tokenizer = new StringTokenizer(reader.readLine());
}
return tokenizer.nextToken();
}
public int nextInt() throws IOException {
return Integer.parseInt(next());
}
}
}
Edit: I used a ConcurentHashMap and still launching NullPointer
HashMap is not threadsafe.
If you interact with a hashmap from multiple threads without 'external' synchronization, then the spec of HashMap says anything is fair game. If your computer starts playing Yankee Doodle Dandee, that would be compatible with the spec, and no bug report would be accepted on that account.
In other words, you MUST take care of this yourself.
Usually, the right move is to use ConcurrentHashMap instead (from the extremely useful java.util.concurrent package), and so it is here.
If you must, you can externally synchronize as well. For example:
synchronized (workersReady) {
// interact with workersReady here
}
but synchronized is a pretty clumsy cudgel to use here, and may well remove most/all of the benefits of trying to multithread this stuff.
Note that a 'pool of workers' sounds more like a job for e.g. ExecutorPool. Make sure to check the j.u.c package, I'm pretty sure it has something much more appropriate so you can delete most of what you wrote and use a carefully tweaked solution, pre-tested and optimized.
Maybe look at ConcurrentHashMap.
Related
I want to generate a list of unique random numbers from a given input range using threads in Java. For example, given a range of 1-4, I would run 4 threads and each thread would generate a random number such that no two threads would produce the same value twice. I presume I need to implement some synchronization or something? I've tried using Join() but it doesn't seem to work.
My constructor uses input values to populate an array list with a given range. In the run method, I generate a random value (from the same range) and check if it's in the list. If it is, I remove it from the list and print the value. The idea is that when another thread comes in, it can't generate that same value again.
Here is what I have so far:
public class Main {
public static void main(String[] args) {
randomThreadGen randomRange = new randomThreadGen(1, 2);
Thread thread1 = new Thread(randomRange);
Thread thread2 = new Thread(randomRange);
thread1.start();
try {
thread1.join();
} catch (InterruptedException e) {
}
thread2.start();
}
}
And this:
public class randomThreadGen implements Runnable {
private int lowerBound;
private int upperBound;
private final ArrayList<Integer> List = new ArrayList<Integer>();
public randomThreadGen(int lowerb, int upperb) {
this.lowerBound = lowerb;
this.upperBound = upperb;
for (int i = lowerb; i < upperb + 1; i++) { // populate list with values based on lower and upperbounds specified from main
List.add(i);
}
}
#Override
public void run() {
// generate random value
// check if in list. If in list, remove it
// print value
// otherwise try again
int val = ThreadLocalRandom.current().nextInt(lowerBound, upperBound+1); // generate random value based on lower and upper bound inputs from main
while(true){
if(List.contains(val)){
List.remove(new Integer(val));
System.out.println("Random value for " + Thread.currentThread().getName() + " " + val);
System.out.println("List values: " + List);
}
break;
}
}
}'''
This test case with a low range is to make testing easy. Sometimes it works, and Thread0 will generate a different value to Thread01 (1 and 2 or 2 and 1 for example). But sometimes it doesn't (seemingly they generate the same value, in which case my code only prints one value) For example, "Thread02 1" and nothing else.
Any ideas? Is there another way to do this other than join()?
It's quite an easy task. Just use a concurrent hashmap to prevent duplicates. Make sure to declare boundary int and the hashmap as final. Thread.join is needed to guarantee that the results will be printed after all threads have complete their work. There are other effective techniques to replace join but they are not for novices.
Try this:
import java.util.concurrent.ThreadLocalRandom;
import java.util.*;
import java.util.concurrent.*;
public class Main {
final static int low = 0;
final static int up = 5;
final static Set < Integer > inthashmap = ConcurrentHashMap.newKeySet();
// threadhashmap is needed to track down all threads generating ints
final static Set < Thread > threadhashmap = ConcurrentHashMap.newKeySet();
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < up - low + 1; i++) {
Thread t = new Thread() {
public void run() {
int randomNum;
try {
randomNum = ThreadLocalRandom.current().nextInt(low, up + 1);
inthashmap.add(randomNum);
System.out.println("A new random int generated : " + randomNum);
} finally {
}
}
};
threadhashmap.add(t);
t.start();
}
//by iterating through all threads in threadhashmap
// and joining them we guarantee that all threads were completed
// before we print the results of work of those threads (i.e. ints)
Iterator<Thread> iterator = threadhashmap.iterator();
while (iterator.hasNext())
iterator.next().join();
System.out.println("Unique ints from hashmap:");
inthashmap.forEach(System.out::println);
}
}
Output:
A new random int generated : 2
A new random int generated : 3
A new random int generated : 3
A new random int generated : 0
A new random int generated : 0
A new random int generated : 2
Unique ints from hashmap:
0
2
3
I am trying to do a multi thread simulator where there are workers (threads) and jobs to solve, so every thread has to solve a job and start to solve the next
in order, the integer of the job is the time in seconds that is required to solve the job, this is a simulation so the code prints the index of the thread with
the initialization time of the job but it hasn't to be sleeping that number of seconds.
The problem is that i'm getting a NullPointerException only when there are a lot of jobs with the same number like
4 12 (4 threads for 12 jobs)
1 1 1 1 1 1 1 1 1 1 1 1 (12 jobs that require 1 second to be completed) it launches the exception in this part:
if (workersReady.size()>1) {
bestWorker = workersReady.iterator().next();
workersReady.remove(bestWorker);
workersReadyAtTimeT.remove(currentTime);
workersReadyAtTimeT.put(currentTime,workersReady);
nextTimesQueue.add(currentTime);
The input has to be like this:
First line:
2 5 It means that there are two threads(workers) for 5 jobs
Press enter and write the second line:
1 2 3 4 5 This is the jobs that are an integer which means the time cost of processing that job so the output after press enter will be this:
0 0 The two threads try to simultaneously take jobs from the list, so thread with index 0 actually
takes the first job and starts working on it at the moment 0
1 0 Thread with index 1 takes the first job and starts working on it at the moment 0
0 1 After 1 second, thread 0 is done with the first job and takes the third job from the list, and
starts processing it immediately at time 1.
1 2 One second later, thread 1 is done with the second job and takes the fourth job from the list, and starts processing it immediately at time 2
0 4 Finally, after 2 more seconds, thread 0 is done with the third job and takes the fifth job from the list, and starts processing it immediately at time 4
This is the code:
import java.io.*;
import java.util.HashMap;
import java.util.HashSet;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.StringTokenizer;
import java.util.concurrent.ConcurrentHashMap;
public class JobQueue {
private int numWorkers;
private int[] jobs;
private int[] assignedWorker;
private long[] startTime;
private FastScanner in;
private PrintWriter out;
public static void main(String[] args) throws IOException {
new JobQueue().solve();
}
private void readData() throws IOException {
numWorkers = in.nextInt();
int m = in.nextInt();
jobs = new int[m];
for (int i = 0; i < m; ++i) {
jobs[i] = in.nextInt();
}
}
private void writeResponse() {
for (int i = 0; i < jobs.length; ++i) {
out.println(assignedWorker[i] + " " + startTime[i]);
}
}
private void assignJobs() {
// TODO: replace this code with a faster algorithm.
assignedWorker = new int[jobs.length];
startTime = new long[jobs.length];
PriorityQueue<Integer> nextTimesQueue = new PriorityQueue<Integer>();
ConcurrentHashMap<Integer, Set<Integer>> workersReadyAtTimeT = new ConcurrentHashMap<Integer,Set<Integer>>();
long[] nextFreeTime = new long[numWorkers];
int duration = 0;
int bestWorker = 0;
for (int i = 0; i < jobs.length; i++) {
duration = jobs[i];
if(i<numWorkers) {
bestWorker = i;
nextTimesQueue.add(duration);
addToSet(workersReadyAtTimeT, duration, i,0);
}else {
int currentTime = nextTimesQueue.poll();
Set<Integer> workersReady = workersReadyAtTimeT.get(currentTime);
if (workersReady.size()>1) {
bestWorker = workersReady.iterator().next();
workersReady.remove(bestWorker);
workersReadyAtTimeT.remove(currentTime);
workersReadyAtTimeT.put(currentTime,workersReady);
nextTimesQueue.add(currentTime);
} else {
bestWorker = workersReady.iterator().next();
workersReadyAtTimeT.remove(currentTime);
nextTimesQueue.add(currentTime+duration);
addToSet(workersReadyAtTimeT, duration, bestWorker, currentTime);
}
}
assignedWorker[i] = bestWorker;
startTime[i] = nextFreeTime[bestWorker];
nextFreeTime[bestWorker] += duration;
}
}
private void addToSet(ConcurrentHashMap<Integer, Set<Integer>> workersReadyAtTimeT, int duration, int worker, int current) {
if(workersReadyAtTimeT.get(current+duration)==null) {
HashSet<Integer> s = new HashSet<Integer>();
s.add(worker);
workersReadyAtTimeT.put(current+duration, s);
}else {
Set<Integer> s = workersReadyAtTimeT.get(current+duration);
s.add(worker);
workersReadyAtTimeT.put(current+duration,s);
}
}
public void solve() throws IOException {
in = new FastScanner();
out = new PrintWriter(new BufferedOutputStream(System.out));
readData();
assignJobs();
writeResponse();
out.close();
}
static class FastScanner {
private BufferedReader reader;
private StringTokenizer tokenizer;
public FastScanner() {
reader = new BufferedReader(new InputStreamReader(System.in));
tokenizer = null;
}
public String next() throws IOException {
while (tokenizer == null || !tokenizer.hasMoreTokens()) {
tokenizer = new StringTokenizer(reader.readLine());
}
return tokenizer.nextToken();
}
public int nextInt() throws IOException {
return Integer.parseInt(next());
}
}
}
Edit: I used a ConcurentHashMap and still launching NullPointer
According to the java standard longs are written in two parts, and it is possible in one thread to read a number that was never written b/c it consists of the first part of one write and the second of another (https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.7). I have tried to write a program that shows this happening; but it never happens. Do I misunderstand the standard, or is there an error in my example program.
In the program below if tearing happens, we should get an output
[main] INFO net.kasterma.basicjava.TearingTest2 - compare false
This has not happened in many runs.
package net.kasterma.basicjava;
import lombok.extern.slf4j.Slf4j;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
#Slf4j
public class TearingTest2 extends Thread {
private final boolean read;
private static long i = 0;
private final static int ITERATIONS = 1_000_000;
private final static long[] read_i = new long[ITERATIONS];
private final static long[] write_i = new long[ITERATIONS];
private final static Random random = new Random();
TearingTest2(boolean read) {
this.read = read;
}
public void run() {
if (read) {
for (int iter = 0; iter < ITERATIONS; iter++) {
// log.info("read {}", iter);
read_i[iter] = i;
}
} else {
for (int iter = 0; iter < ITERATIONS; iter++) {
// log.info("write {}", iter);
i = random.nextLong();
write_i[iter] = i;
}
}
}
static boolean compare() {
Set<Long> writes = new HashSet<>();
writes.add(0L);
Arrays.stream(write_i).forEach(l -> writes.add(l));
for (int iter = 0; iter < ITERATIONS; iter++) {
if (!writes.contains(read_i[iter])) {
log.info("not found {}", iter);
return false; // <--- tearing has happened.
}
}
// compute some statistics for debugging of the program
Set<Long> reads = new HashSet<>();
Arrays.stream(read_i).forEach(l -> reads.add(l));
log.info("Number of read values {}", reads.size());
int ct = 0;
for (int iter = 0; iter < ITERATIONS; iter++) {
if (read_i[iter] == 0) {
ct++;
}
}
log.info("number zeros {}", ct);
return true;
}
public static void main(String[] args) throws InterruptedException {
Thread T1 = new TearingTest2(true);
Thread T2 = new TearingTest2(false);
T1.start();
T2.start();
T1.join();
T2.join();
log.info("compare {}", compare());
}
}
The output of a run is:
[main] INFO net.kasterma.basicjava.TearingTest2 - Number of read values 105328
[main] INFO net.kasterma.basicjava.TearingTest2 - number zeros 1575
[main] INFO net.kasterma.basicjava.TearingTest2 - compare true
From the same article regarding standard that you've mentioned
Some implementations may find it convenient to divide a single write action on a 64-bit long or double value into two write actions on adjacent 32-bit values. For efficiency's sake, this behavior is implementation-specific; an implementation of the Java Virtual Machine is free to perform writes to long and double values atomically or in two parts.
Implementations of the Java Virtual Machine are encouraged to avoid splitting 64-bit values where possible.
So I think for some JVMs you may never see such split because of how JVM is implemented
I have what probably is a basic question. When I create 100 million Hashtables it takes approximately 6 seconds (runtime = 6 seconds per core) on my machine if I do it on a single core. If I do this multi-threaded on 12 cores (my machine has 6 cores that allow hyperthreading) it takes around 10 seconds (runtime = 112 seconds per core).
This is the code I use:
Main
public class Tests
{
public static void main(String args[])
{
double start = System.currentTimeMillis();
int nThreads = 12;
double[] runTime = new double[nThreads];
TestsThread[] threads = new TestsThread[nThreads];
int totalJob = 100000000;
int jobsize = totalJob/nThreads;
for(int i = 0; i < threads.length; i++)
{
threads[i] = new TestsThread(jobsize,runTime, i);
threads[i].start();
}
waitThreads(threads);
for(int i = 0; i < runTime.length; i++)
{
System.out.println("Runtime thread:" + i + " = " + (runTime[i]/1000000) + "ms");
}
double end = System.currentTimeMillis();
System.out.println("Total runtime = " + (end-start) + " ms");
}
private static void waitThreads(TestsThread[] threads)
{
for(int i = 0; i < threads.length; i++)
{
while(threads[i].finished == false)//keep waiting untill the thread is done
{
//System.out.println("waiting on thread:" + i);
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
}
Thread
import java.util.HashMap;
import java.util.Map;
public class TestsThread extends Thread
{
int jobSize = 0;
double[] runTime;
boolean finished;
int threadNumber;
TestsThread(int job, double[] runTime, int threadNumber)
{
this.finished = false;
this.jobSize = job;
this.runTime = runTime;
this.threadNumber = threadNumber;
}
public void run()
{
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++)
{
double[] test = new double[65];
}
double end = System.nanoTime();
double difference = end-start;
runTime[threadNumber] += difference;
this.finished = true;
}
}
I do not understand why creating the object simultaneously in multiple threads takes longer per thread then doing it in serial in only 1 thread. If I remove the line where I create the Hashtable this problem disappears. If anyone could help me with this I would be greatly thankful.
Update: This problem has an associated bug report and has been fixed with Java 1.7u40. And it was never an issue for Java 1.8 as Java 8 has an entirely different hash table algorithm.
Since you are not using the created objects that operation will get optimized away. So you’re only measuring the overhead of creating threads. This is surely the more overhead the more threads you start.
I have to correct my answer regarding a detail, I didn’t know yet: there is something special with the classes Hashtable and HashMap. They both invoke sun.misc.Hashing.randomHashSeed(this) in the constructor. In other words, their instances escape during construction which has an impact on the memory visibility. This implies that their construction, unlike let’s say for an ArrayList, cannot optimized away, and multi-threaded construction slows down due to what happens inside that method (i.e. synchronization).
As said, that’s special to these classes and of course this implementation (my setup:1.7.0_13). For ordinary classes the construction time goes straight to zero for such code.
Here I add a more sophisticated benchmark code. Watch the difference between DO_HASH_MAP = true and DO_HASH_MAP = false (when false it will create an ArrayList instead which has no such special behavior).
import java.util.*;
import java.util.concurrent.*;
public class AllocBench {
static final int NUM_THREADS = 1;
static final int NUM_OBJECTS = 100000000 / NUM_THREADS;
static final boolean DO_HASH_MAP = true;
public static void main(String[] args) throws InterruptedException, ExecutionException {
ExecutorService threadPool = Executors.newFixedThreadPool(NUM_THREADS);
Callable<Long> task=new Callable<Long>() {
public Long call() {
return doAllocation(NUM_OBJECTS);
}
};
long startTime=System.nanoTime(), cpuTime=0;
for(Future<Long> f: threadPool.invokeAll(Collections.nCopies(NUM_THREADS, task))) {
cpuTime+=f.get();
}
long time=System.nanoTime()-startTime;
System.out.println("Number of threads: "+NUM_THREADS);
System.out.printf("entire allocation required %.03f s%n", time*1e-9);
System.out.printf("time x numThreads %.03f s%n", time*1e-9*NUM_THREADS);
System.out.printf("real accumulated cpu time %.03f s%n", cpuTime*1e-9);
threadPool.shutdown();
}
static long doAllocation(int numObjects) {
long t0=System.nanoTime();
for(int i=0; i<numObjects; i++)
if(DO_HASH_MAP) new HashMap<Object, Object>(); else new ArrayList<Object>();
return System.nanoTime()-t0;
}
}
What about if you do it on 6 cores? Hyperthreading isn't the exact same as having double the cores, so you might want to try the amount of real cores too.
Also the OS won't necessarily schedule each of your threads to their own cores.
Since all you are doing is measuring the time and churning memory, your bottleneck is likely to be in your L3 cache or bus to main memory. In this cases, coordinating the work between threads could be producing so much overhead it is worse instead of better.
This is too long for a comment but your inner loop can be just
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++){
Map<String,Integer> test = new HashMap<String,Integer>();
}
// runtime is an AtomicLong for thread safety
runtime.addAndGet(System.nanoTime() - start); // time in nano-seconds.
Taking the time can be as slow creating a HashMap so you might not be measuring what you think you if you call the timer too often.
BTW Hashtable is synchronized and you might find using HashMap is faster, and possibly more scalable.
I've programmed a (very simple) benchmark in Java. It simply increments a double value up to a specified value and takes the time.
When I use this singlethreaded or with a low amount of threads (up to 100) on my 6-core desktop, the benchmark returns reasonable and repeatable results.
But when I use for example 1200 threads, the average multicore duration is significantly lower than the singlecore duration (about 10 times or more). I've made sure that the total amount of incrementations is the same, no matter how much threads I use.
Why does the performance drop so much with more threads? Is there a trick to solve this problem?
I'm posting my source, but I don't think, that there is a problem.
Benchmark.java:
package sibbo.benchmark;
import java.text.DecimalFormat;
import java.util.LinkedList;
import java.util.List;
public class Benchmark implements TestFinishedListener {
private static final double TARGET = 1e10;
private static final int THREAD_MULTIPLICATOR = 2;
public static void main(String[] args) throws InterruptedException {
Benchmark b = new Benchmark(TARGET);
b.start();
}
private int coreCount;
private List<Worker> workers = new LinkedList<>();
private List<Worker> finishedWorkers = new LinkedList<>();
private double target;
public Benchmark(double target) {
this.target = target;
getSystemInfos();
printInfos();
}
private void getSystemInfos() {
coreCount = Runtime.getRuntime().availableProcessors();
}
private void printInfos() {
System.out.println("Usable cores: " + coreCount);
System.out.println("Multicore threads: " + coreCount * THREAD_MULTIPLICATOR);
System.out.println("Loops per core: " + new DecimalFormat("###,###,###,###,##0").format(TARGET));
System.out.println();
}
public synchronized void start() throws InterruptedException {
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
System.out.print("Initializing singlecore benchmark... ");
Worker w = new Worker(this, 0);
workers.add(w);
Thread.sleep(1000);
System.out.println("finished");
System.out.print("Running singlecore benchmark... ");
w.runBenchmark(target);
wait();
System.out.println("finished");
printResult();
System.out.println();
// Multicore
System.out.print("Initializing multicore benchmark... ");
finishedWorkers.clear();
for (int i = 0; i < coreCount * THREAD_MULTIPLICATOR; i++) {
workers.add(new Worker(this, i));
}
Thread.sleep(1000);
System.out.println("finished");
System.out.print("Running multicore benchmark... ");
for (Worker worker : workers) {
worker.runBenchmark(target / THREAD_MULTIPLICATOR);
}
wait();
System.out.println("finished");
printResult();
Thread.currentThread().setPriority(Thread.NORM_PRIORITY);
}
private void printResult() {
DecimalFormat df = new DecimalFormat("###,###,###,##0.000");
long min = -1, av = 0, max = -1;
int threadCount = 0;
boolean once = true;
System.out.println("Result:");
for (Worker w : finishedWorkers) {
if (once) {
once = false;
min = w.getTime();
max = w.getTime();
}
if (w.getTime() > max) {
max = w.getTime();
}
if (w.getTime() < min) {
min = w.getTime();
}
threadCount++;
av += w.getTime();
if (finishedWorkers.size() <= 6) {
System.out.println("Worker " + w.getId() + ": " + df.format(w.getTime() / 1e9) + "s");
}
}
System.out.println("Min: " + df.format(min / 1e9) + "s, Max: " + df.format(max / 1e9) + "s, Av per Thread: "
+ df.format((double) av / threadCount / 1e9) + "s");
}
#Override
public synchronized void testFinished(Worker w) {
workers.remove(w);
finishedWorkers.add(w);
if (workers.isEmpty()) {
notify();
}
}
}
Worker.java:
package sibbo.benchmark;
public class Worker implements Runnable {
private double value = 0;
private long time;
private double target;
private TestFinishedListener l;
private final int id;
public Worker(TestFinishedListener l, int id) {
this.l = l;
this.id = id;
new Thread(this).start();
}
public int getId() {
return id;
}
public synchronized void runBenchmark(double target) {
this.target = target;
notify();
}
public long getTime() {
return time;
}
#Override
public void run() {
synWait();
value = 0;
long startTime = System.nanoTime();
while (value < target) {
value++;
}
long endTime = System.nanoTime();
time = endTime - startTime;
l.testFinished(this);
}
private synchronized void synWait() {
try {
wait();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
You need to understand that the OS (or Java thread scheduler, or both) is trying to balance between all of the threads in your application to give them all a chance to perform some work, and there is a non-zero cost to switch between threads. With 1200 threads, you have just reached (and probably far exceeded) the tipping point wherein the processor is spending more time context switching than doing actual work.
Here is a rough analogy:
You have one job to do in room A. You stand in room A for 8 hours a day, and do your job.
Then your boss comes by and tells you that you have to do a job in room B also. Now you need to periodically leave room A, walk down the hall to room B, and then walk back. That walking takes 1 minute per day. Now you spend 3 hours, 59.5 minutes working on each job, and one minute walking between rooms.
Now imagine that you have 1200 rooms to work in. You are going to spend more time walking between rooms than doing actual work. This is the situation that you have put your processor into. It is spending so much time switching between contexts that no real work gets done.
EDIT: Now, as per the comments below, maybe you spend a fixed amount of time in each room before moving on- your work will progress, but the number of context switches between rooms still affects the overall runtime of a single task.
Ok, I think I've found my problem, but until now, no solution.
When measuring the time every thread runs to do his part of the work, there are different possible minimums for different total amounts of threads. The maximum is the same everytime. In case that a thread is started first and then is paused very often and finishes last. For example this maximum value could be 10 seconds. Assuming that the total amount of operations that is done by every thread stays the same, no matter how much threads I use, the amount of operations that is done by a single thread has to be changed when using a different amount of threads. For example, using one thread, it has to do 1000 operations, but using ten threads, everyone of them has to do just 100 operations. Now, using ten threads, the minimum amount of time that one thread can use is much lower than using one thread. So calculating the average amount of time every thread needs to do his work is nonsense. The minimum using ten Threads would be 1 second. This happens if one thread does its work without interruption.
EDIT
The solution would be to simply measure the amount of time between the start of the first thread and the completion of the last.