In Java, I have simple multithreaded code:
public class ThreadedAlgo {
public static final int threadsCount = 3;
public static void main(String[] args) {
// start timer prior computation
time = System.currentTimeMillis();
// create threads
Thread[] threads = new Thread[threadsCount];
class ToDo implements Runnable {
public void run() { ... }
}
// create job objects
for (int i = 0; i < threadsCount; i++) {
ToDo job = new ToDo();
threads[i] = new Thread(job);
}
// start threads
for (int i = 0; i < threadsCount; i++) {
threads[i].start();
}
// wait for threads above to finish
for (int i = 0; i < threadsCount; i++) {
try {
threads[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
// display time after computation
System.out.println("Execution time: " + (System.currentTimeMillis() - time));
}
}
It works fine, now I want to run it for 2 or 3 threads and compute the time spent for computation of each thread. Then I will compare times: note them by t1 and t2, and if |t1 - t2| < small epsilon, I will say that my algorithm performs with fine granularity under some given conditions, that is the time spent by threads is relatively the same.
How can I measure the time of a thread?
Use System.nanoTime() at the beginning and end of the thread (job) methods to calculate the total time spent in each invocation. In your case, all threads will be executed with the same (default) priority, where time slices should be distributed pretty fair. If your threads are interlocked, use 'fair locks' for the same reason; e.g. new ReentrantLock(true);
Add the timing logic inside your Run methods
Related
The task I'm trying to implement is finding Collatz sequence for numbers in a set interval using several threads and seeing how much improvement is gained compared to one thread.
However one thread is always faster no matter if it I choose 2 threads(edit. 2 threads are faster, but not by much while 4 threads is slower than 1 thread and I have no idea why.(I could even say that the more threads the slower it gets). I hope someone can explain. Maybe I'm doing something wrong.
Below is my code that I wrote so far. I'm using ThreadPoolExecutor for executing the tasks(one task = one Collatz sequence for one number in the interval).
The Collatz class:
public class ParallelCollatz implements Runnable {
private long result;
private long inputNum;
public long getResult() {
return result;
}
public void setResult(long result) {
this.result = result;
}
public long getInputNum() {
return inputNum;
}
public void setInputNum(long inputNum) {
this.inputNum = inputNum;
}
public void run() {
//System.out.println("number:" + inputNum);
//System.out.println("Thread:" + Thread.currentThread().getId());
//int j=0;
//if(Thread.currentThread().getId()==11) {
// ++j;
// System.out.println(j);
//}
long result = 1;
//main recursive computation
while (inputNum > 1) {
if (inputNum % 2 == 0) {
inputNum = inputNum / 2;
} else {
inputNum = inputNum * 3 + 1;
}
++result;
}
// try {
//Thread.sleep(10);
//} catch (InterruptedException e) {
// TODO Auto-generated catch block
// e.printStackTrace();
//}
this.result=result;
return;
}
}
And the main class where I run the threads(yes for now I create two lists with the same numbers since after running with one thread the initial values are lost):
ThreadPoolExecutor executor = (ThreadPoolExecutor)Executors.newFixedThreadPool(1);
ThreadPoolExecutor executor2 = (ThreadPoolExecutor)Executors.newFixedThreadPool(4);
List<ParallelCollatz> tasks = new ArrayList<ParallelCollatz>();
for(int i=1; i<=1000000; i++) {
ParallelCollatz task = new ParallelCollatz();
task.setInputNum((long)(i+1000000));
tasks.add(task);
}
long startTime = System.nanoTime();
for(int i=0; i<1000000; i++) {
executor.execute(tasks.get(i));
}
executor.shutdown();
boolean tempFirst=false;
try {
tempFirst =executor.awaitTermination(5, TimeUnit.HOURS);
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
System.out.println("tempFirst " + tempFirst);
long endTime = System.nanoTime();
long durationInNano = endTime - startTime;
long durationInMillis = TimeUnit.NANOSECONDS.toMillis(durationInNano); //Total execution time in nano seconds
System.out.println("laikas " +durationInMillis);
List<ParallelCollatz> tasks2 = new ArrayList<ParallelCollatz>();
for(int i=1; i<=1000000; i++) {
ParallelCollatz task = new ParallelCollatz();
task.setInputNum((long)(i+1000000));
tasks2.add(task);
}
long startTime2 = System.nanoTime();
for(int i=0; i<1000000; i++) {
executor2.execute(tasks2.get(i));
}
executor2.shutdown();
boolean temp =false;
try {
temp=executor2.awaitTermination(5, TimeUnit.HOURS);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("temp "+ temp);
long endTime2 = System.nanoTime();
long durationInNano2 = endTime2 - startTime2;
long durationInMillis2 = TimeUnit.NANOSECONDS.toMillis(durationInNano2); //Total execution time in nano seconds
System.out.println("laikas2 " +durationInMillis2);
For example running with one thread it completes in 3280ms. Running with two threads 3437ms. Should I be considering another concurrent structure for calculating each element?
EDIT
Clarrification. I'm not trying to parallelize individual sequences, but an interval of numbers when each number has it's sequence.(Which is not related to other numbers)
EDIT2
Today I ran the program on a good PC with 6 cores and 12 logical processors and the issue persists. Does anyone have an idea where the problem might be? I also updated my code. 4 threads do worse than 2 threads for some reason.(even worse than 1 thread). I also applied what was given in the answer, but no change.
Another Edit
What I have noticed that if I put a Thread.sleep(1) in my ParallelCollatz method then the performance gradually increases with the thread count. Perhaps this detail tells someone what is wrong? However no matter how many tasks I give if there is no Thread.Sleep(1) 2 threads perform fastest 1 thread is in 2nd place and others hang arround a similiar number of milliseconds but slower both than 1 and 2 threads.
New Edit
I also tried putting more tasks(for cycle for calculating not 1 but 10 or 100 Collatz sequences) in the run() method of the Runnable class so that the thread itself would do more work. Unfortunately, this did not help as well.
Perhaps I'm launching the tasks incorrectly? Anyone any ideas?
EDIT
So it would seem that after adding more tasks to the run method fixes it a bit, but for more threads the issue still remains 8+. I still wonder is the cause of this is that it takes more time to create and run the threads than to execute the task? Or should I create a new post with this question?
You are not waiting for your tasks to complete, only measuring the time it takes to submit them to the executor.
executor.shutdown() does not wait for all tasks get finished.You need to call executor.awaitTermination after that.
executor.shutdown();
executor.awaitTermination(5, TimeUnit.HOURS);
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html#shutdown()
Update
I believe that our testing methodology is flawed. I repeated your test on my machine, (1 processor, 2 cores, 4 logical processors) and the the time measured from run to run differed wildly.
I believe the following are main reasons:
JVM startup & JIT compilation time. At the beginning, the code is running in interpreted mode.
result of calculation is ignored. I have no intuition what is removed by the JIT and what we are actually measuring.
printlines in code
To test this, I converted your test to JMH.
In particular:
I converted the runnable to a callable, and I return the sum of results to prevent inlining (alternativaly, you can use BlackHole from JMH)
My tasks have no state, I moved all moving parts to local variables. No GC is needed to cleanup the tasks.
I still create executors in each round. This is not perfect, but I decided to keep it as is.
The results I received below are consistent with my expectations: one core is waiting in the main thread, the work is performed on a single core, the numbers are rougly the same.
Benchmark Mode Cnt Score Error Units
SpeedTest.multipleThreads avgt 20 559.996 ± 20.181 ms/op
SpeedTest.singleThread avgt 20 562.048 ± 16.418 ms/op
Updated code:
public class ParallelCollatz implements Callable<Long> {
private final long inputNumInit;
public ParallelCollatz(long inputNumInit) {
this.inputNumInit = inputNumInit;
}
#Override
public Long call() {
long result = 1;
long inputNum = inputNumInit;
//main recursive computation
while (inputNum > 1) {
if (inputNum % 2 == 0) {
inputNum = inputNum / 2;
} else {
inputNum = inputNum * 3 + 1;
}
++result;
}
return result;
}
}
and the benchmark itself:
#State(Scope.Benchmark)
public class SpeedTest {
private static final int NUM_TASKS = 1000000;
private static List<ParallelCollatz> tasks = buildTasks();
#Benchmark
#Fork(value = 1, warmups = 1)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#SuppressWarnings("unused")
public long singleThread() throws Exception {
ThreadPoolExecutor executorOneThread = (ThreadPoolExecutor) Executors.newFixedThreadPool(1);
return measureTasks(executorOneThread, tasks);
}
#Benchmark
#Fork(value = 1, warmups = 1)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#SuppressWarnings("unused")
public long multipleThreads() throws Exception {
ThreadPoolExecutor executorMultipleThread = (ThreadPoolExecutor) Executors.newFixedThreadPool(4);
return measureTasks(executorMultipleThread, tasks);
}
private static long measureTasks(ThreadPoolExecutor executor, List<ParallelCollatz> tasks) throws InterruptedException, ExecutionException {
long sum = runTasksInExecutor(executor, tasks);
return sum;
}
private static long runTasksInExecutor(ThreadPoolExecutor executor, List<ParallelCollatz> tasks) throws InterruptedException, ExecutionException {
List<Future<Long>> futures = new ArrayList<>(NUM_TASKS);
for (int i = 0; i < NUM_TASKS; i++) {
Future<Long> f = executor.submit(tasks.get(i));
futures.add(f);
}
executor.shutdown();
boolean tempFirst = false;
try {
tempFirst = executor.awaitTermination(5, TimeUnit.HOURS);
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
long sum = 0l;
for (Future<Long> f : futures) {
sum += f.get();
}
//System.out.println(sum);
return sum;
}
private static List<ParallelCollatz> buildTasks() {
List<ParallelCollatz> tasks = new ArrayList<>();
for (int i = 1; i <= NUM_TASKS; i++) {
ParallelCollatz task = new ParallelCollatz((long) (i + NUM_TASKS));
tasks.add(task);
}
return tasks;
}
}
Feel free to correct me if I am wrong!
The synchronized keyword in java makes a method unable to be run be different threads simultaneously. In my program I have 4 different threads that run on the same time counting to 100.000.
When adding the synchronized keyword to the method being performed, it should take four times the amount of time as it would multithreading?
Executing the programs either way, takes roughly 16 seconds.
Heres my code!
public class ExerciseThree {
public static void main(String[] args) {
Even even = new Even();
Thread t1 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
System.out.println(even.next());
}
});
Thread t2 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
System.out.println(even.next());
}
});
Thread t3 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
System.out.println(even.next());
}
});
Thread t4 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
System.out.println(even.next());
}
});
System.out.println("starting thread 1");
t1.start();
System.out.println("starting thread 2");
t2.start();
System.out.println("starting thread 3");
t3.start();
System.out.println("starting thread 4");
t4.start();
}
}
The method being called by the threads
public class Even {
private int n = 0;
// public synchronized int next() {
public int next() {
n++;
n++;
return n;
}
}
As already pointed out in the comment section, microbenchmarking is a complex matter as many factors influence the execution time (e.g., just-in-time compilation and garbage collection). A good reference was already provided in the comments section, but I suggest that you also take a look at my answer for a similar question which links to an external resource by Peter Sestoft that provides a very good introduction to microbenchmarking and what one needs to be aware of.
It has already been mentioned that println() has no place in a microbenchmark like this. In addition, I'd like to point out that you should use some sort of synchronization mechanism (e.g., a CountDownLatch) to make sure that the four threads start performing their work at the same time. The overhead involved in creating and starting the threads may result in the earlier threads getting a headstart on their work during the time it takes for the later ones to start, thereby creating less contention for the even lock than what you expect. This could for example look something like this:
public class ExerciseThree {
public static void main(String[] args) {
final CountDownLatch startSignal = new CountDownLatch(1);
final CountDownLatch threadReadyCheck = new CountDownLatch(4);
final CountDownLatch threadDoneCheck = new CountDownLatch(4);
Even even = new Even();
Thread t1 = new Thread(() -> {
threadReadyCheck.countDown();
startSignal.await();
for (int i = 0; i < 100000; i++) {
even.next();
}
threadDoneCheck.countDown();
});
Thread t2 = new Thread(() -> {
threadReadyCheck.countDown();
startSignal.await();
for (int i = 0; i < 100000; i++) {
even.next();
}
threadDoneCheck.countDown();
});
Thread t3 = new Thread(() -> {
threadReadyCheck.countDown();
startSignal.await();
for (int i = 0; i < 100000; i++) {
even.next();
}
threadDoneCheck.countDown();
});
Thread t4 = new Thread(() -> {
threadReadyCheck.countDown();
startSignal.await();
for (int i = 0; i < 100000; i++) {
even.next();
}
threadDoneCheck.countDown();
});
t1.start();
t2.start();
t3.start();
t4.start();
// Wait until all threads are ready to perform their work.
threadReadyCheck.await();
// All threads ready.
// This is where you log start time.
long start = System.nanoTime();
// Let threads progress to perform their actual work.
startSignal.countDown();
// Wait for threads to finish their work.
threadDoneCheck.await();
long end = System.nanoTime();
// Note that this is again subject to many factors, for example when the main thread gets scheduled again after the workers terminate.
long executionTime = end - start;
}
}
With println being much more expensive than the computation, it's all about concurrent execution of it. However, println itself is synchronized, so there can be no speed up.
Without it, doing just
public int next() {
n++;
n++;
return n;
}
is subject to many optimizations. Especially the double increment can be replaced by n+=2 and the return gets eliminated as the returned value doesn't get used. A loop like
for (int i = 0; i < 100000; i++) {
even.next());
}
can be reduced to just n += 200000.
Benchnmarking is hard in general and especially in Java. By all means, use JMH, which takes care of most problems.
I wrote the below code trying to run two threads for calling a function in a for loop, but the results have the same time as if I ran it sequentially without multiple threads. Any thoughts why the multithreading here is not working? Is there a better way to do it? Like for example if I wanted to have 10 threads, using my code this will mean I have to create 10 duplicate run() functions when creating the thread, I wonder if there is an easier way to set the number of threads? Also is it possible to create a number of threads depending on the loop counter so that each loop a thread is created to finish it so if I had 10 loops then 10 threads will run concurrently to finish the processing very fast?
private Thread t1 = new Thread(){
public void run(){
for (int i = 0; i < 2; i++)
{
try {
myfn(i);
} catch (IOException e) {
e.printStackTrace();
}
}
}
};
private Thread t2 = new Thread(){
public void run(){
for (int i = 2; i < 4; i++)
{
try {
myfn(i);
} catch (IOException e) {
e.printStackTrace();
}
}
}
};
public Results getResults() throws IOException, SocketTimeoutException {
t1.start();
t2.start();
try {
t1.join(0);
} catch (InterruptedException e) {
e.printStackTrace();
}
try {
t2.join(0);
} catch (InterruptedException e) {
e.printStackTrace();
}
For running the same task across multiple threads, you're probably looking for a thread pool. Java provides a ThreadPoolExecutor for this.
Here is an introduction to Java concurrency with the following example:
ExecutorService executor = Executors.newFixedThreadPool(1);
Future<Integer> future = executor.submit(() -> {
try {
TimeUnit.SECONDS.sleep(2);
return 123;
}
catch (InterruptedException e) {
throw new IllegalStateException("task interrupted", e);
}
});
future.get(1, TimeUnit.SECONDS);
That example specifically creates a pool with only a single thread, but the parameter to Executors.newFixedThreadPool controls how many threads will be used.
I'm not sure from your original question why you think two threads aren't being utilized.
public class MyThead extend Thread{
private int initValue = 0;
private int upperBound = 0;
public MyThread(int init, int ub){
this.initValue = init;
this.upperBound = ub;
}
public void run(){
for(int i = init; i < upperBound; i++){
myfn(i);
}
}
}
Create threads and start them:
List<Thread> threads = new ArrayList<>();
threads.add(new MyThread(0,2));
threads.add(new MyThread(2,4));
for(Thread t: threads){
t.start()
}
for(Thread t: threads){
t.join();
}
I wrote the below code trying to run two threads for calling a function in a for loop, but the results have the same time as if I ran it sequentially without multiple threads.
There are many reasons why that can happen although it's hard to know what is going on without seeing the myfn(...) code. Here are some possible reasons:
It could be that myfn runs so quickly that running it in different threads isn't going to be any faster.
It could be that myfn is waiting on some other resource in which case the threads can't really run concurrently.
It could be that myfn is blocking on IO (network or disk) and even though you are doing 2 (or more) of them at a time, the disk or the remote server can't handle the increased requests any faster.
Is there a better way to do it? Like for example if I wanted to have 10 threads, using my code this will mean I have to create 10 duplicate run() functions...
The right thing to do here is to create your own class which takes the lower and upper bounds. The right way to do this is to implement Runnable, not extend Thread. Something like:
public class MyRunnable implements Runnable {
private final int start;
private final int end;
public MyRunnable(int start, int end) {
this.start = start;
this.end = end;
}
public void run() {
for (int i = start; i < end; i++) {
myfn(i);
}
}
}
You can then either start the threads by hand or use an ExecutorService which makes the thread maintenance a lot easier:
// this will start a new thread for every job
ExecutorService threadPool = Executors.newCachedThreadPool();
threadPool.submit(new MyRunnable(0, 2));
threadPool.submit(new MyRunnable(2, 4));
// once you've submitted your last task, you shutdown the pool
threadPool.shutdown();
// then we wait until all of the tasks have run
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
You don't need to copy your threads / loop 10 times, just take the logic and use it appropriately.
public class ExampleThread extends Thread {
private final int start, iterations;
public ExampleThread(int start, int iterations) {
this.start = start;
this.iterations = iterations;
}
#Override public void run() {
for (int i = 0; i < iterations; i++) {
myfn(start + i);
}
}
}
int iterations = 2;
List<Thread> threads = new ArrayList<>();
for (int threadId = 0; threadId < 10; threadId++) {
threads.add(new ExampleThread(threadId * iterations, iterations));
}
threads.forEach(Thread::start);
threads.forEach(t -> {
try {
t.join(0);
} catch (Exception e) {
e.printStackTrace(System.err);
}
});
I have what probably is a basic question. When I create 100 million Hashtables it takes approximately 6 seconds (runtime = 6 seconds per core) on my machine if I do it on a single core. If I do this multi-threaded on 12 cores (my machine has 6 cores that allow hyperthreading) it takes around 10 seconds (runtime = 112 seconds per core).
This is the code I use:
Main
public class Tests
{
public static void main(String args[])
{
double start = System.currentTimeMillis();
int nThreads = 12;
double[] runTime = new double[nThreads];
TestsThread[] threads = new TestsThread[nThreads];
int totalJob = 100000000;
int jobsize = totalJob/nThreads;
for(int i = 0; i < threads.length; i++)
{
threads[i] = new TestsThread(jobsize,runTime, i);
threads[i].start();
}
waitThreads(threads);
for(int i = 0; i < runTime.length; i++)
{
System.out.println("Runtime thread:" + i + " = " + (runTime[i]/1000000) + "ms");
}
double end = System.currentTimeMillis();
System.out.println("Total runtime = " + (end-start) + " ms");
}
private static void waitThreads(TestsThread[] threads)
{
for(int i = 0; i < threads.length; i++)
{
while(threads[i].finished == false)//keep waiting untill the thread is done
{
//System.out.println("waiting on thread:" + i);
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
}
Thread
import java.util.HashMap;
import java.util.Map;
public class TestsThread extends Thread
{
int jobSize = 0;
double[] runTime;
boolean finished;
int threadNumber;
TestsThread(int job, double[] runTime, int threadNumber)
{
this.finished = false;
this.jobSize = job;
this.runTime = runTime;
this.threadNumber = threadNumber;
}
public void run()
{
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++)
{
double[] test = new double[65];
}
double end = System.nanoTime();
double difference = end-start;
runTime[threadNumber] += difference;
this.finished = true;
}
}
I do not understand why creating the object simultaneously in multiple threads takes longer per thread then doing it in serial in only 1 thread. If I remove the line where I create the Hashtable this problem disappears. If anyone could help me with this I would be greatly thankful.
Update: This problem has an associated bug report and has been fixed with Java 1.7u40. And it was never an issue for Java 1.8 as Java 8 has an entirely different hash table algorithm.
Since you are not using the created objects that operation will get optimized away. So you’re only measuring the overhead of creating threads. This is surely the more overhead the more threads you start.
I have to correct my answer regarding a detail, I didn’t know yet: there is something special with the classes Hashtable and HashMap. They both invoke sun.misc.Hashing.randomHashSeed(this) in the constructor. In other words, their instances escape during construction which has an impact on the memory visibility. This implies that their construction, unlike let’s say for an ArrayList, cannot optimized away, and multi-threaded construction slows down due to what happens inside that method (i.e. synchronization).
As said, that’s special to these classes and of course this implementation (my setup:1.7.0_13). For ordinary classes the construction time goes straight to zero for such code.
Here I add a more sophisticated benchmark code. Watch the difference between DO_HASH_MAP = true and DO_HASH_MAP = false (when false it will create an ArrayList instead which has no such special behavior).
import java.util.*;
import java.util.concurrent.*;
public class AllocBench {
static final int NUM_THREADS = 1;
static final int NUM_OBJECTS = 100000000 / NUM_THREADS;
static final boolean DO_HASH_MAP = true;
public static void main(String[] args) throws InterruptedException, ExecutionException {
ExecutorService threadPool = Executors.newFixedThreadPool(NUM_THREADS);
Callable<Long> task=new Callable<Long>() {
public Long call() {
return doAllocation(NUM_OBJECTS);
}
};
long startTime=System.nanoTime(), cpuTime=0;
for(Future<Long> f: threadPool.invokeAll(Collections.nCopies(NUM_THREADS, task))) {
cpuTime+=f.get();
}
long time=System.nanoTime()-startTime;
System.out.println("Number of threads: "+NUM_THREADS);
System.out.printf("entire allocation required %.03f s%n", time*1e-9);
System.out.printf("time x numThreads %.03f s%n", time*1e-9*NUM_THREADS);
System.out.printf("real accumulated cpu time %.03f s%n", cpuTime*1e-9);
threadPool.shutdown();
}
static long doAllocation(int numObjects) {
long t0=System.nanoTime();
for(int i=0; i<numObjects; i++)
if(DO_HASH_MAP) new HashMap<Object, Object>(); else new ArrayList<Object>();
return System.nanoTime()-t0;
}
}
What about if you do it on 6 cores? Hyperthreading isn't the exact same as having double the cores, so you might want to try the amount of real cores too.
Also the OS won't necessarily schedule each of your threads to their own cores.
Since all you are doing is measuring the time and churning memory, your bottleneck is likely to be in your L3 cache or bus to main memory. In this cases, coordinating the work between threads could be producing so much overhead it is worse instead of better.
This is too long for a comment but your inner loop can be just
double start = System.nanoTime();
for(int l = 0; l < jobSize ; l++){
Map<String,Integer> test = new HashMap<String,Integer>();
}
// runtime is an AtomicLong for thread safety
runtime.addAndGet(System.nanoTime() - start); // time in nano-seconds.
Taking the time can be as slow creating a HashMap so you might not be measuring what you think you if you call the timer too often.
BTW Hashtable is synchronized and you might find using HashMap is faster, and possibly more scalable.
I work on a high concurrency app. In the app code I try to avoid synchronization where possible. Recently, when comparing test performance of a unsynchronized and synchronized code versions, it turned out synchronized code performed three-four times faster than its unsynchronized counterpart.
After some experiments I came to this test code:
private static final Random RND = new Random();
private static final int NUM_OF_THREADS = 3;
private static final int NUM_OF_ITR = 3;
private static final int MONKEY_WORKLOAD = 50000;
static final AtomicInteger lock = new AtomicInteger();
private static void syncLockTest(boolean sync) {
System.out.println("syncLockTest, sync=" + sync);
final AtomicLong jobsDone = new AtomicLong();
final AtomicBoolean stop = new AtomicBoolean();
for (int i = 0; i < NUM_OF_THREADS; i++) {
Runnable runner;
if (sync) {
runner = new Runnable() {
#Override
public void run() {
while (!stop.get()){
jobsDone.incrementAndGet();
synchronized (lock) {
monkeyJob();
}
Thread.yield();
}
}
};
} else {
runner = new Runnable() {
#Override
public void run() {
while (!stop.get()){
jobsDone.incrementAndGet();
monkeyJob();
Thread.yield();
}
}
};
}
new Thread(runner).start();
}
long printTime = System.currentTimeMillis();
for (int i = 0; i < NUM_OF_ITR;) {
long now = System.currentTimeMillis();
if (now - printTime > 10 * 1000) {
printTime = now;
System.out.println("Jobs done\t" + jobsDone);
jobsDone.set(0);
i++;
}
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
stop.set(true);
}
private static double[] monkeyJob() {
double[] res = new double[MONKEY_WORKLOAD];
for (int i = 0; i < res.length; i++) {
res[i] = RND.nextDouble();
res[i] = 1./(1. + res[i]);
}
return res;
}
I played with the number of threads, workload, test iterations - each time synchronized code perfomed much faster than unsunchronized one.
Here are results for two different values of NUM_OF_THREADS
Number of threads:3 syncLockTest, sync=true Jobs
done 5951 Jobs done 5958 Jobs done 5878 syncLockTest,
sync=false Jobs done 1399 Jobs done 1397 Jobs
done 1391
Number of threads:5 syncLockTest, sync=true Jobs
done 5895 Jobs done 6464 Jobs done 5886 syncLockTest,
sync=false Jobs done 1179 Jobs done 1260 Jobs
done 1226
Test environment
Windows 7 Professional
Java Version 7.0
Here's a simillar case Synchronized code performs faster than unsynchronized one
Any ideas?
Random is a thread-safe class. you are most likely avoiding contention on calls into the Random class by synchronizing around the main job.
This is fascinating. I think #jtahlborn nailed it. If I move the Random and make it local to the thread, the times for the non-sync jump ~10x while the synchronized ones don't change:
Here are my times with a static Random RND:
syncLockTest, sync=true
Jobs done 8800
Jobs done 8839
Jobs done 8896
syncLockTest, sync=false
Jobs done 1401
Jobs done 1381
Jobs done 1423
Here are my times with a Random rnd local variable per thread:
syncLockTest, sync=true
Jobs done 8846
Jobs done 8861
Jobs done 8866
syncLockTest, sync=false
Jobs done 25956 << much faster
Jobs done 26065 << much faster
Jobs done 26021 << much faster
I also wondered if this was GC related but moving the double[] res to being a thread local did not help the speeds at all. Here's the code I used:
...
#Override
public void run() {
// made this be a thread local but it affected the times only slightly
double[] res = new double[MONKEY_WORKLOAD];
// turned rnd into a local variable instead of static
Random rnd = new Random();
while (!stop.get()) {
jobsDone.incrementAndGet();
if (sync) {
synchronized (lock) {
monkeyJob(res, rnd);
}
} else {
monkeyJob(res, rnd);
}
}
}
...
private static double[] monkeyJob(double[] res, Random rnd) {
for (int i = 0; i < res.length; i++) {
res[i] = rnd.nextDouble();
res[i] = 1. / (1. + res[i]);
}
return res;
}