I'm trying to increase the performance of an app by adding threads to do concurrent tasks. The results I've gotten are very confusing to me and make me think there is some kind of thread related overhead of which I am not aware. Below are two copies of the same code with the exception that one uses threads and the other doesn't. The one that doesn't use threads runs four times faster than the one that uses threads. I'm testing using my device which is a Samsung note 4 with a quad processor. Any insights will be highly welcome.
Thanks,
cwm
public void testThreads() throws InterruptedException {
startMilli = System.currentTimeMillis();
Thread t1 = new Thread() {
public void run() {
load1();
}
};
Thread t2 = new Thread() {
public void run() {
load2();
}
};
t1.start();
t2.start();
t1.join();
t2.join();
// load1();
// load2();
stopMilli = System.currentTimeMillis();
diffMilli = stopMilli - startMilli;
startMilli = System.currentTimeMillis();
}
public void load1() {
List<Integer> list1 = new ArrayList<Integer>();
for(i = 0; i<100000; i++) {
list1.add(i);
}
}
public void load2() {
List<Integer> list2 = new ArrayList<Integer>();
for(j = 100000; j<200000; j++){
list2.add(j);
}
}
public void testThreads() throws InterruptedException {
startMilli = System.currentTimeMillis();
load1();
load2();
stopMilli = System.currentTimeMillis();
diffMilli = stopMilli - startMilli;
startMilli = System.currentTimeMillis();
}
public void load1() {
List<Integer> list1 = new ArrayList<Integer>();
for(i = 0; i<100000; i++) {
list1.add(i);
}
}
public void load2() {
List<Integer> list2 = new ArrayList<Integer>();
for(j = 100000; j<200000; j++){
list2.add(j);
}
}
I'm not sure but I think OS memory managment is the problem in your case. In the first method you are adding 100000 elements simulteniously in each List. So both threads call resize operation at the same time which is probably locking memory while it finds new space for larger List, making other thread wait. In the second method 200000 elements are added sequentially, so no two simultenious resize operations occur and no memory is locked.
I ran tests for larger sets. Array was slightly faster than List implementation.
So here are the results with total 100000000 memory access operations
using array/arraylist
thread x4: 83 / 20000 ms // each thread got 25000000 size chunk
thread x2: 75 / 22000 ms // each thread got 50000000 size chunk
main thread: 146/ 25800 ms // main thread processed all elements in one arraylist
But when I fill arraylist twice with range/2 number of elements, time reduces from 25800 to 10100 ms.
int range = 100000000;
public static void main(String[] args) throws InterruptedException {
TEST m = new TEST();
m.testThreads1(); // m.testThreads();
}
public void testThreads() throws InterruptedException {
long startMilli = System.currentTimeMillis();
Thread t1 = new Thread() {
public void run() {
load1();
}
};
Thread t2 = new Thread() {
public void run() {
load1();
}
};
Thread t3 = new Thread() {...}
Thread t4 = new Thread() {...}
t1.start();
t2.start();
t3.start();
t4.start();
t1.join();
t2.join();
t3.join();
t4.join();
long stopMilli = System.currentTimeMillis();
System.out.println(stopMilli - startMilli);
}
public void load1() {
List<Integer> list1 = new ArrayList<Integer>();
// int[] arr = new int[range]; // change size according to thread #
for(int i = 0; i<range/2; i++) {
list1.add(i);
// arr[i]=i;
}
}
public void testThreads1() throws InterruptedException {
long startMilli = System.currentTimeMillis();
/* 2 load1 call for `range/2` elements performes better than 1 call for `range` elements */
load1();
load1();
long stopMilli = System.currentTimeMillis();
System.out.println(stopMilli - startMilli);
}
General Guidelines:
It can be quite fiddly to increase application performance by use of threads; too few and you're not doing as well as you might, too many and the overhead of threads erodes the benefits gain from your program's concurrency.
To get it just right you have to setup as many threads as there are cores, and make sure you're program breaks down nicely to that many threads. That can be tricky to get right if you're allowing for the program being run on a wide range of hardware.
Thread pools were invented to help here. Broadly speaking, they have just the right number of threads for the hardware the program happens to be running on, and they let you submit lots of small concurrent tasks for execution without the overhead of setting up a new thread for each one. Thus your program runs well on a wide range of different hardware without having worry too much about it.
Related
Feel free to correct me if I am wrong!
The synchronized keyword in java makes a method unable to be run be different threads simultaneously. In my program I have 4 different threads that run on the same time counting to 100.000.
When adding the synchronized keyword to the method being performed, it should take four times the amount of time as it would multithreading?
Executing the programs either way, takes roughly 16 seconds.
Heres my code!
public class ExerciseThree {
public static void main(String[] args) {
Even even = new Even();
Thread t1 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
System.out.println(even.next());
}
});
Thread t2 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
System.out.println(even.next());
}
});
Thread t3 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
System.out.println(even.next());
}
});
Thread t4 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
System.out.println(even.next());
}
});
System.out.println("starting thread 1");
t1.start();
System.out.println("starting thread 2");
t2.start();
System.out.println("starting thread 3");
t3.start();
System.out.println("starting thread 4");
t4.start();
}
}
The method being called by the threads
public class Even {
private int n = 0;
// public synchronized int next() {
public int next() {
n++;
n++;
return n;
}
}
As already pointed out in the comment section, microbenchmarking is a complex matter as many factors influence the execution time (e.g., just-in-time compilation and garbage collection). A good reference was already provided in the comments section, but I suggest that you also take a look at my answer for a similar question which links to an external resource by Peter Sestoft that provides a very good introduction to microbenchmarking and what one needs to be aware of.
It has already been mentioned that println() has no place in a microbenchmark like this. In addition, I'd like to point out that you should use some sort of synchronization mechanism (e.g., a CountDownLatch) to make sure that the four threads start performing their work at the same time. The overhead involved in creating and starting the threads may result in the earlier threads getting a headstart on their work during the time it takes for the later ones to start, thereby creating less contention for the even lock than what you expect. This could for example look something like this:
public class ExerciseThree {
public static void main(String[] args) {
final CountDownLatch startSignal = new CountDownLatch(1);
final CountDownLatch threadReadyCheck = new CountDownLatch(4);
final CountDownLatch threadDoneCheck = new CountDownLatch(4);
Even even = new Even();
Thread t1 = new Thread(() -> {
threadReadyCheck.countDown();
startSignal.await();
for (int i = 0; i < 100000; i++) {
even.next();
}
threadDoneCheck.countDown();
});
Thread t2 = new Thread(() -> {
threadReadyCheck.countDown();
startSignal.await();
for (int i = 0; i < 100000; i++) {
even.next();
}
threadDoneCheck.countDown();
});
Thread t3 = new Thread(() -> {
threadReadyCheck.countDown();
startSignal.await();
for (int i = 0; i < 100000; i++) {
even.next();
}
threadDoneCheck.countDown();
});
Thread t4 = new Thread(() -> {
threadReadyCheck.countDown();
startSignal.await();
for (int i = 0; i < 100000; i++) {
even.next();
}
threadDoneCheck.countDown();
});
t1.start();
t2.start();
t3.start();
t4.start();
// Wait until all threads are ready to perform their work.
threadReadyCheck.await();
// All threads ready.
// This is where you log start time.
long start = System.nanoTime();
// Let threads progress to perform their actual work.
startSignal.countDown();
// Wait for threads to finish their work.
threadDoneCheck.await();
long end = System.nanoTime();
// Note that this is again subject to many factors, for example when the main thread gets scheduled again after the workers terminate.
long executionTime = end - start;
}
}
With println being much more expensive than the computation, it's all about concurrent execution of it. However, println itself is synchronized, so there can be no speed up.
Without it, doing just
public int next() {
n++;
n++;
return n;
}
is subject to many optimizations. Especially the double increment can be replaced by n+=2 and the return gets eliminated as the returned value doesn't get used. A loop like
for (int i = 0; i < 100000; i++) {
even.next());
}
can be reduced to just n += 200000.
Benchnmarking is hard in general and especially in Java. By all means, use JMH, which takes care of most problems.
I work in the multithreading problem where 2 threads are started from the main. The code is provided below,
package com.multi;
public class App {
private int count = 0;
public void doWork() {
Thread thread1 = new Thread(new Runnable() {
public void run() {
for (int i = 0; i < 10000; i++) {
count++;
}
}
});
Thread thread2 = new Thread(new Runnable() {
public void run() {
for (int i = 0; i < 10000; i++) {
count++;
}
}
});
thread1.start();
thread2.start();
try {
thread1.join();
thread2.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Count is: " + count);
}
public static void main(String[] args) {
App worker = new App();
worker.doWork();
}
}
In the book, it informs that there is a possibility that the count value can be printed less than 20000 in some cases. They provided some explanation but even after reading for few times, I was unable to comprehend that completely. Like there is a try block that join the threads and that meant to ensure to complete both for loops.
a. In which circumstances, the count can be printed less than the 20000 and why both of the threads won't increase the count value?
b. If I wrote like
private volatile int count = 0;
private AtomicInteger count = 0;
will these essentially solve the issue?
Consider this sequence
count is 1
thread1 reads count 1 into local var x
thread2 reads count 1 into local var y
thread1 increments x to 2
thread1 writes x value 2 to count
thread2 increments y to 2
thread2 writes the y value 2 to count
When you do count++, it is a read from the field count, an addition of 1 to the value, and then a write of the result back to the field count, so my example sequence is essentially what can happen in your code.
In my example sequence, even though the field was incremented twice, the count is just 2, and not 3.
This happens because both threads are reading and writing from the same field at the same time.
I wrote the below code trying to run two threads for calling a function in a for loop, but the results have the same time as if I ran it sequentially without multiple threads. Any thoughts why the multithreading here is not working? Is there a better way to do it? Like for example if I wanted to have 10 threads, using my code this will mean I have to create 10 duplicate run() functions when creating the thread, I wonder if there is an easier way to set the number of threads? Also is it possible to create a number of threads depending on the loop counter so that each loop a thread is created to finish it so if I had 10 loops then 10 threads will run concurrently to finish the processing very fast?
private Thread t1 = new Thread(){
public void run(){
for (int i = 0; i < 2; i++)
{
try {
myfn(i);
} catch (IOException e) {
e.printStackTrace();
}
}
}
};
private Thread t2 = new Thread(){
public void run(){
for (int i = 2; i < 4; i++)
{
try {
myfn(i);
} catch (IOException e) {
e.printStackTrace();
}
}
}
};
public Results getResults() throws IOException, SocketTimeoutException {
t1.start();
t2.start();
try {
t1.join(0);
} catch (InterruptedException e) {
e.printStackTrace();
}
try {
t2.join(0);
} catch (InterruptedException e) {
e.printStackTrace();
}
For running the same task across multiple threads, you're probably looking for a thread pool. Java provides a ThreadPoolExecutor for this.
Here is an introduction to Java concurrency with the following example:
ExecutorService executor = Executors.newFixedThreadPool(1);
Future<Integer> future = executor.submit(() -> {
try {
TimeUnit.SECONDS.sleep(2);
return 123;
}
catch (InterruptedException e) {
throw new IllegalStateException("task interrupted", e);
}
});
future.get(1, TimeUnit.SECONDS);
That example specifically creates a pool with only a single thread, but the parameter to Executors.newFixedThreadPool controls how many threads will be used.
I'm not sure from your original question why you think two threads aren't being utilized.
public class MyThead extend Thread{
private int initValue = 0;
private int upperBound = 0;
public MyThread(int init, int ub){
this.initValue = init;
this.upperBound = ub;
}
public void run(){
for(int i = init; i < upperBound; i++){
myfn(i);
}
}
}
Create threads and start them:
List<Thread> threads = new ArrayList<>();
threads.add(new MyThread(0,2));
threads.add(new MyThread(2,4));
for(Thread t: threads){
t.start()
}
for(Thread t: threads){
t.join();
}
I wrote the below code trying to run two threads for calling a function in a for loop, but the results have the same time as if I ran it sequentially without multiple threads.
There are many reasons why that can happen although it's hard to know what is going on without seeing the myfn(...) code. Here are some possible reasons:
It could be that myfn runs so quickly that running it in different threads isn't going to be any faster.
It could be that myfn is waiting on some other resource in which case the threads can't really run concurrently.
It could be that myfn is blocking on IO (network or disk) and even though you are doing 2 (or more) of them at a time, the disk or the remote server can't handle the increased requests any faster.
Is there a better way to do it? Like for example if I wanted to have 10 threads, using my code this will mean I have to create 10 duplicate run() functions...
The right thing to do here is to create your own class which takes the lower and upper bounds. The right way to do this is to implement Runnable, not extend Thread. Something like:
public class MyRunnable implements Runnable {
private final int start;
private final int end;
public MyRunnable(int start, int end) {
this.start = start;
this.end = end;
}
public void run() {
for (int i = start; i < end; i++) {
myfn(i);
}
}
}
You can then either start the threads by hand or use an ExecutorService which makes the thread maintenance a lot easier:
// this will start a new thread for every job
ExecutorService threadPool = Executors.newCachedThreadPool();
threadPool.submit(new MyRunnable(0, 2));
threadPool.submit(new MyRunnable(2, 4));
// once you've submitted your last task, you shutdown the pool
threadPool.shutdown();
// then we wait until all of the tasks have run
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
You don't need to copy your threads / loop 10 times, just take the logic and use it appropriately.
public class ExampleThread extends Thread {
private final int start, iterations;
public ExampleThread(int start, int iterations) {
this.start = start;
this.iterations = iterations;
}
#Override public void run() {
for (int i = 0; i < iterations; i++) {
myfn(start + i);
}
}
}
int iterations = 2;
List<Thread> threads = new ArrayList<>();
for (int threadId = 0; threadId < 10; threadId++) {
threads.add(new ExampleThread(threadId * iterations, iterations));
}
threads.forEach(Thread::start);
threads.forEach(t -> {
try {
t.join(0);
} catch (Exception e) {
e.printStackTrace(System.err);
}
});
We want to process List exList (which has a variable size) in parallel.
How could we make this to work with different sizes of exList and minimum one core and max 4 cores ?
The given code assumes that exList.size > 40. (if size is < 40 se simply use one thread).
But all of that is hard coded. So - how can this code be enhanced to make parallel runs "dynamically"; dependent on the size of our list?
int threads = Runtime.getRuntime().availableProcessors();
final int start = exList.size() / threads;
try {
Thread t1 = new Thread(new Runnable() {
public void run()
{
for(int i =0; i < start;i++){
System.out.println(exList.get(i));
}
}});
t1.start();
Thread t2 = new Thread(new Runnable() {
public void run()
{
for(int i =start; i < start * 2;i++){
System.out.println(exList.get(i));
}
}});
t2.start();
Thread t3 = new Thread(new Runnable() {
public void run()
{
for(int i = start *2; i < start * 3;i++){
System.out.println(exList.get(i));
}
}});
t3.start();
Thread t4 = new Thread(new Runnable() {
public void run()
{
for(int i =start * 3 ; i < exList.size();i++){
System.out.println(exList.get(i));
}
}});
t4.start();
}catch (Exception e){
}
You are already computing the number of threads that might be good to use.
int threads = Runtime.getRuntime().availableProcessors();
But you are simply drawing the wrong conclusion from that! The idea of computing that start value only adds confusion; it doesn't give you anything meaningful. Instead, simply go for:
int listSize = exList.size();
for (int shardNumber = 0; shardNumber < threads; shardNumber++) {
new Thread(new Runnable() {
public void run() {
for(int listIndex = shardNumber*listSize; listIndex < (shardNumber+1)*listSize; listIndex++) {
System.out.println(exList.get(listIndex));
}
}}).start();
}
In other words: you simply slice your exList into thread "shards". And then you create one thread to process such a shard/slice.
Please note: the above isn't tested. It is meant as idea to get you going! You want to carefully check my math to ensure that the inner loop is really fetching the correct elements!
And hint: avoid creating threads and starting threads on that low level. You better create an ExecutorService and submit runnables. Use abstractions, not "low level" stuff.
exList is a list of strings
final int threads = Runtime.getRuntime().availableProcessors();
final int listSize = exList.size()/threads + 1;
Thread[] t = new Thread[threads];
for (int i = 0; i < threads; i++) {
final int finalshardNumber = i;
final int finalI = i;
t[i] = new Thread() {
public void run() {
for(int listIndex = finalshardNumber * listSize; listIndex < ( finalshardNumber + 1) *listSize; listIndex++) {
try {
//thread // index of exList //string from exList
System.out.println( finalI +" "+ listIndex +" "+ exList.get(listIndex));
}catch (Exception e){
}
}
}}; t[i].start();
}
In Java, I have simple multithreaded code:
public class ThreadedAlgo {
public static final int threadsCount = 3;
public static void main(String[] args) {
// start timer prior computation
time = System.currentTimeMillis();
// create threads
Thread[] threads = new Thread[threadsCount];
class ToDo implements Runnable {
public void run() { ... }
}
// create job objects
for (int i = 0; i < threadsCount; i++) {
ToDo job = new ToDo();
threads[i] = new Thread(job);
}
// start threads
for (int i = 0; i < threadsCount; i++) {
threads[i].start();
}
// wait for threads above to finish
for (int i = 0; i < threadsCount; i++) {
try {
threads[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
// display time after computation
System.out.println("Execution time: " + (System.currentTimeMillis() - time));
}
}
It works fine, now I want to run it for 2 or 3 threads and compute the time spent for computation of each thread. Then I will compare times: note them by t1 and t2, and if |t1 - t2| < small epsilon, I will say that my algorithm performs with fine granularity under some given conditions, that is the time spent by threads is relatively the same.
How can I measure the time of a thread?
Use System.nanoTime() at the beginning and end of the thread (job) methods to calculate the total time spent in each invocation. In your case, all threads will be executed with the same (default) priority, where time slices should be distributed pretty fair. If your threads are interlocked, use 'fair locks' for the same reason; e.g. new ReentrantLock(true);
Add the timing logic inside your Run methods