Adding more threads to executorservice only makes it slower

Adding more threads to executorservice only makes it slower - java

I have this code, where I have my own homemade array class, that I want to use to test the speed of some different concurrency tools in java
public class LongArrayListUnsafe {
private static final ExecutorService executor
= Executors.newFixedThreadPool(1);
public static void main(String[] args) {
LongArrayList dal1 = new LongArrayList();
int n = 100_000_000;
Timer t = new Timer();
List<Callable<Void>> tasks = new ArrayList<>();
tasks.add(() -> {
for (int i = 0; i <= n; i+=2){
dal1.add(i);
}
return null;
});
tasks.add(() -> {
for (int i = 0; i < n; i++){
dal1.set(i, i + 1);
}
return null;});
tasks.add(() -> {
for (int i = 0; i < n; i++) {
dal1.get(i);
}
return null;});
tasks.add(() -> {
for (int i = n; i < n * 2; i++) {
dal1.add(i + 1);
}
return null;});
try {
executor.invokeAll(tasks);
} catch (InterruptedException exn) {
System.out.println("Interrupted: " + exn);
}
executor.shutdown();
try {
executor.awaitTermination(1000, TimeUnit.MILLISECONDS);
} catch (Exception e){
System.out.println("what?");
}
System.out.println("Using toString(): " + t.check() + " ms");
}
}
class LongArrayList {
// Invariant: 0 <= size <= items.length
private long[] items;
private int size;
public LongArrayList() {
reset();
}
public static LongArrayList withElements(long... initialValues){
LongArrayList list = new LongArrayList();
for (long l : initialValues) list.add( l );
return list;
}
public void reset(){
items = new long[2];
size = 0;
}
// Number of items in the double list
public int size() {
return size;
}
// Return item number i
public long get(int i) {
if (0 <= i && i < size)
return items[i];
else
throw new IndexOutOfBoundsException(String.valueOf(i));
}
// Replace item number i, if any, with x
public long set(int i, long x) {
if (0 <= i && i < size) {
long old = items[i];
items[i] = x;
return old;
} else
throw new IndexOutOfBoundsException(String.valueOf(i));
}
// Add item x to end of list
public LongArrayList add(long x) {
if (size == items.length) {
long[] newItems = new long[items.length * 2];
for (int i=0; i<items.length; i++)
newItems[i] = items[i];
items = newItems;
}
items[size] = x;
size++;
return this;
}
public String toString() {
return Arrays.stream(items, 0,size)
.mapToObj( Long::toString )
.collect(Collectors.joining(", ", "[", "]"));
}
}
public class Timer {
private long start, spent = 0;
public Timer() { play(); }
public double check() { return (System.nanoTime()-start+spent)/1e9; }
public void pause() { spent += System.nanoTime()-start; }
public void play() { start = System.nanoTime(); }
}
The implementation of a LongArrayList class is not so important,it's not threadsafe.
The drivercode with the executorservice performs a bunch of operations on the arraylist, and has 4 different tasks doing it, each 100_000_000 times.
The problem is that when I give the threadpool more threads "Executors.newFixedThreadPool(2);" it only becomes slower.
For example, for one thread, a typical timing is 1.0366974 ms, but if I run it with 3 threads, the time ramps up to 5.7932714 ms.
What is going on? why is more threads so much slower?
EDIT:
To boil the issue down, I made this much simpler drivercode, that has four tasks that simply add elements:
ExecutorService executor
= Executors.newFixedThreadPool(2);
LongArrayList dal1 = new LongArrayList();
int n = 100_000_00;
Timer t = new Timer();
for (int i = 0; i < 4 ; i++){
executor.execute(new Runnable() {
#Override
public void run() {
for (int j = 0; j < n ; j++)
dal1.add(j);
}
});
}
executor.shutdown();
try {
executor.awaitTermination(1000, TimeUnit.MILLISECONDS);
} catch (Exception e){
System.out.println("what?");
}
System.out.println("Using toString(): " + t.check() + " ms");
Here it still does not seem to matter how many threads i allocate, there is no speedup at all, could this simply be because of overhead?

There are some problems with your code that make it hard to reason why with more threads the time increases.
btw
public double check() { return (System.nanoTime()-start+spent)/1e9; }
gives you back seconds not milliseconds, so change this:
System.out.println("Using toString(): " + t.check() + " ms");
to
System.out.println("Using toString(): " + t.check() + "s");
First problem:
LongArrayList dal1 = new LongArrayList();
dal1 is shared among all threads, and those threads are updating that shared variable without any mutual exclusion around it, consequently, leading to race conditions. Moreover, this can also lead to cache invalidation, which can increase your overall execution time.
The other thing is that you may have load balancing problems. You have 4 parallel tasks, but clearly the last one
tasks.add(() -> {
for (int i = n; i < n * 2; i++) {
dal1.add(i + 1);
}
return null;});
is the most computing-intensive task. Even if the 4 tasks run in parallel, without the problems that I have mention (i.e., lack of synchronization around the shared data), the last task will dictate the overall execution time.
Not to mention that parallelism does not come for free, it adds overhead (e.g., scheduling the parallel work and so on), which might be high enough that makes it not worth to parallelize the code in the first place. In your code, there is at least the overhead of waiting for the tasks to be completed, and also the overhead of shutting down the pool of executors.
Another possibility that would also explain why you are not getting ArrayIndexOutOfBoundsException all over the place is that the first 3 tasks are so small that they are being executed by the same thread. This would also again make your overall execution time very dependent on the last task, the on the overhead of executor.shutdown(); and executor.awaitTermination. However, even if that is the case, the order of execution of tasks, and which threads will execute then, is typically non-deterministic, and consequently, is not something that your application should rely upon. Funny enough, when I changed your code to immediately execute the tasks (i.e., executor.execute) I got ArrayIndexOutOfBoundsException all over the place.

Related

More than 2 threads working slower than 1 or 2 threads unless Thread.sleep(1) is put in the run() method of a thread

The task I'm trying to implement is finding Collatz sequence for numbers in a set interval using several threads and seeing how much improvement is gained compared to one thread.
However one thread is always faster no matter if it I choose 2 threads(edit. 2 threads are faster, but not by much while 4 threads is slower than 1 thread and I have no idea why.(I could even say that the more threads the slower it gets). I hope someone can explain. Maybe I'm doing something wrong.
Below is my code that I wrote so far. I'm using ThreadPoolExecutor for executing the tasks(one task = one Collatz sequence for one number in the interval).
The Collatz class:
public class ParallelCollatz implements Runnable {
private long result;
private long inputNum;
public long getResult() {
return result;
}
public void setResult(long result) {
this.result = result;
}
public long getInputNum() {
return inputNum;
}
public void setInputNum(long inputNum) {
this.inputNum = inputNum;
}
public void run() {
//System.out.println("number:" + inputNum);
//System.out.println("Thread:" + Thread.currentThread().getId());
//int j=0;
//if(Thread.currentThread().getId()==11) {
// ++j;
// System.out.println(j);
//}
long result = 1;
//main recursive computation
while (inputNum > 1) {
if (inputNum % 2 == 0) {
inputNum = inputNum / 2;
} else {
inputNum = inputNum * 3 + 1;
}
++result;
}
// try {
//Thread.sleep(10);
//} catch (InterruptedException e) {
// TODO Auto-generated catch block
// e.printStackTrace();
//}
this.result=result;
return;
}
}
And the main class where I run the threads(yes for now I create two lists with the same numbers since after running with one thread the initial values are lost):
ThreadPoolExecutor executor = (ThreadPoolExecutor)Executors.newFixedThreadPool(1);
ThreadPoolExecutor executor2 = (ThreadPoolExecutor)Executors.newFixedThreadPool(4);
List<ParallelCollatz> tasks = new ArrayList<ParallelCollatz>();
for(int i=1; i<=1000000; i++) {
ParallelCollatz task = new ParallelCollatz();
task.setInputNum((long)(i+1000000));
tasks.add(task);
}
long startTime = System.nanoTime();
for(int i=0; i<1000000; i++) {
executor.execute(tasks.get(i));
}
executor.shutdown();
boolean tempFirst=false;
try {
tempFirst =executor.awaitTermination(5, TimeUnit.HOURS);
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
System.out.println("tempFirst " + tempFirst);
long endTime = System.nanoTime();
long durationInNano = endTime - startTime;
long durationInMillis = TimeUnit.NANOSECONDS.toMillis(durationInNano); //Total execution time in nano seconds
System.out.println("laikas " +durationInMillis);
List<ParallelCollatz> tasks2 = new ArrayList<ParallelCollatz>();
for(int i=1; i<=1000000; i++) {
ParallelCollatz task = new ParallelCollatz();
task.setInputNum((long)(i+1000000));
tasks2.add(task);
}
long startTime2 = System.nanoTime();
for(int i=0; i<1000000; i++) {
executor2.execute(tasks2.get(i));
}
executor2.shutdown();
boolean temp =false;
try {
temp=executor2.awaitTermination(5, TimeUnit.HOURS);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("temp "+ temp);
long endTime2 = System.nanoTime();
long durationInNano2 = endTime2 - startTime2;
long durationInMillis2 = TimeUnit.NANOSECONDS.toMillis(durationInNano2); //Total execution time in nano seconds
System.out.println("laikas2 " +durationInMillis2);
For example running with one thread it completes in 3280ms. Running with two threads 3437ms. Should I be considering another concurrent structure for calculating each element?
EDIT
Clarrification. I'm not trying to parallelize individual sequences, but an interval of numbers when each number has it's sequence.(Which is not related to other numbers)
EDIT2
Today I ran the program on a good PC with 6 cores and 12 logical processors and the issue persists. Does anyone have an idea where the problem might be? I also updated my code. 4 threads do worse than 2 threads for some reason.(even worse than 1 thread). I also applied what was given in the answer, but no change.
Another Edit
What I have noticed that if I put a Thread.sleep(1) in my ParallelCollatz method then the performance gradually increases with the thread count. Perhaps this detail tells someone what is wrong? However no matter how many tasks I give if there is no Thread.Sleep(1) 2 threads perform fastest 1 thread is in 2nd place and others hang arround a similiar number of milliseconds but slower both than 1 and 2 threads.
New Edit
I also tried putting more tasks(for cycle for calculating not 1 but 10 or 100 Collatz sequences) in the run() method of the Runnable class so that the thread itself would do more work. Unfortunately, this did not help as well.
Perhaps I'm launching the tasks incorrectly? Anyone any ideas?
EDIT
So it would seem that after adding more tasks to the run method fixes it a bit, but for more threads the issue still remains 8+. I still wonder is the cause of this is that it takes more time to create and run the threads than to execute the task? Or should I create a new post with this question?

You are not waiting for your tasks to complete, only measuring the time it takes to submit them to the executor.
executor.shutdown() does not wait for all tasks get finished.You need to call executor.awaitTermination after that.
executor.shutdown();
executor.awaitTermination(5, TimeUnit.HOURS);
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html#shutdown()
Update
I believe that our testing methodology is flawed. I repeated your test on my machine, (1 processor, 2 cores, 4 logical processors) and the the time measured from run to run differed wildly.
I believe the following are main reasons:
JVM startup & JIT compilation time. At the beginning, the code is running in interpreted mode.
result of calculation is ignored. I have no intuition what is removed by the JIT and what we are actually measuring.
printlines in code
To test this, I converted your test to JMH.
In particular:
I converted the runnable to a callable, and I return the sum of results to prevent inlining (alternativaly, you can use BlackHole from JMH)
My tasks have no state, I moved all moving parts to local variables. No GC is needed to cleanup the tasks.
I still create executors in each round. This is not perfect, but I decided to keep it as is.
The results I received below are consistent with my expectations: one core is waiting in the main thread, the work is performed on a single core, the numbers are rougly the same.
Benchmark Mode Cnt Score Error Units
SpeedTest.multipleThreads avgt 20 559.996 ± 20.181 ms/op
SpeedTest.singleThread avgt 20 562.048 ± 16.418 ms/op
Updated code:
public class ParallelCollatz implements Callable<Long> {
private final long inputNumInit;
public ParallelCollatz(long inputNumInit) {
this.inputNumInit = inputNumInit;
}
#Override
public Long call() {
long result = 1;
long inputNum = inputNumInit;
//main recursive computation
while (inputNum > 1) {
if (inputNum % 2 == 0) {
inputNum = inputNum / 2;
} else {
inputNum = inputNum * 3 + 1;
}
++result;
}
return result;
}
}
and the benchmark itself:
#State(Scope.Benchmark)
public class SpeedTest {
private static final int NUM_TASKS = 1000000;
private static List<ParallelCollatz> tasks = buildTasks();
#Benchmark
#Fork(value = 1, warmups = 1)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#SuppressWarnings("unused")
public long singleThread() throws Exception {
ThreadPoolExecutor executorOneThread = (ThreadPoolExecutor) Executors.newFixedThreadPool(1);
return measureTasks(executorOneThread, tasks);
}
#Benchmark
#Fork(value = 1, warmups = 1)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#SuppressWarnings("unused")
public long multipleThreads() throws Exception {
ThreadPoolExecutor executorMultipleThread = (ThreadPoolExecutor) Executors.newFixedThreadPool(4);
return measureTasks(executorMultipleThread, tasks);
}
private static long measureTasks(ThreadPoolExecutor executor, List<ParallelCollatz> tasks) throws InterruptedException, ExecutionException {
long sum = runTasksInExecutor(executor, tasks);
return sum;
}
private static long runTasksInExecutor(ThreadPoolExecutor executor, List<ParallelCollatz> tasks) throws InterruptedException, ExecutionException {
List<Future<Long>> futures = new ArrayList<>(NUM_TASKS);
for (int i = 0; i < NUM_TASKS; i++) {
Future<Long> f = executor.submit(tasks.get(i));
futures.add(f);
}
executor.shutdown();
boolean tempFirst = false;
try {
tempFirst = executor.awaitTermination(5, TimeUnit.HOURS);
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
long sum = 0l;
for (Future<Long> f : futures) {
sum += f.get();
}
//System.out.println(sum);
return sum;
}
private static List<ParallelCollatz> buildTasks() {
List<ParallelCollatz> tasks = new ArrayList<>();
for (int i = 1; i <= NUM_TASKS; i++) {
ParallelCollatz task = new ParallelCollatz((long) (i + NUM_TASKS));
tasks.add(task);
}
return tasks;
}
}

Why iteration of list taking more time if java 8 stream feature use?

public static void main(String[] args) {
List<String> data = new ArrayList<>();
for (int i = 0; i < 10000000; i++) {
data.add("data" + i);
}
System.out.println("parallel stream start time" + System.currentTimeMillis());
data.parallelStream().forEach(x -> {
System.out.println("data -->" + x);
});
System.out.println("parallel stream end time" + System.currentTimeMillis());
System.out.println("simple stream start time" + System.currentTimeMillis());
data.stream().forEach(x -> {
System.out.println("data -->" + x);
});
System.out.println("simple stream end time" + System.currentTimeMillis());
System.out.println("normal foreach start time" + System.currentTimeMillis());
for (int i = 0; i < data.size(); i++) {
System.out.println("data -->" + data.get(i));
}
System.out.println("normal foreach end time" + System.currentTimeMillis());
}
Output
parallel stream start time 1501944014854
parallel stream end time 1501944014970
simple stream start time 1501944014970
simple stream end time 1501944015036
normal foreach start time 1501944015036
normal foreach end time 1501944015040
Total time taken
Simple stream -> 66
Parellem stream -> 116
simple foreach -> 4
In many blogs written that parallelStream is executing by parallel by internally managed distributed task among thread and collect automatically..
But as per above experiment it is clearly notice that Parallel Stream taking more time then simple stream and normal foreach.
Why it is taking more time if it is executed parallel? Is it good to use in project as this feature is downgrading performance?
Thanks in Advance

Your tests are based on I/O operations (the most expensive operation)
If you want to use parallel streams you have to take the thread creation time overhead into account. So only if your operation benefits from that then use it (that is the case for heavy operations). If not, then just use normal streams or a regular for-loop.
Basic rules for measurement:
Don't use I/O operation.
Repeat the same test more then just once.
So if we have to re-formulate the test scenarios again, then we probably have a test helper class defined as follows:
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;
public class Benchmark {
public static <T> T performTest(Callable<T> callable, int iteration, String name) throws Exception {
Map<String, Iteraion> map = new HashMap<>();
T last = null;
for (int i = 0; i < iteration; i++) {
long s = System.nanoTime();
T temp = callable.call();
long f = System.nanoTime();
map.put(UUID.randomUUID().toString(), new Iteraion(s, f));
if (i == iteration - 1) {
last = temp;
}
}
System.out.print("TEST :\t" + name + "\t\t\t");
System.out.print("ITERATION: " + map.size());
long sum = 0l;
for (String i : map.keySet()) {
sum += (map.get(i).finish - map.get(i).start);
}
long avg = (sum / map.size()) / 1000000;
System.out.println("\t\t\tAVERAGE: " + avg + " ms");
return last;
}
public interface Callable<T> {
T call() throws Exception;
}
static class Iteraion {
Long start;
Long finish;
public Iteraion(Long s, Long f) {
start = s;
finish = f;
}
}
}
Now we can perform the same test more then once using different operation. The following code shows test performed using two different scenarios.
import java.util.ArrayList;
import java.util.List;
import static java.lang.Math.*;
#SuppressWarnings("unused")
public class Test {
public static void main(String[] args) {
try {
final int iteration = 100;
final List<String> data = new ArrayList<>();
for (int i = 0; i < 10000000; i++) {
data.add("data" + i);
}
/**
* Scenario 1
*/
Benchmark.performTest(new Callable<Void>() {
#Override
public Void call() throws Exception {
data.parallelStream().forEach(x -> {
x.trim();
});
return (Void) null;
}
}, iteration, "PARALEL_STREAM_ASSIGN_VAL");
Benchmark.performTest(new Callable<Void>() {
#Override
public Void call() throws Exception {
data.stream().forEach(x -> {
x.trim();
});
return (Void) null;
}
}, iteration, "NORMAL_STREAM_ASSIGN_VAL");
Benchmark.performTest(new Callable<Void>() {
#Override
public Void call() throws Exception {
for (int i = 0; i < data.size(); i++) {
data.get(i).trim();
}
return (Void) null;
}
}, iteration, "NORMAL_FOREACH_ASSIGN_VAL");
/**
* Scenario 2
*/
Benchmark.performTest(new Callable<Void>() {
#Override
public Void call() throws Exception {
data.parallelStream().forEach(x -> {
Integer i = Integer.parseInt(x.substring(4, x.length()));
double d = tan(atan(tan(atan(i))));
});
return (Void) null;
}
}, iteration, "PARALEL_STREAM_COMPUTATION");
Benchmark.performTest(new Callable<Void>() {
#Override
public Void call() throws Exception {
data.stream().forEach(x -> {
Integer i = Integer.parseInt(x.substring(4, x.length()));
double d = tan(atan(tan(atan(i))));
});
return (Void) null;
}
}, iteration, "NORMAL_STREAM_COMPUTATION");
Benchmark.performTest(new Callable<Void>() {
#Override
public Void call() throws Exception {
for (int i = 0; i < data.size(); i++) {
Integer x = Integer.parseInt(data.get(i).substring(4, data.get(i).length()));
double d = tan(atan(tan(atan(x))));
}
return (Void) null;
}
}, iteration, "NORMAL_FOREACH_COMPUTATION");
} catch (Exception e) {
e.printStackTrace();
}
}
}
The first scenario performs the same test using the trim() method 100 times for a list that contains 10_000_000 elements and therefore it uses a parallel stream, then a normal stream and last the old school for loop.
The second scenario performs some relatively heavy operations like tan(atan(tan(atan(i)))) for the same list with the same technique as in the first scenario.
The results are:
// First scenario, average times
Parallel stream: 78 ms
Regular stream: 113 ms
For-loop: 110 ms
// Second scenario, average times
Parallel stream: 1397 ms
Regular stream: 3866 ms
For-loop: 3826 ms
Note that you can debug the above code, then you notice that for parallel streams the program creates three extra threads under name [ForkJoinPool-1], [ForkJoinPool-2] and [ForkJoinPool-3].
Edit:
The sequential streams and the for-loop use the caller's thread.

Thread.sleep blocks other Thread

I have a Output class which just prints everything that it gets to print.
public class Output {
private static List<String> textList = new ArrayList<>();
private static Output output = null;
private Output() {
Runnable task = () -> {
int lastIndex = 0;
while (true) {
while (lastIndex < textList.size()) {
System.out.println(lastIndex + " - " + textList.size() + ": " + textList.get(lastIndex));
outputText(textList.get(lastIndex));
lastIndex ++;
}
}
};
new Thread(task).start();
}
private static void outputText(String text) {
synchronized (System.out) {
System.out.println(text);
}
}
public static void say(String text) {
if (output == null) {
output = new Output();
}
textList.add(text);
}
}
When I add something to print, everything works fine:
for (int i = 0; i < 10; i++) {
Output.say("" + i);
}
But when I add a Thread.sleep to the loop it stops on the first output:
for (int i = 0; i < 10; i++) {
Output.say("" + i);
Thread.sleep(100);
}
How can I prevent it? I mean, I'm stopping with sleep just the main thread and not the separate thread.

When you don’t synchronize threads correctly, there is no guaranty that threads see updates made by other threads. They may either completely miss updates or see only parts of them, creating an entirely inconsistent result. Sometimes they may even appear to do the right thing. Without proper synchronization (in the sense of any valid construct specified to be thread safe), this is entirely unpredictable.
Sometimes, the chances of seeing a particular behavior are higher, like in your example. In most runs, the loop without sleep will complete before the other thread even starts its work, whereas inserting sleep raises the chance of lost updates after the second thread has seen values. Once the second thread has seen a value for textList.size(), it might reuse the value forever, evaluating lastIndex < textList.size() to false and executing the equivalent of while(true) { }.
It’s funny that the only place where you inserted a construct for thread safety, is the method outputText that is called by a single thread only (and printing to System.out is synchronized internally in most environments anyway).
Besides, it’s not clear why you are creating an object of type Output that has no relevance here, as all fields and methods are static.
Your code can be corrected and simplified to
public static void main(String[] args) throws InterruptedException {
List<String> textList = new ArrayList<>();
new Thread( () -> {
int index=0;
while(true) synchronized(textList) {
for(; index<textList.size(); index++)
System.out.println(textList.get(index));
}
}).start();
for (int i = 0; i < 10; i++) {
synchronized(textList) {
textList.add(""+i);
}
Thread.sleep(100);
}
}
though it still contains the issues of you original code of never terminating due to the infinite second thread and also burning the CPU with a polling loop. You should let the second thread wait for new items and add a termination condition:
public static void main(String[] args) throws InterruptedException {
List<String> textList = new ArrayList<>();
new Thread( () -> {
synchronized(textList) {
for(int index=0; ; index++) {
while(index>=textList.size()) try {
textList.wait();
} catch(InterruptedException ex) { return; }
final String item = textList.get(index);
if(item==null) break;
System.out.println(item);
}
}
}).start();
for (int i = 0; i < 10; i++) {
synchronized(textList) {
textList.add(""+i);
textList.notify();
}
Thread.sleep(100);
}
synchronized(textList) {
textList.add(null);
textList.notify();
}
}
This is still only an academic example that you shouldn’t use in real life code. There are classes for thread safe data exchange provided by the Java API removing the burden of implementing such things yourself.
public static void main(String[] args) throws InterruptedException {
ArrayBlockingQueue<String> queue = new ArrayBlockingQueue<>(10);
String endMarker = "END-OF-QUEUE"; // the queue does not allow null
new Thread( () -> {
for(;;) try {
String item = queue.take();
if(item == endMarker) break;// don't use == for ordinary strings
System.out.println(item);
} catch(InterruptedException ex) { return; }
}).start();
for (int i = 0; i < 10; i++) {
queue.put(""+i);
Thread.sleep(100);
}
queue.put(endMarker);
}

Java lock/concurrency issue when searching array with multiple threads

I am new to Java and trying to write a method that finds the maximum value in a 2D array of longs.
The method searches through each row in a separate thread, and the threads maintain a shared current maximal value. Whenever a thread finds a value larger than its own local maximum, it compares this value with the shared local maximum and updates its current local maximum and possibly the shared maximum as appropriate. I need to make sure that appropriate synchronization is implemented so that the result is correct regardless of how to computations interleave.
My code is verbose and messy, but for starters, I have this function:
static long sharedMaxOf2DArray(long[][] arr, int r){
MyRunnableShared[] myRunnables = new MyRunnableShared[r];
for(int row = 0; row < r; row++){
MyRunnableShared rr = new MyRunnableShared(arr, row, r);
Thread t = new Thread(rr);
t.start();
myRunnables[row] = rr;
}
return myRunnables[0].sharedMax; //should be the same as any other one (?)
}
For the adapted runnable, I have this:
public static class MyRunnableShared implements Runnable{
long[][] theArray;
private int row;
private long rowMax;
public long localMax;
public long sharedMax;
private static Lock sharedMaxLock = new ReentrantLock();
MyRunnableShared(long[][] a, int r, int rm){
theArray = a;
row = r;
rowMax = rm;
}
public void run(){
localMax = 0;
for(int i = 0; i < rowMax; i++){
if(theArray[row][i] > localMax){
localMax = theArray[row][i];
sharedMaxLock.lock();
try{
if(localMax > sharedMax)
sharedMax = localMax;
}
finally{
sharedMaxLock.unlock();
}
}
}
}
}
I thought this use of a lock would be a safe way to prevent multiple threads from messing with the sharedMax at a time, but upon testing/comparing with a non-concurrent maximum-finding function on the same input, I found the results to be incorrect. I'm thinking the problem might come from the fact that I just say
...
t.start();
myRunnables[row] = rr;
...
in the sharedMaxOf2DArray function. Perhaps a given thread needs to finish before I put it in the array of myRunnables; otherwise, I will have "captured" the wrong sharedMax? Or is it something else? I'm not sure on the timing of things..

I'm not sure if this is a typo or not, but your Runnable implementation declares sharedMax as an instance variable:
public long sharedMax;
rather than a shared one:
public static long sharedMax;
In the former case, each Runnable gets its own copy and will not "see" the values of others. Changing it to the latter should help. Or, change it to:
public long[] sharedMax; // array of size 1 shared across all threads
and you can now create an array of size one outside the loop and pass it in to each Runnable to use as shared storage.
As an aside: please note that there will be tremendous lock contention since every thread checks the common sharedMax value by holding a lock for every iteration of its loop. This will likely lead to poor performance. You'd have to measure, but I'd surmise that letting each thread find the row maximum and then running a final pass to find the "max of maxes" might actually be comparable or quicker.

From JavaDocs:
public interface Callable
A task that returns a result and may
throw an exception. Implementors define a single method with no
arguments called call.
The Callable interface is similar to Runnable, in that both are
designed for classes whose instances are potentially executed by
another thread. A Runnable, however, does not return a result and
cannot throw a checked exception.
Well, you can use Callable to calculate your result from one 1darray and wait with an ExecutorService for the end. You can now compare each result of the Callable to fetch the maximum. The code may look like this:
Random random = new Random(System.nanoTime());
long[][] myArray = new long[5][5];
for (int i = 0; i < 5; i++) {
myArray[i] = new long[5];
for (int j = 0; j < 5; j++) {
myArray[i][j] = random.nextLong();
}
}
ExecutorService executor = Executors.newFixedThreadPool(myArray.length);
List<Future<Long>> myResults = new ArrayList<>();
// create a callable for each 1d array in the 2d array
for (int i = 0; i < myArray.length; i++) {
Callable<Long> callable = new SearchCallable(myArray[i]);
Future<Long> callResult = executor.submit(callable);
myResults.add(callResult);
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
while (!executor.isTerminated()) {
}
// now compare the results and fetch the biggest one
long max = 0;
for (Future<Long> future : myResults) {
try {
max = Math.max(max, future.get());
} catch (InterruptedException | ExecutionException e) {
// something bad happend...!
e.printStackTrace();
}
}
System.out.println("The result is " + max);
And your Callable:
public class SearchCallable implements Callable<Long> {
private final long[] mArray;
public SearchCallable(final long[] pArray) {
mArray = pArray;
}
#Override
public Long call() throws Exception {
long max = 0;
for (int i = 0; i < mArray.length; i++) {
max = Math.max(max, mArray[i]);
}
System.out.println("I've got the maximum " + max + ", and you guys?");
return max;
}
}

Your code has serious lock contention and thread safety issues. Even worse, it doesn't actually wait for any of the threads to finish before the return myRunnables[0].sharedMax which is a really bad race condition. Also, using explicit locking via ReentrantLock or even synchronized blocks is usually the wrong way of doing things unless you're implementing something low level (eg your own/new concurrent data structure)
Here's a version that uses the Future concurrent primitive and an ExecutorService to handle the thread creation. The general idea is:
Submit a number of concurrent jobs to your ExecutorService
Add the Future returned backed from submit(...) to a List
Loop through the list calling get() on each Future and aggregating the result
This version has the added benefit that there is no lock contention (or locking in general) between the worker threads as each just returns back the max for its slice of the array.
import java.util.concurrent.*;
import java.util.*;
public class PMax {
public static long pmax(final long[][] arr, int numThreads) {
ExecutorService pool = Executors.newFixedThreadPool(numThreads);
try {
List<Future<Long>> list = new ArrayList<Future<Long>>();
for(int i=0;i<arr.length;i++) {
// put sub-array in a final so the inner class can see it:
final long[] subArr = arr[i];
list.add(pool.submit(new Callable<Long>() {
public Long call() {
long max = Long.MIN_VALUE;
for(int j=0;j<subArr.length;j++) {
if( subArr[j] > max ) {
max = subArr[j];
}
}
return max;
}
}));
}
// find the max of each slice's max:
long max = Long.MIN_VALUE;
for(Future<Long> future : list) {
long threadMax = future.get();
System.out.println("threadMax: " + threadMax);
if( threadMax > max ) {
max = threadMax;
}
}
return max;
} catch( RuntimeException e ) {
throw e;
} catch( Exception e ) {
throw new RuntimeException(e);
} finally {
pool.shutdown();
}
}
public static void main(String args[]) {
int x = 1000;
int y = 1000;
long max = Long.MIN_VALUE;
long[][] foo = new long[x][y];
for(int i=0;i<x;i++) {
for(int j=0;j<y;j++) {
long r = (long)(Math.random() * 100000000);
if( r > max ) {
// save this to compare against pmax:
max = r;
}
foo[i][j] = r;
}
}
int numThreads = 32;
long pmax = pmax(foo, numThreads);
System.out.println("max: " + max);
System.out.println("pmax: " + pmax);
}
}
Bonus: If you're calling this method repeatedly then it would probably make sense to pull the ExecutorService creation out of the method and have it be reused across calls.

Well, that definetly is an issue - but without more code it is hard to understand if it is the only thing.
There is basically a race condition between the access of thread[0] (and this read of sharedMax) and the modification of the sharedMax in other threads.
Think what happens if the scheduler decides to let no let any thread run for now - so when you are done creating the threads, you will return the answer without modifying it even once! (of course there are other possible scenarios...)
You can overcome it by join()ing all threads before returning an answer.

what is wrong with this thread-safe byte sequence generator?

I need a byte generator that would generate values from Byte.MIN_VALUE to Byte.MAX_VALUE. When it reaches MAX_VALUE, it should start over again from MIN_VALUE.
I have written the code using AtomicInteger (see below); however, the code does not seem to behave properly if accessed concurrently and if made artificially slow with Thread.sleep() (if no sleeping, it runs fine; however, I suspect it is just too fast for concurrency problems to show up).
The code (with some added debug code):
public class ByteGenerator {
private static final int INITIAL_VALUE = Byte.MIN_VALUE-1;
private AtomicInteger counter = new AtomicInteger(INITIAL_VALUE);
private AtomicInteger resetCounter = new AtomicInteger(0);
private boolean isSlow = false;
private long startTime;
public byte nextValue() {
int next = counter.incrementAndGet();
//if (isSlow) slowDown(5);
if (next > Byte.MAX_VALUE) {
synchronized(counter) {
int i = counter.get();
//if value is still larger than max byte value, we reset it
if (i > Byte.MAX_VALUE) {
counter.set(INITIAL_VALUE);
resetCounter.incrementAndGet();
if (isSlow) slowDownAndLog(10, "resetting");
} else {
if (isSlow) slowDownAndLog(1, "missed");
}
next = counter.incrementAndGet();
}
}
return (byte) next;
}
private void slowDown(long millis) {
try {
Thread.sleep(millis);
} catch (InterruptedException e) {
}
}
private void slowDownAndLog(long millis, String msg) {
slowDown(millis);
System.out.println(resetCounter + " "
+ (System.currentTimeMillis()-startTime) + " "
+ Thread.currentThread().getName() + ": " + msg);
}
public void setSlow(boolean isSlow) {
this.isSlow = isSlow;
}
public void setStartTime(long startTime) {
this.startTime = startTime;
}
}
And, the test:
public class ByteGeneratorTest {
#Test
public void testGenerate() throws Exception {
ByteGenerator g = new ByteGenerator();
for (int n = 0; n < 10; n++) {
for (int i = Byte.MIN_VALUE; i <= Byte.MAX_VALUE; i++) {
assertEquals(i, g.nextValue());
}
}
}
#Test
public void testGenerateMultiThreaded() throws Exception {
final ByteGenerator g = new ByteGenerator();
g.setSlow(true);
final AtomicInteger[] counters = new AtomicInteger[Byte.MAX_VALUE-Byte.MIN_VALUE+1];
for (int i = 0; i < counters.length; i++) {
counters[i] = new AtomicInteger(0);
}
Thread[] threads = new Thread[100];
final CountDownLatch latch = new CountDownLatch(threads.length);
for (int i = 0; i < threads.length; i++) {
threads[i] = new Thread(new Runnable() {
public void run() {
try {
for (int i = Byte.MIN_VALUE; i <= Byte.MAX_VALUE; i++) {
byte value = g.nextValue();
counters[value-Byte.MIN_VALUE].incrementAndGet();
}
} finally {
latch.countDown();
}
}
}, "generator-client-" + i);
threads[i].setDaemon(true);
}
g.setStartTime(System.currentTimeMillis());
for (int i = 0; i < threads.length; i++) {
threads[i].start();
}
latch.await();
for (int i = 0; i < counters.length; i++) {
System.out.println("value #" + (i+Byte.MIN_VALUE) + ": " + counters[i].get());
}
//print out the number of hits for each value
for (int i = 0; i < counters.length; i++) {
assertEquals("value #" + (i+Byte.MIN_VALUE), threads.length, counters[i].get());
}
}
}
The result on my 2-core machine is that value #-128 gets 146 hits (all of them should get 100 hits equally as we have 100 threads).
If anyone has any ideas, what's wrong with this code, I'm all ears/eyes.
UPDATE: for those who are in a hurry and do not want to scroll down, the correct (and shortest and most elegant) way to solve this in Java would be like this:
public byte nextValue() {
return (byte) counter.incrementAndGet();
}
Thanks, Heinz!

Initially, Java stored all fields as 4 or 8 byte values, even short and byte. Operations on the fields would simply do bit masking to shrink the bytes. Thus we could very easily do this:
public byte nextValue() {
return (byte) counter.incrementAndGet();
}
Fun little puzzle, thanks Neeme :-)

You make the decision to incrementAndGet() based on a old value of counter.get(). The value of the counter can reach MAX_VALUE again before you do the incrementAndGet() operation on the counter.
if (next > Byte.MAX_VALUE) {
synchronized(counter) {
int i = counter.get(); //here You make sure the the counter is not over the MAX_VALUE
if (i > Byte.MAX_VALUE) {
counter.set(INITIAL_VALUE);
resetCounter.incrementAndGet();
if (isSlow) slowDownAndLog(10, "resetting");
} else {
if (isSlow) slowDownAndLog(1, "missed"); //the counter can reach MAX_VALUE again if you wait here long enough
}
next = counter.incrementAndGet(); //here you increment on return the counter that can reach >MAX_VALUE in the meantime
}
}
To make it work one has to make sure the no decisions are made on stale info. Either reset the counter or return the old value.
public byte nextValue() {
int next = counter.incrementAndGet();
if (next > Byte.MAX_VALUE) {
synchronized(counter) {
next = counter.incrementAndGet();
//if value is still larger than max byte value, we reset it
if (next > Byte.MAX_VALUE) {
counter.set(INITIAL_VALUE + 1);
next = INITIAL_VALUE + 1;
resetCounter.incrementAndGet();
if (isSlow) slowDownAndLog(10, "resetting");
} else {
if (isSlow) slowDownAndLog(1, "missed");
}
}
}
return (byte) next;
}

Your synchronized block contains only the if body. It should wrap whole method including if statement itself. Or just make your method nextValue synchronized. BTW in this case you do not need Atomic variables at all.
I hope this will work for you. Try to use Atomic variables only if your really need highest performance code, i.e. synchronized statement bothers you. IMHO in most cases it does not.

If I understand you correctly, you care that the results of nextValue are in the range of Byte.MIN_VALUE and Byte.MAX_VALUE and you don't care about the value stored in the counter.
Then you can map integers on bytes such that you required enumeration behavior is exposed:
private static final int VALUE_RANGE = Byte.MAX_VALUE - Byte.MIN_VALUE + 1;
private final AtomicInteger counter = new AtomicInteger(0);
public byte nextValue() {
return (byte) (counter.incrementAndGet() % VALUE_RANGE + Byte.MIN_VALUE - 1);
}
Beware, this is untested code. But the idea should work.

I coded up the following version of nextValue using compareAndSet which is designed to be used in a non-synchronized block. It passed your unit tests:
Oh, and I introduced new constants for MIN_VALUE and MAX_VALUE but you can ignore those if you prefer.
static final int LOWEST_VALUE = Byte.MIN_VALUE;
static final int HIGHEST_VALUE = Byte.MAX_VALUE;
private AtomicInteger counter = new AtomicInteger(LOWEST_VALUE - 1);
private AtomicInteger resetCounter = new AtomicInteger(0);
public byte nextValue() {
int oldValue;
int newValue;
do {
oldValue = counter.get();
if (oldValue >= HIGHEST_VALUE) {
newValue = LOWEST_VALUE;
resetCounter.incrementAndGet();
if (isSlow) slowDownAndLog(10, "resetting");
} else {
newValue = oldValue + 1;
if (isSlow) slowDownAndLog(1, "missed");
}
} while (!counter.compareAndSet(oldValue, newValue));
return (byte) newValue;
}
compareAndSet() works in conjunction with get() to manage concurrency.
At the start of your critical section, you perform a get() to retrieve the old value. You then perform some function dependent only on the old value to compute a new value. Then you use compareAndSet() to set the new value. If the AtomicInteger is no longer equal to the old value at the time compareAndSet() is executed (because of concurrent activity), it fails and you must start over.
If you have an extreme amount of concurrency and the computation time is long, it is conceivable that the compareAndSet() may fail many times before succeeding and it may be worth gathering statistics on that if concerns you.
I'm not suggesting that this is a better or worse approach than a simple synchronized block as others have suggested, but I personally would probably use a synchronized block for simplicity.
EDIT: I'll answer your actual question "Why doesn't mine work?"
Your code has:
int next = counter.incrementAndGet();
if (next > Byte.MAX_VALUE) {
As these two lines are not protected by a synchronized block, multiple threads can execute them concurrently and all obtain values of next > Byte.MAX_VALUE. All of them will then drop through into the synchronized block and set counter back to INITIAL_VALUE (one after another as they wait for each other).
Over the years, there has been a huge amount written over the pitfalls of trying to get a performance tweak by not synchronizing when it doesn't seem necessary. For example, see Double Checked Locking

Notwithstanding that Heinz Kabutz is the clean answer to the specific question, ye olde Java SE 8 [March 2014] added AtomicIntger.updateAndGet (and friends). This leads to a more general solution if circumstances required:
public class ByteGenerator {
private static final int MIN = Byte.MIN_VALUE;
private static final int MAX = Byte.MAX_VALUE;
private final AtomicInteger counter = new AtomicInteger(MIN);
public byte nextValue() {
return (byte)counter.getAndUpdate(ByteGenerator::update);
}
private static int update(int old) {
return old==MAX ? MIN : old+1;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Adding more threads to executorservice only makes it slower - java

Related

More than 2 threads working slower than 1 or 2 threads unless Thread.sleep(1) is put in the run() method of a thread

Why iteration of list taking more time if java 8 stream feature use?

Thread.sleep blocks other Thread

Java lock/concurrency issue when searching array with multiple threads

what is wrong with this thread-safe byte sequence generator?

Categories

Resources