How to make writes to array visible to other Threads - java

I have an input array of basic type int, I would like to process this array using multiple threads and store the results in an output array of same type and size. Is the following code correct in terms of memory visibility?
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ArraySynchronization2
{
final int width = 100;
final int height = 100;
final int[][] img = new int[width][height];
volatile int[][] avg = new int[width][height];
public static void main(String[] args) throws InterruptedException, ExecutionException
{
new ArraySynchronization2().doJob();;
}
private void doJob() throws InterruptedException, ExecutionException
{
final int threadNo = 8;
ExecutorService pool = Executors.newFixedThreadPool(threadNo);
final CountDownLatch countDownLatch = new CountDownLatch(width - 2);
for (int x = 1; x < width - 1; x++)
{
final int col = x;
pool.execute(new Runnable()
{
public void run()
{
for (int y = 0; y < height; y++)
{
avg[col][y] = (img[col - 1][y] + img[col][y] + img[col + 1][y]) / 3;
}
// how can I make the writes to the data in avg[][] visible to other threads? is this ok?
avg = avg;
countDownLatch.countDown();
};
});
}
try
{
// Does this make any memory visibility guarantees?
countDownLatch.await();
}
catch (InterruptedException e)
{
e.printStackTrace();
}
// can I read avg here, will the results be correct?
for (int x = 0; x < width; x++)
{
for (int y = 0; y < height; y++)
{
System.out.println(avg[x][y]);
}
}
pool.shutdown();
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
// now I know tasks are completed and results synchronized (after thread death), but what if I plan to reuse the pool?
}
}
I do not want to synchronize on CountDownLatch. I would like to know how to make the writes to the output array visible to other threads. Let imagine that I have an array (eg. image) that I would like to process, I could do this in multiple separate tasks that process chunks of the input array into the output array, there are no inter-dependencies between the writes to the output. After all computations complete, I would like to have all the results in the output array ready to read. How could I achieve such behaviour? I know that it is achievable by using submit and Future.get() instead of execute, I'd like to know how to properly implement such low-level mechanism? Please also refer to the questions raised in comments near the code.

Hm, just wondering if you actually need a latch. The array itself is a reserved block in memory, with every cell being a dedicated memory address. (btw. marking it volatile does only mark the reference to the array as volatile, not the cells of the array, see here). So you need to coordinate access to the cells only if multiple threads write-access the same cell.
Question is, are you actually doing this? Or the aim should be: avoid coordinating access if possible, because it comes at a cost.
In your algorithm, you operate on rows, so why not parallelize on rows, so that each thread only reads & calculates values of a row-segement of the entire array and ignore the other rows?
i.e.
thread-0 -> rows 0, 8, 15, ...
thread-1 -> rows 1, 9, 16, ...
...
basically this (haven't tested):
for (int n = 0; n < threadNo; n++) { //each n relates to a thread
pool.execute(new Runnable() {
public void run() {
for (int row = n; row < height; row += threadNo) { //proceed to the next row for the thread
for (int col = 1; col < width-1; col++) {
avg[col][row] = (img[col - 1][row] + img[col][row] + img[col + 1][row]) / 3;
}
}
};
});
}
So they can operate on the entire array without having to synchronize at all. An by putting the loop to print out the result after shutting down the pool will ensure all calculate-threads have finished, and the only thread that has to wait is the main thread.
An alternative to this approach is to create an avg-array of size 100/ThreadNo for each thread so that each thread write-operate on it's on array and you merge the arrays afterwards with System.arraycopy() into one array.
If you intend to reuse the pool, you should use submit instead of execute and call get() on the Futures you get from submit.
Set<Future> futures = new HashSet<>();
for(int n = 0; ...) {
futures.add(pool.submit(new Runnable() {...}));
}
for(Future f : futures) {
f.get(); //blocks until the task is completed
}
In case you want to read intermediate states of the array you can either read it directly, if inconsistent data on single cells is acceptable, or use AtomicIntegerArray, as Nicolas Filotto suggested.
-- EDIT --
After the edit for using the width for the latch instead of the original thread number and all the discussion I'd like to add a few words.
As #jameslarge pointed out, it's about how to establish a "happens-before" relationship, or how to guarantee, that operation A (i.e. a write) happens before operation B (i.e. a read). Therefore access between two threads needs to be coordinated. There are several options
volatile keyword - doesn't work on arrays as it marks only the reference and not the values as being volatile
synchronization - pessimistic locking (synchronized modifier or statement)
CAS - optimistic locking, used by quite a few concurrent implementations
However every syncpoint (pessimistic or optimistic) establishes a happens-before relationship. Which one you choose, depends on your requirement.
What you like to achieve is a coordination between the read operation of the main thread and the write operations of the worker threads. How do you implement, is up to you and your requirements. The CountDownLatch counting down the total number of jobs is one way (btw., the latch uses a state property which is a volatile int). A CyclicBarrier may also be a construct worth to consider, especially if you'd like to read consistent intermediate states. Or a future.get(), or...
All boils down to the worker thread having to signal they're done writingso the reader thread can start reading.
However be aware of using sleep instead of synchronization. Sleep does not establish a happens before relationship, and using sleep for synchronization is a typical concurrency bug pattern. I.e. in the worst case, sleep is executed before any of the work has been done.

What you need to use is rather an AtomicIntegerArray instead of a simple volatile int array. Indeed, it is meant to be used in your case to update array element atomically and visible by all threads.

Related

Data Races in an AtomicIntegerArray

In the code below:
I am updating num[1]=0 of an AtomicIntegerArray num 1000 times each in 2 threads.
At the end of the 2 threads in main thread ;shouldn't the value of num[1] be 2000 as there shouldn't be data races in an AtomicIntegerArray .
However I get random values < 2000. Could someone tell me why?
Code:
import java.util.concurrent.atomic.AtomicIntegerArray;
public class AtomicIntegerArr {
private static AtomicIntegerArray num= new AtomicIntegerArray(2);
public static void main(String[] args) throws InterruptedException {
Thread t1 = new Thread(new MyRun1());
Thread t2 = new Thread(new MyRun2());
num.set(0, 10);
num.set(1, 0);
System.out.println("In Main num before:"+num.get(1));
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println("In Main num after:"+num.get(1));
}
static class MyRun1 implements Runnable {
public void run() {
for (int i = 0; i < 1000; i++) {
num.set(1,num.get(1)+1);
}
}
}
static class MyRun2 implements Runnable {
public void run() {
for (int i = 0; i < 1000; i++) {
num.set(1,num.get(1)+1);
}
}
}
}
Edit: Adding num.compareAndSet(1, num.get(1), num.get(1)+1); instead of num.set(1,num.get(1)+1); doesnt work either.
I get random values < 2000. Could someone tell me why?
This is called the lost-update problem.
Because, in the following code:
num.set(1, num.get(1) + 1);
Although each individual operation involved is atomic, the combined operation is not. The single operations from the two threads can interleave, causing updates from one thread to be overwritten with stale value by another thread.
You can use compareAndSet to solve this problem, but you have to check whether the operation is successful, and do it again when it fails.
int v;
do {
v = num.get(1);
} while (!num.compareAndSet(1, v, v+1));
There's also a method for exactly this purpose:
num.accumulateAndGet(1, 1, (x, d)->x+d);
accumulateAndGet(int i, int x, IntBinaryOperator accumulatorFunction)
Atomically updates the element at index i with the results of applying the given function to the current and given values, returning the updated value. The function should be side-effect-free, since it may be re-applied when attempted updates fail due to contention among threads. The function is applied with the current value at index i as its first argument, and the given update as the second argument.
This is a classic race condition. Any time you have a fetch, an operation, and a put, your code is racy.
Consider two threads, both executing num.set(1,num.get(1)+1) at roughly the "same time." First, let's break down what the expression itself is doing:
it fetches num.get(1); let's call this x
it adds 1 to that; let's call this y
it puts that sum in at `num.set(1, y);
Even though the intermediate values in your expression are just values on the stack, and not explicit variables, the operation is the same: get, add, put.
Okay, so back to our two threads. What if the operations are ordered like this?
inital state: n[1] = 5
Thread A | Thread B
========================
x = n[1] = 5 |
| x = n[1] = 5
| y = 5 + 1 = 6
y = 5 + 1 = 6 |
n[1] = 6 |
| n[1] = 6
Since both threads fetched the value before either thread put its added value, they both do the same thing. You have 5 + 1 twice, and the result is 6, not 7!
What you want is getAndIncrement(int idx), or one of the similar methods that does the get, adding, and putting atomically.
These methods can actually all be built on top of the compareAndSet method you identified. But to do that, you need to do the increment within a loop, trying until the compareAndSet returns true. Also, for that to work, you have store that initial num.get(1) value in a local variable, rather than fetching it a second time. In effect, this loop says "keep trying the get-add-put logic until it works without anyone else having raced between the operations." In my example above, Thread B would have noticed that compareAndSet(1, 5, 6) fails (since the actual value at that time is 6, not 5 as expected), and thus retried. This is in fact what all of those atomic methods, like getAndIncrement, do.

Parallel Programming 2 Threads Running With Shared Variable

I have a question about concurrency, I just wrote a program that runs 2 threads with the following instructions:
Thread 1: increment by 1 the variable "num" till 1'000'000 with loop
Thread 2: same thing but decrementing
at the end I receive an undesired result. And yeah I know that I could synchronize or try to use reentrant locks, but the problem is that I can't understand what's behind all this different undesired results.
I mean the operations I'm using are commutative and hence we don't care about the ordering, so if this doesn't matter we should still obtain 0 which is not the case!
Can someone explain to me what happens behind all the computing, so that I can get a feel and I can recognize this situations immediately?
EDIT:
Since I was just interested in understanding the main concept I thought it wasn't necessary to put the code.
Code:
class MyThread implements Runnable {
int id;
volatile static long num = 0;
MyThread(int id) {
this.id = id;
public void run() {
if (id == 0) {
for (int j = 0; j < 100000; ++j)
num++;}
} else {
for (int j = 0; j < 100000; ++j)
num--;}
After this I create the Threads and run them:
MyThread p = new MyThread(0);
MyThread q = new MyThread(1);
Thread t = new Thread(p);
Thread u = new Thread(q);
t.start();
u.start();
try {
t.join();
u.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
EDIT2:
I understand the concept now, but I would also like to know why declaring the variable as volatile still gives me wrong results?
EDIT3: I thought about it, and I think it's because bad interleaving can still give problems!
If the increment/decrement operation are not atomic, you can end up with this kind of behaviors.
An operation is considered atomic if it appears to the rest of the system to occur instantaneously. (cf wikipedia).
Consider the following case:
Thread 1 reads the value n in the variable x.
Thread 2 reads the value n in the variable x.
Thread 1 increment the value and store it in the variable x, that now evaluate at n+1.
Thread 2 decrements the value and store it in the variable x, that now evaluate at n-1.
But what you wanted was the variable x to still evaluate at n.
I do not know the specific of java primitive but it appears that you could use AtomicInteger or using a synchronized method could solve your issue here.
just mark this field as volatile. By this way you will reach safe access and you will be able to change it in a multi-thread application without using any other synchronization tools.

How should I parallelize a computationally expensive for loop and collate the iteration results?

I'm working on an 8-core machine and am performing a computationally heavy task. However, each execution of the task (i.e., iteration of for loop) is rather independent of the previous one. There are only some variables that are 'summed up' from one execution to the next. I'm guessing this is a good example for parallelizing/threading but I'm not sure how to go about it.
Here's how the code looks. As of now, it's just part of the main method in my main executor class:
double testerPayoffSum = 0.0, developerPayoffSum = 0.0;
Random seed = new Random();
try {
for (int i = 0; i < GameConstants.MAX_GAMES; i++) {
EraserSimulator eraser = new EraserSimulator(GameConstants.MAX_TARGETS, GameConstants.MAX_RESOURCES, GameConstants.NUM_ATTACKER_TYPES, seed.nextInt());
Map<Set<SingleObjectiveTarget>, Double> gameStrategy = eraser.run();
assert (gameStrategy != null);
TestingGameSimulator testingGame = new TestingGameSimulator(GameConstants.MAX_TARGETS, gameStrategy, GameConstants.NUM_GAMES_TO_STORE_FOR_HISTORY, GameConstants.NUM_TESTING_GAMES_TO_PLAY);
PlayerPayoffs payoffs = testingGame.run(eraser.getEraserInstance());
testerPayoffSum += payoffs.getAverageTesterPayoff(GameConstants.NUM_TESTING_GAMES_TO_PLAY);
developerPayoffSum += payoffs.getAverageDeveloperPayoff(GameConstants.NUM_TESTING_GAMES_TO_PLAY);
System.out.print("Output: ERASER Games played; Number of developers caught");
System.out.print(", " + GameConstants.NUM_TESTING_GAMES_TO_PLAY + ", " + payoffs.getNumTimesCaught() + "\n");
} catch(Exception e){sendEmailAlert("Execution Failed with Exception");}
I'd like to parallelize the for-loop computation if possible and keep summing up the testerPayoffSum and developerPayofffSum variables. How might I achieve this?
Note: Each execution of the for loop takes about 20-30 minutes depending on the input size (as set by the various GameConstants). Even for a small number of MAX_GAMES the above takes close to 2-3 hours.
Create a thread object implementing Callable which returns a Future object containing your testerPayoffSum and developerPayoffSum, start the calculation and sum the results obtained from the Futures (See also https://blogs.oracle.com/CoreJavaTechTips/entry/get_netbeans_6).
are you absolutely sure you have no dependency ?
1.used classes must not share any variables
if it does then you have to add locks
but it will affect performance
if some shared variables are used extensively
then the performance can drop significantly even bellow the non-parallel execution
2.used classes must not use any kind of machine learning.
there is no solution for this
because parallelization will corrupt your results
Now how to do it (I am not JAVA coder so I stick to C++ code).
//--- globals and headers -----------------------------------------------------
unsigned long __stdcall function(LPVOID p);
Random seed = new Random();
const int N=8; // threads count (<=CPU count)
int id[N]; // thread id
int max[N]; // number of games per thread
double testerPayoffSum[N]; // sum to separate variables to avoid locks need
double developerPayoffSum[N];
volatile int run=0,stop=0; // thread control variables run is number of running threads and stop force stop...
//--- main code ---------------------------------------------------------------
// init some variables ... may be the seed init will be better here too
int i;
for (i = 0; i < N; i++)
{
id[i]=i;
max[i]=GameConstants.MAX_GAMES / N;
testerPayoffSum[i]=0.0;
developerPayoffSum[i]=0.0;
}
max[0]=GameConstants.MAX_GAMES % N;
// create threads
for (i = 0; i < N; i++)
{
HANDLE hnd=CreateThread(0,0,function,&id[i],0,0);
if (hnd!=NULL) CloseHandle(hnd); // this line is important !!!
// because if you do not close Handle it will be allocated until the end of app
// handle leaks are nasty and cause weird OS behaviour
// I saw many times this bug in commercial drivers
// it is a nightmare for 24/7 software
}
// wait for them
while (run) Sleep(200);
// sum the results to [0]
for (i = 1; i < N; i++)
{
testerPayoffSum[0] +=testerPayoffSum[i];
developerPayoffSum[0]+=developerPayoffSum[i];
}
// here do what you need to do with the results
//--- thread function ---------------------------------------------------------
unsigned long __stdcall function(LPVOID p)
{
run++;
int ix=((int*)p)[0];
for (i = 0; i < max[ix]; i++)
{
if (stop) break;
EraserSimulator eraser = new EraserSimulator(GameConstants.MAX_TARGETS, GameConstants.MAX_RESOURCES, GameConstants.NUM_ATTACKER_TYPES, seed.nextInt());
Map<Set<SingleObjectiveTarget>, Double> gameStrategy = eraser.run();
assert (gameStrategy != null);
TestingGameSimulator testingGame = new TestingGameSimulator(GameConstants.MAX_TARGETS, gameStrategy, GameConstants.NUM_GAMES_TO_STORE_FOR_HISTORY, GameConstants.NUM_TESTING_GAMES_TO_PLAY);
PlayerPayoffs payoffs = testingGame.run(eraser.getEraserInstance());
testerPayoffSum[ix] += payoffs.getAverageTesterPayoff(GameConstants.NUM_TESTING_GAMES_TO_PLAY);
developerPayoffSum[ix] += payoffs.getAverageDeveloperPayoff(GameConstants.NUM_TESTING_GAMES_TO_PLAY);
// do not call any visual stuff from thread !!! sometimes it can cause a lot of problems ...
// instead cretae some global string variable and set it to what shoud be printed out
// and inside wait while loop in main code add if string != "" then System.out.print(string);
// but in that case you should add lock to it.
// System.out.print("Output: ERASER Games played; Number of developers caught");
// System.out.print(", " + GameConstants.NUM_TESTING_GAMES_TO_PLAY + ", " + payoffs.getNumTimesCaught() + "\n");
//Sleep(100); // well placed sleep
}
run--;
}
[Notes]
from your code I am assuming that GameConstants is shared variable !!!
if it is only for read than it is OK
but if you do also write to it inside thread (I suspect that yes)
then you have a big problem because you need to add locks inside your game class then ...
if no machine learning is done then you could avoid this
by creating separate GameConstants variables for each thread like ... GameConstants[N]
but you need to rewrite the code so it access the GameConstants[ix] and not GameConstants
[lock]
have no clue how locks are implemented in JAVA
but you can also use your own something like this
class _lock
{
public:
volatile bool locked;
_lock() { locked=false; }
void lock() { while(locked) Sleep(1); locked=true; }
void unlock() { locked=false; }
};
// now for each shared variable (or group of variables) add one global _lock variable
_lock l1; int sv1; // shared variable 1 and her lock
// any write access and sometimes also read access needs lock
l1.lock();
sv1++;
l1.unlock();
beware that locks can sometimes cause App freeze especially while heavy duty use.
does not matter if it is own lock or OS lock
this occurs mainly while mixing visual stuff or some OS calls inside threads and not in main thread
in that case sometimes a well placed sleep helps but avoid OS calls inside threads if you can
because it cause very many other problems ...
also try to be locked as small time as possible because in case of conflict the conflicting threads are stopped !!!
therefore you cannot just add lock at the start of loop and unlock at the end
because the parallelism speedup will be lost then
Declare a queue to collect results and submit tasks to a thread pool:
final ArrayBloclingQueue<PlayerPayoffs> queue=new ArrayBloclingQueue<PlayerPayoffs>();
Executor exec=new Executors.newFixedThreadPool(N); // number of threads depends on hardware
for (int i = 0; i < GameConstants.MAX_GAMES; i++) {
exec.execute(new Runnable(){
EraserSimulator eraser = new EraserSimulator(GameConstants.MAX_TARGETS, GameConstants.MAX_RESOURCES, GameConstants.NUM_ATTACKER_TYPES, seed.nextInt());
Map<Set<SingleObjectiveTarget>, Double> gameStrategy = eraser.run();
assert (gameStrategy != null);
TestingGameSimulator testingGame = new TestingGameSimulator(GameConstants.MAX_TARGETS, gameStrategy, GameConstants.NUM_GAMES_TO_STORE_FOR_HISTORY, GameConstants.NUM_TESTING_GAMES_TO_PLAY);
PlayerPayoffs payoffs = testingGame.run(eraser.getEraserInstance());
queue.put(payoffs);
});
}
Then collect and sum results:
double testerPayoffSum = 0.0, developerPayoffSum = 0.0;
for (int i = 0; i < GameConstants.MAX_GAMES; i++) {
PlayerPayoffs payoffs=queue.take();
testerPayoffSum += payoffs.getAverageTesterPayoff(GameConstants.NUM_TESTING_GAMES_TO_PLAY);
developerPayoffSum += payoffs.getAverageDeveloperPayoff(GameConstants.NUM_TESTING_GAMES_TO_PLAY);
System.out.print("Output: ERASER Games played; Number of developers caught");
System.out.print(", " + GameConstants.NUM_TESTING_GAMES_TO_PLAY + ", " + payoffs.getNumTimesCaught() + "\n");
}

(Thread pools in Java) Increasing number of threads creates slow down for simple for loop. Why?

I've got a little bit of work that is easily parallelizable, and I want to use Java threads to split up the work across my four core machine. It's a genetic algorithm applied to the traveling salesman problem. It doesn't sound easily parallelizable, but the first loop is very easily so. The second part where I talk about the actual evolution may or may not be, but I want to know if I'm getting slow down because of the way I'm implementing threading, or if its the algorithm itself.
Also, if anyone has better ideas on how I should be implementing what I'm trying to do, that would be very much appreciated.
In main(), I have this:
final ArrayBlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(numThreads*numIter);
ThreadPoolExecutor tpool = new ThreadPoolExecutor(numThreads, numThreads, 10, TimeUnit.SECONDS, queue);
barrier = new CyclicBarrier(numThreads);
k.init(tpool);
I have a loop that is done inside of init() and looks like this:
for (int i = 0; i < numCities; i++) {
x[i] = rand.nextInt(width);
y[i] = rand.nextInt(height);
}
That I changed to this:
int errorCities = 0, stepCities = 0;
stepCities = numCities/numThreads;
errorCities = numCities - stepCities*numThreads;
// Split up work, assign to threads
for (int i = 1; i <= numThreads; i++) {
int startCities = (i-1)*stepCities;
int endCities = startCities + stepCities;
// This is a bit messy...
if(i <= numThreads) endCities += errorCities;
tpool.execute(new citySetupThread(startCities, endCities));
}
And here is citySetupThread() class:
public class citySetupThread implements Runnable {
int start, end;
public citySetupThread(int s, int e) {
start = s;
end = e;
}
public void run() {
for (int j = start; j < end; j++) {
x[j] = ThreadLocalRandom.current().nextInt(0, width);
y[j] = ThreadLocalRandom.current().nextInt(0, height);
}
try {
barrier.await();
} catch (InterruptedException ie) {
return;
} catch (BrokenBarrierException bbe) {
return;
}
}
}
The above code is run once in the program, so it was sort of a test case for my threading constructs (this is my first experience with Java threads). I implemented the same sort of thing in a real critical section, specifically the evolution part of the genetic algorithm, whose class is as follows:
public class evolveThread implements Runnable {
int start, end;
public evolveThread(int s, int e) {
start = s;
end = e;
}
public void run() {
// Get midpoint
int n = population.length/2, m;
for (m = start; m > end; m--) {
int i, j;
i = ThreadLocalRandom.current().nextInt(0, n);
do {
j = ThreadLocalRandom.current().nextInt(0, n);
} while(i == j);
population[m].crossover(population[i], population[j]);
population[m].mutate(numCities);
}
try {
barrier.await();
} catch (InterruptedException ie) {
return;
} catch (BrokenBarrierException bbe) {
return;
}
}
}
Which exists in a function evolve() that is called in init() like so:
for (int p = 0; p < numIter; p++) evolve(p, tpool);
Yes I know that's not terribly good design, but for other reasons I'm stuck with it. Inside of evolve is the relevant parts, shown here:
// Threaded inner loop
int startEvolve = popSize - 1,
endEvolve = (popSize - 1) - (popSize - 1)/numThreads;
// Split up work, assign to threads
for (int i = 0; i < numThreads; i++) {
endEvolve = (popSize - 1) - (popSize - 1)*(i + 1)/numThreads + 1;
tpool.execute(new evolveThread(startEvolve, endEvolve));
startEvolve = endEvolve;
}
// Wait for our comrades
try {
barrier.await();
} catch (InterruptedException ie) {
return;
} catch (BrokenBarrierException bbe) {
return;
}
population[1].crossover(population[0], population[1]);
population[1].mutate(numCities);
population[0].mutate(numCities);
// Pick out the strongest
Arrays.sort(population, population[0]);
current = population[0];
generation++;
What I really want to know is this:
What role does the "queue" have? Am I right to create a queue for as many jobs as I think will be executed for all threads in the pool? If the size isn't sufficiently large, I get RejectedExecutionException's. I just decided to do numThreads*numIterations because that's how many jobs there would be (for the actual evolution method that I mentioned earlier). It's weird though.. I shouldn't have to do this if the barrier.await()'s were working, which leads me to...
Am I using the barrier.await() correctly? Currently I have it in two places: inside the run() method for the Runnable object, and after the for loop that executes all the jobs. I would've thought only one would be required, but I get errors if I remove one or the other.
I'm suspicious of contention for the threads, as that is the only thing I can glean from the absurd slowdown (which does scale with the input parameters). I want to know if it is anything to do with how I'm implementing the thread pool and barriers. If not, then I'll have to look inside the crossover() and mutate() methods, I suppose.
First, I think you may have a bug with how you intended to use the CyclicBarrier. Currently you are initializing it with the number of executor threads as the number of parties. You have an additional party, however; the main thread. So I think you need to do:
barrier = new CyclicBarrier(numThreads + 1);
I think this should work, but personally I find it an odd use of the barrier.
When using a worker-queue thread-pool model I find it easier to use a Semaphore or Java's Future model.
For a semaphore:
class MyRunnable implements Runnable {
private final Semaphore sem;
public MyRunnable(Semaphore sem) {
this.sem = sem;
}
public void run() {
// do work
// signal complete
sem.release()
}
}
Then in your main thread:
Semaphore sem = new Semaphore(0);
for (int i = 0; i < numJobs; ++i) {
threadPool.execute(new MyRunnable(sem));
}
sem.acquire(numJobs);
Its really doing the same thing as the barrier, but I find it easier to think about the worker tasks "signaling" that they are done instead of "sync'ing up" with the main thread again.
For example, if you look at the example code in the CyclicBarrier JavaDoc the call to barrier.await() is inside the loop inside the worker. So it is really synching up the multiple long running worker threads and the main thread is not participating in the barrier. Calling barrier.await() at the end of the worker outside the loop is more signaling completion.
As you increase the number of tasks, you increase the overhead using each task adds. This means you want to minimise the number of tasks i.e. the same as the number of cpus you have. For some tasks using double the number of cpus can be better when the work load is not even.
BTW: You don't need a barrier in each task, you can wait for the future of each task to complete by calling get() on each one.

Java strange performance inconsistency

I have a simple recursive method, a depth first search. On each call, it checks if it's in a leaf, otherwise it expands the current node and calls itself on the children.
I'm trying to make it parallel, but I notice the following strange (for me) problem.
I measure execution time with System.currentTimeMillis().
When I break the search into a number of subsearches and add the total execution time, I get a bigger number than the sequential search. I only measure execution time, no communication or sync, etc. I would expect to get the same time when I add the times of the subtasks. This happens even if I just run one task after the other, so without threads. If I just break the search into some subtasks and run the subtasks one after the other, I get a bigger time.
If I add the number of method calls for the subtasks, I get the same number as the sequential search. So, basically, in both cases I do the same number of method calls, but I get different times.
I'm guessing there's some overhead on initial method calls or something else caused by a JVM mechanism. Any ideas what could it be?
For example, one sequential search takes around 3300 ms. If I break it into 13 tasks, it takes a total time of 3500ms.
My method looks like this:
private static final int dfs(State state) {
method_calls++;
if(state.isLeaf()){
return 1;
}
State[] children = state.expand();
int result = 0;
for (int i = 0; i < children.length; i++) {
result += dfs(children[i]);
}
return result;
}
Whenever I call it, I do it like this:
for(int i = 0; i < num_tasks; i++){
long start = System.currentTimeMillis();
dfs(tasks[i]);
totalTime += (System.currentTimeMillis() - start);
}
Problem is totalTime increases with num_tasks and I would expect to stay the same because the method_calls variable stays the same.
You should average out the numbers over longer runs. Secondly the precision of currentTimeMillis may not be sufficient, you can try using System.nanoTime().
As in all the programming languages, whenever you call a procedure or a method, you have to push the environment, initialize the new one, execute the programs instructions, return the value on the stack and finally reset the previous environment. It cost a bit! Create a thread cost also more!
I suppose that if you enlarge the researching tree you will have benefit by the parallelization.
Adding system clock time for several threads seems a weird idea. Either you are interested in the time until processing is complete, in which case adding doesn't make sense, or in cpu usage, in which case you should only count when the thread is actually scheduled to execute.
What probably happens is that at least part of the time, more threads are ready to execute than the system has cpu cores, and the scheduler puts one of your threads to sleep, which causes it to take longer to complete. It makes sense that this effect is exacerbated the more threads you use. (Even if your program uses less threads than you have cores, other programs (such as your development environment, ...) might).
If you are interested in CPU usage, you might wish to query ThreadMXBean.getCurrentThreadCpuTime
I'd expect to see Threads used. Something like this:
import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Puzzle {
static volatile long totalTime = 0;
private static int method_calls = 0;
/**
* #param args
*/
public static void main(String[] args) {
final int num_tasks = 13;
final State[] tasks = new State[num_tasks];
ExecutorService threadPool = Executors.newFixedThreadPool(5);
for(int i = 0; i < num_tasks; i++){
threadPool.submit(new DfsRunner(tasks[i]));
}
try {
threadPool.shutdown();
threadPool.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException e) {
System.out.println("Interrupted");
}
System.out.println(method_calls + " Methods in " + totalTime + "msecs");
}
static final int dfs(State state) {
method_calls++;
if(state.isLeaf()){
return 1;
}
State[] children = state.expand();
int result = 0;
for (int i = 0; i < children.length; i++) {
result += dfs(children[i]);
}
return result;
}
}
With the runnable bit like this:
public class DfsRunner implements Runnable {
private State state;
public DfsRunner(State state) {
super();
this.state = state;
}
#Override
public void run() {
long start = System.currentTimeMillis();
Puzzle.dfs(state);
Puzzle.totalTime += (System.currentTimeMillis() - start);
}
}

Categories

Resources