I was running some tests with parallel processing and made a program that given a matrix of integers re-calcutes each position's value based on the neighbours.
I needed a copy of the matrix so the values wouldn't be overriden and used a CyclicBarrier to merge the results once the partial problems were solved:
CyclicBarrier cyclic_barrier = new CyclicBarrier(n_tasks + 1, new Runnable() {
public void run() {
ParallelProcess.mergeResult();
}
});
ParallelProcess p = new ParallelProcess(cyclic_barrier, n_rows, r_cols); // init
Each task is assigned a portion of the matrix: I'm splitting it in equals pieces by rows. But it might happen that the divisions are not exact so there would be a small piece corresponding to the lasts row that wouldn't be submitted to the thread pool.
Example: if I have 16 rows and n_tasks = 4 no problem, all 4 will be submitted to the pool. But if I had 18 instead, the first 16 ones will be submitted, but not the last two ones.
So I'm forcing a submit if this case happens. Well, I'm not submitting actually, because I am using a fixed thread pool that I created like this ExecutorService e = Executors.newFixedThreadPool(n_tasks). Since all the slots in the pool are occupied and the threads are blocked by the barrier (mybarrier.await() is called in the run method) I couldn't submit it to the pool, so I just used Thread.start().
Let's go to the point. Since I need to take into consideration for the CyclicBarrier the possibility of that chunk remaining, the number of parties must be incremented by 1.
But if this case didn't happen, I would be one party short to trigger the barrier.
What's my solution?:
if (lower_limit != n_rows) { // the remaining chunk to be processed
Thread t = new Thread(new ParallelProcess(lower_limit, n_rows));
t.start();
t.join();
}
else {
cyclic_barrier.await();
}
I feel like I am cheating when using the cyclic_barrier.await() trick to raise the barrier by force.
Is there any other way I could approach this problem so I didn't have to do what I'm doing?
Though this doesn't answer your question about CyclicBarriers, can I recommend using a Phaser? This does have the ability to include the number of parties, and it also allows you to run the mergeResult when a phase is tripped.
So, before you execute an async calculation, simply register. Then inside that calculation have the thread arrive on the phaser. When all threads have arrived, it will advance the phase and can invoke an overriden method onAdvance.
The submission:
ParallelProcess process = new ParallelProcess(lower_limit, n_rows));
phaser.register();
executor.submit(process);
The processor
public void run(){
//do stuff
phaser.arrive();
}
The phaser
Phaser phaser = new Phaser(){
protected boolean onAdvance(int phase, int registeredParties) {
ParallelProcess.mergeResult();
return true;
}
}
Related
I am running the below class .
public class RunThreads implements Runnable {
static int i;
public static void main(String[] args) {
RunThreads job = new RunThreads();
Thread alpha = new Thread(job);
Thread beta = new Thread(job);
alpha.setName("Alpha");
beta.setName("beta");
alpha.start();
beta.start();
}
public void run(){
for(;i<10;i++){
System.out.println(Thread.currentThread().getName() + i);
}
}
}
And my output is :
beta0
beta1
Alpha0
beta2
beta4
beta5
beta6
Alpha3
Alpha8
beta7
Alpha9
I understand I will get different outputs every time I execute it. My question is, why does the output have the value of i as 0 twice, for both the alpha and beta threads i.e. Alpha0 and beta0.
The value of i has been incremented to 1 by the beta thread. So, how does the alpha thread print out Alpha0
I maybe missing something very obvious here. Thanks !
Things are scary when you access shared data with no synchronization etc:
There's no guarantee that the Alpha thread reading i will see the "latest" update from the beta thread or vice versa
It's possible for both threads to start i++ at roughly the same time... and as i++ is basically:
int tmp = i;
tmp++;
i = tmp;
you can easily "lose" increments
If you want to make this thread-safe, you should use AtomicInteger instead:
import java.util.concurrent.atomic.AtomicInteger;
public class RunThreads implements Runnable {
static AtomicInteger counter = new AtomicInteger();
public static void main(String[] args) {
RunThreads job = new RunThreads();
Thread alpha = new Thread(job);
Thread beta = new Thread(job);
alpha.setName("Alpha");
beta.setName("beta");
alpha.start();
beta.start();
}
public void run(){
int local;
while ((local = counter.getAndIncrement()) < 10) {
System.out.println(Thread.currentThread().getName() + local);
}
}
}
Your output may still appear to be in the wrong order (because the alpha thread may "start" writing "Alpha0" while the beta thread "starts" writing "beta1" but the beta thread gets the lock on console output first), but you'll only see each count once. Note that you have to use the result of getAndIncrement() for both the checking and the printing - if you called counter.get() in the body of the loop, you could still see duplicates due to the interleaving of operations.
The simple answer here would be that whether or not the variable is volatile or atomic or whatever, both your threads start out when the variable value is 0, and only change it after the print.
This means that both threads can reach the "print" line before either of them reached the i++.
Which means that both are very likely to print 0, unless one of them is delayed long enough for the other one to update the variable (at which point the question of memory model and data visibility arise).
"While" thread A is doing its update, thread B is reading that value.
Like in:
A: prints i (current value 0)
B: prints i (current value 0)
A: stores 1 to i
You can't assume that those two actions:
for(;i<10;i++){ // increasing the loop counter
and
System.out.println(Thread.currentThread().getName() + i);
printing the loop counter happen in "one shot". To the contrary.
Both threads execute the corresponding instructions in parallel; and there are absolutely no guarantees on the outcome of that.
Threads might run on separate cores in multicore architecture and each core has it's own registry set or local cache, as long as you don't invalidate a cache the running core might not reach to a RAM memory to get up-to-date value and will use the cached one. To make it work as you expected it the variable would have to be marked as volatile, which would invalidate a cache on each write to this memory location and fetch the value directly from RAM.
Update:
As was pointed in comments - volatile is not enough to keep the value up to date, there is a read and write operation on i++. To make it work AtomicInteger would be enough, or slightly more expensive locking around i++.
I'm trying to convert this code to java and using thread to implement it
turn = 0 // shared control variable
while (turn != i);
// CS
turn = (turn + 1) % n;
I'm really tried hard to reach to right code but I failed this is my code
/*
* Mutual exclusion using thread
*/
class gV{
int turn=0;
}
class newThread extends Thread{
static int i;
int n=10;
newThread(gV obj){
this.i=obj.turn;
start();
}
public void run(){
while(obj.turn!=i&&obj.turn<n);
criticalSection(i);
obj.turn=(obj.turn+1);
i++;
}
public void criticalSection(int numOfProcess){
System.out.println("Process " + numOfProcess + " done!!");
}
}
class MutualExclusion{
public static void main(String args[]){
gV obj = new gV();
new newThread(obj);
}
}
I know my code has some mistakes. Thank you for the help!
Use an AtomicInteger.
Atomic means that any operation on it will fully complete before any other thread can see the result. Meaning that you won't have two simultaneous operations 'clobber' it. For example, imagine if you had a non atomic integer and two threads attempted to increment it simultaneously - say it had value 1, they both read it as 1 and attempt to set it to 2. They both incremented it once - but instead of it becoming 3, it became 2! AtomicInteger solves this problem by giving you IncrementAndGet, which guarantees no other thread can access the AtomicInteger's value before the increment completes.
In particular, use these methods:
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicInteger.html#get()
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicInteger.html#incrementAndGet()
You might notice that this increments it, but it doesn't take it modulo n. Well, you can take it modulo n whenever you read its value, you don't need it to be stored that way.
EDIT: By the way, doing something like this:
while (turn != i);
is called busy-waiting, and it's a bad idea because it means that CPU usage will be 100%, checking the variable hundreds of thousands of times per second. In this kind of scenario, instead of making each thread check as often as possible, you want to have threads wait and be notifyed by another thread when it is that thread's turn to continue execution.
I believe in Java that using lock and synchronized to implement mutual exclusion will also give you this property, e.g. if you try to lock on something or enter a synchronized block but it is already in use then the thread goes to sleep and is woken up when it is its turn. So, you can look into this as well.
I am trying out the executor service in Java, and wrote the following code to run Fibonacci (yes, the massively recursive version, just to stress out the executor service).
Surprisingly, it will run faster if I set the nThreads to 1. It might be related to the fact that the size of each "task" submitted to the executor service is really small. But still it must be the same number also if I set nThreads to 1.
To see if the access to the shared Atomic variables can cause this issue, I commented out the three lines with the comment "see text", and looked at the system monitor to see how long the execution takes. But the results are the same.
Any idea why this is happening?
BTW, I wanted to compare it with the similar implementation with Fork/Join. It turns out to be way slower than the F/J implementation.
public class MainSimpler {
static int N=35;
static AtomicInteger result = new AtomicInteger(0), pendingTasks = new AtomicInteger(1);
static ExecutorService executor;
public static void main(String[] args) {
int nThreads=2;
System.out.println("Number of threads = "+nThreads);
executor = Executors.newFixedThreadPool(nThreads);
Executable.inQueue = new AtomicInteger(nThreads);
long before = System.currentTimeMillis();
System.out.println("Fibonacci "+N+" is ... ");
executor.submit(new FibSimpler(N));
waitToFinish();
System.out.println(result.get());
long after = System.currentTimeMillis();
System.out.println("Duration: " + (after - before) + " milliseconds\n");
}
private static void waitToFinish() {
while (0 < pendingTasks.get()){
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
executor.shutdown();
}
}
class FibSimpler implements Runnable {
int N;
FibSimpler (int n) { N=n; }
#Override
public void run() {
compute();
MainSimpler.pendingTasks.decrementAndGet(); // see text
}
void compute() {
int n = N;
if (n <= 1) {
MainSimpler.result.addAndGet(n); // see text
return;
}
MainSimpler.executor.submit(new FibSimpler(n-1));
MainSimpler.pendingTasks.incrementAndGet(); // see text
N = n-2;
compute(); // similar to the F/J counterpart
}
}
Runtime (approximately):
1 thread : 11 seconds
2 threads: 19 seconds
4 threads: 19 seconds
Update:
I notice that even if I use one thread inside the executor service, the whole program will use all four cores of my machine (each core around 80% usage on average). This could explain why using more threads inside the executor service slows down the whole process, but now, why does this program use 4 cores if only one thread is active inside the executor service??
It might be related to the fact that the size of each "task" submitted
to the executor service is really small.
This is certainly the case and as a result you are mainly measuring the overhead of context switching. When n == 1, there is no context switching and thus the performance is better.
But still it must be the same number also if I set nThreads to 1.
I'm guessing you meant 'to higher than 1' here.
You are running into the problem of heavy lock contention. When you have multiple threads, the lock on the result is contended all the time. Threads have to wait for each other before they can update the result and that slows them down. When there is only a single thread, the JVM probably detects that and performs lock elision, meaning it doesn't actually perform any locking at all.
You may get better performance if you don't divide the problem into N tasks, but rather divide it into N/nThreads tasks, which can be handled simultaneously by the threads (assuming you choose nThreads to be at most the number of physical cores/threads available). Each thread then does its own work, calculating its own total and only adding that to a grand total when the thread is done. Even then, for fib(35) I expect the costs of thread management to outweigh the benefits. Perhaps try fib(1000).
I have a favorite C# program similar to the one below that shows that if two threads share the same memory address for counting (one thread incrementing n times, one thread decrementing n times) you can get a final result other than zero. As long as n is reasonably large, it's pretty easy to get C# to display some non-zero value between [-n, n]. However, I can't get Java to produce a non-zero result even when increasing the number of threads to 1000 (500 up, 500 down). Is there some memory model or specification difference wrt C# I'm not aware of that guarantees this program will always yield 0 despite the scheduling or number of cores that I am not aware of? Would we agree that this program could produce a non-zero value even if we can not prove that experimentally?
(Not:, I found this exact question over here, but when I run that topic's code I also get zero.)
public class Counter
{
private int _counter = 0;
Counter() throws Exception
{
final int limit = Integer.MAX_VALUE;
Thread add = new Thread()
{
public void run()
{
for(int i = 0; i<limit; i++)
{
_counter++;
}
}
};
Thread sub = new Thread()
{
public void run()
{
for(int i = 0; i<limit; i++)
{
_counter--;
}
}
};
add.run();
sub.run();
add.join();
sub.join();
System.out.println(_counter);
}
public static void main(String[] args) throws Exception
{
new Counter();
}
}
The code you've given only runs on a single thread, so will always give a result of 0. If you actually start two threads, you can indeed get a non-zero result:
// Don't call run(), which is a synchronous call, which doesn't start any threads
// Call start(), which starts a new thread and calls run() *in that thread*.
add.start();
sub.start();
On my box in a test run that gave -2146200243.
Assuming you really meant start, not run.
On most common platforms it will very likely produce non zero, because ++/-- are not atomic operations in case of multiple cores. On single core/single CPU you will most likely get 0 because ++/-- are atomic if compiled to one instruction (add/inc) but that part depends on JVM.
Check result here: http://ideone.com/IzTT2
The problem with your program is that you are not creating an OS thread, so your program is essentially single threaded. In Java you must call Thread.start() to create a new OS thread, not Thread.run(). This has to do with a regrettable mistake made in the initial Java API. That mistake is that the designer made Thread implement Runnable.
add.start();
sub.start();
add.join();
sub.join();
what is the best way to stop a thread and wait for a statement (or a method) to be executed a certain number of times by another thread?
I was thinking about something like this (let "number" be an int):
number = 5;
while (number > 0) {
synchronized(number) { number.wait(); }
}
...
synchronized(number) {
number--;
number.notify();
}
Obviously this wouldn't work, first of all because it seems you can't wait() on a int type. Furthermore, all other solutions that come to my java-naive mind are really complicated for such a simple task. Any suggestions? (Thanks!)
Sounds like you're looking for CountDownLatch.
CountDownLatch latch = new CountDownLatch(5);
...
latch.await(); // Possibly put timeout
// Other thread... in a loop
latch.countDown(); // When this has executed 5 times, first thread will unblock
A Semaphore would also work:
Semaphore semaphore = new Semaphore(0);
...
semaphore.acquire(5);
// Other thread... in a loop
semaphore.release(); // When this has executed 5 times, first thread will unblock
You might find something like a CountDownLatch useful.