Complexity of the notifyAll operation

Complexity of the notifyAll operation - java

I don't understand the following fragment of the Java Concurrency in Practice book:
Using notifyAll when only one thread can make progress is inefficient - sometimes a little, sometimes grossly so. If ten threads are waiting on a condition queue, calling notifyAll causes each of them to wake up and contend for the lock; then most or all of them will go right back to sleep. This means a lot of context switches and a lot of contended lock acquisitions for each event that enables (maybe) a single thread to make progress. (In the worst case, using notifyAll results in O(n2) wakeups where n would suffice.)
An example code is in listing 14.6:
#ThreadSafe
public class BoundedBuffer<V> extends BaseBoundedBuffer<V> {
// CONDITION PREDICATE: not-full (!isFull())
// CONDITION PREDICATE: not-empty (!isEmpty())
public BoundedBuffer(int size) { super(size); }
// BLOCKS-UNTIL: not-full
public synchronized void put(V v) throws InterruptedException {
while (isFull())
wait();
doPut(v);
notifyAll();
}
// BLOCKS-UNTIL: not-empty
public synchronized V take() throws InterruptedException {
while (isEmpty())
wait();
V v = doTake();
notifyAll();
return v;
}
}
We can have, for example, the following sequence of events:
two consumer threads try to get an object from the buffer, the buffer is empty, so they are suspended.
10 producers put 10 objects to the buffer, the buffer capacity is 10.
100001 producers try to put 100001 objects to the buffer, the buffer is full, so they are suspended.
first consumer gets an object from the buffer and invokes notifyAll.
a producer puts an object to the buffer and invokes notifyAll, the buffer is full.
Now only one thread can make progress - the consumer thread. We also have 100000 producers, who can't make progress.
I don't understand why in the worst case there will be O(n2) wakeups, before the thread which can make progress is woken up.
I think the worst case is the following sequence
All threads are woken up (because of notifyAll). We "used" O(n) wakeups.
A producer thread gets the lock, other threads are suspended. The producer thread can't make progress, so it is suspended and it releases the lock.
Now only one producer thread is woken up, because a different mechanism is used (a thread resumes execution, because it gets the lock - but this time the notifyAll is not called). We "use" only O(1) wakeups.
The second producer can't make progress, so it is suspended and it releases the lock.
Similar events happen for all other waiting producers.
Finally the thread which can make progress (the consumer thread) is woken up.
I think we "used" O(n) + O(n)*O(1) = O(n) wakeups.
Is there an error in the book, or am I missing something here?

Something gets put into the queue n times. "n wakeups would suffice" means that ideally we'd like one consumer to be notified when a producer drops something into the buffer, for instance, so there would be n notifications, and even better they would all be uncontended. But instead all of the threads waiting on the lock, including all the producers (minus 1, the one doing the putting) and all the consumers (the ones who are waiting anyway), get notified every time something gets dropped in the queue, they all fight for the lock and the scheduler picks a winner. (And we're not even considering the case where the chosen thread has to wait, that's just a detail.) So there are n times that notifyAll gets called, once for each put, and each notifyAll wakes up multiple producers and multiple consumers, which is where they get O(n2) wakeups.

Let’s we have n consumer and n producer threads and the buffer is empty(example with full buffer is similar). All threads are in ready to run state(scheduler may chose any to run).
If any consumer run - it will go to waiting state. If any producer run - it will succeed and invoke notifyAll().
Case that maximize quantity of wait() call(and wakeups):
Example for 5 producer and 5 consumer
+--------------+-------------------------------------+
| C-C-C-C-C-P | all consumers move to waiting state |
+--------------+-------------------------------------+
| C*-C-C-C-C-P | 5 wake ups |
+--------------+-------------------------------------+
| C*-C-C-C-P | 4 wake ups |
+--------------+-------------------------------------+
| C*-C-C-P | 3 wake ups |
+--------------+-------------------------------------+
| C*-C-P | 2 wake ups |
+--------------+-------------------------------------+
| C* | 1 wake up |
+--------------+-------------------------------------+
P - producer
C - consumer
C* - consumer that succesfully finish take() method ( without wait() invoking)
Let’s count:
5 + 4 + 3 + 2 + 1 = 15
For n producer and n consumer:
n + (n-1) + (n-2) + (n-3) + … + 1 = 1 + 2 + 3 + 4 + ...+ n = sum of first n elements of arithmetic progression =
n * (1 + n) /2 = (n + n^2) / 2 → O(n^2)

Related

why the starvation is caused by notifyAll()?

Here is my code,
class Shared {
private static int index = 0;
public synchronized void printThread() {
try {
while(true) {
Thread.sleep(1000);
System.out.println(Thread.currentThread().getName() + ": " + index++);
notifyAll();
// notify();
wait();
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
class Example13 implements Runnable {
private Shared shared = new Shared();
#Override
public void run() {
shared.printThread();
}
}
public class tetest {
public static void main(String[] args) {
Example13 r = new Example13();
Thread t1 = new Thread(r, "Thread 1");
Thread t2 = new Thread(r, "Thread 2");
Thread t3 = new Thread(r, "Thread 3");
Thread t4 = new Thread(r, "Thread 4");
Thread t5 = new Thread(r, "Thread 5");
t1.start();
t2.start();
t3.start();
t4.start();
t5.start();
}
}
and the result is here
Thread 1: 0
Thread 5: 1
Thread 4: 2
Thread 3: 3
Thread 2: 4
Thread 3: 5
Thread 2: 6
Thread 3: 7
Thread 2: 8
Thread 3: 9
the question is, why only two of the threads are working? I'm so confused, I thought notify() is randomly wake up one of the waiting threads, but it's not.
is this starvation? then, why the starvation is caused? I tried notify() and notifyAll(), but got same results for both.
Can any one help my toasted brain?

This isn't 'starvation'. Your 5 threads are all doing nothing. They all want to 'wake up' - notify() will wake up an arbitrary one. It is neither unreliably random: The JMM does not ascribe an order to this, so one of them will wake up, you can't rely on it being random (do not use this to generate random numbers), nor can you rely on specific ordering behaviour.
It's not starvation (it's not: Oh no! Thread 2 and 3 are doing all the work and 4 and 5 are just hanging out doing nothing! That's bad - the system could be more efficient!) - because it doesn't matter which thread 'does the work'. A CPU core is a CPU core, it cares not which thread it ends up running.
Starvation is a different principle. Imagine instead of Thread.sleep (which means the threads aren't waiting for anything specific, other than some time to elapse), instead the threads want to print the result of some expensiv-ish math operation. If you just let 2 threads each say 'Hello!', then the impl of System.out says that it would be acceptable for the JVM to produce:
HelHellloo!!
So to prevent this, you use locks to create a 'talking stick' of sorts: A thread only gets to print if it has the talking stick. Each of 5 threads will all perform, in a loop, the following operation:
Do an expensive math operation.
Acquire the talking stick.
Print the result of the operation.
Release the talking stick.
Loop back to the top.
Now imagine that, despite the fact that the math operation is quite expensive, for whatever reason you have an excruciatingly slow terminal, and the 'print the result of the operation' job takes a really long time to finish.
Now you can run into starvation. Imagine this scenario:
Threads 1-5 all do their expensive math simultaneously.
Arbitrarily, thread 4 ends up nabbing the talking stick.
The other 4 threads soon too want the talking stick but they have to wait; t4 has it. They do nothing now. Twiddling their thumbs (they could be calculating, but they are not!)
after the excruciatingly long time, 4 is done and releases the stick. 1, 2, 3, and 5 dogpile on like its the All-Blacks and 2 so happens to win the scrum and crawls out of the pile with the stick. 1, 3, and 5 gnash their teeth and go back yet again to waiting for the stick, still not doing any work. Whilst 2 is busy spending the really long time printing results, 4 goes back to the top of the loop and calculates another result. It ends up doing this faster than 2 manages to print, so 4 ends up wanting the talking stick again before 2 is done.
2 is finally done and 1, 3, 4, and 5 all scrum into a pile again. 4 happens to get the stick - java makes absolutely no guarantees about fairness, any one of them can get it, there is also no guarantee of randomness or lack thereof. A JVM is not broken if 4 is destined to win this fight.
Repeat ad nauseam. 2 and 4 keep trading the stick back and forth. 1, 3, and 5 never get to talk.
The above is, as per the JMM, valid - a JVM is not broken if it behaves like this (it would be a tad weird). Any bugs filed about this behaviour would be denied: The lock isn't so-called "fair". Java has fair locks if you want them - in the java.util.concurrent package. Fair locks incur some slight extra bookkeeping cost, the assumption made by the synchronized and wait/notify system is that you don't want to pay this extra cost.
A better solution to the above scenario is possibly to make a 6th thread that JUST prints, with 5 threads JUST filling a buffer, at least then the 'print' part is left to a single thread and that might be faster. But mostly, the bottleneck in this getup is simply that printing - the code has zero benefit from being multicore (just having ONE thread do ONE math calculation, print it, do another, and so forth would be better. Or possibly 2 threads: Whilst printing, the other thread calculates a number, but there's no point in having more than one; even a single thread can calculate faster than results can be printed). Thus in some ways this is just what the situation honestly requires: This hypothetical scenario still prints as fast as it can. IF you need the printing to be 'fair' (and who says that? It's not intrinsic to the problem description that fairness is a requirement. Maybe all the various calculations are equally useful so it doesn't matter that one thread gets to print more than others; let's say its bitcoin miners, generating a random number and checking if that results in a hash with the requisite 7 zeroes at the end or whatever bitcoin is up to now - who cares that one thread gets more time than another? A 'fair' system is no more likely to successfully mine a block).
Thus, 'fairness' is something you need to explicitly determine you actually need. If you do, AND starvation is an issue, use a fair lock. new ReentrantLock(true) is all you need (that boolean parameter is the fair parameter - true means you want fairness).

Amount of Threads with subtasks

An optimum of threads in a pool is something that is case specific, though there is a rule of thumb which says #threads = #CPU +1.
However, how does this work with threads spanning other threads and waiting (i.e. blocked until thread.join() is successful) for these 'subthreads'?
Assume that I have code that requires the execution of list of tasks (2), which has subtasks(2), which has subsubtasks(3) and so on. The total number of tasks is 2*2*3 = 12, though 18 threads will be created (because a threads will 'spawn' more subtasks (threads), where the thread spawning more threads will be blocked untill all is over. See below for pseudo code.
I am assuming that for a CPU with N cores there is a rule of thumb that everything can be parallelized if the highest number of active threads (12) is #CPU + 1. Is this correct?
PseudoCode
outputOfTask = []
for subtask in SubTaskList
outputOfTask --> append(subtask.doCompute())
// wait untill all output is finished.
in subtask.java:
Each subtask, for example, implements the same interface, but can be different.
outputOfSubtask = []
for task in subsubTaskList
// do some magic depending on the type of subtask
outputOfSubtask -> append( task.doCompute())
return outputOfSubtask
in subsubtask.java:
outputOfSubsubtask = []
for task in subsubsubtask
// do some magic depending on the type of subsubtask
outputOfSubsubtask -> append( task.doCompute())
return outputOfSubsubtask
EDIT:
Dummy code Java code. I used this in my original question to check how many threads were active, but I assume that the pseudocode is more clear. Please note: I used the Eclipse Collection, this introduces the asParallel function which allows for a shorter notation of the code.
#Test
public void testasParallelthreads() {
// // ExecutorService executor = Executors.newWorkStealingPool();
ExecutorService executor = Executors.newCachedThreadPool();
MutableList<Double> myMainTask = Lists.mutable.with(1.0, 2.0);
MutableList<Double> mySubTask = Lists.mutable.with(1.0, 2.0);
MutableList<Double> mySubSubTask = Lists.mutable.with(1.0, 2.0);
MutableList<Double> mySubSubSubTask = Lists.mutable.with(1.0, 2.0, 2.0);
MutableList<Double> a = myMainTask.asParallel(executor, 1)
.flatCollect(task -> mySubTask.asParallel(executor,1)
.flatCollect(subTask -> mySubSubTask.asParallel(executor, 1)
.flatCollect(subsubTask -> mySubSubSubTask.asParallel(executor, 1)
.flatCollect(subsubTask -> dummyFunction(task, subTask, subsubTask, subsubTask,executor))
.toList()).toList()).toList()).toList();
System.out.println("pool size: " + ((ThreadPoolExecutor) executor).getPoolSize());
executor.shutdownNow();
}
private MutableList<Double> dummyFunction(double a, double b, double c, double d, ExecutorService ex) {
System.out.println("ThreadId: " + Thread.currentThread().getId());
System.out.println("Active threads size: " + ((ThreadPoolExecutor) ex).getActiveCount());
return Lists.mutable.with(a,b,c,d);
}

I am assuming that for a CPU with N cores there is a rule of thumb that everything can be parallelized if the highest number of active threads (12) is #CPU + 1. Is this correct?
This topic is extremely hard to generalize about. Even with the actual code, the performance of your application is going to be very difficult to determine. Even if you could come up an estimation, the actual performance may vary wildly between runs – especially considering that the threads are interacting with each other. The only time we can take the #CPU + 1 number is if the jobs that are submitted into the thread-pool are independent and completely CPU bound.
I'd recommend trying a number of different thread-pool size values under simulated load to find the optimal values for your application. Examining the overall throughput numbers or system load stats should give you the feedback you need.

However, how does this work with threads spanning other threads and waiting (i.e. blocked until thread.join() is successful) for these 'subthreads'?
Threads will block, and it is up to the os/jvm to schedule another one if possible. If you have a single thread pool executor and call join from one of your tasks, the other task won't even get started. With executors that use more threads, then the blocking task will block a single thread and the os/jvm is free to scheduled other threads.
These blocked threads should not consume CPU time, because they are blocked. So I am assuming that for a CPU with N cores there is a rule of thumb that everything can be parallelized if the highest number of active threads (24) is #CPU + 1. Is this correct?
Active threads can be blocking. I think you're mixing terms here, #CPU, the number of cores, and the number of virtual cores. If you have N physical cores, then you can run N cpu bound tasks in parallel. When you have other types of blocking or very short lived tasks, then you can have more parallel tasks.

Join a group of threads with overall timeout

Is there a way to join a group of threads simultaneously with an overall timeout?
Suppose we have Collection<Thread> threads; and int timeout;. If I didn't care about the timeout, I would do
for (Thread t : threads)
t.join();
but I want to wait until either all threads are done, or a certain amount of time passes, whichever comes first. I was searching for a (hypothetical) ThreadGroup.join(int) which would do this.
Note that what I'm asking for is different from doing
for (Thread t : threads)
t.join(timeout);
Rather, I'm looking for something less verbose (and perhaps more reliable) than
int timeout = 10000;
for (Thread t : threads) {
if (timeout <= 0) break;
long start = System.currentTimeMillis();
t.join(timeout);
long end = System.currentTimeMillis();
// substract time elapsed from next timeout:
timeout -= (int) (end - start);
}

First create a single CountDownLatch having a count for every thread in the group.
A controlling thread can await(timeout, TimeUnit) on the latch.
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CountDownLatch.html#await-long-java.util.concurrent.TimeUnit-
Start the threads that are in the group.
Each of the threads in the group should decrement the latch when it completes.
The controlling thread will wait until everything in the group has completed or the timeout happens, and because await returns a boolean, the controlling thread can tell whether the latch was decremented naturally or whether a timeout occurred.

Using AtomicInteger as a static shared counter

In an effort to learn about synchronization via Java, I'm just messing around with some simple things like creating a counter shared between threads.
The problem I've run into is that I can't figure out how to print the counter sequentially 100% of the time.
int counterValue = this.counter.incrementAndGet();
System.out.println(this.threadName + ": " + counterValue);
The above increments the AtomicInteger counter, gets the new value, and prints it to the console identified by the thread name that is responsible for that update. The problem occurs when it appears that the incrementAndGet() method is causing the JVM to context switch to another thread for its updates before printing the current thread's updated value. This means that the value gets incremented but not printed until the thread returns to the executing state. This is obvious when looking at this example output:
Thread 3: 4034
Thread 3: 4035
Thread 3: 4036
Thread 1: 3944
Thread 1: 4037
Thread 1: 4039
Thread 1: 4040
Thread 2: 3863
Thread 1: 4041
Thread 1: 4043
You can see that when execution returns to Thread 1, it prints its value and continues updating. The same is evident with Thread 2.
I have a feeling that I'm missing something very obvious.

The problem occurs when it appears that the incrementAndGet() method is causing the JVM to context switch to another thread for its updates before printing the current thread's updated value
This is a race condition that often can happen in these situations. Although the AtomicInteger counters are being incremented properly, there is nothing to stop Thread 2 from being swapped out after the increment happens and before the println is called.
int counterValue = this.counter.incrementAndGet();
// there is nothing stopping a context switch here
System.out.println(this.threadName + ": " + counterValue);
If you want to print the "counter sequentially 100% of the time" you are going to have to synchronize on a lock around both the increment and the println call. Of course if you do that then the AtomicInteger is wasted.
synchronized (counter) {
System.out.println(this.threadName + ": " + counter.incrementAndGet());
}
If you edit your question to explain why you need the output to be sequential maybe there is a better solution that doesn't have this race condition.

You need to synchronize the whole construction for that:
synchronized(this) {
int counterValue = this.counter.incrementAndGet();
System.out.println(this.threadName + ": " + counterValue);
}
In this case, though, you don't have to use AtomicInteger. Plain int would work (counter++).

To print sequentially, the incAndGet and the println must both be in a critical region, a piece of code only one thread may enter, the others being blocked. Realisable with a binary semaphore, for instance, like java synchronized.
You could turn things on the head, and have one thread incrementing a counter and printing it. Other threads may in a "critical region" take only one after another the counter. That would be more efficient, as critical regions should remain small and preferable do no I/O.

Thread concurrency issue even within one single command?

I am slightly surprised by what I get if I compile and run the following (horrible non-synchronized) Java SE program.
public class ThreadRace {
// this is the main class.
public static void main(String[] args) {
TestRunnable tr=new TestRunnable(); // tr is a Runnable.
Thread one=new Thread(tr,"thread_one");
Thread two=new Thread(tr,"thread_two");
one.start();
two.start(); // starting two threads both with associated object tr.
}
}
class TestRunnable implements Runnable {
int counter=0; // Both threads can see this counter.
public void run() {
for(int x=0;x<1000;x++) {
counter++;
}
// We can't get here until we've added one to counter 1000 times.
// Can we??
System.out.println("This is thread "+
Thread.currentThread().getName()+" and the counter is "+counter);
}
}
If I run "java ThreadRace" at the command line, then here is my interpretation
of what happens. Two new threads are created and started. The threads have
the same Runnable object instance tr, and so they see the same tr.counter .
Both new threads add one to this counter 1000 times, and then print the value
of the counter.
If I run this lots and lots of times, then usually I get output of the form
This is thread thread_one and the counter is 1000
This is thread thread_two and the counter is 2000
and occasionally I get output of the form
This is thread thread_one and the counter is 1204
This is thread thread_two and the counter is 2000
Note that what happened in this latter case was that thread_one finished
adding one to the counter 1000 times, but thread_two had started adding
one already, before thread_one printed out the value of the counter.
In particular, this output is still comprehensible to me.
However, very occasionally I get something like
This is thread thread_one and the counter is 1723
This is thread thread_two and the counter is 1723
As far as I can see, this "cannot happen". The only way the System.out.println() line
can be reached in either thread, is if the thread has finished counting to 1000.
So I am not bothered if one of the threads reports the counter as being some
random number between 1000 and 2000, but I cannot see how both threads can
get as far as their System.out.println() line (implying both for loops have finished,
surely?) and counter not be 2000 by the time the second statement is printed.
Is what is happening that both threads somehow attempt to do counter++ at exactly
the same time, and one overwrites the other? That is, a thread can even be
interrupted even in the middle of executing a single statement?

The "++" operator is not atomic -- it doesn't happen in one uninterruptible cycle. Think of it like this:
1. Fetch the old value
2. Add one to it
3. Store the new value back
So imagine that you get this sequence:
Thread A: Step 1
Thread B: Step 1
Thread A: Step 2
Thread B: Step 2
Thread A: Step 3
Thread B: Step 3
Both threads think they've incremented the variable, but its value has only increased by one! The second "store back" operation effectively cancels out the result of the first.
Now, truth is, when you add in multiple levels of cache, far weirder things can actually happen; but this is an easy explanation to understand. You can fix these kinds of issues by synchronizing access to the variable: either the whole run() method, or the inside of the loop using a synchronized block. As Jon suggests, you could also use some of the fancier tools in java.util.concurrent.atomic.

It absolutely can happen. counter++ isn't an atomic operation. Consider it as:
int tmp = counter;
tmp++;
counter = tmp;
Now imagine two threads executing that code at the same about time:
Both read the counter
Both increment their local copy (0 to 1)
Both write 1 into counter
This sort of thing is precisely why java.util.concurrent.atomic exists. Change your code to:
class TestRunnable implements Runnable {
private final AtomicInteger counter = new AtomicInteger();
public void run() {
for(int x=0;x<1000;x++) {
counter.incrementAndGet();
}
System.out.println("This is thread "+
Thread.currentThread().getName()+" and the counter is " + counter.get());
}
}
That code is safe.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.