This question already has answers here:
Reordering of reads
(2 answers)
Closed 2 years ago.
Consider this example. We're having:
int var = 0;
Thread A:
System.out.println(var);
System.out.println(var);
Thread B:
var = 1;
The threads run concurrently. Is the following output possible?
1
0
That is, the original value is read after the new value was read. The var isn't volatile. My gut feeling is that it's not possible.
You are using System.out.println that internally does a synchronized(this) {...} that will make things a bit more worse. But even with that, your reader thread can still observe 1, 0, i.e. : a racy read.
I am by far not an expert of this, but after going through lots of videos/examples/blogs from Alexey Shipilev, I think I understand at least something.
JLS states that :
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
Since both reads of var are in program order, we can draw:
(po)
firstRead(var) ------> secondRead(var)
// po == program order
That sentence also says that this builds a happens-before order, so:
(hb)
firstRead(var) ------> secondRead(var)
// hb == happens before
But that is within "the same thread". If we want to reason about multiple threads, we need to look into synchronization order. We need that because the same paragraph about happens-before order says:
If an action x synchronizes-with a following action y, then we also have hb(x, y).
So if we build this chain of actions between program order and synchronizes-with order, we can reason about the result. Let's apply that to your code:
(NO SW) (hb)
write(var) ---------> firstRead(var) -------> secondRead(var)
// NO SW == there is "no synchronizes-with order" here
// hb == happens-before
And this is where happens-before consistency comes at play in the same chapter:
A set of actions A is happens-before consistent if for all reads r in A, where W(r) is the write action seen by r, it is not the case that either hb(r, W(r)) or that there exists a write w in A such that w.v = r.v and hb(W(r), w) and hb(w, r).
In a happens-before consistent set of actions, each read sees a write that it is allowed to see by the happens-before ordering
I admit that I very vaguely understand the first sentence and this is where Alexey has helped me the most, as he puts it:
Reads either see the last write that happened in the happens-before or any other write.
Because there is no synchronizes-with order there, and implicitly there is no happens-before order, the reading thread is allowed to read via a race.
and thus get 1, than 0.
As soon as you introduce a correct synchronizes-with order, for example one from here
An unlock action on monitor m synchronizes-with all subsequent lock actions on...
A write to a volatile variable v synchronizes-with all subsequent reads of v by any thread...
The graph changes (let's say you chose to make var volatile):
SW PO
write(var) ---------> firstRead(var) -------> secondRead(var)
// SW == there IS "synchronizes-with order" here
// PO == happens-before
PO (program order) gives that HB (happens before) via the first sentence I quoted in this answer from the JLS. And SW gives HB because:
If an action x synchronizes-with a following action y, then we also have hb(x, y).
As such:
HB HB
write(var) ---------> firstRead(var) -------> secondRead(var)
And now happens-before order says that the reading thread will read the value that was "written in the last HB", or it means that reading 1 then 0 is impossible.
I took the example jcstress samples and introduced a small change (just like your System.out.println does):
#JCStressTest
#Outcome(id = "0, 0", expect = Expect.ACCEPTABLE, desc = "Doing both reads early.")
#Outcome(id = "1, 1", expect = Expect.ACCEPTABLE, desc = "Doing both reads late.")
#Outcome(id = "0, 1", expect = Expect.ACCEPTABLE, desc = "Doing first read early, not surprising.")
#Outcome(id = "1, 0", expect = Expect.ACCEPTABLE_INTERESTING, desc = "First read seen racy value early, and the second one did not.")
#State
public class SO64983578 {
private final Holder h1 = new Holder();
private final Holder h2 = h1;
private static class Holder {
int a;
int trap;
}
#Actor
public void actor1() {
h1.a = 1;
}
#Actor
public void actor2(II_Result r) {
Holder h1 = this.h1;
Holder h2 = this.h2;
h1.trap = 0;
h2.trap = 0;
synchronized (this) {
r.r1 = h1.a;
}
synchronized (this) {
r.r2 = h2.a;
}
}
}
Notice the synchronized(this){....} that is not part of the initial example. Even with synchronization, I still can see that 1, 0 as a result. This is just to prove that even with synchronized (that comes internally from System.out.println), you can still get 1 than 0.
When the value of var is read and it's 1 it won't change back. This output can't happen, neither due to visibility nor reorderings. What can happen is 0 0, 0 1 and 1 1.
The key point to understand here is that println involves synchronization. Look inside that method and you should see a synchronized there. These blocks have the effect that the prints will happen in just that order. While the write can happen anywhen, it's not possible that the first print sees the new value of var but the second print sees the old value. Therefore, the write can only happen before both prints, in-between or after them.
Besides that, there is no guarantee that the write will be visible at all, as var is not marked with volatile nor is the write synchronized in any way.
I think what is missing here is the fact that those threads run on actual physical cores and we have few possible variants here:
all threads run on the same core, then the problem is reduced to the order of execution of those 3 instructions, in this case 1,0 is not possible I think, println executions are ordered due to the memory barriers created by synchronisation, so that excludes 1,0
A and B runs on 2 different cores, then 1,0 does not look possible either, as as soon the core that runs thread A reads 1, there is no way it will read 0 after, same as above printlns are ordered.
Thread A is rescheduled in between those 2 printlns, so the second println is executed on a different core, either the same as B was/will be executed or on a different 3rd core. So when the 2 printlns are executed on a different cores, it depends what value does 2 cores see, if var is not synchronised (it is not clear is var a member of this), then those 2 cores can see different var value, so there is a possibility for 1,0.
So this is a cache coherence problem.
P.S. I'm not a jvm expert, so there might be other things in play here.
Adding to the other answers:
With long and double, writes may not be atomic so the first 32 bits could become visible before the last 32 bits, or viceversa. Therefore completely different values could be output.
Related
Here is my code,
class Shared {
private static int index = 0;
public synchronized void printThread() {
try {
while(true) {
Thread.sleep(1000);
System.out.println(Thread.currentThread().getName() + ": " + index++);
notifyAll();
// notify();
wait();
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
class Example13 implements Runnable {
private Shared shared = new Shared();
#Override
public void run() {
shared.printThread();
}
}
public class tetest {
public static void main(String[] args) {
Example13 r = new Example13();
Thread t1 = new Thread(r, "Thread 1");
Thread t2 = new Thread(r, "Thread 2");
Thread t3 = new Thread(r, "Thread 3");
Thread t4 = new Thread(r, "Thread 4");
Thread t5 = new Thread(r, "Thread 5");
t1.start();
t2.start();
t3.start();
t4.start();
t5.start();
}
}
and the result is here
Thread 1: 0
Thread 5: 1
Thread 4: 2
Thread 3: 3
Thread 2: 4
Thread 3: 5
Thread 2: 6
Thread 3: 7
Thread 2: 8
Thread 3: 9
the question is, why only two of the threads are working? I'm so confused, I thought notify() is randomly wake up one of the waiting threads, but it's not.
is this starvation? then, why the starvation is caused? I tried notify() and notifyAll(), but got same results for both.
Can any one help my toasted brain?
This isn't 'starvation'. Your 5 threads are all doing nothing. They all want to 'wake up' - notify() will wake up an arbitrary one. It is neither unreliably random: The JMM does not ascribe an order to this, so one of them will wake up, you can't rely on it being random (do not use this to generate random numbers), nor can you rely on specific ordering behaviour.
It's not starvation (it's not: Oh no! Thread 2 and 3 are doing all the work and 4 and 5 are just hanging out doing nothing! That's bad - the system could be more efficient!) - because it doesn't matter which thread 'does the work'. A CPU core is a CPU core, it cares not which thread it ends up running.
Starvation is a different principle. Imagine instead of Thread.sleep (which means the threads aren't waiting for anything specific, other than some time to elapse), instead the threads want to print the result of some expensiv-ish math operation. If you just let 2 threads each say 'Hello!', then the impl of System.out says that it would be acceptable for the JVM to produce:
HelHellloo!!
So to prevent this, you use locks to create a 'talking stick' of sorts: A thread only gets to print if it has the talking stick. Each of 5 threads will all perform, in a loop, the following operation:
Do an expensive math operation.
Acquire the talking stick.
Print the result of the operation.
Release the talking stick.
Loop back to the top.
Now imagine that, despite the fact that the math operation is quite expensive, for whatever reason you have an excruciatingly slow terminal, and the 'print the result of the operation' job takes a really long time to finish.
Now you can run into starvation. Imagine this scenario:
Threads 1-5 all do their expensive math simultaneously.
Arbitrarily, thread 4 ends up nabbing the talking stick.
The other 4 threads soon too want the talking stick but they have to wait; t4 has it. They do nothing now. Twiddling their thumbs (they could be calculating, but they are not!)
after the excruciatingly long time, 4 is done and releases the stick. 1, 2, 3, and 5 dogpile on like its the All-Blacks and 2 so happens to win the scrum and crawls out of the pile with the stick. 1, 3, and 5 gnash their teeth and go back yet again to waiting for the stick, still not doing any work. Whilst 2 is busy spending the really long time printing results, 4 goes back to the top of the loop and calculates another result. It ends up doing this faster than 2 manages to print, so 4 ends up wanting the talking stick again before 2 is done.
2 is finally done and 1, 3, 4, and 5 all scrum into a pile again. 4 happens to get the stick - java makes absolutely no guarantees about fairness, any one of them can get it, there is also no guarantee of randomness or lack thereof. A JVM is not broken if 4 is destined to win this fight.
Repeat ad nauseam. 2 and 4 keep trading the stick back and forth. 1, 3, and 5 never get to talk.
The above is, as per the JMM, valid - a JVM is not broken if it behaves like this (it would be a tad weird). Any bugs filed about this behaviour would be denied: The lock isn't so-called "fair". Java has fair locks if you want them - in the java.util.concurrent package. Fair locks incur some slight extra bookkeeping cost, the assumption made by the synchronized and wait/notify system is that you don't want to pay this extra cost.
A better solution to the above scenario is possibly to make a 6th thread that JUST prints, with 5 threads JUST filling a buffer, at least then the 'print' part is left to a single thread and that might be faster. But mostly, the bottleneck in this getup is simply that printing - the code has zero benefit from being multicore (just having ONE thread do ONE math calculation, print it, do another, and so forth would be better. Or possibly 2 threads: Whilst printing, the other thread calculates a number, but there's no point in having more than one; even a single thread can calculate faster than results can be printed). Thus in some ways this is just what the situation honestly requires: This hypothetical scenario still prints as fast as it can. IF you need the printing to be 'fair' (and who says that? It's not intrinsic to the problem description that fairness is a requirement. Maybe all the various calculations are equally useful so it doesn't matter that one thread gets to print more than others; let's say its bitcoin miners, generating a random number and checking if that results in a hash with the requisite 7 zeroes at the end or whatever bitcoin is up to now - who cares that one thread gets more time than another? A 'fair' system is no more likely to successfully mine a block).
Thus, 'fairness' is something you need to explicitly determine you actually need. If you do, AND starvation is an issue, use a fair lock. new ReentrantLock(true) is all you need (that boolean parameter is the fair parameter - true means you want fairness).
I have a global variable
volatile i = 0;
and two threads. Each does the following:
i++;
System.out.print(i);
I receive the following combinations. 12, 21 and 22.
I understand why I don't get 11 (volatile disallows the caching of i) and I also understand 12 and 22.
What I don't understand is how it is possible to get 21?
The only possible way how you can get this combination is that the thread that prints later had to be the first to increment i from 0 to 1 and then cached i==1. Then the other thread incremented i from 1 to 2 and then printed 2. Then the first thread prints the cached i==1. But I thought that volatile disallow caching.
Edit: After running the code 10,000 times I got 11 once. Adding volatile to i does not change the possible combinations at all.
markspace is right: volatile forbids caching i but i++ is not atomic. This means that i still gets sort of "cached" in a register during the incrementation.
r1 = i
//if i changes here r1 does not change
r1 = r1 + 1
i = r1
This is the reason why 11 is still possible. 21 is caused because PrintStreams are not synchronized (see Karol Dowbecki's answer)
Your code does not guarantee which thread will call System.out first.
The increments and reads for i happened in order due to volatile keyword but prints didn't.
Unfortunately ++ is not an atomic operation. Despite volatile not allowing caching, it is permitted for the JVM to read, increment, and then write as separate operations. Thus, the concept you are trying to implement just isn't workable. You need to use synchronized for its mutex, or use something like AtomicInteger which does provide an atomic increment operation.
The only possible way...is that the thread that prints later had to be the first to increment i from 0 to 1 and then cached i==1...
You are forgetting about what System.out.print(i); does: That statement calls the System.out object's print(...) method with whatever value was stored in i at the moment when the call was started.
So here's one scenario that could happen:
Thread A
increments i (i now equals 1)
Starts to call `print(1)` //Notice! that's the digit 1, not the letter i.
gets bogged down somewhere deep in the guts...
Thread B
increments i (i=2)
Calls `print(2)`
Gets lucky, and the call runs to completion.
Thread A
finishes its `print(1)` call.
Neither thread is caching the i variable. But, the System.out.print(...) function doesn't know anything about your volatile int i. It only knows about the value (1 or 2) that was passed to it.
I am trying to learn concurrency in Java, but whatever I do, 2 threads run in serial, not parallel, so I am not able to replicate common concurrency issues explained in tutorials (like thread interference and memory consistency errors). Sample code:
public class Synchronization {
static int v;
public static void main(String[] args) {
Runnable r0 = () -> {
for (int i = 0; i < 10; i++) {
Synchronization.v++;
System.out.println(v);
}
};
Runnable r1 = () -> {
for (int i = 0; i < 10; i++) {
Synchronization.v--;
System.out.println(v);
}
};
Thread t0 = new Thread(r0);
Thread t1 = new Thread(r1);
t0.start();
t1.start();
}
}
This always give me a result starting from 1 and ending with 0 (whatever the loop length is). For example, the code above gives me every time:
1
2
3
4
5
6
7
8
9
10
9
8
7
6
5
4
3
2
1
0
Sometimes, the second thread starts first and the results are the same but negative, so it is still running in serial.
Tried in both Intellij and Eclipse with identical results. CPU has 2 cores if it matters.
UPDATE: it finally became reproducible with huge loops (starting from 1_000_000), though still not every time and just with small amount of final discrepancy. Also seems like making operations in loops "heavier", like printing thread name makes it more reproducible as well. Manually adding sleep to thread also works, but it makes experiment less cleaner, so to say. The reason doesn't seems to be that first loop finishes before the second starts, because I see both loops printing to console while continuing operating and still giving me 0 at the end. The reasons seems more like a thread race for same variable. I will dig deeper into that, thanks.
Seems like first started thread just never give a chance to second in Thread Race to take a variable/second one just never have a time to even start (couldn't say for sure), so the second almost* always will be waiting until first loop will be finished.
Some heavy operation will mix the result:
TimeUnit.MILLISECONDS.sleep(100);
*it is not always true, but you are was lucky in your tests
Starting a thread is heavyweight operation, meaning that it will take some time to perform. Due that fact, by the time you start second thread, first is finished.
The reasoning why sometimes it is in "revert order" is due how thread scheduler works. By the specs there are not guarantees about thread execution order - having that in mind, we know that it is possible for second thread to run first (and finish)
Increase iteration count to something meaningful like 10000 and see what will happen then.
This is called lucky timing as per Brian Goetz (Author of Java Concurrency In Practice). Since there is no synchronization to the static variable v it is clear that this class is not thread-safe.
I found a solution to the mutual-exclusion problem online that has two processes P0 and P1. (Assume that the variable turn is initialized to 0)
volatile int turn;
Process P0:
/* Other code */
while (turn != 0) { } /* Do nothing and wait. */
Critical Section /* . . . */
turn = 1;
/* Other code */
Process P1:
/*Other code*/
while (turn != 1) { } /* Do nothing and wait. */
Critical Section /* . . . */
turn = 0;
/* Other code */
How does this solution solve the mutual-exclusion problem? I don't understand it fully.
Assuming there's no other code that can set turn to a value other than 0 or 1, and assuming the only thing messing with the turn variable are P0 and P1, then this does solve the mutual exclusion property. Specifically, you say that turn is initialized to 0. So that means P1 can't enter the critical section: it's busy in the while (turn != 1) loop and it'll stay in that loop until something sets turn == 1. Given our assumption that only P0 and P1 make changes to turn that means P1 can't enter the critical section until P0 sets turn to 1. So P0 will immediately exit it's while (turn != 0) loop (as turn is initially 0) and safely enter its critical section. It knows P1 can't enter it's critical section until turn gets set to 1 and that only happens after P0 has left it's critical section. Once P0 sets turn to 1, P0 will be stuck in it's while (turn != 0) loop until P1 sets it free so now P1 is in it's critical section and P0 can't be in it's. And so on.
An easy way to think of this is two people and a batton. They each agree not to do anything (enter their critical section) unless they hold the batton. So Person 1 has the batton at first and is free to do stuff knowing that Person 2 can't do anything - they don't have the batton. Once Person 1 is done, they hand the batton to Person 2. Person 2 is now free to do whatever they want and they know Person 1 is doing nothing but waiting for the batton to be handed back to them.
As #JustinSteele points out, this definitely does not solve the mutual exclusion problem. Maybe if you would change the turn to a boolean, you could get a dirty fix, since a boolean only consists out of two values. If you want a more proper way of providing mutual exclusive, I would suggest to take a look at mutexes, semaphores and condition variables. Good luck!
If both P0 and P1 are executed, and each is executed only once, it is true that P0 will enter the critical section first, exclusively, before P1 does.
In term of Java Memory Model, this program is correctly synchronized, because all inter-thread actions are volatile reads and writes. Therefore the program is sequentially consistent, easy to analyze.
Or, more specifically, all volatile reads and writes are in a total order (that's consistent with the programming order); this order will guarantee the mutual exclusiveness of critical sections.
However, there is a serious problem here. If P1 arrives first, it must wait for P0, no matter how late P0 arrives. This is quite unfair. And, if P0 is not executed, P1 cannot advance. And, If P0 is executed and P1 is not, P0 cannot enter the critical section again (it must wait for P1 to reset the turn). This locking mechanism only allows a strict P0-P1-P0-P1-... sequence (unless that is exactly what's desired)
To solve this problem, there are Dekker's algorithm, Peterson's algorithm, etc. See this post - https://cs.stackexchange.com/a/12632
I've got this little problem on thread.
int x = 0;
add() {
x=x+1;
}
If we run this in multiple threads, say 4 threads, is the final value of x=4 at every time or it could be 1,2,3 or 4.
Thanks
PS
lets say the atomic operations for the adding is like this,
LOAD A x
ADD A 1
LOAD x A
Then the final result will be 4. Am I right or what have I get wrong ?
This is a classical example of data race.
Now, let's take a closer look at what add() does:
add()
{
x = x + 1;
}
This translates to:
Give me the most recent value of X and store it in my private workspace
Add 1 to that value that is stored in my private workspace
Copy what I have in my workspace to the memory that I copied from (that is globally accessible).
Now, before we go further in explaining this, you have something called context switch, which is the process by which your operating system divides your processor's time among different threads and processes. This process usually gives your threads a finite amount of processor time (on windows it is about 40 milliseconds) and then interrupts that work, copies everything the processor have in its registers (and by thus preserve it's state) and switch to the next task. This is called Round-robin task scheduling.
You have no control over when you're processing is going to be interrupted and transferred to another thread.
Now imagine you have two threads doing the same:
1. Give me the most recent value of X and store it in my private workspace
2. Add 1 to that value that is stored in my private workspace
3. Copy what I have in my workspace to the memory that I copied from (that is globally accessible).
and X is equal to 1 before any of them runs.
The first thread might execute the first instruction and store in it's private workspace the value of X that was most recent at the time it was working on - 1. Then a context-switch occurs and the operating system interrupts your threads and gives control to the next task in queue, that happens to be the second thread. The second thread also reads the value of X which is equal to 1.
Thread number two manages to run to completion - it adds 1 to the value it "downloaded" and "uploads" the calculated value.
The operating system forces a context switch again.
Now the first thread continues execution at the point where it was interrupted. It will still think that the most recent value is 1, it will increment that value by one and save the result of it's computation to that memory area. And this is how data races occur. You expect the final result to be 3 but it is 2.
There are many ways to avoid this problem such as locks/mutexes, compare and swap or atomic operations.
Your code is broken at two levels:
No happens-before relationship imposed between actions of threads;
Atomicity of get-and-increment not enforced.
To solve 1. you can add the volatile modifier. This will still leave the operation non-atomic. To ensure atomicity, you would use (preferably) an AtomicInteger or synchronized (involves locking, not preferred).
As it stands, the result can be any number from 0 to 4 if read from a thread that was not involved in incrementing.
Multi-thread applications are concurrent (this is the whole point).
t1: LOAD A1 x
t2: LOAD A2 x
t3: LOAD A3 x
t4: LOAD A4 x
t1: ADD A1 1
t2: ADD A2 1
t3: ADD A3 1
t4: ADD A4 1
t1: STORE x A1
t2: STORE x A2
t3: STORE x A3
t4: STORE x A4
A1, A2, A3, A4 are local registers.
The result is 1, but it could be 2, 3 or 4. If you have another thread it could see the old value due to visability issues and see 0