What does "subsequent read" mean in the context of volatile variables?

What does "subsequent read" mean in the context of volatile variables? - java

Java memory visibility documentation says that:
A write to a volatile field happens-before every subsequent read of that same field.
I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores. So for example I assign value to variable in cycle c1 in some thread and then second thread is able to see this value in subsequent cycle c1 + 1?

It sounds to me like it's saying that it provides lockless acquire/release memory-ordering semantics between threads. See Jeff Preshing's article explaining the concept (mostly for C++, but the main point of the article is language neutral, about the concept of lock-free acquire/release synchronization.)
In fact Java volatile provides sequential consistency, not just acq/rel. There's no actual locking, though. See Jeff Preshing's article for an explanation of why the naming matches what you'd do with a lock.)
If a reader sees the value you wrote, then it knows that everything in the producer thread before that write has also already happened.
This ordering guarantee is only useful in combination with other guarantees about ordering within a single thread.
e.g.
int data[100];
volatile bool data_ready = false;
Producer:
data[0..99] = stuff;
// release store keeps previous ops above this line
data_ready = true;
Consumer:
while(!data_ready){} // spin until we see the write
// acquire-load keeps later ops below this line
int tmp = data[99]; // gets the value from the producer
If data_ready was not volatile, reading it wouldn't establish a happens-before relationship between two threads.
You don't have to have a spinloop, you could be reading a sequence number, or an array index from a volatile int, and then reading data[i].
I don't know Java well. I think volatile actually gives you sequential-consistency, not just release/acquire. A sequential-release store isn't allowed to reorder with later loads, so on typical hardware it needs an expensive memory barrier to make sure the local core's store buffer is flushed before any later loads are allowed to execute.
Volatile Vs Atomic explains more about the ordering volatile gives you.
Java volatile is just an ordering keyword; it's not equivalent to C11 _Atomic or C++11 std::atomic<T> which also give you atomic RMW operations. In Java, volatile_var++ is not an atomic increment, it a separate load and store, like volatile_var = volatile_var + 1. In Java, you need a class like AtomicInteger to get an atomic RMW.
And note that C/C++ volatile doesn't imply atomicity or ordering at all; it only tells the compiler to assume that the value can be modified asynchronously. This is only a small part of what you need to write lockless for anything except the simplest cases.

It means that once a certain Thread writes to a volatile field, all other Thread(s) will observe (on the next read) that written value; but this does not protect you against races though.
Threads have their caches, and those caches will be invalidated and updated with that newly written value via cache coherency protocol.
EDIT
Subsequent means whenever that happens after the write itself. Since you don't know the exact cycle/timing when that will happen, you usually say when some other thread observes the write, it will observer all the actions done before that write; thus a volatile establishes the happens-before guarantees.
Sort of like in an example:
// Actions done in Thread A
int a = 2;
volatile int b = 3;
// Actions done in Thread B
if(b == 3) { // observer the volatile write
// Thread B is guaranteed to see a = 2 here
}
You could also loop (spin wait) until you see 3 for example.

Peter's answer gives the rationale behind the design of the Java memory model.
In this answer I'm attempting to give an explanation using only the concepts defined in the JLS.
In Java every thread is composed by a set of actions.
Some of these actions have the potential to be observable by other threads (e.g. writing a shared variable), these
are called synchronization actions.
The order in which the actions of a thread are written in the source code is called the program order.
An order defines what is before and what is after (or better, not before).
Within a thread, each action has a happens-before relationship (denoted by <) with the next (in program order) action.
This relationship is important, yet hard to understand, because it's very fundamental: it guarantees that if A < B then
the "effects" of A are visible to B.
This is indeed what we expect when writing the code of a function.
Consider
Thread 1 Thread 2
A0 A'0
A1 A'1
A2 A'2
A3 A'3
Then by the program order we know A0 < A1 < A2 < A3 and that A'0 < A'1 < A'2 < A'3.
We don't know how to order all the actions.
It could be A0 < A'0 < A'1 < A'2 < A1 < A2 < A3 < A'3 or the sequence with the primes swapped.
However, every such sequence must have that the single actions of each thread are ordered according to the thread's program order.
The two program orders are not sufficient to order every action, they are partial orders, in opposition of the
total order we are looking for.
The total order that put the actions in a row according to a measurable time (like a clock) they happened is called the execution order.
It is the order in which the actions actually happened (it is only requested that the actions appear to be happened in
this order, but that's just an optimization detail).
Up until now, the actions are not ordered inter-thread (between two different threads).
The synchronization actions serve this purpose.
Each synchronization action synchronizes-with at least another synchronization action (they usually comes in pairs, like
a write and a read of a volatile variable, a lock and the unlock of a mutex).
The synchronize-with relationship is the happens-before between thread (the former implies the latter), it is exposed as
a different concept because 1) it slightly is 2) happens-before are enforced naturally by the hardware while synchronize-with
may require software intervention.
happens-before is derived from the program order, synchronize-with from the synchronization order (denoted by <<).
The synchronization order is defined in terms of two properties: 1) it is a total order 2) it is consistent with each thread's
program order.
Let's add some synchronization action to our threads:
Thread 1 Thread 2
A0 A'0
S1 A'1
A1 S'1
A2 S'2
S2 A'3
The program orders are trivial.
What is the synchronization order?
We are looking for something that by 1) includes all of S1, S2, S'1 and S'2 and by 2) must have S1 < S2 and S'1 < S'2.
Possible outcomes:
S1 < S2 < S'1 < S'2
S1 < S'1 < S'2 < S2
S'1 < S1 < S'2 < S'2
All are synchronization orders, there is not one synchronization order but many, the question of above is wrong, it
should be "What are the synchronization orders?".
If S1 and S'1 are so that S1 << S'1 than we are restricting the possible outcomes to the ones where S1 < S'2 so the
outcome S'1 < S1 < S'2 < S'2 of above is now forbidden.
If S2 << S'1 then the only possible outcome is S1 < S2 < S'1 < S'2, when there is only a single outcome I believe we have
sequential consistency (the converse is not true).
Note that if A << B these doesn't mean that there is a mechanism in the code to force an execution order where A < B.
Synchronization actions are affected by the synchronization order they do not impose any materialization of it.
Some synchronization actions (e.g. locks) impose a particular execution order (and thereby a synchronization order) but some don't (e.g. reads/writes of volatiles).
It is the execution order that create the synchronization order, this is completely orthogonal to the synchronize-with relationship.
Long story short, the "subsequent" adjective refers to any synchronization order, that is any valid (according to each thread
program order) order that encompasses all the synchronization actions.
The JLS then continues defining when a data race happens (when two conflicting accesses are not ordered by happens-before)
and what it means to be happens-before consistent.
Those are out of scope.

I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores...?
Subsequent means (according to the dictionary) coming after in time. There certainly is a global clock across all CPUs in a computer (think X Ghz) and the document is trying to say that if thread-1 did something at clock tick 1 then thread-2 does something on another CPU at clock tick 2, it's actions are considered subsequent.
A write to a volatile field happens-before every subsequent read of that same field.
The key phrase that could be added to this sentence to make it more clear is "in another thread". It might make more sense to understand it as:
A write to a volatile field happens-before every subsequent read of that same field in another thread.
What this is saying that if a read of a volatile field happens in Thread-2 after (in time) the write in Thread-1, then Thread-2 will be guaranteed to see the updated value. Further up in the documentation you point to is the section (emphasis mine):
... The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular.
Notice the highlighted phrase. The Java compiler is free to reorder instructions in any one thread's execution for optimization purposes as long as the reordering doesn't violate the definition of the language – this is called execution order and is critically different than program order.
Let's look at the following example with variables a and b that are non-volatile ints initialized to 0 with no synchronized clauses. What is shown is program order and the time in which the threads are encountering the lines of code.
Time Thread-1 Thread-2
1 a = 1;
2 b = 2;
3 x = a;
4 y = b;
5 c = a + b; z = x + y;
If Thread-1 adds a + b at Time 5, it is guaranteed to be 3. However, if Thread-2 adds x + y at Time 5, it might get 0, 1, 2, or 3 depends on race conditions. Why? Because the compiler might have reordered the instructions in Thread-1 to set a after b because of efficiency reasons. Also, Thread-1 may not have appropriately published the values of a and b so that Thread-2 might get out of date values. Even if Thread-1 gets context-switched out or crosses a write memory barrier and a and b are published, Thread-2 needs to cross a read barrier to update any cached values of a and b.
If a and b were marked as volatile then the write to a must happen-before (in terms of visibility guarantees) the subsequent read of a on line 3 and the write to b must happen-before the subsequent read of b on line 4. Both threads would get 3.
We use volatile and synchronized keywords in java to ensure happens-before guarantees. A write memory barrier is crossed when assigning a volatile or exiting a synchronized block and a read barrier is crossed when reading a volatile or entering a synchronized block. The Java compiler cannot reorder write instructions past these memory barriers so the order of updates is assured. These keywords control instruction reordering and insure proper memory synchronization.
NOTE: volatile is unnecessary in a single-threaded application because program order assures the reads and writes will be consistent. A single-threaded application might see any value of (non-volatile) a and b at times 3 and 4 but it always sees 3 at Time 5 because of language guarantees. So although use of volatile changes the reordering behavior in a single-threaded application, it is only required when you share data between threads.

This is more a definition of what will not happen rather than what will happen.
Essentially it is saying that once a write to an atomic variable has happened there cannot be any other thread that, on reading the variable, will read a stale value.
Consider the following situation.
Thread A is continuously incrementing an atomic value a.
Thread B occasionally reads A.a and exposes that value as a
non-atomic b variable.
Thread C occasionally reads both A.a and B.b.
Given that a is atomic it is possible to reason that from the point of view of C, b may occasionally be less than a but will never be greater than a.
If a was not atomic no such guarantee could be given. Under certain caching situations it would be quite possible for C to see b progress beyond a at any time.
This is a simplistic demonstration of how the Java memory model allows you to reason about what can and cannot happen in a multi-threaded environment. In real life the potential race conditions between reading and writing to data structures can be much more complex but the reasoning process is the same.

Related

volatile vs not volatile

Let's consider the following piece of code in Java
int x = 0;
int who = 1
Thread #1:
(1) x++;
(2) who = 2;
Thread #2
while(who == 1);
x++;
print x; ( the value should be equal to 2 but, perhaps, it is not* )
(I don't know Java memory models- let assume that it is strong memory model- I mean: (1) and (2) will be doesn't swapped)
Java memory model guarantees that access/store to the 32 bit variables is atomic so our program is safe. But, nevertheless we should use a attribute volatile because *. The value of x may be equal to 1 because x can be kept in register when Thread#2 read it. To resolve it we should make the x variable volatile. It is clear.
But, what about that situation:
int x = 0;
mutex m; ( just any mutex)
Thread #1:
mutex.lock()
x++;
mutex.unlock()
Thread #2
mutex.lock()
x++;
print x; // the value is always 2, why**?
mutex.unlock()
The value of x is always 2 though we don't make it volatile. Do I correctly understand that locking/unlocking mutex is connected with inserting memory barriers?

I'll try to tackle this. The Java memory model is kind of involved and hard to contain in a single StackOverflow post. Please refer to Brian Goetz's Java Concurrency in Practice for the full story.
The value of x is always 2 though we don't make it volatile. Do I correctly understand that locking/unlocking mutex is connected with inserting memory barriers?
First if you want to understand the Java memory model, it's always Chapter 17 of the spec you want to read through.
That spec says:
An unlock on a monitor happens-before every subsequent lock on that monitor.
So yes, there's a memory visibility event at the unlock of your monitor. (I assume by "mutex" you mean monitor. Most of the locks and other classes in the java.utils.concurrent package also have happens-before semantics, check the documentation.)
Happens-before is what Java means when it guarantees not just that the events are ordered, but also that memory visibility is guaranteed.
We say that a read r of a variable v is allowed to observe a write w
to v if, in the happens-before partial order of the execution trace:
r is not ordered before w (i.e., it is not the case that
hb(r, w)), and
there is no intervening write w' to v (i.e. no write w' to v such
that hb(w, w') and hb(w', r)).
Informally, a read r is allowed to see the result of a write w if there
is no happens-before ordering to prevent that read.
This is all from 17.4.5. It's a little confusing to read through, but the info is all there if you do read through it.

Let's go over some things. The following statement is true: Java memory model guarantees that access/store to the 32 bit variables is atomic. However, it does not follow that the first pseudoprogram you listed is safe. Simply because two statements are ordered syntactically does not mean that the visibility of their updates are also so ordered as viewed by other threads. Thread #2 may see the update caused by who=2 before the increment in x is visible. Making x volatile would still not make the program correct. Instead, making the variable 'who' voliatile would make the program correct. That is because volatile interacts with the java memory model in specific ways.
I feel like there is some notion of 'writing back to main memory' at the core of a common sense understanding of volatile which is incorrect. Volatile does not write back the value to main memory in Java. What reading from and writing to a volatile variable does is create what's called a happens-before relationship. When thread #1 writes to a volatile variable you're creating a relationship that ensures that any other threads #2 viewing that volatile variable will also be able to 'view' all the actions thread #1 has taken before that. In your example that means making 'who' volatile. By writing the value 2 to 'who' you are creating a happens-before relationship so that when thread #2 views who=2 it will similarly see an updated version of x.
In your second example (assuming you meant to have the 'who' variable too) the mutex unlocking creates a happens-before relationship as I specified above. Since that means other threads viewing the unlock of the mutex (ie. they are able to lock it themselves) they will see the updated version of x.

how synchronized keyword works internally

I read the below program and answer in a blog.
int x = 0;
boolean bExit = false;
Thread 1 (not synchronized)
x = 1;
bExit = true;
Thread 2 (not synchronized)
if (bExit == true)
System.out.println("x=" + x);
is it possible for Thread 2 to print “x=0”?
Ans : Yes ( reason : Every thread has their own copy of variables. )
how do you fix it?
Ans: By using make both threads synchronized on a common mutex or make both variable volatile.
My doubt is : If we are making the 2 variable as volatile then the 2 threads will share the variables from the main memory. This make a sense, but in case of synchronization how it will be resolved as both the thread have their own copy of variables.
Please help me.

This is actually more complicated than it seems. There are several arcane things at work.
Caching
Saying "Every thread has their own copy of variables" is not exactly correct. Every thread may have their own copy of variables, and they may or may not flush these variables into the shared memory and/or read them from there, so the whole thing is non-deterministic. Moreover, the very term flushing is really implementation-dependent. There are strict terms such as memory consistency, happens-before order, and synchronization order.
Reordering
This one is even more arcane. This
x = 1;
bExit = true;
does not even guarantee that Thread 1 will first write 1 to x and then true to bExit. In fact, it does not even guarantee that any of these will happen at all. The compiler may optimize away some values if they are not used later. The compiler and CPU are also allowed to reorder instructions any way they want, provided that the outcome is indistinguishable from what would happen if everything was really in program order. That is, indistinguishable for the current thread! Nobody cares about other threads until...
Synchronization comes in
Synchronization does not only mean exclusive access to resources. It is also not just about preventing threads from interfering with each other. It's also about memory barriers. It can be roughly described as each synchronization block having invisible instructions at the entry and exit, the first one saying "read everything from the shared memory to be as up-to-date as possible" and the last one saying "now flush whatever you've been doing there to the shared memory". I say "roughly" because, again, the whole thing is an implementation detail. Memory barriers also restrict reordering: actions may still be reordered, but the results that appear in the shared memory after exiting the synchronized block must be identical to what would happen if everything was indeed in program order.
All that only works, of course, only if both blocks use the same locking object.
The whole thing is described in details in Chapter 17 of the JLS. In particular, what's important is the so-called "happens-before order". If you ever see in the documentation that "this happens-before that", it means that everything the first thread does before "this" will be visible to whoever does "that". This may even not require any locking. Concurrent collections are a good example: one thread puts there something, another one reads that, and that magically guarantees that the second thread will see everything the first thread did before putting that object into the collection, even if those actions had nothing to do with the collection itself!
Volatile variables
One last warning: you better give up on the idea that making variables volatile will solve things. In this case maybe making bExit volatile will suffice, but there are so many troubles that using volatiles can lead to that I'm not even willing to go into that. But one thing is for sure: using synchronized has much stronger effect than using volatile, and that goes for memory effects too. What's worse, volatile semantics changed in some Java version so there may exist some versions that still use the old semantics which was even more obscure and confusing, whereas synchronized always worked well provided you understand what it is and how to use it.
Pretty much the only reason to use volatile is performance because synchronized may cause lock contention and other troubles. Read Java Concurrency in Practice to figure all that out.
Q & A
1) You wrote "now flush whatever you've been doing there to the shared
memory" about synchronized blocks. But we will see only the variables
that we access in the synchronize block or all the changes that the
thread call synchronize made (even on the variables not accessed in the
synchronized block)?
Short answer: it will "flush" all variables that were updated during the synchronized block or before entering the synchronized block. And again, because flushing is an implementation detail, you don't even know whether it will actually flush something or do something entirely different (or doesn't do anything at all because the implementation and the specific situation already somehow guarantee that it will work).
Variables that wasn't accessed inside the synchronized block obviously won't change during the execution of the block. However, if you change some of those variables before entering the synchronized block, for example, then you have a happens-before relationship between those changes and whatever happens in the synchronized block (the first bullet in 17.4.5). If some other thread enters another synchronized block using the same lock object then it synchronizes-with the first thread exiting the synchronized block, which means that you have another happens-before relationship here. So in this case the second thread will see the variables that the first thread updated prior to entering the synchronized block.
If the second thread tries to read those variables without synchronizing on the same lock, then it is not guaranteed to see the updates. But then again, it isn't guaranteed to see the updates made inside the synchronized block as well. But this is because of the lack of the memory-read barrier in the second thread, not because the first one didn't "flush" its variables (memory-write barrier).
2) In this chapter you post (of JLS) it is written that: "A write to a
volatile field (§8.3.1.4) happens-before every subsequent read of that
field." Doesn't this mean that when the variable is volatile you will
see only changes of it (because it is written write happens-before
read, not happens-before every operation between them!). I mean
doesn't this mean that in the example, given in the description of the
problem, we can see bExit = true, but x = 0 in the second thread if
only bExit is volatile? I ask, because I find this question here: http://java67.blogspot.bg/2012/09/top-10-tricky-java-interview-questions-answers.html
and it is written that if bExit is volatile the program is OK. So the
registers will flush only bExits value only or bExits and x values?
By the same reasoning as in Q1, if you do bExit = true after x = 1, then there is an in-thread happens-before relationship because of the program order. Now since volatile writes happen-before volatile reads, it is guaranteed that the second thread will see whatever the first thread updated prior to writing true to bExit. Note that this behavior is only since Java 1.5 or so, so older or buggy implementations may or may not support this. I have seen bits in the standard Oracle implementation that use this feature (java.concurrent collections), so you can at least assume that it works there.
3) Why monitor matters when using synchronized blocks about memory
visibility? I mean when try to exit synchronized block aren't all
variables (which we accessed in this block or all variables in the
thread - this is related to the first question) flushed from registers
to main memory or broadcasted to all CPU caches? Why object of
synchronization matters? I just cannot imagine what are relations and
how they are made (between object of synchronization and memory).
I know that we should use the same monitor to see this changes, but I
don't understand how memory that should be visible is mapped to
objects. Sorry, for the long questions, but these are really
interesting questions for me and it is related to the question (I
would post questions exactly for this primer).
Ha, this one is really interesting. I don't know. Probably it flushes anyway, but Java specification is written with high abstraction in mind, so maybe it allows for some really weird hardware where partial flushes or other kinds of memory barriers are possible. Suppose you have a two-CPU machine with 2 cores on each CPU. Each CPU has some local cache for every core and also a common cache. A really smart VM may want to schedule two threads on one CPU and two threads on another one. Each pair of the threads uses its own monitor, and VM detects that variables modified by these two threads are not used in any other threads, so it only flushes them as far as the CPU-local cache.
See also this question about the same issue.
4) I thought that everything before writing a volatile will be up to
date when we read it (moreover when we use volatile a read that in
Java it is memory barrier), but the documentation don't say this.
It does:
17.4.5.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
If hb(x, y) and hb(y, z), then hb(x, z).
A write to a volatile field (§8.3.1.4) happens-before every subsequent
read of that field.
If x = 1 comes before bExit = true in program order, then we have happens-before between them. If some other thread reads bExit after that, then we have happens-before between write and read. And because of the transitivity, we also have happens-before between x = 1 and read of bExit by the second thread.
5) Also, if we have volatile Person p does we have some dependency
when we use p.age = 20 and print(p.age) or have we memory barrier in
this case(assume age is not volatile) ? - I think - No
You are correct. Since age is not volatile, then there is no memory barrier, and that's one of the trickiest things. Here is a fragment from CopyOnWriteArrayList, for example:
Object[] elements = getArray();
E oldValue = get(elements, index);
if (oldValue != element) {
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len);
newElements[index] = element;
setArray(newElements);
} else {
// Not quite a no-op; ensures volatile write semantics
setArray(elements);
Here, getArray and setArray are trivial setter and getter for the array field. But since the code changes elements of the array, it is necessary to write the reference to the array back to where it came from in order for the changes to the elements of the array to become visible. Note that it is done even if the element being replaced is the same element that was there in the first place! It is precisely because some fields of that element may have changed by the calling thread, and it's necessary to propagate these changes to future readers.
6) And is there any happens before 2 subsequent reads of volatile
field? I mean does the second read will see all changes from thread
which reads this field before it(of course we will have changes only
if volatile influence visibility of all changes before it - which I am
a little confused whether it is true or not)?
No, there is no relationship between volatile reads. Of course, if one thread performs a volatile write and then two other thread perform volatile reads, they are guaranteed to see everything at least up to date as it was before the volatile write, but there is no guarantee of whether one thread will see more up-to-date values than the other. Moreover, there is not even strict definition of one volatile read happening before another! It is wrong to think of everything happening on a single global timeline. It is more like parallel universes with independent timelines that sometimes sync their clocks by performing synchronization and exchanging data with memory barriers.

It depends on the implementation which decides if threads will keep a copy of the variables in their own memory. In case of class level variables threads have a shared access and in case of local variables threads will keep a copy of it. I will provide two examples which shows this fact , please have a look at it.
And in your example if I understood it correctly your code should look something like this--
package com.practice.multithreading;
public class LocalStaticVariableInThread {
static int x=0;
static boolean bExit = false;
public static void main(String[] args) {
Thread t1=new Thread(run1);
Thread t2=new Thread(run2);
t1.start();
t2.start();
}
static Runnable run1=()->{
x = 1;
bExit = true;
};
static Runnable run2=()->{
if (bExit == true)
System.out.println("x=" + x);
};
}
Output
x=1
I am getting this output always. It is because the threads share the variable and the when it is changed by one thread other thread can see it. But in real life scenarios we can never say which thread will start first, since here the threads are not doing anything we can see the expected result.
Now take this example--
Here if you make the i variable inside the for-loop` as static variable then threads won t keep a copy of it and you won t see desired outputs, i.e. the count value will not be 2000 every time even if u have synchronized the count increment.
package com.practice.multithreading;
public class RaceCondition2Fixed {
private int count;
int i;
/*making it synchronized forces the thread to acquire an intrinsic lock on the method, and another thread
cannot access it until this lock is released after the method is completed. */
public synchronized void increment() {
count++;
}
public static void main(String[] args) {
RaceCondition2Fixed rc= new RaceCondition2Fixed();
rc.doWork();
}
private void doWork() {
Thread t1 = new Thread(new Runnable() {
#Override
public void run() {
for ( i = 0; i < 1000; i++) {
increment();
}
}
});
Thread t2 = new Thread(new Runnable() {
#Override
public void run() {
for ( i = 0; i < 1000; i++) {
increment();
}
}
});
t1.start();
t2.start();
try {
t1.join();
t2.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
/*if we don t use join then count will be 0. Because when we call t1.start() and t2.start()
the threads will start updating count in the spearate threads, meanwhile the main thread will
print the value as 0. So. we need to wait for the threads to complete. */
System.out.println(Thread.currentThread().getName()+" Count is : "+count);
}
}

Is it true that java volatile accesses cannot be reordered?

Note
By saying that a memory access can (or cannot) be reordered I meand that it can be
reordered either by the compiler when emitting byte code byte or by the JIT when emitting
machine code or by the CPU when executing out of order (eventually requiring barriers to prevent this) with respect to any other memory access.
If often read that accesses to volatile variables cannot be reordered due to the Happens-Before relationship (HBR).
I found that an HBR exists between every two consecutive (in program order) actions of
a given thread and yet they can be reordered.
Also a volatile access HB only with accesses on the same variable/field.
What I thinks makes the volatile not reorderable is this
A write to a volatile field (§8.3.1.4) happens-before every subsequent read [of any thread]
of that field.
If there are others threads a reordering of the variables will becomes visible as in this
simple example
volatile int a, b;
Thread 1 Thread 2
a = 1; while (b != 2);
b = 2; print(a); //a must be 1
So is not the HBR itself that prevent the ordering but the fact that volatile extends this relationship with other threads, the presence of other threads is the element that prevent reordering.
If the compiler could prove that a reordering of a volatile variable would not change the
program semantic it could reorder it even if there is an HBR.
If a volatile variable is never accessed by other threads than its accesses
could be reordered
volatile int a, b, c;
Thread 1 Thread 2
a = 1; while (b != 2);
b = 2; print(a); //a must be 1
c = 3; //c never accessed by Thread 2
I think c=3 could very well be reordered before a=1, this quote from the specs
confirm this
It should be noted that the presence of a happens-before relationship between
two actions does not necessarily imply that they have to take place in that order
in an implementation. If the reordering produces results consistent with a legal
execution, it is not illegal.
So I made these simple java programs
public class vtest1 {
public static volatile int DO_ACTION, CHOOSE_ACTION;
public static void main(String[] args) {
CHOOSE_ACTION = 34;
DO_ACTION = 1;
}
}
public class vtest2 {
public static volatile int DO_ACTION, CHOOSE_ACTION;
public static void main(String[] args) {
(new Thread(){
public void run() {
while (DO_ACTION != 1);
System.out.println(CHOOSE_ACTION);
}
}).start();
CHOOSE_ACTION = 34;
DO_ACTION = 1;
}
}
In both cases both fields are marked as volatile and accessed with putstatic.
Since these are all the information the JIT has1, the machine code would be identical,
thus the vtest1 accesses will not be optimized2.
My question
Are volatile accesses really never reordered by specification or they could be3, but this is never done in practice?
If volatile accesses can never be reordered, what parts of the specs say so? and would this means that all volatile accesses are executed and seen in program order by the CPUs?
1Or the JIT can known that a field will never be accessed by other thread? If yes, how?.
2Memory barriers will be present for example.
3For example if no other threads are involved.

What the JLS says (from JLS-8.3.1.4. volatile Fields) is, in part, that
The Java programming language provides a second mechanism, volatile fields, that is more convenient than locking for some purposes.
A field may be declared volatile, in which case the Java Memory Model ensures that all threads see a consistent value for the variable (§17.4).
Which means the access may be reordered, but the results of any reordering must eventually be consistent (when accessed by another thread) with the original order. A field in a single threaded application wouldn't need locking (from volatile or synchronization).

The Java memory model provides sequential consistency (SC) for correctly synchronized programs. SC in simple terms means that if all possible executions of some program, can be explained by different executions in which all memory actions are executed in some sequential order and this order is consistent with the program order (PO) of each of the threads, then this program is consistent with these sequential executions; so it is sequential consistent (hence the name).
What this effectively means that the JIT/CPU/memory subsystem can reorder volatile writes and reads as much as it wants as long as there exists a sequential execution that could also explain the outcome of the actual execution. So the actual execution isn't that important.
If we look at the following example:
volatile int a, b, c;
Thread 1 Thread 2
a = 1; while (c != 1);
b = 1; print(b);
c = 1;
There is a happens before relation between a=1 and b=2 (PO), and a happens before relation between c=2 and c=3 (PO) and a happens before relation c=1 and c!=0 (Volatile variable rule) and a happens before relation between c!=0 and print(b) (PO).
Since the happens before relation is transitive, there is a happens before relation between a=1 and print(b). So in that sense, it can't be reordered. However, there is nobody to prove that a reordering happened, so it can still be reordered.

I'm going to be using notation from JLS §17.4.5.
In your second example, (if you'll excuse my loose notation) you have
Thread 1 ordering:
hb(a = 1, b = 2)
hb(b = 2, c = 3)
Volatile guarantees:
hb(b = 2, b != 2)
hb(a = 1, access a for print)
Thread 2 ordering:
hb(while(b != 2);, print(a))
and we have (emphasis mine)
More specifically, if two actions share a happens-before relationship,
they do not necessarily have to appear to have happened in that order
to any code with which they do not share a happens-before
relationship. Writes in one thread that are in a data race with reads
in another thread may, for example, appear to occur out of order to
those reads.
There is no happens-before relationship between c=3 and Thread 2. The implementation is free to reorder c=3 to its heart's content.

From 17.4. Memory Model of JLS
The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.
This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization.

Could the JIT collapse two volatile reads as one in certain expressions?

Suppose we have a volatile int a. One thread does
while (true) {
a = 1;
a = 0;
}
and another thread does
while (true) {
System.out.println(a+a);
}
Now, would it be illegal for a JIT compiler to emit assembly corresponding to 2*a instead of a+a?
On one hand the very purpose of a volatile read is that it should always be fresh from memory.
On the other hand, there's no synchronization point between the two reads, so I can't see that it would be illegal to treat a+a atomically, in which case I don't see how an optimization such as 2*a would break the spec.
References to JLS would be appreciated.

Short answer:
Yes, this optimization is allowed. Collapsing two sequential read operations produes the observable behavior of the sequence being atomic, but does not appear as a reordering of operations. Any sequence of actions performed on a single thread of execution can be executed as an atomic unit. In general, it is difficult to ensure a sequence of operations executes atomically, and it rarely results in a performance gain because most execution environments introduce overhead to execute items atomically.
In the example given by the original question, the sequence of operations in question is the following:
read(a)
read(a)
Performing these operations atomically guarantees that the value read on the first line is equal to the value read on the second line. Furthermore, it means the value read on the second line is the value contained in a at the time the first read was executed (and vice versa, because atomic both read operations occurred at the same time according to the observable execution state of the program). The optimization in question, which is reusing the value of the first read for the second read, is equivalent to the compiler and/or JIT executing the sequence atomically, and is thus valid.
Original longer answer:
The Java Memory Model describes operations using a happens-before partial ordering. In order to express the restriction that the first read r1 and second read r2 of a cannot be collapsed, you need to show that some operation is semantically required to appear between them.
The operations on the thread with r1 and r2 is the following:
--> r(a) --> r(a) --> add -->
To express the requirement that something (say y) lie between r1 and r2, you need to require that r1 happens-before y and y happens-before r2. As it happens, there is no rule where a read operation appears on the left side of a happens-before relationship. The closest you could get is saying y happens-before r2, but the partial order would allow y to also occur before r1, thus collapsing the read operations.
If no scenario exists which requires an operation to fall between r1 and r2, then you can declare that no operation ever appears between r1 and r2 and not violate the required semantics of the language. Using a single read operation would be equivalent to this claim.
Edit My answer is getting voted down, so I'm going to go into additional details.
Here are some related questions:
Is the Java compiler or JVM required to collapse these read operations?
No. The expressions a and a used in the add expression are not constant expressions, so there is no requirement that they be collapsed.
Does the JVM collapse these read operations?
To this, I'm not sure of the answer. By compiling a program and using javap -c, it's easy to see that the Java compiler does not collapse these read operations. Unfortunately it's not as easy to prove the JVM does not collapse the operations (or even tougher, the processor itself).
Should the JVM collapse these read operations?
Probably not. Each optimization takes time to execute, so there is a balance between the time it takes to analyze the code and the benefit you expect to gain. Some optimizations, such as array bounds check elimination or checking for null references, have proven to have extensive benefits for real-world applications. The only case where this particular optimization has the possibility of improving performance is cases where two identical read operations appear sequentially.
Furthermore, as shown by the response to this answer along with the other answers, this particular change would result in an unexpected behavior change for certain applications which users may not desire.
Edit 2: Regarding Rafael's description of a claim that two read operations that cannot be reordered. This statement is designed to highlight the fact that caching the read operation of a in the following sequence could produce an incorrect result:
a1 = read(a)
b1 = read(b)
a2 = read(a)
result = op(a1, b1, a2)
Suppose initially a and b have their default value 0. Then you execute just the first read(a).
Now suppose another thread executes the following sequence:
a = 1
b = 1
Finally, suppose the first thread executes the line read(b). If you were to cache the originally read value of a, you would end up with the following call:
op(0, 1, 0)
This is not correct. Since the updated value of a was stored before writing to b, there is no way to read the value b1 = 1 and then read the value a2 = 0. Without caching, the correct sequence of events leads to the following call.
op(0, 1, 1)
However, if you were to ask the question "Is there any way to allow the read of a to be cached?", the answer is yes. If you can execute all three read operations in the first thread sequence as an atomic unit, then caching the value is allowed. While synchronizing across multiple variables is difficult and rarely provides an opportunistic optimization advantage, it is certainly conceivable to encounter an exception. For example, suppose a and b are each 4 bytes, and they appear sequentially in memory with a aligned on an 8-byte boundary. A 64-bit process could implement the sequence read(a) read(b) as an atomic 64-bit load operation, which would allow the value of a to be cached (effectively treating all three read operations as an atomic operation instead of just the first two).

In my original answer, I argued against the legality of the suggested optimization. I backed this mainly from information of the JSR-133 cookbook where it states that a volatile read must not be reordered with another volatile read and where it further states that a cached read is to be treated as a reordering. The latter statement is however formulated with some ambiguouity which is why I went through the formal definition of the JMM where I did not find such indication. Therefore, I would now argue that the optimization is allowed. However, the JMM is quite complex and the discussion on this page indicates that this corner case might be decided differently by someone with a more thorough understanding of the formalism.
Denoting thread 1 to execute
while (true) {
System.out.println(a // r_1
+ a); // r_2
}
and thread 2 to execute:
while (true) {
a = 0; // w_1
a = 1; // w_2
}
The two reads r_i and two writes w_i of a are synchronization actions as a is volatile (JSR 17.4.2). They are external actions as variable a is used in several threads. These actions are contained in the set of all actions A. There exists a total order of all synchronization actions, the synchronization order which is consistent with program order for thread 1 and thread 2 (JSR 17.4.4). From the definition of the synchronizes-with partial order, there is no edge defined for this order in the above code. As a consequence, the happens-before order only reflects the intra-thread semantics of each thread (JSR 17.4.5).
With this, we define W as a write-seen function where W(r_i) = w_2 and a value-written function V(w_i) = w_2 (JLS 17.4.6). I took some freedom and eliminated w_1 as it makes this outline of a formal proof even simpler. The question is of this proposed execution E is well-formed (JLS 17.5.7). The proposed execution E obeys intra-thread semantics, is happens-before consistent, obeys the synchronized-with order and each read observes a consistent write. Checking the causality requirements is trivial (JSR 17.4.8). I do neither see why the rules for non-terminating executions would be relevant as the loop covers the entire discussed code (JLS 17.4.9) and we do not need to distinguish observable actions.
For all this, I cannot find any indication of why this optimization would be forbidden. Nevertheless, it is not applied for volatile reads by the HotSpot VM as one can observe using -XX:+PrintAssembly. I assume that the performance benefits are however minor and this pattern is not normally observed.
Remark: After watching the Java memory model pragmatics (multiple times), I am pretty sure, this reasoning is correct.

On one hand the very purpose of a volatile read is that it should always be fresh from memory.
That is not how the Java Language Specification defines volatile. The JLS simply says:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
Therefore, a write to a volatile variable happens-before (and is visible to) any subsequent reads of that same variable.
This constraint is trivially satisfied for a read that is not subsequent. That is, volatile only ensures visibility of a write if the read is known to occur after the write.
This is not the case in your program. For every well formed execution that observes a to be 1, I can construct another well formed execution where a is observed to be 0, simply be moving the read after the write. This is possible because the happens-before relation looks as follows:
write 1 --> read 1 write 1 --> read 1
| | | |
| v v |
v --> read 1 write 0 v
write 0 | vs. | --> read 0
| | | |
v v v v
write 1 --> read 1 write 1 --> read 1
That is, all the JMM guarantees for your program is that a+a will yield 0, 1 or 2. That is satisfied if a+a always yields 0. Just as the operating system is permitted to execute this program on a single core, and always interrupt thread 1 before the same instruction of the loop, the JVM is permitted to reuse the value - after all, the observable behavior remains the same.
In general, moving the read across the write violates happens-before consistency, because some other synchronization action is "in the way". In the absence of such intermediary synchronization actions, a volatile read can be satisfied from a cache.

Modified the OP Problem a little
volatile int a
//thread 1
while (true) {
a = some_oddNumber;
a = some_evenNumber;
}
// Thread 2
while (true) {
if(isOdd(a+a)) {
break;
}
}
If the above code have been executed Sequentially, then there exist a valid Sequential Consistent Execution which will break the thread2 while loop.
Whereas if compiler optimizes a+a to 2a then thread2 while loop will never exist.
So the above optimization will prohibit one particular execution if it had been Sequentially Executed Code.
Main Question, is this optimization a Problem ?
Q. Is the Transformed code Sequentially Consistent.
Ans. A program is correctly synchronized if, when it is executed in a sequentially consistent manner, there are no data races. Refer Example 17.4.8-1 from JLS chapter 17
Sequential consistency: the result of any execution is the same as
if the read and write operations by all processes were executed in
some sequential order and the operations of each individual
process appear in this sequence in the order specified by its
program [Lamport, 1979].
Also see http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.3
Sequential Consistency is a strong guarantee. The Execution Path where compiler optimizes a+a as 2a is also a valid Sequentially Consistent Execution.
So the Answer is Yes.
Q. Is the code violates happens before guarantees.
Ans. Sequential Consistency implies that happens before guarantee is valid here .
So the Answer is Yes. JLS ref
So i don't think optimization is invalid legally at least in the OP case.
The case where the Thread 2 while loops stucks into an infinte is also quite possible without compiler transformation.

As laid out in other answers there are two reads and two writes. Imagine the following execution (T1 and T2 denote two threads), using annotations that match the JLS statement below:
T1: a = 0 //W(r)
T2: read temp1 = a //r_initial
T1: a = 1 //w
T2: read temp2 = a //r
T2: print temp1+temp2
In a concurrrent environment this is definitely a possible thread interleaving. Your question is then: would the JVM be allowed to make r observe W(r) and read 0 instead of 1?
JLS #17.4.5 states:
A set of actions A is happens-before consistent if for all reads r in A, where W(r) is the write action seen by r, it is not the case that either hb(r, W(r)) or that there exists a write w in A such that w.v = r.v and hb(W(r), w) and hb(w, r).
The optimisation you propose (temp = a; print (2 * temp);) would violate that requirement. So your optimisation can only work if there is no intervening write between r_initial and r, which can't be guaranteed in a typical multi threaded framework.
As a side comment, note however that there is no guarantee as to how long it will take for the writes to become visible from the reading thread. See for example: Detailed semantics of volatile regarding timeliness of visibility.

The volatile key word and memory consistency errors

In the oracle Java documentation located here, the following is said:
Atomic actions cannot be interleaved, so they can be used without fear of thread interference. However, this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible. Using volatile variables reduces the risk of memory consistency errors, because any write to a volatile variable establishes a happens-before relationship with subsequent reads of that same variable. This means that changes to a volatile variable are always visible to other threads. What's more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the volatile, but also the side effects of the code that led up the change.
It also says:
Reads and writes are atomic for reference variables and for most
primitive variables (all types except long and double).
Reads and writes are atomic for all variables declared volatile (including long
and double variables).
I have two questions regarding these statements:
"Using volatile variables reduces the risk of memory consistency errors" - What do they mean by "reduces the risk", and how is a memory consistency error still possible when using volatile?
Would it be true to say that the only effect of placing volatile on a non-double, non-long primitive is to enable the "happens-before" relationship with subsequent reads from other threads? I ask this since it seems that those variables already have atomic reads.

What do they mean by "reduces the risk"?
Atomicity is one issue addressed by the Java Memory Model. However, more important than Atomicity are the following issues:
memory architecture, e.g. impact of CPU caches on read and write operations
CPU optimizations, e.g. reordering of loads and stores
compiler optimizations, e.g. added and removed loads and stores
The following listing contains a frequently used example. The operations on x and y are atomic. Still, the program can print both lines.
int x = 0, y = 0;
// thread 1
x = 1
if (y == 0) System.out.println("foo");
// thread 2
y = 1
if (x == 0) System.out.println("bar");
However, if you declare x and y as volatile, only one of the two lines can be printed.
How is a memory consistency error still possible when using volatile?
The following example uses volatile. However, updates might still get lost.
volatile int x = 0;
// thread 1
x += 1;
// thread 2
x += 1;
Would it be true to say that the only effect of placing volatile on a non-double, non-long primitive is to enable the "happens-before" relationship with subsequent reads from other threads?
Happens-before is often misunderstood. The consistency model defined by happens-before is weak and difficult to use correctly. This can be demonstrated with the following example, that is known as Independent Reads of Independent Writes (IRIW):
volatile int x = 0, y = 0;
// thread 1
x = 1;
// thread 2
y = 1;
// thread 3
if (x == 1) System.out.println(y);
// thread 4
if (y == 1) System.out.println(x);
Only with happens-before, two 0s would be valid result. However, that's apparently counter-intuitive. For that reason, Java provides a stricter consistency model, that forbids this relativity issue, and that is known as sequential consistency. You can find it in sections §17.4.3 and §17.4.5 of the Java Language Specification. The most important part is:
A program is correctly synchronized if and only if all sequentially consistent executions are free of data races. If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent (§17.4.3).
That means, volatile gives you more than happens-before. It gives you sequential consistency if used for all conflicting accesses (§17.4.3).

The usual example:
while(!condition)
sleep(10);
if condition is volatile, this behaves as expected. If it is not, the compiler is allowed to optimize this to
if(!condition)
for(;;)
sleep(10);
This is completely orthogonal to atomicity: if condition is of a hypothetical integer type that is not atomic, then the sequence
thread 1 writes upper half to 0
thread 2 reads upper half (0)
thread 2 reads lower half (0)
thread 1 writes lower half (1)
can happen while the variable is updated from a nonzero value that just happens to have a lower half of zero to a nonzero value that has an upper half of zero; in this case, thread 2 reads the variable as zero. The volatile keyword in this case makes sure that thread 2 really reads the variable instead of using its local copy, but it does not affect timing.
Third, atomicity does not protect against
thread 1 reads value (0)
thread 2 reads value (0)
thread 1 writes incremented value (1)
thread 2 writes incremented value (1)
One of the best ways to use atomic volatile variables are the read and write counters of a ring buffer:
thread 1 looks at read pointer, calculates free space
thread 1 fills free space with data
thread 1 updates write pointer (which is `volatile`, so the side effects of filling the free space are also committed before)
thread 2 looks at write pointer, calculates amount of data received
...
Here, no lock is needed to synchronize the threads, atomicity guarantees that the read and write pointers will always be accessed consistently and volatile enforces the necessary ordering.

For question 1, the risk is only reduced (and not eliminated) because volatile only applies to a single read/write operation and not more complex operations such as increment, decrement, etc.
For question 2, the effect of volatile is to make changes immediately visible to other threads. As the quoted passage states "this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible." Simply because reads are atomic does not mean that they are thread safe. So establishing a happens before relationship is almost a (necessary) side-effect of guaranteeing memory consistency across threads.

Ad 1: With a volatile variable, the variable is always checked against a master copy and all threads see a consistent state. But if you use that volatility variable in a non-atomic operation writing back the result (say a = f(a)) then you might still create a memory inconsistency. That's how I would understand the remark "reduces the risk". A volatile variable is consistent at the time of read, but you still might need to use a synchronize.
Ad 2: I don't know. But: If your definition of "happens before" includes the remark
This means that changes to a volatile variable are always visible to other threads. What's more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the volatile, but also the side effects of the code that led up the change.
I would not dare to rely on any other property except that volatile ensures this. What else do you expect from it?!

Assume that you have a CPU with a CPU cache or CPU registers. Independent from your CPU architecture in terms of number of cores it has, volatile does NOT guarantee you a perfect inconsistency. The only way to achieve this is to use synchronized or atomic references with a performance price.
For example you have multiple threads (Thread A & Thread B) working on a shared data. Assume that Thread A wants to update the shared data and it's is started .For performance reasons, Thread A's stack was moved to CPU cache or registers. Then Thread A updated the shared data. But the problem with those places is that actually they don't flush back the updated value to the main memory immediately. This is where inconsistency's offered because up to the flash back operation, Thread B might have wanted to play with the same data, which would have taken it from the main memory - yet unupdated value.
If you use volatile all the operations will be perfomed on the main memory so you don't have a flush back latency. But, this time you may suffer from thread pipeline. In the middle of write operation (composed of number of atomic operations), Thread B may have been executed by the os to perform a read operation and that's it! Thread B will read the unupdated value again. That's why it's said it reduces the risk.
Hope you got it.

when coming to concurrency, you might want to ensure 2 things:
atomic operations: a set of operations is atomic - this is usually achieved with
"synchronized" (higher level constructs). Also with volatile for instance for read/write on long and double.
visibility: a thread B sees a modification made by a thread A. Even if an operation is atomic, like a write to an int variable, a second thread can still see a non-up-to-date value of the variable, due to processor caches. Putting a variable as volatile ensures that the second thread does see the up-to-date value of that variable. More than that, it ensures that the second thread sees an up-to-date value of ALL the variables written by the first thread before the write to the volatile variable.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.