I am reading the JLS spec on memory model, 17.4.5 Happens-before Order.
I do not understand the first rule:
"# If x and y are actions of the same thread and x comes before y in program
order, then hb(x, y)."
Let's assume A an B are objects (instances of class object) that can be shared between multiple threads:
int i=A.getNum(); // ActionA
int j=B.getNum(); // ActionB
Three questions:
According to the above rule, does it mean hb(ActionA,ActionB)?
If the answer to 1 is true, does it mean according to happens-before rule, ActionB can not reordered to come before ActionA in any JVM that follows JSR133 memory model?
If 1 and 2 both are true, it seems that ActionA and ActionB are not relevant, why can not reorder them? just for this spec?
It is my understanding that:
you're right
they can be reordered, but only if action B doesn't depend on result of action A
Happens-before relationship doesn't say anything about reordering actions. It only says that if HB(A, B) holds, then action B must see memory effects of action A.
If action B doesn't use any result of action A, then there is no reason why they cannot be reordered. (In general, "use any result of another action" is pretty broad, and it can only be detected for quite simple actions, like memory reads/writes, not for actions using external resources like I/O operations, or time-based operations)
Yes, ActionA happens before ActionB. Read further in that section though. It doesn't necessarily mean that the JVM can't reorder these. It means that ActionB must observe the effect of ActionA, that is all. If ActionB never depends on ActionA's effect, that is trivially true.
You are basically correct in your understanding. However, the key thing to remember is:
Reordering is allowed if it doesn't affect the outcome of the thread in which it runs
This doesn't mean reordering isn't allowed if it affects other threads
It is this last fact that is a common source of errors and bewilderment in multi-threaded programming in java.
Related
According to the Java Memory Model (JMM):
A program is correctly synchronized if and only if all sequentially consistent executions are free of data races.
If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent (§17.4.3).
I don't see how the the fact that no SC execution has data races guarantees that every execution has no data races (which means that every execution is SC).
Is there a proof of that?
What I found:
A blog post by Jeremy Manson (one of the authors of the JMM).
The following paragraph might mean that the guarantee is provided by causality (but I don't see how):
So there is the intuition behind the model. When you want to justify the fact that a read returns the value of a write, you can do so if:
a) That write happened before it, or
b) You have already justified the write.
The way that the model works is that you start with an execution that has all of the actions justified by property a), and you start justifying actions iteratively. So, in the second example above, you create successive executions that justify 0, then 2, then 3, then 4, then 1.
This definition has the lovely property of being reasonably intuitive and also guaranteeing SC for DRF programs.
Foundations of the C++ Concurrency Memory Model.
This article describes C++ memory model (which has similarities with the JMM).
Section 8 of the article has a proof of a similar guarantee for C++:
THEOREM 8.1. If a program allows a type 2 data race in a consistent execution, then there exists a sequentially consistent execution, with two conflicting actions, neither of which happens before the other.8
In effect, we only need to look at sequentially consistent executions in order to determine whether there is a data race in a consistent execution.
[...]
8 The latter is essentially the condition used in [25] to define “correctly synchronized” for Java.
Unfortunately, I'm not sure this proof holds for the JMM because the following doesn't work:
Consider the longest prefix P of <T that contains no data race. Note that each load in P must see a store that precedes it in either the synchronization or happens-before orders.
It seems to me that the above doesn't work in JMM because causality allows a read to return a later store.
The proof is in The Java Memory Model by J.Manson, W.Pugh and S.Adve:
9.2.1 Correctly synchronized programs exhibit only sequentially consistent behaviors
Lemma 2. Consider an execution E of a correctly synchronized program P that is legal under the Java memory model. If, in E, each read sees a write that happens-before it, E has sequentially consistent behavior.
Proof:
Since the execution is legal under the memory model, the execution is happens-before consistent and synchronization order consistent.
A topological sort on the happens-before edges of the actions in an execution gives a total order consistent with program order and synchronization order. Let r be the first read in E that doesn’t see the most recent conflicting write w in the sorted order but instead sees w′. Let the topological sort of E be αw′βwγrδ.
Let αw′βwγr′δ′ be the topological sort of an execution E′. E′ is obtained exactly as E, except that instead of r, it performs the action r′, which is the same as r except that it sees w; δ′ is any sequentially consistent completion of the program such that each read sees the previous conflicting write.
The execution E′ is sequentially consistent, and it is not the case that w′ ‒hb→ w ‒hb→ r, so P is not correctly synchronized.
Thus, no such r exists and the program has sequentially consistent behavior. □
Theorem 3. If an execution E of a correctly synchronized program is legal under the Java memory model, it is also sequentially consistent.
Proof: By Lemma 2, if an execution E is not sequentially consistent, there must be a read r that sees a write w such that w does not happen-before r. The read must be committed, because otherwise it would not be able to see a write that does not happen-before it. There may be multiple reads of this sort; if so, let r be the first such read that was committed. Let Eᵢ be the execution that was used to justify committing r.
The relative happens-before order of committed actions and actions being committed must remain the same in all executions considering the resulting set of committed actions. Thus, if we don’t have w ‒hb→ r in E, then we didn’t have w ‒hb→ r in Eᵢ when we committed r.
Since r was the first read to be committed that doesn’t see a write that happens-before it, each committed read in Eᵢ must see a write that happens-before it. Non-committed reads always sees writes that happens-before them. Thus, each read in Eᵢ sees a write that happens-before it, and there is a write w in Eᵢ that is not ordered with respect to r by happens-before ordering.
A topological sort of the actions in Eᵢ according to their happens-before edges gives a total order consistent with program order and synchronization order. This gives a total order for a sequentially consistent execution in which the conflicting accesses w and r are not ordered by happens-before edges. However, Lemma 2 shows that executions of correctly synchronized programs in which each read sees a write that happens-before it must be sequentially consistent. Therefore, this program is not correctly synchronized. This is a contradiction. □
The part of the language specification dedicated to the Java Memory Model (JMM) (link) mentions "execution trace" a lot.
For example right from the start:
A memory model describes, given a program and an execution trace of that program, whether the execution trace is a legal execution of the program. The Java programming language memory model works by examining each read in an execution trace and checking that the write observed by that read is valid according to certain rules.
But I cannot find there any description/definition of this term.
So, what is "execution trace" exactly according to the JMM, and what exactly does it consist of?
References to specific places in the language specification text are most welcome.
You're right; it's not very clear. They also refer to it as "program trace", and simply "trace" on its own.
The following is a quote:
Consider, for example, the example program traces shown in Table 17.4-A.
Table 17.4-A.
Thread 1
Thread 2
B = 1;
A = 2;
r2 = A;
r1 = B;
So, it's simply an ordered list of statements, per thread, representing one possible permutation of how the statements may be executed (since statement may be reordered). A trace may be valid or invalid within the JMM; they are used to exemplify what is legal and what is not.
This is not a full-fledged answer, but I think this is worth mentioning.
Even if we don't know what an "execution trace" is in details, we can deduce which information it should provide.
Let's read the first paragraph of 17.4. Memory Model:
A memory model describes, given a program and an execution trace of that program, whether the execution trace is a legal execution of the program. The Java programming language memory model works by examining each read in an execution trace and checking that the write observed by that read is valid according to certain rules.
This means that "a program" (i.e. source code) and "an execution trace" should provide all the information required to determine whether the program execution is legal.
The information is described in 17.4.6. Executions.
I'm not going to copy-paste it here because it's too long.
I'll try to explain it in simple words instead:
a program consists of statements, each statement consists of (possibly nested) expressions evaluated in some order
an execution of a thread can be represented as a sequence of actions: one action per every simple expression
an execution of a program is several threads executing in parallel
an execution trace should provide information about actions performed during the program execution, i.e. it should provide provide the following information:
all executed actions: a sequence of actions per every thread
Note: the JMM only cares about so called inter-thread actions (17.4.2. Actions):
An inter-thread action is an action performed by one thread that can be detected or directly influenced by another thread
Inter-thread action kinds:
read/write
volatile read/write
lock/unlock
various special and synthetic actions (e.g. thread start/stop, etc.)
for every action it should store:
thread id
action kind
what expression in the source code it corresponds to
for write and volatile write: the written value
for read and volatile read: the write action, which provided the value
for lock/unlock: the monitor being locked/unlocked
various relations with other actions (e.g. position in a so-called synchronization order for synchronization actions)
According to this reorder rules
reorder Rules
if I have code like this
volatile int a = 0;
boolean b = false;
foo1(){ a= 10; b = true;}
foo2(){if(b) {assert a==10;}}
Make Thread A to run foo1 and Thread b to run foo2, since a= 10 is a volatile store and
b = true is a normal store, then these two statements could possible be reordered, which means in Thread B may have b = true while a!=10? Is that correct?
Added:
Thanks for your answers!
I am just starting to learn about java multi-threading and have been troubled with keyword volatile a lot.
Many tutorial talk about the visibility of volatile field, just like "volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it". I have doubt about how could a completed write on field being invisible to other Threads(or CPUS)?
As my understanding, a completed write means you have successfully written the filed back to cache, and according to the MESI, all others thread should have an Invalid cache line if this filed have been cached by them. One exception ( Since I am not very familiar with the hardcore, this is just a conjecture )is that maybe the result will be written back to the register instead of cache and I do not know whether there is some protocol to keep consistency in this situation or the volatile make it not to write to register in java.
In some situation that looks like "invisible" happens examples:
A=0,B=0;
thread1{A=1; B=2;}
thread2{if(B==2) {A may be 0 here}}
suppose the compiler did not reorder it, what makes we see in thread2 is due to the store buffer, and I do not think a write operation in store buffer means a completed write. Since the store buffer and invalidate queue strategy, which make the write on variable A looks like invisible but in fact the write operation has not finished while thread2 read A. Even we make field B volatile, while we set a write operation on field B to the store buffer with memory barriers, thread 2 can read the b value with 0 and finish. As for me, the volatile looks like is not about the visibility of the filed it declared, but more like an edge to make sure that all the writes happens before volatile field write in ThreadA is visible to all operations after volatile field read( volatile read happens after volatile field write in ThreadA has completed ) in another ThreadB.
By the way, since I am not an native speakers, I have seen may tutorials with my mother language(also some English tutorials) say that volatile will instruct JVM threads to read the value of volatile variable from main memory and do not cache it locally, and I do not think that is true. Am I right?
Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.
I'm pretty sure the assert can fire. I think a volatile load is only an acquire operation (https://preshing.com/20120913/acquire-and-release-semantics/) wrt. non-volatile variables, so nothing is stopping load-load reordering.
Two volatile operations couldn't reorder with each other, but reordering with non-atomic operations is possible in one direction, and you picked the direction without guarantees.
(Caveat, I'm not a Java expert; it's possible but unlikely volatile has some semantics that require a more expensive implementation.)
More concrete reasoning is that if the assert can fire when translated into asm for some specific architecture, it must be allowed to fire by the Java memory model.
Java volatile is (AFAIK) equivalent to C++ std::atomic with the default memory_order_seq_cst. Thus foo2 can JIT-compile for ARM64 with a plain load for b and an LDAR acquire load for a.
ldar can't reorder with later loads/stores, but can with earlier. (Except for stlr release stores; ARM64 was specifically designed to make C++ std::atomic<> with memory_order_seq_cst / Java volatile efficient with ldar and stlr, not having to flush the store buffer immediately on seq_cst stores, only on seeing an LDAR, so that design gives the minimal amount of ordering necessary to still recover sequential consistency as specified by C++ (and I assume Java).)
On many other ISAs, sequential-consistency stores do need to wait for the store buffer to drain itself, so they are in practice ordered wrt. later non-atomic loads. And again on many ISAs, an acquire or SC load is done with a normal load preceded with a barrier which blocks loads from crossing it in either direction, otherwise they wouldn't work. That's why having the volatile load of a compile to an acquire-load instruction that just does an acquire operation is key to understanding how this can happen in practice.
(In x86 asm, all loads are acquire loads and all stores are release stores. Not sequential-release, though; x86's memory model is program order + store buffer with store-forwarding, which allows StoreLoad reordering, so Java volatile stores need special asm.
So the assert can't fire on x86, except via compile/JIT-time reordering of the assignments. This is a good example of one reason why testing lock-free code is hard: a failing test can prove there is a problem, but testing on some hardware/software combo can't prove correctness.)
Answer to your addition.
Many tutorial talk about the visibility of volatile field, just like
"volatile field becomes visible to all readers (other threads in
particular) after a write operation completes on it". I have doubt
about how could a completed write on field being invisible to other
Threads(or CPUS)?
The compiler might mess up code.
e.g.
boolean stop;
void run(){
while(!stop)println();
}
first optimization
void run(){
boolean r1=stop;
while(!r1)println();
}
second optimization
void run(){
boolean r1=stop;
if(!r1)return;
while(true) println();
}
So now it is obvious this loop will never stop because effectively the new value to stop will never been seen. For store you can do something similar that could indefinitely postpone it.
As my understanding, a completed write means you have successfully
written the filed back to cache, and according to the MESI, all others
thread should have an Invalid cache line if this filed have been
cached by them.
Correct. This is normally called 'globally visible' or 'globally performed'.
One exception ( Since I am not very familiar with the hardcore, this
is just a conjecture )is that maybe the result will be written back to
the register instead of cache and I do not know whether there is some
protocol to keep consistency in this situation or the volatile make it
not to write to register in java.
All modern processors are load/store architectures (even X86 after uops conversion) meaning that there are explicit load and store instructions that transfer data between registers and memory and regular instructions like add/sub can only work with registers. So a register needs to be used anyway. The key part is that the compiler should respect the loads/stores of the source code and limit optimizations.
suppose the compiler did not reorder it, what makes we see in thread2
is due to the store buffer, and I do not think a write operation in
store buffer means a completed write.
Since the store buffer and invalidate queue strategy, which make the
write on variable A looks like invisible but in fact the write
operation has not finished while thread2 read A.
On the X86 the order of the stores in the store buffer are consistent with program order and will commit to the cache in program order. But there are architectures where stores from the store buffer can commit to the cache out of order e.g. due to:
write coalescing
allowing stores to commit to cache as soon as the cache line is returned in the right state no matter if an earlier still is still waiting.
sharing the store buffer with a subset of the CPUs.
Store buffers can be a source of reordering; but also out of order and speculative execution can be a source.
Apart from the stores, reordering loads can also lead to observing stores out of order. On the X86 loads can't be reordered, but on the ARM it is allowed. And of course the JIT can mess things up as well.
Even we make field B volatile, while we set a write operation on field
B to the store buffer with memory barriers, thread 2 can read the b
value with 0 and finish.
It is important to realize that the JMM is based on sequential consistency; so even though it is a relaxed memory model (separation of plain loads and stores vs synchronization actions like volatile load/store lock/unlock) if a program has no data races, it will only produce sequential consistent executions. For sequential consistency the real time order doesn't need to be respected. So it is perfectly fine for a load/store to be skewed as long as:
there memory order is a total order over all loads/stores
the memory order is consistent with the program order
a load sees the most recent write before it in the memory order.
As for me, the volatile looks like is not about the visibility of the
filed it declared, but more like an edge to make sure that all the
writes happens before volatile field write in ThreadA is visible to
all operations after volatile field read( volatile read happens after
volatile field write in ThreadA has completed ) in another ThreadB.
You are on the right path.
Example.
int a=0
volatile int b=;
thread1(){
1:a=1
2:b=1
}
thread2(){
3:r1=b
4:r2=a
}
In this case there is a happens before edge between 1-2 (program order). If r1=1, then there is happens before edge between 2-3 (volatile variable) and a happens before edge between 3-4 (program order).
Because the happens before relation is transitive, there is a happens before edge between 1-4. So r2 must be 1.
volatile takes care of the following:
Visibility: needs to make sure the load/store doesn't get optimized out.
That is load/store is atomic. So a load/store should not be seen partially.
And most importantly, it needs to make sure that the order between 1-2 and 3-4 is preserved.
By the way, since I am not an native speakers, I have seen may
tutorials with my mother language(also some English tutorials) say
that volatile will instruct JVM threads to read the value of volatile
variable from main memory and do not cache it locally, and I do not
think that is true.
You are completely right. This is a very common misconception. Caches are the source of truth since they are always coherent. If every write needs to go to main memory, programs would become extremely slow. Memory is just a spill bucket for whatever doesn't fit in cache and can be completely incoherent with the cache. Plain/volatile loads/stores are stored in the cache. It is possible to bypass the cache for special situations like MMIO or when using e.g. SIMD instructions but it isn't relevant for these examples.
Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.
Most people here are not a native speaker (I'm certainly not). Your English is good enough and you show a lot of promise.
In addition to Peter Cordes his great answer, in terms of the JMM there is a data race on b since there is no happens before edge between the write of b and the read of b because it is a plain variable. Only if this happens before edge would exist, then you are guaranteed that if load of b=1 that also the load of a=1 is seen.
Instead of making a volatile, you need to make b volatile.
int a=0;
volatile int b=0;
thread1(){
a=1
b=1
}
thread2(){
if(b==1) assert a==1;
}
So if thread2 sees b=1, then this read is ordered before the write of b=1 in the happens before order (volatile variable rule). And since a=1 and b=1 are ordered happens before order (program order rule), and read of b and the read of a are ordered in the happens before order (program order rule again), then due to the transitive nature of the happens before relation, there is a happens before edge between the write of a=1 and the read of a; which needs to see the value 1.
You are referring to a possible implementation of the JMM using fences. And although it provides some insights into what happens under the hood, it is equally damaging to think in terms of fences because they are not a suitable mental model. See the following counter example:
https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#myth-barriers-are-sane
Yes, the assert can fail.
volatile int a = 0;
boolean b = false;
foo1(){ a= 10; b = true;}
foo2(){if(b) {assert a==10;}}
The JMM guarantees that writes to volatile fields happen-before reads from them. In your example, whatever thread a did before a = 10 will happen-before whatever thread b does after reading a (while executing assert a == 10). Since b = true executes after a = 10 for thread a (for a single thread, happens-before is always holds), there is no guarantee that there'll be an ordering guarantee. However, consider this:
int a = 0;
volatile boolean b = false;
foo1(){ a= 10; b = true;}
foo2(){if(b) {assert a==10;}}
In this example, the situation is:
a = 10 ---> b = true---|
|
| (happens-before due to volatile's semantics)
|
|---> if(b) ---> assert a == 10
Since you have a total order, the assert is guaranteed to pass.
I'm currently learning about the Java Memory Model, and how it affects the reorderings that a compiler may make. However, I'm a bit confused about external operations. The JMM defines them as an action that may be observable outside of on operation. Going off of this question, I understand external actions to be things like printing a value, writing to a file, network operations, etc.
Now, how are external actions affected by reordering? I think it's obvious that an external action cannot be reordered with another external action, as this will change the observable behaviour of the program (and thus is not a valid transformation according to the JMM). But what about reordering an external action with a normal memory access, or a synchronisation action? For example:
volatile int v = 5;
int x = v;
System.out.println("!");
Can the print and int x = v be reordered here? I can't see it changing behaviour, but the read of volatile v is the same as a monitor acquire, so I don't think the reordering is valid.
External actions are added to avoid surprising outcomes:
class ExternalAction {
int foo = 0;
void method() {
jni();
foo = 42;
}
native void jni(); /* {
assert foo == 0;
} */
}
Assuming that the JNI method was implemented to run the same assertion, you would not expect this to fail. The JIT compiler cannot determine the outcome of anything external such that the JMM forbidds such reorderings, too.
I think it's obvious that an external action cannot be reordered with another external action, as this will change the observable behaviour of the program (and thus is not a valid transformation according to the JMM).
According to the JLS, observable behaviour doesn't require a total order of all external actions:
Note that a behavior B does not describe the order in which the external actions in B are observed, but other (internal) constraints on how the external actions are generated and performed may impose such constraints.
It seems that two external actions cannot be reordered if the result of the 1st external action is used as a parameter of the 2nd external action (either used directly, or indirectly — to compute the value of a parameter)
This is what the JLS says about the result of an external action:
An external action tuple contains an additional component, which contains the results of the external action as perceived by the thread performing the action. This may be information as to the success or failure of the action, and any values read by the action.
I suppose there could be strong ordering guarantees for external action which might access JVM's internal state — like jndi as explained in the Rafael's answer.
Other than that it seems like the JLS allows almost anything:
An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.
This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization.
Of course, Java implementations can provide stronger guarantees. This would be legal because stronger guarantees don't produce new behaviours.
I was trying to learn the Java Memory Model, but still cannot understand how people use it in practice.
I know that many just rely on appropriate memory barriers (as described in the Cookbook), but in fact the model itself does not operate such terms.
The model introduces different orders defined on a set of actions and defines so called "well-formed executions".
Some people are trying to explain the memory model restrictions using one of such orders, namely "happens-before", but it seems like the order, at least by itself, does not define acceptable execution:
It should be noted that the presence of a happens-before relationship between two actions does not necessarily imply that they have to take place in that order in an implementation. If the reordering produces results consistent with a legal execution, it is not illegal
My question is how can one verify that certain code or change can lead to an "illegal execution" in practice (according to the model) ?
To be more concrete, let's consider a very simple example:
public class SomeClass {
private int a;
private int b;
public void someMethod() {
a = 2; // 1
b = 3; // 2
}
// other methods
}
It's clear that within the thread w(a = 2) happens before w(b = 3) according to the program order.
How can compiler/optimizer be sure that reordering 1 and 2 won't produce an "illegal execution" (strictly in terms of the model) ? And why if we set b to be volatile it will ?
Are you asking about how the VM/JIT analyzes the bytecode flow? Thats far too broad to answer, entire research papers have been written about that. And what the VM actually implements may change from release to release.
Or is the question simply about which rules of the memory model govern what is "legal"? For the executing thread, the memory model already makes the strong guarantee that every action on a given thread appears to happen in program order for that thread. That means if the JIT determines by whatever method(s) it implements for reordering that the reordering produces the same observable result(s) is legal.
The presence of actions that establish happens-before guarantees with respect to other threads (such as volatile accesses) simply adds more constraints to the legal reorderings.
Simplified it could be memorized as that everything that happened in program order before also appears to have (already) happened to other threads when a happend-before establishing action is executed.
For your example that means, in case of non-volatile (a, b) only the guarantee "appears to happen in program order" (to the executing thread) needs to be upheld, that means any reordering of the writes to (a, b) is legal, even delaying them until they are actually read (e.g. holding the value in a CPU register and bypassing main memory) would be valid. It could even omit writting the members at all if the JIT detects they are never actually read before the object goes out of scope (and to be precise, there is also no finalizer using them).
Making b volatile in your example changes the constraints in that other threads reading b would also be guaranteed to see the last update of a because it happened before the write to b. Again simplified, happens-before actions extend some of the perceived ordering guarantees from the executing thread to other threads.
It seems you are making the common mistake of thinking too much about low level aspects of the JMM in isolation. Regarding your question “how people use it in practice”, if you are talking about an application programmer, (s)he will use it in practice by not thinking about memory barriers or possible reorderings all the time.
Regarding your example:
public void someMethod() {
a = 2; // 1
b = 3; // 2
}
Given a and b are non-final, non-volatile.
It's clear that within the thread w(a = 2) happens before w(b = 3) according to the program order. How can compiler/optimizer be sure that reordering 1 and 2 won't produce an "illegal execution" (strictly in terms of the model) ?
Here, it backfires that you are focusing on re-ordering in isolation. First of all, the resulting code (of HotSpot optimization, JIT compilation, etc.) does not need to write the values to the heap memory at all. It might hold the new values in CPU registers and use it from there in subsequent operations of the same thread. Only when reaching a point were these changes have to be made visible to other threads they have to be written to the heap. Which may happen in arbitrary order.
But if, for example, the caller of the method enters an infinite loop after calling this method, the values don’t have to be written ever.
And why if we set b to be volatile it will ?
Declaring b as volatile does not guaranty that a and b are written. This is another mistake which arises from focusing on memory barriers.
Let’s go more abstract:
Suppose you have two concurrent actions, A and B. For concurrent execution in Java, there are several perfectly valid behaviors, including:
A might be executed entirely before B
B might be executed entirely before A
All or parts of A and B run in parallel
in the case B is executed entirely before A, there is no sense in having a write barrier in A and a read barrier in B, B will still not notice any activities of A. You can draw your conclusions about different parallel scenarios from this starting point.
This is where the happens-before relationship comes into play: a write of a value to a volatile variable happens before a read of that value from that variable by another thread. If the read operation is executed before the write operation, the reading thread will not see the value and hence there’s no happens-before relationship and so there is no statement about the other variables we can make.
To stay at your example with b being volatile: this implies that if a reading thread reads b and reads the value 3, and only then it is guaranteed to see the value of 2 (or an even more recent value if there are other writes) for a on subsequent reads.
So if a JVM can prove that there will never be a read operation on b seeing the written value, maybe because the entire instance we are modifying will never be seen by another thread, there is no happens-before relationship to be ever established, in other words, b being volatile has no impact on the allowed code transformations in this case, i.e. it might be reordered as well, or even never written to the heap at all.
So the bottom line is that it is not useful to look at a small piece of code and ask whether it will allow reordering or whether it will contain a memory barrier. This might not even be answerable as the answer might change depending on how the code is actually used. Only if your view is wide enough to see how threads will interact when accessing the data and you can safely deduce whether a happens-before relationship will be established you can start drawing conclusions about the correct working of the code. As you found out by yourself, correct working does not imply that you know whether reordering will happen or not on the lowest level.