Java Memory Model: volatiles and reads/writes reordering

Java Memory Model: volatiles and reads/writes reordering - java

I'm having problems understanding how JMM relates to possible instructions reordering. For example, let's consider the following code snippet:
volatile boolean t1HasArrived = false, t2HasArrived = false;
// In thread 1
t1HasArrived = true;
while (!t2HasArrived)
;
// In thread 2
t2HasArrived = true;
while (!t1HasArrived)
;
Now, I want to believe that this code implements Barrier synchronization for two threads. But I'm not sure about it. The thing that gives me doubts is read/writes reordering: is there anything in JMM that would prevent compiler or CPU to rearrange execution path of the code like the code snippet below? And if not, how do you actually prove that such reordering is allowed?
// In thread 1
while (!t2HasArrived)
;
t1HasArrived = true;
// In thread 2
while (!t1HasArrived)
;
t2HasArrived = true;
Note, that I'm not trying to implement locks-free Barrier. This is just an example that came to my mind after I started thinking about instructions reordering. I just want to understand how to apply JMM rules to it. It is relatively easy to reason about piece of code when there is only one volatile variable/lock involved, but when there are several, things become complicated.

Volatile variables cannot be reordered with each other by the JMM.
In your case, you have a volatile store followed by a volatile load, and those cannot be reordered into a load followed by a store. This is true for all versions of Java.
See The JSR-133 Cookbook for Compiler Writers for specifics.
Lasciate ogni speranza, voi ch’entrate.
From the JSR-133 specification:
§5 Definitions
Inter-thread Actions An inter-thread action is an action performed by one thread that can be detected or directly influenced by another thread. Inter-thread actions include reads and writes of shared variables and synchronization actions, such as locking or unlocking a monitor, reading or writing a volatile variable, or starting a thread.
Synchronization Actions Synchronization actions include locks, unlocks, reads of and writes to volatile variables, actions that start a thread, and actions that detect that a thread is done.
§7.3 Well-Formed Executions
We only consider well-formed executions. An execution E = 〈P, A,
po→, so→, W, V, sw→, hb→〉 is well formed if the following conditions are true:
Synchronization order is consistent with program order and mutual exclusion. Having synchronization order is consistent with program order implies that the happensbefore order, given by the transitive closure of synchronizes-with edges and program order, is a valid partial order: reflexive, transitive and antisymmetric. Having synchronization order consistent with mutual exclusion means that on each monitor, the lock and unlock actions are correctly nested.

The JMM defines reordering restrictions as the transitive closure of certain orderings of program instructions. If an ordering exists between a write and a read, the JVM is required to return the value according to this ordering.
In your case, a synchronization order for any volatile read observing the value of a volatile write is implied by the fields being volatile. A synchronization order, requires a thread to observe all written fields that were comitted previous to the volatile write after the volatile read in the order of the program.
This implies that whenever a volatile field is read, the program order of your application requires the volatile writes to be comitted according to program order what induces a happens-before relationship of the writes to the reads. Thus, the reordering that you suggest as an optimzation is invalid and the JMM guarantees the visibility that your original source code implies.
I recently gave a presentation on the JMM if you want to understand this in greater detail (time - 7 hours, 27 minutes).

Related

Understanding atomic-ness, visibility and reordering of synchonized blocks vs volatile variables in Java

I am trying to understand volatile keyword from the book Java Concurrency in Practice. I compares synchronized keyword with volatile variables in three aspects: atomic-ness, volatility and reordering. I have some doubts about the same. I have discussed them one by one below:
1) Visibility: `synchronized` vs `volatile`
Book says following with respect to visibility of synchronized:
Everything thread A did in or prior to a synchronized block is visible to B when it executes a synchronized block guarded by the same lock.
It says following with respect to visibility of volatile variables:
Volatile variables are not cached in registers or in caches where they are hidden from other processors, so a read of a volatile variable always returns the most recent write by any thread.
The visibility effects of volatile variables extend beyond the value of the
volatile variable itself. When thread A writes to a volatile variable and subsequently thread B reads that same variable, the values of all variables that were visible to A prior to writing to the volatile variable become visible to B after reading the volatile variable. So from a memory visibility perspective, writing a volatile variable is like exiting a synchronized block and reading a volatile variable is like entering a synchronized block.
Q1. I feel second paragraph above (of volatile) corresponds to what book said about synchronized. But is there synchronized-equivalent to volatile's first paragraph? In other words, does using synchronized ensures any / some variables not getting cached in processor caches and registers?
Note that book also says following about visibility for synchronized:
Locking is not just about mutual exclusion; it is also about memory visibility.
2) Reordering: `synchornized` vs `volatile`
Book says following about volatile in the context of reordering:
When a field is declared volatile, the compiler and runtime are put on notice that this variable is shared and that operations on it should not be reordered with other memory operations.
Q2. Book does not say anything about reordering in the context of synchronized. Can someone explain what can be said of reordering in the context of synchronized?
3) Atomicity
Book says following about atomicity of synchronized and volatile.
the semantics of volatile are not strong enough to make the increment operation (count++) atomic, unless you can guarantee that the variable is written only from a single thread.
Locking can guarantee both visibility and atomicity; volatile variables can
only guarantee visibility.
Q3. I guess this means two threads can see volatile int a together, both will increment it and then save it. But only one last read will have effect, thus making whole "read-increment-save" non atomic. Am I correct with this interpretation on non-atomic-ness of volatile?
Q4. Does all locking-equivalent are comparable and have same visibility, ordering and atomicity property: synchronized blocks, atomic variables, locks?
PS: This question is related to and completely revamped version of this question I asked some days back. Since its full revamp, I havent deleted the older one. I wrote this question in more focused and structured way. Will delete older once I get answer to this one.

The key difference between 'synchronized' and 'volatile', is that 'synchronized' can make threads pause, whereas volatile can't.
'caches and registers' isn't a thing. The book says that because in practice that's usually how things are implemented, and it makes it easier (or perhaps not, given these questions) to understand the how & why of the JMM (java memory model).
The JMM doesn't name them, however. All it says is that the VM is free to give each thread its own local copy of any variable, or not, to be synchronized at some arbitrary time with some or all other threads, or not... unless there is a happens-before relationship anywhere, in which case the VM must ensure that at the point of execution between the two threads where a happens before relationship has been established, they observe all variables in the same state.
In practice that would presumably mean to flush caches. Or not; it might mean the other thread overwrites its local copy.
The VM is free to implement this stuff however it wants, and differently on every architecture out there. As long as the VM sticks to the guaranteed that the JMM makes, it's a good implementation, and as a consequence, your software must work given solely those guarantees and no other assumptions; because what works on your machine might not work on another if you rely on assumptions that aren't guaranteed by the JMM.
Reordering
Reordering is also not in the VM spec, at all. What IS in the VM spec are the following two concepts:
Within the confines of a single thread, all you can observe from inside it is consistent with an ordered view. That is, if you write 'x = 5; y = 10;' it is not possible to observe, from within the same thread, y being 10 but x being its old value. Regardless of synchronized or volatile. So, anytime it can reorder things without that being observable, then the VM is free to. Will it? Up to the VM. Some do, some don't.
When observing effects caused by other threads, and you have not established a happens-before relationship, you may see some, all, or none of these effects, in any order. Really, anything can happen here. In practice, then: Do NOT attempt to observe effects caused by other threads without establishing a happens-before, because the results are arbitrary and untestable.
Happens-before relationships are established by all sorts of things; synchronized blocks obviously do it (if your thread is frozen trying to acquire a lock, and it then runs, any synchronized blocks on that object that finished 'happened before', and anything they did you can now observe, with the guarantee that what you observe is consistent with those things having run in order, and where all data they wrote you can see (as in, you won't get an older 'cache' or whatnot). Volatile accesses do as well.
Atomicity
Yes, your interpretation of why x++ is not atomic even if x is volatile, is correct.
I'm not sure what your Q4 is trying to ask.
In general, if you want to atomically increment an integer, or do any of many other concurrent-esque operations, look at the java.util.concurrent package. These contain efficient and useful implementations of various concepts. AtomicInteger, for example, can be used to atomically increment something, in a way that is visible from other threads, while still being quite efficient (for example, if your CPU supports Compare-And-Set (CAS) operations, Atomicinteger will use it; not something you can do from general java without resorting to Unsafe).

Just to complement rzwitserloot excellent answer:
A1. You can think of it like so: synchronized guarantees that all cashed changes will become visible to other threads that enter a synchronized block (flushed from the cache) once the first thread exits the synchronized block and before the other enters.
A2. Operations executed by a thread T1 within a synchronized block appear to some other thread T2 as not reordered if and only if T2 synchronizes on the same guard.
A3. I'm not sure what you understand by that. It may happen that when incrementing both threads will first perform a read of the variable a which will yield some value v, then both threads will locally increase their local copy of the value v producing v' = v + 1, then both threads will write v' to a. Thus finally the value of a could be v + 1 instead of v + 2.
A4. Basically yes, although in a synchronized block you can perform atomically many operations, while atomic variables allow you to only do a certain single operation like an atomic increment. Moreover, the difference is that when using the synchronized block incorrectly, i.e. by reading variables outside of a synchronized block which are modified by another thread within a synchronized block, you can observe them not-atomically and reordered. Something which is impossible with atomic variables. Locking is exactly the same as synchronized.

Q1. I feel second paragraph above (of volatile) corresponds to what book said about synchronized.
Sure. volatile access can be regarded as synchronization lite.
But is there
synchronized-equivalent to volatile's first paragraph? In other words,
does using synchronized ensures any / some variables not getting
cached in processor caches and registers?
The book has confused you by mixing levels. volatile access has nothing directly to do with processor caches or registers, and in fact the book is surely incorrect about the caches. Volatility and synchronization are about the inter-thread visibility of certain actions, especially of writes to shared variables. How the semantics are implemented is largely a separate concern.
In any case, no, synchronization does not put any constraints on storage of variables. Everything to do with synchronized semantics happens at the boundaries of synchronized regions. This is why all accesses to a given variable from a set of concurrently running threads must be synchronized on the same object in order for a program to be properly synchronized with respect to that variable.
Book says following about volatile in the context of reordering:
When a field is declared volatile, the compiler and runtime are put on notice that this variable is shared and that operations on it
should not be reordered with other memory operations.
Q2. Book does not say anything about reordering in the context of synchronized. Can someone explain what can be said of reordering in
the context of synchronized?
But that already does say something (not everything) about synchronized access. You need to understand that a "memory operation" in this sense is a read or write of a shared variable, or acquiring or releasing any object's monitor. Entry to a synchronized region involves acquiring a monitor, so already the book says, correctly, that volatile accesses will not be reordered across the boundaries of a synchronized region.
More generally, reads of shared variables will not be reordered relative to the beginning of a synchronized region, and writes will not be reordered relative to the end of one.
Q3. I guess this means two threads can see volatile int a together, both will increment it and then save it. But only one last
read will have effect, thus making whole "read-increment-save" non
atomic. Am I correct with this interpretation on non-atomic-ness of
volatile?
Yes. The autoincrement operator performs both a read and a write of the variable to which it is applied. If that variable is volatile then volatile semantics apply to those individually, so other operations on the same variable can happen between if there is no other protection.
Q4. Does all locking-equivalent are comparable and have same visibility, ordering and atomicity property: synchronized blocks,
atomic variables, locks?
Huh? This sub-question is much too broad. You're reading a whole book about it. Generally speaking, though, no these mechanisms have some characteristics in common and some that differ. All have effects on on the visibility of memory operations and their ordering, but they are not identical. "Atomicity" is a function of the other two.

When is locking necessary

Ok, I know this may sound quite stupid (and I'm afraid it is), but I'm not completely satisfied with the answer I gave myself so I thought it was worth it asking it here.
I'm dealing with an exercise about concurrency (in Java) which goes like this
Given a solved Sudoku chart, determine, using a fixed number of threads running at the same time, whether the chart has been correctly solved, i.e. no violation of the canonical rules occur (a number must appear within its row, its column, and its block only once).
Now my question is: since the threads only have to perform "reads", gathering infos from the chart and elaborating them somewhere else, couldn't they work without worrying about concurrency? Chart's state is always consistent since no "writes" are performed, hence it's never changed.
Aren't locks/synchronized blocks/synchronized methods necessary if and only if there's a risk for resources' consistency to be lost? In other words, did I understand concurrency the right way?

This is a fairly subtle question, not stupid at all.
Multiple threads that are reading a data structure concurrently may do so without synchronization, only if the data structure has been safely published. This is memory visibility issue, not a timing issue or race condition.
See section 3.5 of Goetz, et. al., Java Concurrency In Practice, for further discussion of the concept of safe publication. Section 3.5.4 on "Effectively Immutable Objects" seems applicable here, as the board becomes effectively immutable at a certain point, because it is never written to after it has reached the solved state.
Briefly, the writer threads and the reader threads must perform some memory-coordinating activity to ensure that the reader threads have a consistent view of what has been written. For example, the writer thread could write the sudoku board and then, while holding a lock, store a reference to the board in a static field. The reading threads could then load that reference, while holding the lock. Once they've done that, they are assured that all previous writes to the board are visible and consistent. After that, the reader threads may access the board structure freely, with no further synchronization.
There are other ways to coordinate memory visibility, such as writes/reads to a volatile variable or an AtomicReference. Use of higher-level concurrency constructs, such as latches or barriers, or submitting tasks to an ExecutorService, will also provide memory visibility guarantees.
UPDATE
Based on an exchange in the comments with Donal Fellows, I should also point out that the safe publication requirement also applies when getting results back from the reader threads. That is, once one of the reader threads has a result from its portion of the computation, it needs to publish that result somewhere so that it can be combined with the other reader threads' results. The same techniques can be used as before, such as locking/synchronization over a shared data structure, volatiles, etc. However, this is usually not necessary, since the results can be obtained from a Future returned by ExecutorService.submit or invoke. These constructs handle the safe publication requirements automatically, so the application doesn't have to deal with synchronization.

In my opinion your understanding is correct. Data corruption can only happen if any of the threads is writing on the data.
If you're 100% sure that no thread is writing, then it's safe to skip synchronization and locking...
EDIT: skipping locking in theses cases is the best practice!
:)

No need of Synchronizing the file if it is read-only.Basically lock is applied to critical section.Critical section is ,where different threads accessing the shared memory concurrently.
Since Synchronization makes program slow as no multiple threads access at same time so better not to use lock in case of read-only files.

Imagine you have a bunch of work to complete (check 9 rows, 9 columns, 9 blocks). If you want threads to complete this bunch of 27 units of work and if you want to complete the work without double work, then the threads would need to be synchronized. If on the other hand, you are happy to have threads that may perform a work unit that has been done by another thread, then you don't need to synchronize the threads.

Scenario where Thread1 writes some data and then a bunch of threads need to read this data doesn't require locking if done properly. By properly I mean that your SUDOKU board is an immutable object, and by immutable object I mean:
State cannot be modified after construction
State is not actually modified via some reflection dark magic
All the fields are final
'this' reference does not escape during construction (this could happen if during construction you do something along the lines MyClass.instnce = this).
If you pass this object to the worker threads you are good to go. If your objects don't satisfy all these conditions you still may run into concurrency problems, in most cases it is due to the fact that JVM may reorder statements at will (for performance reasons), and it might reorder these statements in such a way that worker threads are launched before sudoku board was constructed.
Here is a very nice article about immutable objects.

Abstract
For a thread to be guaranteed to observe the effects of a write to main memory, the write must happen-before the read. If write and read occur in different threads, that requires a synchronization action. The spec defines many different kinds of synchronization actions. One such action is executing a synchronized statement, but alternatives exist.
Details
The Java Language Specification writes:
Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
and
More specifically, if two actions share a happens-before relationship, they do not necessarily have to appear to have happened in that order to any code with which they do not share a happens-before relationship. Writes in one thread that are in a data race with reads in another thread may, for example, appear to occur out of order to those reads.
In your case, you want the reading threads to solve the right sudoku. That is, the initialization of the sudoku object must be visible to the reading threads, and therefore the initialization must happen-before the reading threads read from the sudoku.
The spec defines happens-before as follows:
If we have two actions x and y, we write hb(x, y) to indicate that x happens-before y.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
There is a happens-before edge from the end of a constructor of an object to the start of a finalizer (§12.6) for that object.
If an action x synchronizes-with a following action y, then we also have hb(x, y).
If hb(x, y) and hb(y, z), then hb(x, z).
Since reading occurs in a different thread than writing (and not in a finalizer), we therefore need a synchronization action to establish that the write happens-before the read. The spec gives the following exhaustive list of synchronization actions:
An unlock action on monitor m synchronizes-with all subsequent lock actions on m (where "subsequent" is defined according to the synchronization order).
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
An action that starts a thread synchronizes-with the first action in the thread it starts.
The write of the default value (zero, false, or null) to each variable synchronizes-with the first action in every thread. (Although it may seem a little strange to write a default value to a variable before the object containing the variable is allocated, conceptually every object is created at the start of the program with its default initialized values.)
The final action in a thread T1 synchronizes-with any action in another thread T2 that detects that T1 has terminated (T2 may accomplish this by calling T1.isAlive() or T1.join())
If thread T1 interrupts thread T2, the interrupt by T1 synchronizes-with any point where any other thread (including T2) determines that T2 has been interrupted (by having an InterruptedException thrown or by invoking Thread.interrupted or Thread.isInterrupted).
You can choose any of these methods to establish happens-before. In practice, starting the reading threads after the sudoku has been fully constructed is probably the easiest way.

From my point of view, locking is necessary if you write and this writing takes a long time to complete due to say network latency or massive processing overhead.
Otherwise it's pretty safe to leave the locking out.

Java concurrent visibility of primitive array writes

I recently found this gem in my code base:
/** This class is used to "publish" changes to a non-volatile variable.
*
* Access to non-volatile and volatile variables cannot be reordered,
* so if you make changes to a non-volatile variable before calling publish,
* they are guaranteed to be visible to a thread which calls syncChanges
*
*/
private static class Publisher {
//This variable may not look like it's doing anything, but it really is.
//See the documentaion for this class.
private volatile AtomicInteger sync = new AtomicInteger(0);
void publish() {
sync.incrementAndGet();
}
/**
*
* #return the return value of this function has no meaning.
* You should not make *any* assumptions about it.
*/
int syncChanges() {
return sync.get();
}
}
This is used as such:
Thread 1
float[][] matrix;
matrix[x][y] = n;
publisher.publish();
Thread 2
publisher.syncChanges();
myVar = matrix[x][y];
Thread 1 is a background updating thread that runs continuously. Thread 2 is a HTTP worker thread that does not care that what it reads is in any way consistent or atomic, only that the writes "eventually" get there and are not lost as offerings to the concurrency gods.
Now, this triggers all my warning bells. Custom concurrency algorithm written deep inside of unrelated code.
Unfortunately, fixing the code is not trivial. The Java support for concurrent primitive matrices is not good. It looks like the clearest way to fix this is using a ReadWriteLock, but that would probably have negative performance implications. Correctness is more important, clearly, but it seems like I should prove that this is not correct before just ripping it out of a performance sensitive area.
According to the java.util.concurrent documentation, the following create happens-before relationships:
Each action in a thread happens-before every action in that thread that comes later in the program's order.
A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar memory consistency effects as entering and exiting monitors, but do not entail mutual exclusion locking.
So it sounds like:
matrix write happens-before publish() (rule 1)
publish() happens-before syncChanges() (rule 2)
syncChanges() happens-before matrix read (rule 1)
So the code indeed has established a happens-before chain for the matrix.
But I'm not convinced. Concurrency is hard, and I'm not a domain expert.
What have I missed? Is this indeed safe?

In terms of visibility, all you need is volatile write-read, on any volatile field. This would work
final float[][] matrix = ...;
volatile float[][] matrixV = matrix;
Thread 1
matrix[x][y] = n;
matrixV = matrix; // volatile write
Thread 2
float[][] m = matrixV; // volatile read
myVar = m[x][y];
or simply
myVar = matrixV[x][y];
But this is only good for updating one variable. If writer threads are writing multiple variables and the read thread is reading them, the reader may see an inconsistent picture. Usually it's dealt with by a read-write lock. Copy-on-write might be suitable for some use patterns.
Doug Lea has a new "StampedLock" http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/jsr166e/StampedLock.html for Java8, which is a version of read-write lock that's much cheaper for read locks. But it is much harder to use too. Basically the reader gets the current version, then go ahead and read a bunch of variables, then check the version again; if the version hasn't changed, there was no concurrent writes during the read session.

This does look safe for publishing single updates to the matrix, but of course it doesn't provide any atomicity. Whether that's okay depends on your application, but it should probably be documented in a utility class like this.
However, it contains some redundancy and could be improved by making the sync field final. The volatile access of this field is the first of two memory barriers; by contract, calling incrementAndGet() has the same effect on memory as a write and a read on a volatile variable, and calling get() has the same effect as a read.
So, the code can rely on the synchronization provided by these methods alone, and make the field itself final.

Using volatile is not a magic bullet to synch everything. It is guaranteed that if another thread reads the updated value of a volatile variable, they will also see every change made to a non-volatile-variable before that. But nothing guarantees that the other thread will read the updated value.
In the example code, if you make several writes to matrix and then call publish(), and the other thread calls synch() and then reads the matrix, then the other thread may see some, all, or none of the changes:
All the changes, if it reads the updated value from publish()
None of the changes, if it reads the old published value and none of the changes have leaked through
Some of the changes, if it reads the previously published value, but some of the changes have leaked through
See this article

You are correctly mentioned the rule #2 of happens-before relationship
A write to a volatile field happens-before every subsequent read of that same field.
However, it doesn't guarantee that publish() will ever be called before syncChanges() on the absolute timeline. Lets change your example a bit.
Thread 1:
matrix[0][0] = 42.0f;
Thread.sleep(1000*1000); // assume the thread was preempted here
publisher.publish(); //assume initial state of sync is 0
Thread 2:
int a = publisher.syncChanges();
float b = matrix[0][0];
What are the options for a and b variables are available ?
a is 0, b can be 0 or 42
a is 1, b is 42 because of the happens-before relationship
a is greater than 1 (Thread 2 was slow for some reason and Thread 1 was lucky to publish updates several times), value of b depends on the business logic and the way matrix is handled - does it depend on the previous state or not?
How to deal with it? It depends on the business logic.
If Thread 2 polls the state of a matrix from time to time and it's perfectly fine to have some outdated values in between, if in the end the correct value will be processed, then leave it as is.
If Thread 2 doesn't care about missed updates but it always wants to observe up-to-date matrix then use copy-on-write collections or use ReaderWriteLock as it was mentioned above.
If Thread 2 does care about single updates then it should be handled in a smarter way, you might want to consider wait() / notify() pattern and notify Thread 2 whenever matrix is updated.

AtomicXXX.lazySet(...) in terms of happens before edges

What does mean AtomicXXX.lazySet(value) method in terms of happens-before edges, used in most of JMM reasoning? The javadocs is pure on it, and Sun bug 6275329 states:
The semantics are that the write is guaranteed not to be re-ordered with any previous write, but may be reordered with subsequent operations (or equivalently, might not be visible to other threads) until some other volatile write or synchronizing action occurs).
But this not a reasoning about HB edges, so it confuses me. Does it mean what lazySet() semantics can't be expressed in terms of HB edges?
UPDATE: I'll try to concretize my question. I can use ordinary volatile field in following scenario:
//thread 1: producer
...fill some data structure
myVolatileFlag = 1;
//thread 2: consumer
while(myVolatileFlag!=1){
//spin-wait
}
...use data structure...
In this scenario use of "data structure" in consumer is correct, since volatile flag write-read make HB edge, giving guarantee what all writes to "data structure" by producer will be completed, and visible by consumer. But what if I'll use AtomicInteger.lazySet/get instead of volatile write/read in this scenario?
//thread 1: producer
...fill some data structure
myAtomicFlag.lazySet(1);
//thread 2: consumer
while(myAtomicFlag.get()!=1){
//spin-wait
}
...use data structure...
will it be still correct? Can I still really on "data structure" values visibility in consumer thread?
It is not "from air" question -- I've seen such method in LMAX Disruptor code in exactly this scenario, and I don't understand how to prove it is correct...

The lazySet operations do not create happens-before edges and are therefore not guaranteed to be immediately visible. This is a low-level optimization that has only a few use-cases, which are mostly in concurrent data structures.
The garbage collection example of nulling out linked list pointers has no user-visible side effects. The nulling is preferred so that if nodes in the list are in different generations, it doesn't force a more expensive collection to be performed to discard the link chain. The use of lazySet maintains hygenic semantics without incurring volatile write overhead.
Another example is the usage of volatile fields guarded by a lock, such as in ConcurrentHashMap. The fields are volatile to allow lock-free reads, but writes must be performed under a lock to ensure strict consistency. As the lock guarantees the happens-before edge on release, an optimization is to use lazySet when writing to the fields and flushing all of the updates when unlocking. This helps keep the critical section short by avoiding unnecessary stalls and bus traffic.
If you write a concurrent data structure then lazySet is a good trick to be aware of. Its a low-level optimization so its only worth considering when performance tuning.

Based on the Javadoc of Unsafe (the putOrderedInt is used in the AtomicInteger.lazySet)
/**
* Version of {#link #putObjectVolatile(Object, long, Object)}
* that does not guarantee immediate visibility of the store to
* other threads. This method is generally only useful if the
* underlying field is a Java volatile (or if an array cell, one
* that is otherwise only accessed using volatile accesses).
*/
public native void putOrderedObject(Object o, long offset, Object x);
/** Ordered/Lazy version of {#link #putIntVolatile(Object, long, int)} */
public native void putOrderedInt(Object o, long offset, int x);
The backing fields in the AtomicXXX classes are volatile. The lazySet seems to write to these fields as if they are not volatile, which would remove the happens-before edges you are expecting. As noted in your link this would be useful for nulling values to be eligible for GC without having to incur the volatile write.
Edit:
This is to answer your update.
If you take a look at the quote you provided from the link then you lose any memory guarantees you had with the volatile write.
The lazySet will not be ordered above where it is being written to, but without any other actual synchronization you lose the guarantee that the consumer will see any changes that were written before it. It is perfectly legal to delay the write of the myAtomicFlag and there for any writes before it until some other form of synchronization occurs.

Volatile variable in Java

So I am reading this book titled Java Concurrency in Practice and I am stuck on this one explanation which I cannot seem to comprehend without an example. This is the quote:
When thread A writes to a volatile
variable and subsequently thread B
reads that same variable, the values
of all variables that were visible to
A prior to writing to the volatile
variable become visible to B after
reading the volatile variable.
Can someone give me a counterexample of why "the values of ALL variables that were visible to A prior to writing to the volatile variable become visible to B AFTER reading the volatile variable"?
I am confused why all other non-volatile variables do not become visible to B before reading the volatile variable?

Declaring a volatile Java variable means:
The value of this variable will never be cached thread-locally: all reads and writes will go straight to "main memory".
Access to the variable acts as though it is enclosed in a synchronized block, synchronized on itself.
Just for your reference, When is volatile needed ?
When multiple threads using the same
variable, each thread will have its
own copy of the local cache for that
variable. So, when it's updating the
value, it is actually updated in the
local cache not in the main variable
memory. The other thread which is
using the same variable doesn't know
anything about the values changed by
the another thread. To avoid this
problem, if you declare a variable as
volatile, then it will not be stored
in the local cache. Whenever thread
are updating the values, it is updated
to the main memory. So, other threads
can access the updated value.
From JLS §17.4.7 Well-Formed Executions
We only consider well-formed
executions. An execution E = < P, A,
po, so, W, V, sw, hb > is well formed
if the following conditions are true:
Each read sees a write to the same
variable in the execution. All reads
and writes of volatile variables are
volatile actions. For all reads r in
A, we have W(r) in A and W(r).v = r.v.
The variable r.v is volatile if and
only if r is a volatile read, and the
variable w.v is volatile if and only
if w is a volatile write.
Happens-before order is a partial
order. Happens-before order is given
by the transitive closure of
synchronizes-with edges and program
order. It must be a valid partial
order: reflexive, transitive and
antisymmetric.
The execution obeys
intra-thread consistency. For each
thread t, the actions performed by t
in A are the same as would be
generated by that thread in
program-order in isolation, with each
write wwriting the value V(w), given
that each read r sees the value
V(W(r)). Values seen by each read are
determined by the memory model. The
program order given must reflect the
program order in which the actions
would be performed according to the
intra-thread semantics of P.
The execution is happens-before consistent
(§17.4.6).
The execution obeys
synchronization-order consistency. For
all volatile reads r in A, it is not
the case that either so(r, W(r)) or
that there exists a write win A such
that w.v = r.v and so(W(r), w) and
so(w, r).
Useful Link : What do we really know about non-blocking concurrency in Java?

Thread B may have a CPU-local cache of those variables. A read of a volatile variable ensures that any intermediate cache flush from a previous write to the volatile is observed.
For an example, read the following link, which concludes with "Fixing Double-Checked Locking using Volatile":
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

If a variable is non-volatile, then the compiler and the CPU, may re-order instructions freely as they see fit, in order to optimize for performance.
If the variable is now declared volatile, then the compiler no longer attempts to optimize accesses (reads and writes) to that variable. It may however continue to optimize access for other variables.
At runtime, when a volatile variable is accessed, the JVM generates appropriate memory barrier instructions to the CPU. The memory barrier serves the same purpose - the CPU is also prevent from re-ordering instructions.
When a volatile variable is written to (by thread A), all writes to any other variable are completed (or will atleast appear to be) and made visible to A before the write to the volatile variable; this is often due to a memory-write barrier instruction. Likewise, any reads on other variables, will be completed (or will appear to be) before the
read (by thread B); this is often due to a memory-read barrier instruction. This ordering of instructions that is enforced by the barrier(s), will mean that all writes visible to A, will be visible B. This however, does not mean that any re-ordering of instructions has not happened (the compiler may have performed re-ordering for other instructions); it simply means that if any writes visible to A have occurred, it would be visible to B. In simpler terms, it means that strict-program order is not maintained.
I will point to this writeup on Memory Barriers and JVM Concurrency, if you want to understand how the JVM issues memory barrier instructions, in finer detail.
Related questions
What is a memory fence?
What are some tricks that a processor does to optimize code?

Threads are allowed to cache variable values that other threads may have since updated since they read them. The volatile keyword forces all threads to not cache values.

This is simply an additional bonus the memory model gives you, if you work with volatile variables.
Normally (i.e. in the absence of volatile variables and synchronization), the VM can make variables from one thread visible to other threads in any order it wants, or not at all. E.g. the reading thread could read some mixture of earlier versions of another threads variable assignments. This is caused by the threads being maybe run on different CPUs with their own caches, which are only sometimes copied to the "main memory", and additionally by code reordering for optimization purposes.
If you used a volatile variable, as soon as thread B read some value X from it, the VM makes sure that anything which thread A has written before it wrote X is also visible to B. (And also everything which A got guaranteed as visible, transitively).
Similar guarantees are given for synchronized blocks and other types of locks.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.