Java Memory Model in practice

Java Memory Model in practice - java

I was trying to learn the Java Memory Model, but still cannot understand how people use it in practice.
I know that many just rely on appropriate memory barriers (as described in the Cookbook), but in fact the model itself does not operate such terms.
The model introduces different orders defined on a set of actions and defines so called "well-formed executions".
Some people are trying to explain the memory model restrictions using one of such orders, namely "happens-before", but it seems like the order, at least by itself, does not define acceptable execution:
It should be noted that the presence of a happens-before relationship between two actions does not necessarily imply that they have to take place in that order in an implementation. If the reordering produces results consistent with a legal execution, it is not illegal
My question is how can one verify that certain code or change can lead to an "illegal execution" in practice (according to the model) ?
To be more concrete, let's consider a very simple example:
public class SomeClass {
private int a;
private int b;
public void someMethod() {
a = 2; // 1
b = 3; // 2
}
// other methods
}
It's clear that within the thread w(a = 2) happens before w(b = 3) according to the program order.
How can compiler/optimizer be sure that reordering 1 and 2 won't produce an "illegal execution" (strictly in terms of the model) ? And why if we set b to be volatile it will ?

Are you asking about how the VM/JIT analyzes the bytecode flow? Thats far too broad to answer, entire research papers have been written about that. And what the VM actually implements may change from release to release.
Or is the question simply about which rules of the memory model govern what is "legal"? For the executing thread, the memory model already makes the strong guarantee that every action on a given thread appears to happen in program order for that thread. That means if the JIT determines by whatever method(s) it implements for reordering that the reordering produces the same observable result(s) is legal.
The presence of actions that establish happens-before guarantees with respect to other threads (such as volatile accesses) simply adds more constraints to the legal reorderings.
Simplified it could be memorized as that everything that happened in program order before also appears to have (already) happened to other threads when a happend-before establishing action is executed.
For your example that means, in case of non-volatile (a, b) only the guarantee "appears to happen in program order" (to the executing thread) needs to be upheld, that means any reordering of the writes to (a, b) is legal, even delaying them until they are actually read (e.g. holding the value in a CPU register and bypassing main memory) would be valid. It could even omit writting the members at all if the JIT detects they are never actually read before the object goes out of scope (and to be precise, there is also no finalizer using them).
Making b volatile in your example changes the constraints in that other threads reading b would also be guaranteed to see the last update of a because it happened before the write to b. Again simplified, happens-before actions extend some of the perceived ordering guarantees from the executing thread to other threads.

It seems you are making the common mistake of thinking too much about low level aspects of the JMM in isolation. Regarding your question “how people use it in practice”, if you are talking about an application programmer, (s)he will use it in practice by not thinking about memory barriers or possible reorderings all the time.
Regarding your example:
public void someMethod() {
a = 2; // 1
b = 3; // 2
}
Given a and b are non-final, non-volatile.
It's clear that within the thread w(a = 2) happens before w(b = 3) according to the program order. How can compiler/optimizer be sure that reordering 1 and 2 won't produce an "illegal execution" (strictly in terms of the model) ?
Here, it backfires that you are focusing on re-ordering in isolation. First of all, the resulting code (of HotSpot optimization, JIT compilation, etc.) does not need to write the values to the heap memory at all. It might hold the new values in CPU registers and use it from there in subsequent operations of the same thread. Only when reaching a point were these changes have to be made visible to other threads they have to be written to the heap. Which may happen in arbitrary order.
But if, for example, the caller of the method enters an infinite loop after calling this method, the values don’t have to be written ever.
And why if we set b to be volatile it will ?
Declaring b as volatile does not guaranty that a and b are written. This is another mistake which arises from focusing on memory barriers.
Let’s go more abstract:
Suppose you have two concurrent actions, A and B. For concurrent execution in Java, there are several perfectly valid behaviors, including:
A might be executed entirely before B
B might be executed entirely before A
All or parts of A and B run in parallel
in the case B is executed entirely before A, there is no sense in having a write barrier in A and a read barrier in B, B will still not notice any activities of A. You can draw your conclusions about different parallel scenarios from this starting point.
This is where the happens-before relationship comes into play: a write of a value to a volatile variable happens before a read of that value from that variable by another thread. If the read operation is executed before the write operation, the reading thread will not see the value and hence there’s no happens-before relationship and so there is no statement about the other variables we can make.
To stay at your example with b being volatile: this implies that if a reading thread reads b and reads the value 3, and only then it is guaranteed to see the value of 2 (or an even more recent value if there are other writes) for a on subsequent reads.
So if a JVM can prove that there will never be a read operation on b seeing the written value, maybe because the entire instance we are modifying will never be seen by another thread, there is no happens-before relationship to be ever established, in other words, b being volatile has no impact on the allowed code transformations in this case, i.e. it might be reordered as well, or even never written to the heap at all.
So the bottom line is that it is not useful to look at a small piece of code and ask whether it will allow reordering or whether it will contain a memory barrier. This might not even be answerable as the answer might change depending on how the code is actually used. Only if your view is wide enough to see how threads will interact when accessing the data and you can safely deduce whether a happens-before relationship will be established you can start drawing conclusions about the correct working of the code. As you found out by yourself, correct working does not imply that you know whether reordering will happen or not on the lowest level.

Related

Is this understanding correct for these code about java volatile and reordering?

According to this reorder rules
reorder Rules
if I have code like this
volatile int a = 0;
boolean b = false;
foo1(){ a= 10; b = true;}
foo2(){if(b) {assert a==10;}}
Make Thread A to run foo1 and Thread b to run foo2, since a= 10 is a volatile store and
b = true is a normal store, then these two statements could possible be reordered, which means in Thread B may have b = true while a!=10? Is that correct?
Added:
Thanks for your answers!
I am just starting to learn about java multi-threading and have been troubled with keyword volatile a lot.
Many tutorial talk about the visibility of volatile field, just like "volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it". I have doubt about how could a completed write on field being invisible to other Threads(or CPUS)?
As my understanding, a completed write means you have successfully written the filed back to cache, and according to the MESI, all others thread should have an Invalid cache line if this filed have been cached by them. One exception ( Since I am not very familiar with the hardcore, this is just a conjecture )is that maybe the result will be written back to the register instead of cache and I do not know whether there is some protocol to keep consistency in this situation or the volatile make it not to write to register in java.
In some situation that looks like "invisible" happens examples:
A=0,B=0;
thread1{A=1; B=2;}
thread2{if(B==2) {A may be 0 here}}
suppose the compiler did not reorder it, what makes we see in thread2 is due to the store buffer, and I do not think a write operation in store buffer means a completed write. Since the store buffer and invalidate queue strategy, which make the write on variable A looks like invisible but in fact the write operation has not finished while thread2 read A. Even we make field B volatile, while we set a write operation on field B to the store buffer with memory barriers, thread 2 can read the b value with 0 and finish. As for me, the volatile looks like is not about the visibility of the filed it declared, but more like an edge to make sure that all the writes happens before volatile field write in ThreadA is visible to all operations after volatile field read( volatile read happens after volatile field write in ThreadA has completed ) in another ThreadB.
By the way, since I am not an native speakers, I have seen may tutorials with my mother language(also some English tutorials) say that volatile will instruct JVM threads to read the value of volatile variable from main memory and do not cache it locally, and I do not think that is true. Am I right?
Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.

I'm pretty sure the assert can fire. I think a volatile load is only an acquire operation (https://preshing.com/20120913/acquire-and-release-semantics/) wrt. non-volatile variables, so nothing is stopping load-load reordering.
Two volatile operations couldn't reorder with each other, but reordering with non-atomic operations is possible in one direction, and you picked the direction without guarantees.
(Caveat, I'm not a Java expert; it's possible but unlikely volatile has some semantics that require a more expensive implementation.)
More concrete reasoning is that if the assert can fire when translated into asm for some specific architecture, it must be allowed to fire by the Java memory model.
Java volatile is (AFAIK) equivalent to C++ std::atomic with the default memory_order_seq_cst. Thus foo2 can JIT-compile for ARM64 with a plain load for b and an LDAR acquire load for a.
ldar can't reorder with later loads/stores, but can with earlier. (Except for stlr release stores; ARM64 was specifically designed to make C++ std::atomic<> with memory_order_seq_cst / Java volatile efficient with ldar and stlr, not having to flush the store buffer immediately on seq_cst stores, only on seeing an LDAR, so that design gives the minimal amount of ordering necessary to still recover sequential consistency as specified by C++ (and I assume Java).)
On many other ISAs, sequential-consistency stores do need to wait for the store buffer to drain itself, so they are in practice ordered wrt. later non-atomic loads. And again on many ISAs, an acquire or SC load is done with a normal load preceded with a barrier which blocks loads from crossing it in either direction, otherwise they wouldn't work. That's why having the volatile load of a compile to an acquire-load instruction that just does an acquire operation is key to understanding how this can happen in practice.
(In x86 asm, all loads are acquire loads and all stores are release stores. Not sequential-release, though; x86's memory model is program order + store buffer with store-forwarding, which allows StoreLoad reordering, so Java volatile stores need special asm.
So the assert can't fire on x86, except via compile/JIT-time reordering of the assignments. This is a good example of one reason why testing lock-free code is hard: a failing test can prove there is a problem, but testing on some hardware/software combo can't prove correctness.)

Answer to your addition.
Many tutorial talk about the visibility of volatile field, just like
"volatile field becomes visible to all readers (other threads in
particular) after a write operation completes on it". I have doubt
about how could a completed write on field being invisible to other
Threads(or CPUS)?
The compiler might mess up code.
e.g.
boolean stop;
void run(){
while(!stop)println();
}
first optimization
void run(){
boolean r1=stop;
while(!r1)println();
}
second optimization
void run(){
boolean r1=stop;
if(!r1)return;
while(true) println();
}
So now it is obvious this loop will never stop because effectively the new value to stop will never been seen. For store you can do something similar that could indefinitely postpone it.
As my understanding, a completed write means you have successfully
written the filed back to cache, and according to the MESI, all others
thread should have an Invalid cache line if this filed have been
cached by them.
Correct. This is normally called 'globally visible' or 'globally performed'.
One exception ( Since I am not very familiar with the hardcore, this
is just a conjecture )is that maybe the result will be written back to
the register instead of cache and I do not know whether there is some
protocol to keep consistency in this situation or the volatile make it
not to write to register in java.
All modern processors are load/store architectures (even X86 after uops conversion) meaning that there are explicit load and store instructions that transfer data between registers and memory and regular instructions like add/sub can only work with registers. So a register needs to be used anyway. The key part is that the compiler should respect the loads/stores of the source code and limit optimizations.
suppose the compiler did not reorder it, what makes we see in thread2
is due to the store buffer, and I do not think a write operation in
store buffer means a completed write.
Since the store buffer and invalidate queue strategy, which make the
write on variable A looks like invisible but in fact the write
operation has not finished while thread2 read A.
On the X86 the order of the stores in the store buffer are consistent with program order and will commit to the cache in program order. But there are architectures where stores from the store buffer can commit to the cache out of order e.g. due to:
write coalescing
allowing stores to commit to cache as soon as the cache line is returned in the right state no matter if an earlier still is still waiting.
sharing the store buffer with a subset of the CPUs.
Store buffers can be a source of reordering; but also out of order and speculative execution can be a source.
Apart from the stores, reordering loads can also lead to observing stores out of order. On the X86 loads can't be reordered, but on the ARM it is allowed. And of course the JIT can mess things up as well.
Even we make field B volatile, while we set a write operation on field
B to the store buffer with memory barriers, thread 2 can read the b
value with 0 and finish.
It is important to realize that the JMM is based on sequential consistency; so even though it is a relaxed memory model (separation of plain loads and stores vs synchronization actions like volatile load/store lock/unlock) if a program has no data races, it will only produce sequential consistent executions. For sequential consistency the real time order doesn't need to be respected. So it is perfectly fine for a load/store to be skewed as long as:
there memory order is a total order over all loads/stores
the memory order is consistent with the program order
a load sees the most recent write before it in the memory order.
As for me, the volatile looks like is not about the visibility of the
filed it declared, but more like an edge to make sure that all the
writes happens before volatile field write in ThreadA is visible to
all operations after volatile field read( volatile read happens after
volatile field write in ThreadA has completed ) in another ThreadB.
You are on the right path.
Example.
int a=0
volatile int b=;
thread1(){
1:a=1
2:b=1
}
thread2(){
3:r1=b
4:r2=a
}
In this case there is a happens before edge between 1-2 (program order). If r1=1, then there is happens before edge between 2-3 (volatile variable) and a happens before edge between 3-4 (program order).
Because the happens before relation is transitive, there is a happens before edge between 1-4. So r2 must be 1.
volatile takes care of the following:
Visibility: needs to make sure the load/store doesn't get optimized out.
That is load/store is atomic. So a load/store should not be seen partially.
And most importantly, it needs to make sure that the order between 1-2 and 3-4 is preserved.
By the way, since I am not an native speakers, I have seen may
tutorials with my mother language(also some English tutorials) say
that volatile will instruct JVM threads to read the value of volatile
variable from main memory and do not cache it locally, and I do not
think that is true.
You are completely right. This is a very common misconception. Caches are the source of truth since they are always coherent. If every write needs to go to main memory, programs would become extremely slow. Memory is just a spill bucket for whatever doesn't fit in cache and can be completely incoherent with the cache. Plain/volatile loads/stores are stored in the cache. It is possible to bypass the cache for special situations like MMIO or when using e.g. SIMD instructions but it isn't relevant for these examples.
Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.
Most people here are not a native speaker (I'm certainly not). Your English is good enough and you show a lot of promise.

In addition to Peter Cordes his great answer, in terms of the JMM there is a data race on b since there is no happens before edge between the write of b and the read of b because it is a plain variable. Only if this happens before edge would exist, then you are guaranteed that if load of b=1 that also the load of a=1 is seen.
Instead of making a volatile, you need to make b volatile.
int a=0;
volatile int b=0;
thread1(){
a=1
b=1
}
thread2(){
if(b==1) assert a==1;
}
So if thread2 sees b=1, then this read is ordered before the write of b=1 in the happens before order (volatile variable rule). And since a=1 and b=1 are ordered happens before order (program order rule), and read of b and the read of a are ordered in the happens before order (program order rule again), then due to the transitive nature of the happens before relation, there is a happens before edge between the write of a=1 and the read of a; which needs to see the value 1.
You are referring to a possible implementation of the JMM using fences. And although it provides some insights into what happens under the hood, it is equally damaging to think in terms of fences because they are not a suitable mental model. See the following counter example:
https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#myth-barriers-are-sane

Yes, the assert can fail.
volatile int a = 0;
boolean b = false;
foo1(){ a= 10; b = true;}
foo2(){if(b) {assert a==10;}}
The JMM guarantees that writes to volatile fields happen-before reads from them. In your example, whatever thread a did before a = 10 will happen-before whatever thread b does after reading a (while executing assert a == 10). Since b = true executes after a = 10 for thread a (for a single thread, happens-before is always holds), there is no guarantee that there'll be an ordering guarantee. However, consider this:
int a = 0;
volatile boolean b = false;
foo1(){ a= 10; b = true;}
foo2(){if(b) {assert a==10;}}
In this example, the situation is:
a = 10 ---> b = true---|
|
| (happens-before due to volatile's semantics)
|
|---> if(b) ---> assert a == 10
Since you have a total order, the assert is guaranteed to pass.

How volatile keyword works?

I'm reading about volatile keyword. After reading about volatile keyword, Going through below example for more understanding.
public class TaskRunner {
private static int number;
private static boolean ready;
private static class Reader extends Thread {
#Override
public void run() {
while (!ready) {
Thread.yield();
}
System.out.println(number);
}
}
public static void main(String[] args) {
new Reader().start();
number = 42;
ready = true;
}
}
What I've understood that, In Java application multiple threads can access shared data structure at any point of time. Some writes to it first time, some updates and some reads them and so on.
So while these happenings, every thread access the shared data structure's value from main memory only. But some times thread's operated value on shared data structure remains in its cache until the OS dont put it in main memory. So in that duration if any other thread access the shared data structure, will not get the updated value, which is updated by last thread and still in its cache.
Volatile is used, once shared data structure value is changed, should be moved to main memory first before it get accessed by any other thread. Is it correct understanding ?
What's scenario where thread still not getting updated value even after using volatile ?

But some times thread's operated value on shared data structure remains in its cache until the OS dont put it in main memory. So in that duration if any other thread access the shared data structure, will not get the updated value, which is updated by last thread and still in its cache.
It is not the OS. There is a CPU instruction that is used by JVM to reset CPU cache. Honestly speaking, this claim is incorrect too, because Java Memory Model tells nothing about such instructions. This is one of the ways to implement volatile behaviour.
does volatile keword in java really have to do with caches?

Java is rather high-level: As a language it is not designed for any particular CPU design. Furthermore, java compiles to bytecode, which is an intermediate product: Java does not offer, nor does it have the goal of, letting you write low-level CPU-architecture-specific operations.
And yet, caches are a low-level CPU architecture specific concept. Sure, every modern CPU has them, pretty much, but who knows what happens in 20 years?
So putting volatile in terms of what it does to CPU caches is skipping some steps.
volatile has an effect on your java code. That effect is currently implemented on most VMs I know of by sending the CPU some instructions about flushing caches.
It's better to deal with volatile at the java level itself, not at the 'well most VMs implement it like this' level - after all, that can change.
The way java is set up is essentially as follows:
If there are no comes-before relationships established between any 2 lines of code anywhere in java, then you should assume that java is like schroedinger's cat: Every thread both has and does not have a local cached copy of every field on every object loaded in the entire VM, and whenever you either write or read anything, the universe flips a coin, uses that to determine if you get the copy or not, and will always flip it to mess with you. During tests on your own machine, the coin flips to make the tests pass. During production on crunch weekend when millions of dollars are on the line, it flips to make your code fail.
The only way out is to ensure your code doesn't depend on the coin flip.
The way to do that is to use the comes-before rules, which you can review in the Java Memory Model.
volatile is one way to add them.
In the above code, without the volatile, the Reader thread may always use its local copy of ready, and thus will NEVER be ready, even if it has been many hours since your main set ready to true. In practice that's unlikely but the JMM says that a VM is allowed to coin flip here: It may have your Reader thread continue almost immediately, it may hold it up for an hour, it may hold it up forever. All legal - this code is broken, its behaviour depends on the coin flip which is bad.
Once you introduce volatile, though, you establish a comes-before relationship and now you're guaranteeing that Reader continues. Effectively, volatile both disables coinflips on the variable so marked and also established comes-before once reads/writes matter:
IF a thread observes an updated value in a volatile variable, THEN all lines that ran before whatever thread's code updated that variables have a comes-before relationship with all lines that will ran after the code in the thread that read the update.
So, to be clear:
Without any volatile marks here, it is legal for a VM to let Reader hang forever. It is also legal for a VM to let reader continue (to let it observe that ready is now true, whilst Reader STILL sees that number is 0 (and not 42), even after it passed the ready check! - but it doesn't have to, it is also allowed for the VM to have reader never pass the ready check, or to have it pass ready check and observe 42. A VM is free to do it however it wants; whatever seems fastest for this particular mix of CPU, architecture and phase of the moon right now.
With volatile, reader WILL end up continuing sooner rather than later, and once it has done so, it will definitely observe 42. But if you swap ready = true; and number = 42; that guarantee is no longer granted.

java, when (and for how long) can a thread cache the value of a non-volatile variable?

From this post: http://www.javamex.com/tutorials/synchronization_volatile_typical_use.shtml
public class StoppableTask extends Thread {
private volatile boolean pleaseStop;
public void run() {
while (!pleaseStop) {
// do some stuff...
}
}
public void tellMeToStop() {
pleaseStop = true;
}
}
If the variable were not declared volatile (and without other
synchronization), then it would be legal for the thread running the
loop to cache the value of the variable at the start of the loop and
never read it again.
In Java 5 or later:
is the last paragraph correct?
So, exactly at what moment can a thread cache the value of the pleaseStop variable (and for how long)? just before calling one of StoppableTask's functions (run, tellMeTpStop) of the object? (and the thread must update the variable when exiting the function at the latest?)
can you point me to a documentation/tutorial reference about this (Java 5 or later)?
Update: here it is my compilation of answers posted on this question:
Without using volatile nor synchronized, there are actually two problems with the above program:
1- Threads can cache the variable pleaseStop since the very first moment that the thread starts and don't update it never again. so, the loop would keep going forever. this can be solved by either using volatile or synchronized. This thread cache mechanism does not exist in C.
2- The java compiler can optimise the code, and replace while(!pleaseStop) {...} to if (!pleaseStop) { while (true) {...}}. so, the loop would keep going forever. again, this can be solved by either using volatile or synchronized. This compiler optimisation exists also in C.
Some more info:
https://www.ibm.com/developerworks/library/j-5things15/

When can it cache?
As for your question about "when can it cache" the value, the answer to that is "always". To understand what that means, read on. Processors have storage called caches, which make it possible for the running thread to access values in memory by reading from the cache rather than from memory. The running thread can also write to this cache as if it were writing the value to memory. Thus, so long as the thread is running, it could be using the cache to store the data it's using. Something has to explicitly happen to flush the value from the cache to memory. For a single-threaded process, this is all well and dandy, but if you have another thread, it might be trying to read the data from memory while the other thread is plugging away reading and writing it to the processor cache without flushing to memory.
How long can it cache?
As for the "for how long" part- the answer is unfortunately forever unless you do something about it. Synchronizing on the data in question is one way to force a flush from the cache so that all threads see the updates to the value. For more detail about ways to cause a flush, see the next section.
Where's some Documentation?
As for the "where's the documentation" question, a good place to start is here. For specifically how you can force a flush, java refers to this by discussing whether one action (such as a data write) appears to "happen before" another (like a data read). For more about this, see here.
What about volatile?
volatile in essence prevents the type of processor caching described above. This ensures that all writes to a variable are visible from other threads. To learn more, the tutorial you linked to in your post seems like a good start.

The relevant documentation is on the volatile keyword (Java Language Specification, Chapter 8.3.1.4) here and the Java memory model (Java Language Specification, Chapter 17.4) here
Declaring the parameter volatile ensures that there is some synchronization of actions by other threads that might change its value. Without declaring volatile, Java can reorder operations taken on a parameter by different threads.
As the Spec says (see 8.3.1.4), for parameters declared volatile,"accesses ... occur exactly as many times, and in exactly the same order, as they appear to occur during execution of the program text by each thread..."
So the caching you speak of can happen anytime if the parameter is not volatile. But there is enforcement of consistent access to that parameter by the Java memory model if the parameter is declared volatile. But no such enforcement would take place if not (unless the threads are synchronized).

The official documentation is in section 17 of the Java Language Specification, especially 17.4 Memory Model.
The correct viewpoint is to start by assuming multi-threaded code won't work, and try to force it to work whether it likes it or not. Without the volatile declaration, or similar, there would be nothing forcing the read of pleaseStop to ever see the write if it happens in another thread.
I agree with the Java Concurrency in Practice recommendation. It is a good distillation of the implications of the JLS material for practical Java programming.

In Java can I depend on reference assignment being atomic to implement copy on write?

If I have an unsynchronized java collection in a multithreaded environment, and I don't want to force readers of the collection to synchronize[1], is a solution where I synchronize the writers and use the atomicity of reference assignment feasible? Something like:
private Collection global = new HashSet(); // start threading after this
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(global) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
// Do multithreaded reads here. All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
Rolling your own solution seems to often fail in these type of situations, so I'd be interested in knowing other patterns, collections or libraries I could use to prevent object creation and blocking for my data consumers.
[1] The reasons being a large proportion of time spent in reads compared to writes, combined with the risk of introducing deadlocks.
Edit: A lot of good information in several of the answers and comments, some important points:
A bug was present in the code I posted. Synchronizing on global (a badly named variable) can fail to protect the syncronized block after a swap.
You could fix this by synchronizing on the class (moving the synchronized keyword to the method), but there may be other bugs. A safer and more maintainable solution is to use something from java.util.concurrent.
There is no "eventual consistency guarantee" in the code I posted, one way to make sure that readers do get to see the updates by writers is to use the volatile keyword.
On reflection the general problem that motivated this question was trying to implement lock free reads with locked writes in java, however my (solved) problem was with a collection, which may be unnecessarily confusing for future readers. So in case it is not obvious the code I posted works by allowing one writer at a time to perform edits to "some object" that is being read unprotected by multiple reader threads. Commits of the edit are done through an atomic operation so readers can only get the pre-edit or post-edit "object". When/if the reader thread gets the update, it cannot occur in the middle of a read as the read is occurring on the old copy of the "object". A simple solution that had probably been discovered and proved to be broken in some way prior to the availability of better concurrency support in java.

Rather than trying to roll out your own solution, why not use a ConcurrentHashMap as your set and just set all the values to some standard value? (A constant like Boolean.TRUE would work well.)
I think this implementation works well with the many-readers-few-writers scenario. There's even a constructor that lets you set the expected "concurrency level".
Update: Veer has suggested using the Collections.newSetFromMap utility method to turn the ConcurrentHashMap into a Set. Since the method takes a Map<E,Boolean> my guess is that it does the same thing with setting all the values to Boolean.TRUE behind-the-scenes.
Update: Addressing the poster's example
That is probably what I will end up going with, but I am still curious about how my minimalist solution could fail. – MilesHampson
Your minimalist solution would work just fine with a bit of tweaking. My worry is that, although it's minimal now, it might get more complicated in the future. It's hard to remember all of the conditions you assume when making something thread-safe—especially if you're coming back to the code weeks/months/years later to make a seemingly insignificant tweak. If the ConcurrentHashMap does everything you need with sufficient performance then why not use that instead? All the nasty concurrency details are encapsulated away and even 6-months-from-now you will have a hard time messing it up!
You do need at least one tweak before your current solution will work. As has already been pointed out, you should probably add the volatile modifier to global's declaration. I don't know if you have a C/C++ background, but I was very surprised when I learned that the semantics of volatile in Java are actually much more complicated than in C. If you're planning on doing a lot of concurrent programming in Java then it'd be a good idea to familiarize yourself with the basics of the Java memory model. If you don't make the reference to global a volatile reference then it's possible that no thread will ever see any changes to the value of global until they try to update it, at which point entering the synchronized block will flush the local cache and get the updated reference value.
However, even with the addition of volatile there's still a huge problem. Here's a problem scenario with two threads:
We begin with the empty set, or global={}. Threads A and B both have this value in their thread-local cached memory.
Thread A obtains obtains the synchronized lock on global and starts the update by making a copy of global and adding the new key to the set.
While Thread A is still inside the synchronized block, Thread B reads its local value of global onto the stack and tries to enter the synchronized block. Since Thread A is currently inside the monitor Thread B blocks.
Thread A completes the update by setting the reference and exiting the monitor, resulting in global={1}.
Thread B is now able to enter the monitor and makes a copy of the global={1} set.
Thread A decides to make another update, reads in its local global reference and tries to enter the synchronized block. Since Thread B currently holds the lock on {} there is no lock on {1} and Thread A successfully enters the monitor!
Thread A also makes a copy of {1} for purposes of updating.
Now Threads A and B are both inside the synchronized block and they have identical copies of the global={1} set. This means that one of their updates will be lost! This situation is caused by the fact that you're synchronizing on an object stored in a reference that you're updating inside your synchronized block. You should always be very careful which objects you use to synchronize. You can fix this problem by adding a new variable to act as the lock:
private volatile Collection global = new HashSet(); // start threading after this
private final Object globalLock = new Object(); // final reference used for synchronization
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(globalLock) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
This bug was insidious enough that none of the other answers have addressed it yet. It's these kinds of crazy concurrency details that cause me to recommend using something from the already-debugged java.util.concurrent library rather than trying to put something together yourself. I think the above solution would work—but how easy would it be to screw it up again? This would be so much easier:
private final Set<Object> global = Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>());
Since the reference is final you don't need to worry about threads using stale references, and since the ConcurrentHashMap handles all the nasty memory model issues internally you don't have to worry about all the nasty details of monitors and memory barriers!

According to the relevant Java Tutorial,
We have already seen that an increment expression, such as c++, does not describe an atomic action. Even very simple expressions can define complex actions that can decompose into other actions. However, there are actions you can specify that are atomic:
Reads and writes are atomic for reference variables and for most primitive variables (all types except long and double).
Reads and writes are atomic for all variables declared volatile (including long and double variables).
This is reaffirmed by Section §17.7 of the Java Language Specification
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
It appears that you can indeed rely on reference access being atomic; however, recognize that this does not ensure that all readers will read an updated value for global after this write -- i.e. there is no memory ordering guarantee here.
If you use an implicit lock via synchronized on all access to global, then you can forge some memory consistency here... but it might be better to use an alternative approach.
You also appear to want the collection in global to remain immutable... luckily, there is Collections.unmodifiableSet which you can use to enforce this. As an example, you should likely do something like the following...
private volatile Collection global = Collections.unmodifiableSet(new HashSet());
... that, or using AtomicReference,
private AtomicReference<Collection> global = new AtomicReference<>(Collections.unmodifiableSet(new HashSet()));
You would then use Collections.unmodifiableSet for your modified copies as well.
// ... All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
You should know that making a copy here is redundant, as internally for (Object elm : global) creates an Iterator as follows...
final Iterator it = global.iterator();
while (it.hasNext()) {
Object elm = it.next();
}
There is therefore no chance of switching to an entirely different value for global in the midst of reading.
All that aside, I agree with the sentiment expressed by DaoWen... is there any reason you're rolling your own data structure here when there may be an alternative available in java.util.concurrent? I figured maybe you're dealing with an older Java, since you use raw types, but it won't hurt to ask.
You can find copy-on-write collection semantics provided by CopyOnWriteArrayList, or its cousin CopyOnWriteArraySet (which implements a Set using the former).
Also suggested by DaoWen, have you considered using a ConcurrentHashMap? They guarantee that using a for loop as you've done in your example will be consistent.
Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
Internally, an Iterator is used for enhanced for over an Iterable.
You can craft a Set from this by utilizing Collections.newSetFromMap like follows:
final Set<E> safeSet = Collections.newSetFromMap(new ConcurrentHashMap<E, Boolean>());
...
/* guaranteed to reflect the state of the set at read-time */
for (final E elem : safeSet) {
...
}

I think your original idea was sound, and DaoWen did a good job getting the bugs out. Unless you can find something that does everything for you, it's better to understand these things than hope some magical class will do it for you. Magical classes can make your life easier and reduce the number of mistakes, but you do want to understand what they are doing.
ConcurrentSkipListSet might do a better job for you here. It could get rid of all your multithreading problems.
However, it is slower than a HashSet (usually--HashSets and SkipLists/Trees hard to compare). If you are doing a lot of reads for every write, what you've got will be faster. More importantly, if you update more than one entry at a time, your reads could see inconsistent results. If you expect that whenever there is an entry A there is an entry B, and vice versa, the skip list could give you one without the other.
With your current solution, to the readers, the contents of the map are always internally consistent. A read can be sure there's an A for every B. It can be sure that the size() method gives the precise number of elements that will be returned by the iterator. Two iterations will return the same elements in the same order.
In other words, allUpdatesGoThroughHere and ConcurrentSkipListSet are two good solutions to two different problems.

Can you use the Collections.synchronizedSet method? From HashSet Javadoc http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html
Set s = Collections.synchronizedSet(new HashSet(...));

Replace the synchronized by making global volatile and you'll be alright as far as the copy-on-write goes.
Although the assignment is atomic, in other threads it is not ordered with the writes to the object referenced. There needs to be a happens-before relationship which you get with a volatile or synchronising both reads and writes.
The problem of multiple updates happening at once is separate - use a single thread or whatever you want to do there.
If you used a synchronized for both reads and writes then it'd be correct but the performance may not be great with reads needing to hand-off. A ReadWriteLock may be appropriate, but you'd still have writes blocking reads.
Another approach to the publication issue is to use final field semantics to create an object that is (in theory) safe to be published unsafely.
Of course, there are also concurrent collections available.

Understanding JVM Happens-before and reorder

I am reading the JLS spec on memory model, 17.4.5 Happens-before Order.
I do not understand the first rule:
"# If x and y are actions of the same thread and x comes before y in program
order, then hb(x, y)."
Let's assume A an B are objects (instances of class object) that can be shared between multiple threads:
int i=A.getNum(); // ActionA
int j=B.getNum(); // ActionB
Three questions:
According to the above rule, does it mean hb(ActionA,ActionB)?
If the answer to 1 is true, does it mean according to happens-before rule, ActionB can not reordered to come before ActionA in any JVM that follows JSR133 memory model?
If 1 and 2 both are true, it seems that ActionA and ActionB are not relevant, why can not reorder them? just for this spec?

It is my understanding that:
you're right
they can be reordered, but only if action B doesn't depend on result of action A
Happens-before relationship doesn't say anything about reordering actions. It only says that if HB(A, B) holds, then action B must see memory effects of action A.
If action B doesn't use any result of action A, then there is no reason why they cannot be reordered. (In general, "use any result of another action" is pretty broad, and it can only be detected for quite simple actions, like memory reads/writes, not for actions using external resources like I/O operations, or time-based operations)

Yes, ActionA happens before ActionB. Read further in that section though. It doesn't necessarily mean that the JVM can't reorder these. It means that ActionB must observe the effect of ActionA, that is all. If ActionB never depends on ActionA's effect, that is trivially true.

You are basically correct in your understanding. However, the key thing to remember is:
Reordering is allowed if it doesn't affect the outcome of the thread in which it runs
This doesn't mean reordering isn't allowed if it affects other threads
It is this last fact that is a common source of errors and bewilderment in multi-threaded programming in java.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.