Interpretation of "program order rule" in Java concurrency

Interpretation of "program order rule" in Java concurrency - java

Program order rule states "Each action in a thread happens-before every action in that thread that comes later in the program order"
1.I read in another thread that an action is
reads and writes to variables
locks and unlocks of monitors
starting and joining with threads
Does this mean that reads and writes can be changed in order, but reads and writes cannot change order with actions specified in 2nd or 3rd lines?
2.What does "program order" mean?
Explanation with an examples would be really helpful.
Additional related question
Suppose I have the following code:
long tick = System.nanoTime(); //Line1: Note the time
//Block1: some code whose time I wish to measure goes here
long tock = System.nanoTime(); //Line2: Note the time
Firstly, it's a single threaded application to keep things simple. Compiler notices that it needs to check the time twice and also notices a block of code that has no dependency with surrounding time-noting lines, so it sees a potential to reorganize the code, which could result in Block1 not being surrounded by the timing calls during actual execution (for instance, consider this order Line1->Line2->Block1). But, I as a programmer can see the dependency between Line1,2 and Block1. Line1 should immediately precede Block1, Block1 takes a finite amount of time to complete, and immediately succeeded by Line2.
So my question is: Am I measuring the block correctly?
If yes, what is preventing the compiler from rearranging the order.
If no, (which is think is correct after going through Enno's answer) what can I do to prevent it.
P.S.: I stole this code from another question I asked in SO recently.

It probably helps to explain why such rule exist in the first place.
Java is a procedural language. I.e. you tell Java how to do something for you. If Java executes your instructions not in the order you wrote, it would obviously not work. E.g. in the below example, if Java would do 2 -> 1 -> 3 then the stew would be ruined.
1. Take lid off
2. Pour salt in
3. Cook for 3 hours
So, why does the rule not simply say "Java executes what you wrote in the order you wrote"? In a nutshell, because Java is clever. Take the following example:
1. Take eggs out of the freezer
2. Take lid off
3. Take milk out of the freezer
4. Pour egg and milk in
5. Cook for 3 hours
If Java was like me, it'll just execute it in order. However Java is clever enough to understand that it's more efficient AND that the end result would be the same should it do 1 -> 3 -> 2 -> 4 -> 5 (you don't have to walk to the freezer again, and that doesn't change the recipe).
So what the rule "Each action in a thread happens-before every action in that thread that comes later in the program order" is trying to say is, "In a single thread, your program will run as if it was executed in the exact order you wrote it. We might change the ordering behind the scene but we make sure that none of that would change the output.
So far so good. Why does it not do the same across multiple threads? In multi-thread programming, Java isn't clever enough to do it automatically. It will for some operations (e.g. joining threads, starting threads, when a lock (monitor) is used etc.) but for other stuff you need to explicitly tell it to not do reordering that would change the program output (e.g. volatile marker on fields, use of locks etc.).
Note:
Quick addendum about "happens-before relationship". This is a fancy way of saying no matter what reordering Java might do, stuff A will happen before stuff B. In our weird later stew example, "Step 1 & 3 happens-before step 4 "Pour egg and milk in" ". Also for example, "Step 1 & 3 do not need a happens-before relationship because they don't depend on each other in any way"
On the additional question & response to the comment
First, let us establish what "time" means in the programming world. In programming, we have the notion of "absolute time" (what's the time in the world now?) and the notion of "relative time" (how much time has passed since x?). In an ideal world, time is time but unless we have an atomic clock built in, the absolute time would have to be corrected time to time. On the other hand, for relative time we don't want corrections as we are only interested in the differences between events.
In Java, System.currentTime() deals with absolute time and System.nanoTime() deals with relative time. This is why the Javadoc of nanoTime states, "This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time".
In practice, both currentTimeMillis and nanoTime are native calls and thus the compiler can't practically prove if a reordering won't affect the correctness, which means it will not reorder the execution.
But let us imagine we want to write a compiler implementation that actually looks into native code and reorders everything as long as it's legal. When we look at the JLS, all that it tells us is that "You can reorder anything as long as it cannot be detected". Now as the compiler writer, we have to decide if the reordering would violate the semantics. For relative time (nanoTime), it would clearly be useless (i.e. violates the semantics) if we'd reorder the execution. Now, would it violate the semantics if we'd reorder for absolute time (currentTimeMillis)? As long as we can limit the difference from the source of the world's time (let's say the system clock) to whatever we decide (like "50ms")*, I say no. For the below example:
long tick = System.currentTimeMillis();
result = compute();
long tock = System.currentTimeMillis();
print(result + ":" + tick - tock);
If the compiler can prove that compute() takes less than whatever maximum divergence from the system clock we can permit, then it would be legal to reorder this as follows:
long tick = System.currentTimeMillis();
long tock = System.currentTimeMillis();
result = compute();
print(result + ":" + tick - tock);
Since doing that won't violate the spec we defined, and thus won't violate the semantics.
You also asked why this is not included in the JLS. I think the answer would be "to keep the JLS short". But I don't know much about this realm so you might want to ask a separate question for that.
*: In actual implementations, this difference is platform dependent.

The program order rule guarantees that, within individual threads, reordering optimizations introduced by the compiler cannot produce different results from what would have happened if the program had been executed in serial fashion. It makes no guarantees about what order the thread's actions may appear to occur in to any other threads if its state is observed by those threads without synchronization.
Note that this rule speaks only to the ultimate results of the program, and not to the order of individual executions within that program. For instance, if we have a method which makes the following changes to some local variables:
x = 1;
z = z + 1;
y = 1;
The compiler remains free to reorder these operations however it sees best fit to improve performance. One way to think of this is: if you could reorder these ops in your source code and still obtain the same results, the compiler is free to do the same. (And in fact, it can go even further and completely discard operations which are shown to have no results, such as invocations of empty methods.)
With your second bullet point the monitor lock rule comes into play: "An unlock on a monitor happens-before every subsequent lock on that main monitor lock." (Java Concurrency in Practice p. 341) This means that a thread acquiring a given lock will have a consistent view of the actions which occurred in other threads before releasing that lock. However, note that this guarantee only applies when two different threads release or acquire the same lock. If Thread A does a bunch of stuff before releasing Lock X, and then Thread B acquires Lock Y, Thread B is not assured to have a consistent view of A's pre-X actions.
It is possible for reads and writes to variables to be reordered with start and join if a.) doing so doesn't break within-thread program order, and b.) the variables have not had other "happens-before" thread synchronization semantics applied to them, say by storing them in volatile fields.
A simple example:
class ThreadStarter {
Object a = null;
Object b = null;
Thread thread;
ThreadStarter(Thread threadToStart) {
this.thread = threadToStart;
}
public void aMethod() {
a = new BeforeStartObject();
b = new BeforeStartObject();
thread.start();
a = new AfterStartObject();
b = new AfterStartObject();
a.doSomeStuff();
b.doSomeStuff();
}
}
Since the fields a and b and the method aMethod() are not synchronized in any way, and the action of starting thread does not change the results of the writes to the fields (or the doing of stuff with those fields), the compiler is free to reorder thread.start() to anywhere in the method. The only thing it could not do with the order of aMethod() would be to move the order of writing one of the BeforeStartObjects to a field after writing an AfterStartObject to that field, or to move one of the doSomeStuff() invocations on a field before the AfterStartObject is written to it. (That is, assuming that such reordering would change the results of the doSomeStuff() invocation in some way.)
The critical thing to bear in mind here is that, in the absence of synchronization, the thread started in aMethod() could theoretically observe either or both of the fields a and b in any of the states which they take on during the execution of aMethod() (including null).
Additional question answer
The assignments to tick and tock cannot be reordered with respect to the code in Block1 if they are to be actually used in any measurements, for example by calculating the difference between them and printing the result as output. Such reordering would clearly break Java's within-thread as-if-serial semantics. It changes the results from what would have been obtained by executing instructions in the specified program order. If the assignments aren't used for any measurements and have no side-effects of any kind on the program result, they'll likely be optimized away as no-ops by the compiler rather than being reordered.

Before I answer the question,
reads and writes to variables
Should be
volatile reads and volatile writes (of the same field)
Program order doesn't guarantee this happens before relationship, rather the happens-before relationship guarantees program order
To your questions:
Does this mean that reads and writes can be changed in order, but reads and writes cannot change order with actions specified in 2nd or 3rd lines?
The answer actually depends on what action happens first and what action happens second. Take a look at the JSR 133 Cookbook for Compiler Writers. There is a Can Reorder grid that lists the allowed compiler reordering that can occur.
For instance a Volatile Store can be re-ordered above or below a Normal Store but a Volatile Store cannot be be reordered above or below a Volatile Load. This is all assuming intrathread semantics still hold.
What does "program order" mean?
This is from the JLS
Among all the inter-thread actions performed by each thread t, the
program order of t is a total order that reflects the order in which
these actions would be performed according to the intra-thread
semantics of t.
In other words, if you can change the writes and loads of a variable in such a way that it will preform exactly the same way as you wrote it then it maintains program order.
For instance
public static Object getInstance(){
if(instance == null){
instance = new Object();
}
return instance;
}
Can be reordered to
public static Object getInstance(){
Object temp = instance;
if(instance == null){
temp = instance = new Object();
}
return temp;
}

it simply mean though the thread may be multiplxed, but the internal order of the thread's action/operation/instruction would remain constant (relatively)
thread1: T1op1, T1op2, T1op3...
thread2: T2op1, T2op2, T2op3...
though the order of operation (Tn'op'M) among thread may vary, but operations T1op1, T1op2, T1op3 within a thread will always be in this order, and so as the T2op1, T2op2, T2op3
for ex:
T2op1, T1op1, T1op2, T2op2, T2op3, T1op3

Java tutorial http://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html says that happens-before relationship is simply a guarantee that memory writes by one specific statement are visible to another specific statement. Here is an illustration
int x;
synchronized void x() {
x += 1;
}
synchronized void y() {
System.out.println(x);
}
synchronized creates a happens-before relationship, if we remove it there will be no guarantee that after thread A increments x thread B will print 1, it may print 0

Related

HashMap synchronized `put` but not `get`

I have the following code snippet that I'm trying to see if it can crash/misbehave at some point. The HashMap is being called from multiple threads in which put is inside a synchronized block and get is not. Is there any issue with this code? If so, what modification I need to make to see that happens given that I only use put and get this way, and there is no putAll, clear or any operations involved.
import java.util.HashMap;
import java.util.Map;
public class Main {
Map<Integer, String> instanceMap = new HashMap<>();
public static void main(String[] args) {
System.out.println("Hello");
Main main = new Main();
Thread thread1 = new Thread("Thread 1"){
public void run(){
System.out.println("Thread 1 running");
for (int i = 0; i <= 100; i++) {
System.out.println("Thread 1 " + i + "-" + main.getVal(i));
}
}
};
thread1.start();
Thread thread2 = new Thread("Thread 2"){
public void run(){
System.out.println("Thread 2 running");
for (int i = 0; i <= 100; i++) {
System.out.println("Thread 2 " + i + "-" + main.getVal(i));
}
}
};
thread2.start();
}
private String getVal(int key) {
check(key);
return instanceMap.get(key);
}
private void check(int key) {
if (!instanceMap.containsKey(key)) {
synchronized (instanceMap) {
if (!instanceMap.containsKey(key)) {
// System.out.println(Thread.currentThread().getName());
instanceMap.put(key, "" + key);
}
}
}
}
}
What I have checked out:
Are size(), put(), remove(), get() atomic in Java synchronized HashMap?
Extending HashMap<K,V> and synchronizing only puts
Why does HashMap.get(key) needs to be synchronized when change operations are synchronized?

I somewhat modified your code:
removed System.out.println() from the "hot" loop, it is internally synchronized
increased the number of iterations
changed printing to only print when there's an unexpected value
There's much more we can do and try, but this already fails, so I stopped there. The next step would we to rewrite the whole thing to jcsctress.
And voila, as expected, sometimes this happens on my Intel MacBook Pro with Temurin 17:
Exception in thread "Thread 2" java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "java.util.Map.get(Object)" is null
at com.gitlab.janecekpetr.playground.Playground.getVal(Playground.java:35)
at com.gitlab.janecekpetr.playground.Playground.lambda$0(Playground.java:21)
at java.base/java.lang.Thread.run(Thread.java:833)
Code:
private record Val(int index, int value) {}
private static final int MAX = 100_000;
private final Map<Integer, Integer> instanceMap = new HashMap<>();
public static void main(String... args) {
Playground main = new Playground();
Runnable runnable = () -> {
System.out.println(Thread.currentThread().getName() + " running");
Val[] vals = new Val[MAX];
for (int i = 0; i < MAX; i++) {
vals[i] = new Val(i, main.getVal(i));
}
System.out.println(Stream.of(vals).filter(val -> val.index() != val.value()).toList());
};
Thread thread1 = new Thread(runnable, "Thread 1");
thread1.start();
Thread thread2 = new Thread(runnable, "Thread 2");
thread2.start();
}
private int getVal(int key) {
check(key);
return instanceMap.get(key);
}
private void check(int key) {
if (!instanceMap.containsKey(key)) {
synchronized (instanceMap) {
if (!instanceMap.containsKey(key)) {
instanceMap.put(key, key);
}
}
}
}

To specifically explain the excellent sleuthing work in the answer by #PetrJaneček :
Every field in java has an evil coin attached to it. Anytime any thread reads the field, it flips this coin. It is not a fair coin - it is evil. It will flip heads 10,000 times in a row if that's going to ruin your day (for example, you may have code that depends on coinflips landing a certain way, or it'll fail to work. The coin is evil: You may run into the situation that just to ruin your day, during all your extensive testing, the coin flips heads, and during the first week in production it's all heads flips. And then the big new potential customer demos your app and the coin starts flipping some tails on you).
The coinflip decides which variant of the field is used - because every thread may or may not have a local cache of that field. When you write to a field from any thread? Coin is flipped, on tails, the local cache is updated and nothing more happens. Read from any thread? Coin is flipped. On tails, the local cache is used.
That's not really what happens of course (your JVM does not actually have evil coins nor is it out to get you), but the JMM (Java Memory Model), along with the realities of modern hardware, means that this abstraction works very well: It will reliably lead to the right answer when writing concurrent code, namely, that any field that is touched by more than one thread must have guards around it, or must never change at all during the entire duration of the multi-thread access 'session'.
You can force the JVM to flip the coin the way you want, by establishing so-called Happens Before relationships. This is explicit terminology used by the JMM. If 2 lines of code have a Happens-Before relationship (one is defined as 'happening before' the other, as per the JMM's list of HB relationship establishing actions), then it is not possible (short of a bug in the JVM itself) to observe any side effect of the HA line whilst not also observing all side effects of the HB line. (That is to say: the 'happens before' line happens before the 'happens after' line as far as your code could ever tell, though it's a bit of schrodiner's cat situation. If your code doesn't actually look at these files in a way that you'd ever be able to tell, then the JVM is free to not do that. And it won't, you can rely on the evil coin being evil: If the JMM takes a 'right', there will be some combination of CPU, OS, JVM release, version, and phase of the moon that combine to use it).
A small selection of common HB/HA establishing conditions:
The first line inside a synchronized(lock) block is HA relative to the hitting of that block in any other thread.
Exiting a synchronized(lock) block is HB relative to any other thread entering any synchronized(lock) block, assuming the two locks are the same reference.
thread.start() is HB relative to the first line that thread will run.
The 'natural' HB/HA: line X is HB relative to line Y if X and Y are run by the same thread and X is 'before it' in your code. You can't write x = 5; y = x; and have y be set by a version of x that did not witness the x = 5 happening (of course, if another thread is also modifying x, all bets are off unless you have HB/HA with whatever line is doing that).
writes and reads to volatile establish HB/HA but you usually can't get any guarantees about which direction.
This explains the way your code may fail: The get() call establishes absolutely no HB/HA relationship with the other thread that is calling put(), and therefore the get() call may or may not use locally cached variants of the various fields that HashMap uses internally, depending on the evil coin (which is of course hitting some fields; it'll be private fields in the HashMap implementation someplace, so you don't know which ones, but HashMap obviously has long-lived state, which implies fields are involved).
So why haven't you actually managed to 'see' your code asplode like the JMM says it will? Because the coin is EVIL. You cannot rely on this line of reasoning: "I wrote some code that should fail if the synchronizations I need aren't happening the way I want. I ran it a whole bunch of times, and it never failed, therefore, apparently this code is concurrency-safe and I can use this in my production code". That is simply not ever actually reliable. That's why you need to be thinking: Evil! That coin is out to get me! Because if you don't, you may be tempted to write test code like this.
You should be scared of writing code where more than one thread interacts with the same field. You should be bending over backwards to avoid it. Use message queues. Do your chat between threads by using databases, which have much nicer primitives for this stuff (transactions and isolation levels). Rewrite the code so that it takes a bunch of params up front and then runs without interacting with other threads via fields at all, until it is all done, and it then returns a result (and then use e.g. fork/join framework to make it all work). Make your webserver performant and using all the cores simply by relying on the fact that every incoming request will be its own thread, so the only thing that needs to happen for you to use all the cores is for that many folks to hit your server at the same time. If you don't have enough requests, great! Your server isn't busy so it doesn't matter you aren't using all the cores.
If truly you decide that interacting with the same field from multiple threads is the right answer, you need to think NASA programming mars rovers on the lines that interact with those fields, because tests simply cannot be relied upon. It's not as hard as it sounds - especially if you keep the actual interacting with the relevant fields down to a minimum and keep thinking: "Have I established HB/HA"?
In this case, I think Petr figured it out correctly: System.out.println is hella slow and does various synchronizing actions. JMM is a package deal, and commutative: Once HB/HA establishes, everything the HB line changed is observable to the code in the HA line, and add in the natural rule, which means all code that follows the HA line cannot possibly observe a universe where something any line before the HB line did is not yet visible. In other words, the System.out.println statements HB/HA with each other in some order, but you can't rely on that (System.out is not specced to synchronize. But, just about every implementation does. You should not rely on implementation details, and I can trivially write you some java code that is legal, compiles, runs, and breaks no contracts, because you can set System.out with System.setOut - that does not synchronize when interacting with System.out!). The evil coin in this case took the shape of 'accidental' synchronization via intentionally unspecced behaviour of System.out.

The following explanation is more in line with the terminology used in the JMM. Could be useful if you want a more solid understanding of this topic.
2 Actions are conflicting when they access the same address and there is at least 1 write.
2 Actions are concurrent when they are not ordered by a happens-before relation (there is no happens-before edge between them).
2 Actions are in data race when they are conflicting and concurrent.
When there are data races in your program, weird problems can happen like unexpected reordering of instructions, visibility problems, or atomicity problems.
So what makes up the happens-before relation. If a volatile read observes a particular volatile write, then there is a happens-before edge between the write and the read. This means that read will not only see that write, but everything that happened before that write. There are other sources of happens-before edges like the release of a monitor and subsequent acquire of the same monitor. And there is a happens-before edge between A, B when A occurs before B in the program order. Note: the happens-before relation is transitive, so if A happens-before B and B happens-before C, then A happens-before C.
In your case, you have a get/put operations which are conflicting since they access the same address(es) and there is at least 1 write.
The put/get action are concurrent, since is no happens-before edge between writing and reading because even though the write releases the monitor, the get doesn't acquire it.
Since the put/get operations are concurrent and conflicting, they are in data race.
The simplest way to fix this problem, is to execute the map.get in a synchronized block (using the same monitor). This will introduce the desired happens-before edge and makes the actions sequential instead of concurrent and as consequence, the data-race disappears.
A better-performing solution would be to make use of a ConcurrentHashMap. Instead of a single central lock, there are many locks and they can be acquired concurrently to improve scalability and performance. I'm not going to dig into the optimizations of the ConcurrentHashMap because would create confusion.
[Edit]
Apart from a data-race, your code also suffers from race conditions.

Explain how JIT reordering works

I have been reading a lot about synchronization in Java and all the problems that can occur. However, what I'm still slightly confused about is how the JIT can reorder a write.
For instance, a simple double check lock makes sense to me:
class Foo {
private volatile Helper helper = null; // 1
public Helper getHelper() { // 2
if (helper == null) { // 3
synchronized(this) { // 4
if (helper == null) // 5
helper = new Helper(); // 6
}
}
return helper;
}
}
We use volatile on line 1 to enforce a happens-before relationship. Without it, is entirely possible for the JIT to reoder our code. For example:
Thread 1 is at line 6 and memory is allocated to helper however, the constructor has not yet run because the JIT could reordering our code.
Thread 2 comes in at line 2 and gets an object that is not fully created yet.
I understand this, but I don't fully understand the limitations that the JIT has on reordering.
For instance, say I have a method that creates and puts a MyObject into a HashMap<String, MyObject> (I know that a HashMapis not thread safe and should not be used in a multi-thread environment, but bear with me). Thread 1 calls createNewObject:
public class MyObject {
private Double value = null;
public MyObject(Double value) {
this.value = value;
}
}
Map<String, MyObject> map = new HashMap<String, MyObject>();
public void createNewObject(String key, Double val){
map.put(key, new MyObject( val ));
}
At the same time thread 2 calls a get from the Map.
public MyObject getObject(String key){
return map.get(key);
}
Is it possible for thread 2 to receive an object from getObject(String key) that is not fully constructed? Something like:
Thread 1: Allocate memory for new MyObject( val )
Thread 1: Place object in map
Thread 2: call getObject(String key)
Thread 1: Finish constructing the new MyObject.
Or will map.put(key, new MyObject( val )) not put an object into the map until it's fully constructed?
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
In a nutshell can it only reorder when creating a new Object and assigning it to a reference variable, like the double checked lock? A complete rundown on the JIT might be much for a SO answer, but what I'm really curious about is how it can reorder a write (like line 6 on the double checked lock) and what stops it from putting an object into a Map that is not fully constructed.

WARNING: WALL OF TEXT
The answer to your question is before the horizontal line. I will continue to explain deeper the fundamental problem in the second portion of my answer (which is not related to the JIT, so that's it if you are only interested in the JIT). The answer to the second part of your question lies at the bottom because it relies on what I describe further.
TL;DR The JIT will do whatever it wants, the JMM will do whatever it wants, being valid under the condition that you let them by writing thread unsafe code.
NOTE: "initialization" refers to what happens in the constructor, which excludes anything else such as calling a static init method after constructing etc...
"If the reordering produces results consistent with a legal execution, it is not illegal." (JLS 17.4.5-200)
If the result of a set of actions conforms to a valid chain of execution as per the JMM, then the result is allowed regardless of whether the author intended the code to produce that result or not.
"The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.
This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization" (JLS 17.4).
The JIT will reorder whatever it sees fit unless we do not allow it using the JMM (in a multithreaded environment).
The details of what the JIT can or will do is nondeterministic. Looking at millions of samples of runs will not produce a meaningful pattern because reorderings are subjective, they depend on very specific details such as CPU arch, timings, heuristics, graph size, JVM vendor, bytecode size, etc... We only know that the JIT will assume that the code runs in a single threaded environment when it does not need to conform to the JMM. In the end, the JIT matters very little to your multithreaded code. If you want to dig deeper, see this SO answer and do a little research on such topics as IR Graphs, the JDK HotSpot source, and compiler articles such as this one. But again, remember that the JIT has very little to do with your multithreaded code transforms.
In practice, the "object that is not fully created yet" is not a side effect of the JIT but rather the memory model (JMM). In summary, the JMM is a specification that puts forth guarantees of what can and cannot be results of a certain set of actions, where actions are operations that involve a shared state. The JMM is more easily understood by higher level concepts such as atomicity, memory visibility, and ordering, those three of which are components of a thread-safe program.
To demonstrate this, it is highly unlikely for your first sample of code (the DCL pattern) to be modified by the JIT that would produce "an object that is not fully created yet." In fact, I believe that it is not possible to do this because it would not follow the order or execution of a single-threaded program.
So what exactly is the problem here?
The problem is that if the actions aren't ordered by a synchronization order, a happens-before order, etc... (described again by JLS 17.4-17.5) then threads are not guaranteed to see the side effects of performing such actions. Threads might not flush their caches to update the field, threads might observe the write out of order. Specific to this example, threads are allowed to see the object in an inconsistent state because it is not properly published. I'm sure that you have heard of safe publishing before if you have ever worked even the tiniest bit with multithreading.
You might ask, well if single-threaded execution cannot be modified by the JIT, why can the multithreaded version be?
Put simply, it's because the thread is allowed to think ("perceive" as usually written in textbooks) that the initialization is out of order due to the lack of proper synchronization.
"If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic" (The "Double-Checked Locking is Broken" Declaration).
Making the object immutable ensures that the state is fully initialized when the constructor exits.
Remember that object construction is always unsynchronized. An object that is being initialized is ONLY visible and safe with respect to the thread that constructed it. In order for other threads to see the initialization, you must publish it safely. Here are those ways:
"There are a few trivial ways to achieve safe publication:
Exchange the reference through a properly locked field (JLS 17.4.5)
Use static initializer to do the initializing stores (JLS 12.4)
Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
Initialize the value into a final field (JLS 17.5)."
(Safe Publication and Safe Initialization in Java)
Safe publication ensures that other threads will be able to see the fully initialized objects when after it finishes.
Revisiting our idea that threads are only guaranteed to see side effects if they are in order, the reason that you need volatile is so that your write to the helper in thread 1 is ordered with respect to the read in thread 2. Thread 2 is not allowed to perceive the initialization after the read because it occurs before the write to helper. It piggy backs on the volatile write such that the read must happen after the initialization AND THEN the write to the volatile field (transitive property).
To conclude, an initialization will only occur after the object is created only because another thread THINKS that is the order. An initialization will never occur after construction due to a JIT optimisation. You can fix this by ensuring proper publication through a volatile field or by making your helper immutable.
Now that I've described the general concepts behind how publication works in the JMM, hopefully understanding how your second example won't work will be easy.
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
To the constructing thread, it will put it into the map after initialization.
To the reader thread, it can see whatever the hell it wants. (improperly constructed object in HashMap? That is definitely within the realm of possibility).
What you described with your 4 steps is completely legal. There is no order between assigning value or adding it to the map, thus thread 2 can perceive the initialization out of order since MyObject was published unsafely.
You can actually fix this problem by just converting to ConcurrentHashMap and getObject() will be completely thread safe as once you put the object in the map, the initialization will occur before the put and both will need to occur before the get as a result of ConcurrentHashMap being thread safe. However, once you modify the object, it will become a management nightmare because you need to ensure that updating the state is visible and atomic - what if a thread retrieves an object and another thread updates the object before the first thread could finish modifying and putting it back in the map?
T1 -> get() MyObject=30 ------> +1 --------------> put(MyObject=31)
T2 -------> get() MyObject=30 -------> +1 -------> put(MyObject=31)
Alternatively you could also make MyObject immutable, but you still need to map the map ConcurrentHashMap in order for other threads to see the put - thread caching behavior might cache an old copy and not flush and keep reusing the old version. ConcurrentHashMap ensures that its writes are visible to readers and ensures thread-safety. Recalling our 3 prerequisites for thread-safety, we get visibility from using a thread-safe data structure, atomicity by using an immutable object, and finally ordering by piggybacking on ConcurrentHashMap's thread safety.
To wrap up this entire answer, I will say that multithreading is a very difficult profession to master, one that I myself most definitely have not. By understanding concepts of what makes a program thread-safe and thinking about what the JMM allows and guarantees, you can ensure that your code will do what you want it to do. Bugs in multithreaded code occur often as a result of the JMM allowing a counterintuitive result that is within its parameters, not the JIT doing performance optimisations. Hopefully you will have learned something a little bit more about multithreading if you read everything. Thread safety should be achieved by building a repertoire of thread-safe paradigms rather than using little inconveniences of the spec (Lea or Bloch, not even sure who said this).

Does Java volatile read flush writes, and does volatile write update reads

I understand read-acquire(does not reorder with subsequent read/write operations after it), and write-release(does not reorder with read/write operations preceding it).
My q is:-
In case of read-acquire, do the writes preceding it get flushed?
In case of write-release, do the previous reads get updated?
Also, is read-acquire same as volatile read, and write release same as volatile write in Java?
Why this is important is that, let's take case of write-release..
y = x; // a read.. let's say x is 1 at this point
System.out.println(y);// 1 printed
//or you can also consider System.out.println(x);
write_release_barrier();
//somewhere here, some thread sets x = 2
ready = true;// this is volatile
System.out.println(y);// or maybe, println(x).. what will be printed?
At this point, is x 2 or 1?
Here, consider ready to be volatile.
I understand that all stores before volatile will first be made visible.. and then only the volatile will be made visible. Thanks.
Ref:- http://preshing.com/20120913/acquire-and-release-semantics/

No: not all writes are flushed, nor are all reads updated.
Java works on a "happens-before" basis for multithreading. Basically, if A happens-before B, and B happens-before C, then A happens-before C. So your question amounts to whether x=2 formally happens-before some action that reads x.
Happens-before edges are basically established by synchronizes-with relationships, which are defined in JLS 17.4.4. There are a few different ways to do this, but for volatiles, it's basically amounts to a write to volatile happening-before a read to that same volatile:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
Given that, if your thread writes ready = true, then that write alone doesn't mean anything happens-before it (as far as that write is concerned). It's actually the opposite; that write to ready happens-before things on other threads, iff they read ready.
So, if the other thread (that sets x = 2) had written to ready after it set x = 2, and this thread (that you posted above) then read ready, then it would see x = 2. That is because the write happens-before the read, and the reading thread therefore sees everything that the writing thread had done (up to and including the write). Otherwise, you have a data race, and basically all bets are off.
A couple additional notes:
If you don't have a happens-before edge, you may still see the update; it's just that you're not guaranteed to. So, don't assume that if you don't read a write to ready, then you'll still see x=1. You might see x=1, or x=2, or possibly some other write (up to and including the default value of x=0)
In your example, y is always going to be 1, because you don't re-read x after the "somewhere here" comment. For purposes of this answer, I've assumed that there's a second y=x line immediately before or after ready = true. If there's not, then y's value will be unchanged from what it was in the first println, (assuming no other thread directly changes it -- which is guaranteed if it's a local variable), because actions within a thread always appear as if they are not reordered.

The Java memory model is not specified in terms of "read-acquire" and "write-release". These terms / concepts come from other contexts, and as the article you referenced makes abundantly clear, they are often used (by different experts) to mean different things.
If you want to understand how volatiles work in Java, you need to understand the Java memory model and the Java terminology ... which is (fortunately) well-founded and precisely specified1. Trying to map the Java memory model onto "read-acquire" and "write-release" semantics is a bad idea because:
"read-acquire" and "write-release" terminology and semantics are not well specified, and
a hypothetical JMM -> "read-acquire" / "write-release" semantic mapping is only one possible implementation of the JMM. Others mappings may exist with different, and equally valid semantics.
1 - ... modulo that experts have noted flaws in some versions of the JMM. But the point is that a serious attempt has been made to provide a theoretically sound specification ... as part of the Java Language Specification.

No, reading a volatile variable will not flush preceding writes. Visible actions will ensure that preceding actions are visible, but reading a volatile variable is not visible to other threads.
No, writing to a volatile variable will not clear the cache of previously read values. It is only guaranteed to flush previous writes.
In your example, clearly y will still be 1 on the last line. Only one assignment has been made to y, and that was 1, according to the preceding output. Perhaps that was a typo, and you meant to write println(x), but even then, the value of 2 is not guaranteed to be visible.

For your 1st question, answer is that FIFO order
For your 2nd question: pls check Volatile Vs Static in java

Could the JIT collapse two volatile reads as one in certain expressions?

Suppose we have a volatile int a. One thread does
while (true) {
a = 1;
a = 0;
}
and another thread does
while (true) {
System.out.println(a+a);
}
Now, would it be illegal for a JIT compiler to emit assembly corresponding to 2*a instead of a+a?
On one hand the very purpose of a volatile read is that it should always be fresh from memory.
On the other hand, there's no synchronization point between the two reads, so I can't see that it would be illegal to treat a+a atomically, in which case I don't see how an optimization such as 2*a would break the spec.
References to JLS would be appreciated.

Short answer:
Yes, this optimization is allowed. Collapsing two sequential read operations produes the observable behavior of the sequence being atomic, but does not appear as a reordering of operations. Any sequence of actions performed on a single thread of execution can be executed as an atomic unit. In general, it is difficult to ensure a sequence of operations executes atomically, and it rarely results in a performance gain because most execution environments introduce overhead to execute items atomically.
In the example given by the original question, the sequence of operations in question is the following:
read(a)
read(a)
Performing these operations atomically guarantees that the value read on the first line is equal to the value read on the second line. Furthermore, it means the value read on the second line is the value contained in a at the time the first read was executed (and vice versa, because atomic both read operations occurred at the same time according to the observable execution state of the program). The optimization in question, which is reusing the value of the first read for the second read, is equivalent to the compiler and/or JIT executing the sequence atomically, and is thus valid.
Original longer answer:
The Java Memory Model describes operations using a happens-before partial ordering. In order to express the restriction that the first read r1 and second read r2 of a cannot be collapsed, you need to show that some operation is semantically required to appear between them.
The operations on the thread with r1 and r2 is the following:
--> r(a) --> r(a) --> add -->
To express the requirement that something (say y) lie between r1 and r2, you need to require that r1 happens-before y and y happens-before r2. As it happens, there is no rule where a read operation appears on the left side of a happens-before relationship. The closest you could get is saying y happens-before r2, but the partial order would allow y to also occur before r1, thus collapsing the read operations.
If no scenario exists which requires an operation to fall between r1 and r2, then you can declare that no operation ever appears between r1 and r2 and not violate the required semantics of the language. Using a single read operation would be equivalent to this claim.
Edit My answer is getting voted down, so I'm going to go into additional details.
Here are some related questions:
Is the Java compiler or JVM required to collapse these read operations?
No. The expressions a and a used in the add expression are not constant expressions, so there is no requirement that they be collapsed.
Does the JVM collapse these read operations?
To this, I'm not sure of the answer. By compiling a program and using javap -c, it's easy to see that the Java compiler does not collapse these read operations. Unfortunately it's not as easy to prove the JVM does not collapse the operations (or even tougher, the processor itself).
Should the JVM collapse these read operations?
Probably not. Each optimization takes time to execute, so there is a balance between the time it takes to analyze the code and the benefit you expect to gain. Some optimizations, such as array bounds check elimination or checking for null references, have proven to have extensive benefits for real-world applications. The only case where this particular optimization has the possibility of improving performance is cases where two identical read operations appear sequentially.
Furthermore, as shown by the response to this answer along with the other answers, this particular change would result in an unexpected behavior change for certain applications which users may not desire.
Edit 2: Regarding Rafael's description of a claim that two read operations that cannot be reordered. This statement is designed to highlight the fact that caching the read operation of a in the following sequence could produce an incorrect result:
a1 = read(a)
b1 = read(b)
a2 = read(a)
result = op(a1, b1, a2)
Suppose initially a and b have their default value 0. Then you execute just the first read(a).
Now suppose another thread executes the following sequence:
a = 1
b = 1
Finally, suppose the first thread executes the line read(b). If you were to cache the originally read value of a, you would end up with the following call:
op(0, 1, 0)
This is not correct. Since the updated value of a was stored before writing to b, there is no way to read the value b1 = 1 and then read the value a2 = 0. Without caching, the correct sequence of events leads to the following call.
op(0, 1, 1)
However, if you were to ask the question "Is there any way to allow the read of a to be cached?", the answer is yes. If you can execute all three read operations in the first thread sequence as an atomic unit, then caching the value is allowed. While synchronizing across multiple variables is difficult and rarely provides an opportunistic optimization advantage, it is certainly conceivable to encounter an exception. For example, suppose a and b are each 4 bytes, and they appear sequentially in memory with a aligned on an 8-byte boundary. A 64-bit process could implement the sequence read(a) read(b) as an atomic 64-bit load operation, which would allow the value of a to be cached (effectively treating all three read operations as an atomic operation instead of just the first two).

In my original answer, I argued against the legality of the suggested optimization. I backed this mainly from information of the JSR-133 cookbook where it states that a volatile read must not be reordered with another volatile read and where it further states that a cached read is to be treated as a reordering. The latter statement is however formulated with some ambiguouity which is why I went through the formal definition of the JMM where I did not find such indication. Therefore, I would now argue that the optimization is allowed. However, the JMM is quite complex and the discussion on this page indicates that this corner case might be decided differently by someone with a more thorough understanding of the formalism.
Denoting thread 1 to execute
while (true) {
System.out.println(a // r_1
+ a); // r_2
}
and thread 2 to execute:
while (true) {
a = 0; // w_1
a = 1; // w_2
}
The two reads r_i and two writes w_i of a are synchronization actions as a is volatile (JSR 17.4.2). They are external actions as variable a is used in several threads. These actions are contained in the set of all actions A. There exists a total order of all synchronization actions, the synchronization order which is consistent with program order for thread 1 and thread 2 (JSR 17.4.4). From the definition of the synchronizes-with partial order, there is no edge defined for this order in the above code. As a consequence, the happens-before order only reflects the intra-thread semantics of each thread (JSR 17.4.5).
With this, we define W as a write-seen function where W(r_i) = w_2 and a value-written function V(w_i) = w_2 (JLS 17.4.6). I took some freedom and eliminated w_1 as it makes this outline of a formal proof even simpler. The question is of this proposed execution E is well-formed (JLS 17.5.7). The proposed execution E obeys intra-thread semantics, is happens-before consistent, obeys the synchronized-with order and each read observes a consistent write. Checking the causality requirements is trivial (JSR 17.4.8). I do neither see why the rules for non-terminating executions would be relevant as the loop covers the entire discussed code (JLS 17.4.9) and we do not need to distinguish observable actions.
For all this, I cannot find any indication of why this optimization would be forbidden. Nevertheless, it is not applied for volatile reads by the HotSpot VM as one can observe using -XX:+PrintAssembly. I assume that the performance benefits are however minor and this pattern is not normally observed.
Remark: After watching the Java memory model pragmatics (multiple times), I am pretty sure, this reasoning is correct.

On one hand the very purpose of a volatile read is that it should always be fresh from memory.
That is not how the Java Language Specification defines volatile. The JLS simply says:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
Therefore, a write to a volatile variable happens-before (and is visible to) any subsequent reads of that same variable.
This constraint is trivially satisfied for a read that is not subsequent. That is, volatile only ensures visibility of a write if the read is known to occur after the write.
This is not the case in your program. For every well formed execution that observes a to be 1, I can construct another well formed execution where a is observed to be 0, simply be moving the read after the write. This is possible because the happens-before relation looks as follows:
write 1 --> read 1 write 1 --> read 1
| | | |
| v v |
v --> read 1 write 0 v
write 0 | vs. | --> read 0
| | | |
v v v v
write 1 --> read 1 write 1 --> read 1
That is, all the JMM guarantees for your program is that a+a will yield 0, 1 or 2. That is satisfied if a+a always yields 0. Just as the operating system is permitted to execute this program on a single core, and always interrupt thread 1 before the same instruction of the loop, the JVM is permitted to reuse the value - after all, the observable behavior remains the same.
In general, moving the read across the write violates happens-before consistency, because some other synchronization action is "in the way". In the absence of such intermediary synchronization actions, a volatile read can be satisfied from a cache.

Modified the OP Problem a little
volatile int a
//thread 1
while (true) {
a = some_oddNumber;
a = some_evenNumber;
}
// Thread 2
while (true) {
if(isOdd(a+a)) {
break;
}
}
If the above code have been executed Sequentially, then there exist a valid Sequential Consistent Execution which will break the thread2 while loop.
Whereas if compiler optimizes a+a to 2a then thread2 while loop will never exist.
So the above optimization will prohibit one particular execution if it had been Sequentially Executed Code.
Main Question, is this optimization a Problem ?
Q. Is the Transformed code Sequentially Consistent.
Ans. A program is correctly synchronized if, when it is executed in a sequentially consistent manner, there are no data races. Refer Example 17.4.8-1 from JLS chapter 17
Sequential consistency: the result of any execution is the same as
if the read and write operations by all processes were executed in
some sequential order and the operations of each individual
process appear in this sequence in the order specified by its
program [Lamport, 1979].
Also see http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.3
Sequential Consistency is a strong guarantee. The Execution Path where compiler optimizes a+a as 2a is also a valid Sequentially Consistent Execution.
So the Answer is Yes.
Q. Is the code violates happens before guarantees.
Ans. Sequential Consistency implies that happens before guarantee is valid here .
So the Answer is Yes. JLS ref
So i don't think optimization is invalid legally at least in the OP case.
The case where the Thread 2 while loops stucks into an infinte is also quite possible without compiler transformation.

As laid out in other answers there are two reads and two writes. Imagine the following execution (T1 and T2 denote two threads), using annotations that match the JLS statement below:
T1: a = 0 //W(r)
T2: read temp1 = a //r_initial
T1: a = 1 //w
T2: read temp2 = a //r
T2: print temp1+temp2
In a concurrrent environment this is definitely a possible thread interleaving. Your question is then: would the JVM be allowed to make r observe W(r) and read 0 instead of 1?
JLS #17.4.5 states:
A set of actions A is happens-before consistent if for all reads r in A, where W(r) is the write action seen by r, it is not the case that either hb(r, W(r)) or that there exists a write w in A such that w.v = r.v and hb(W(r), w) and hb(w, r).
The optimisation you propose (temp = a; print (2 * temp);) would violate that requirement. So your optimisation can only work if there is no intervening write between r_initial and r, which can't be guaranteed in a typical multi threaded framework.
As a side comment, note however that there is no guarantee as to how long it will take for the writes to become visible from the reading thread. See for example: Detailed semantics of volatile regarding timeliness of visibility.

Java thread safety of list

I have a List, which is to be used either in thread-safe context or in a non thread-safe one. Which one will it be, is impossible to determine in advance.
In such special case, whenever the list enters non-thread-safe context, I wrap it using
Collections.synchronizedList(...)
But I don't want to wrap it, if doesn't enter non-thread-safe context. F.e., because list is huge and used intensively.
I've read about Java, that its optimization policy is strict about multi-threading - if you don't synchronize your code correctly, it is not guaranteed to be executed correctly in inter-thread context - it can reorganize code significantly, providing consistency in context of only one thread (see http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.3). F.e.,
op1;
op2;
op3;
may be reorganized in
op3;
op2;
op1;
, if it produces same result (in one-thread context).
Now I wonder, if I
fill my list before wrapping it by synchronizedList,
then wrap it up,
then use by different thread
, - is there a possibility, that different thread will see this list filled only partially or not filled at all? Might JVM postpone (1) till after (3)? Is there a correct and fast way to make (big) List being non-thread-safe to become thread-safe?

When you give your list to another thread by thread-safe means (for example using a synchronized block, a volatile variable or an AtomicReference), it is guaranteed that the second thread sees the whole list in the state it was when transferring (or any later state, but not an earlier state).
If you don't change it afterwards, you also don't need your synchronizedList.
Edit (after some comments, to backup my claim):
I assume the following:
we have a volatile variable list.
volatile List<String> list = null;
Thread A:
creates a List L and fills L with elements.
sets list to point to L (this means writes L to list)
does no further modifications on L.
Sample source:
public void threadA() {
List<String> L = new ArrayList<String>();
L.add("Hello");
L.add("World");
list = l;
}
Thread B:
reads K from list
iterates over K, printing the elements.
Sample source:
public void threadB() {
List<String> K = list;
for(String s : K) {
System.out.println(s);
}
}
All other threads do not touch the list.
Now we have this:
The actions 1-A and 2-A in Thread A are ordered by program order so 1 comes before 2.
The action 1-B and 2-B in Thread B are ordered by program order so 1 comes before 2.
The action 2-A in Thread A and action 1-B in Thread are ordered by synchronization order, so 2-A comes before 1-B, since
A write to a volatile variable (§8.3.1.4) v synchronizes-with all subsequent reads of v by any thread (where subsequent is defined according to the synchronization order).
The happens-before-order is the transitive closure of the program orders of the individual threads and the synchronization order. So we have:
1-A happens-before 2-A happens-before 1-B happens-before 2-B
and thus 1-A happens-before 2-B.
Finally,
If one action happens-before another, then the first is visible to and ordered before the second.
So our iterating thread really can see the whole list, and not only some parts of it.
So, transmitting the list with a single volatile variable is sufficient, and we don't need synchronization in this simple case.
One more edit (here, since I have more formatting freedom than in the comments) about the program order of Thread A. (I also added some sample Code above.)
From the JLS (section program order):
Among all the inter-thread actions performed by each thread t, the program order
of t is a total order that reflects the order in which these actions would be
performed according to the intra-thread semantics of t.
So, what are the intra-thread semantics of thread A?
Some paragraphs above:
The memory model determines what values can be read at every point in the program.
The actions of each thread in isolation must behave as governed by the semantics
of that thread, with the exception that the values seen by each read are
determined by the memory model. When we refer to this, we say that the program
obeys intra-thread semantics. Intra-thread semantics are the semantics for
single threaded programs, and allow the complete prediction of the behavior of
a thread based on the values seen by read actions within the thread. To determine
if the actions of thread t in an execution are legal, we simply evaluate the
implementation of thread t as it would be performed in a single threaded context,
as defined in the rest of this specification.
The rest of this specification includes section 14.2 (Blocks):
A block is executed by executing each of the local variable declaration
statements and other statements in order from first to last (left to right).
So, the program order is indeed the order in which the statements/expressions
are given in the program source code.
Thus, in our example source, the memory actions create a new ArrayList, add "Hello", add "World", and assign to list (the first three consist of more subactions) indeed are in this program order.
(The VM does not have to execute the actions in this order, but this program order still contributes to the happens-before order, and thus to the visibility to other threads.)

If you fill your list and then wrap it in the same thread, you'll be safe.
However there are several things to bear in mind:
Collections.synchronizedList() only guarantees you a low-level thread safety. Complex operations, like if ( !list.contains( elem ) ) list.add( elem ); will still need custom synchronization code.
Even this guarantee is void if any thread can obtain a reference to the original list. Make sure this doesn't happen.
Get the functionality right first, then you can start worrying about synchronization being too slow. I very rarely encountered code where the speed of Java synchronization was a serious factor.
Update: I'd like to add a few excerpts from the JLS to hopefully clarify matters a bit.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
This is why filling the list and then wrapping it in the same thread is a safe option. But more importantly:
This is an extremely strong guarantee for programmers. Programmers do not need to reason about reorderings to determine that their code contains data races. Therefore they do not need to reason about reorderings when determining whether their code is correctly synchronized. Once the determination that the code is correctly synchronized is made, the programmer does not need to worry that reorderings will affect his or her code.
The message is clear: make sure that your program, executed in the order which you wrote your code in, doesn't contain data races, and don't worry about reordering.

If the traverses happens more often than writing I'd look into CopyOnWriteArrayList.
A thread-safe variant of ArrayList in
which all mutative operations (add,
set, and so on) are implemented by
making a fresh copy of the underlying
array.

Take a look at how AtomicInteger (and the likes) are implemented to be thread safe & not synchronized. The mechanism does not introduce synchronization, but if one is needed, it handles it gracefully.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.