Java thread safety of list

Java thread safety of list - java

I have a List, which is to be used either in thread-safe context or in a non thread-safe one. Which one will it be, is impossible to determine in advance.
In such special case, whenever the list enters non-thread-safe context, I wrap it using
Collections.synchronizedList(...)
But I don't want to wrap it, if doesn't enter non-thread-safe context. F.e., because list is huge and used intensively.
I've read about Java, that its optimization policy is strict about multi-threading - if you don't synchronize your code correctly, it is not guaranteed to be executed correctly in inter-thread context - it can reorganize code significantly, providing consistency in context of only one thread (see http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.3). F.e.,
op1;
op2;
op3;
may be reorganized in
op3;
op2;
op1;
, if it produces same result (in one-thread context).
Now I wonder, if I
fill my list before wrapping it by synchronizedList,
then wrap it up,
then use by different thread
, - is there a possibility, that different thread will see this list filled only partially or not filled at all? Might JVM postpone (1) till after (3)? Is there a correct and fast way to make (big) List being non-thread-safe to become thread-safe?

When you give your list to another thread by thread-safe means (for example using a synchronized block, a volatile variable or an AtomicReference), it is guaranteed that the second thread sees the whole list in the state it was when transferring (or any later state, but not an earlier state).
If you don't change it afterwards, you also don't need your synchronizedList.
Edit (after some comments, to backup my claim):
I assume the following:
we have a volatile variable list.
volatile List<String> list = null;
Thread A:
creates a List L and fills L with elements.
sets list to point to L (this means writes L to list)
does no further modifications on L.
Sample source:
public void threadA() {
List<String> L = new ArrayList<String>();
L.add("Hello");
L.add("World");
list = l;
}
Thread B:
reads K from list
iterates over K, printing the elements.
Sample source:
public void threadB() {
List<String> K = list;
for(String s : K) {
System.out.println(s);
}
}
All other threads do not touch the list.
Now we have this:
The actions 1-A and 2-A in Thread A are ordered by program order so 1 comes before 2.
The action 1-B and 2-B in Thread B are ordered by program order so 1 comes before 2.
The action 2-A in Thread A and action 1-B in Thread are ordered by synchronization order, so 2-A comes before 1-B, since
A write to a volatile variable (§8.3.1.4) v synchronizes-with all subsequent reads of v by any thread (where subsequent is defined according to the synchronization order).
The happens-before-order is the transitive closure of the program orders of the individual threads and the synchronization order. So we have:
1-A happens-before 2-A happens-before 1-B happens-before 2-B
and thus 1-A happens-before 2-B.
Finally,
If one action happens-before another, then the first is visible to and ordered before the second.
So our iterating thread really can see the whole list, and not only some parts of it.
So, transmitting the list with a single volatile variable is sufficient, and we don't need synchronization in this simple case.
One more edit (here, since I have more formatting freedom than in the comments) about the program order of Thread A. (I also added some sample Code above.)
From the JLS (section program order):
Among all the inter-thread actions performed by each thread t, the program order
of t is a total order that reflects the order in which these actions would be
performed according to the intra-thread semantics of t.
So, what are the intra-thread semantics of thread A?
Some paragraphs above:
The memory model determines what values can be read at every point in the program.
The actions of each thread in isolation must behave as governed by the semantics
of that thread, with the exception that the values seen by each read are
determined by the memory model. When we refer to this, we say that the program
obeys intra-thread semantics. Intra-thread semantics are the semantics for
single threaded programs, and allow the complete prediction of the behavior of
a thread based on the values seen by read actions within the thread. To determine
if the actions of thread t in an execution are legal, we simply evaluate the
implementation of thread t as it would be performed in a single threaded context,
as defined in the rest of this specification.
The rest of this specification includes section 14.2 (Blocks):
A block is executed by executing each of the local variable declaration
statements and other statements in order from first to last (left to right).
So, the program order is indeed the order in which the statements/expressions
are given in the program source code.
Thus, in our example source, the memory actions create a new ArrayList, add "Hello", add "World", and assign to list (the first three consist of more subactions) indeed are in this program order.
(The VM does not have to execute the actions in this order, but this program order still contributes to the happens-before order, and thus to the visibility to other threads.)

If you fill your list and then wrap it in the same thread, you'll be safe.
However there are several things to bear in mind:
Collections.synchronizedList() only guarantees you a low-level thread safety. Complex operations, like if ( !list.contains( elem ) ) list.add( elem ); will still need custom synchronization code.
Even this guarantee is void if any thread can obtain a reference to the original list. Make sure this doesn't happen.
Get the functionality right first, then you can start worrying about synchronization being too slow. I very rarely encountered code where the speed of Java synchronization was a serious factor.
Update: I'd like to add a few excerpts from the JLS to hopefully clarify matters a bit.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
This is why filling the list and then wrapping it in the same thread is a safe option. But more importantly:
This is an extremely strong guarantee for programmers. Programmers do not need to reason about reorderings to determine that their code contains data races. Therefore they do not need to reason about reorderings when determining whether their code is correctly synchronized. Once the determination that the code is correctly synchronized is made, the programmer does not need to worry that reorderings will affect his or her code.
The message is clear: make sure that your program, executed in the order which you wrote your code in, doesn't contain data races, and don't worry about reordering.

If the traverses happens more often than writing I'd look into CopyOnWriteArrayList.
A thread-safe variant of ArrayList in
which all mutative operations (add,
set, and so on) are implemented by
making a fresh copy of the underlying
array.

Take a look at how AtomicInteger (and the likes) are implemented to be thread safe & not synchronized. The mechanism does not introduce synchronization, but if one is needed, it handles it gracefully.

Related

Is it OK to modify items in an ArrayList from multiple threads, if those threads never modify the same item?

A bit of (simplified) context.
Let's say I have an ArrayList<ContentStub> where ContentStub is:
public class ContentStub {
ContentType contentType;
Object content;
}
And I have multiple implementations of classes that "inflate" stubs for each ContentType, e.g.
public class TypeAStubInflater {
public void inflate(List<ContentStub> contentStubs) {
contentStubs.forEach(stub ->
{
if(stub.contentType == ContentType.TYPE_A) {
stub.content = someService.getContent();
}
});
}
}
The idea being, there is TypeAStubInflater which only modifies items ContentType.TYPE_A running in one thread, and TypeBStubInflater which only modifies items ContentType.TYPE_B, etc. - but each instance's inflate() method is modifying items in the same contentStubs List, in parallel.
However:
No thread ever changes the size of the ArrayList
No thread ever attempts to modify a value that's being modified by another thread
No thread ever attempts to read a value written by another thread
Given all this, it seems that no additional measures to ensure thread-safety are necessary. From a (very) quick look at the ArrayList implementation, it seems that there is no risk of a ConcurrentModificationException - however, that doesn't mean that something else can't go wrong. Am I missing something, or this safe to do?

In general, that will work, because you are not modifying the state of the List itself, which would throw a ConcurrentModificationException if any iterator is active at the time of looping, but rather are modifying just an object inside the list, which is fine from the list's POV.
I would recommend splitting up your into a Map<ContentType, List<ContentStub>> and then start Threads with those specific lists.
You could convert your list to a map with this:
Map<ContentType, ContentStub> typeToStubMap = stubs.stream().collect(Collectors.toMap(stub -> stub.contentType, Function.identity()));
If your List is not that big (<1000 entries) I would even recommend not using any threading, but just use a plain for-i loop to iterate, even .foreach if that 2 extra integers are no concern.

Let's assume the thread A writes TYPE_A content and thread B writes TYPE_B content. The List contentStubs is only used to obtain instances of ContentStub: read-access only. So from the perspective of A, B and contentStubs, there is no problem. However, the updates done by threads A and B will likely never be seen by another thread, e.g. another thread C will likely conclude that stub.content == null for all elements in the list.
The reason for this is the Java Memory Model. If you don't use constructs like locks, synchronization, volatile and atomic variables, the memory model gives no guarantee if and when modifications of an object by one thread are visible for another thread. To make this a little more practical, let's have an example.
Imagine that a thread A executes the following code:
stub.content = someService.getContent(); // happens to be element[17]
List element 17 is a reference to a ContentStub object on the global heap. The VM is allowed to make a private thread copy of that object. All subsequent access to reference in thread A, uses the copy. The VM is free to decide when and if to update the original object on the global heap.
Now imagine a thread C that executes the following code:
ContentStub stub = contentStubs.get(17);
The VM will likely do the same trick with a private copy in thread C.
If thread C already accessed the object before thread A updated it, thread C will likely use the – not updated – copy and ignore the global original for a long time. But even if thread C accesses the object for the first time after thread A updated it, there is no guarantee that the changes in the private copy of thread A already ended up in the global heap.
In short: without a lock or synchronization, thread C will almost certainly only read null values in each stub.content.
The reason for this memory model is performance. On modern hardware, there is a trade-off between performance and consistency across all CPUs/cores. If the memory model of a modern language requires consistency, that is very hard to guarantee on all hardware and it will likely impact performance too much. Modern languages therefore embrace low consistency and offer the developer explicit constructs to enforce it when needed. In combination with instruction reordering by both compilers and processors, that makes old-fashioned linear reasoning about your program code … interesting.

Explain how JIT reordering works

I have been reading a lot about synchronization in Java and all the problems that can occur. However, what I'm still slightly confused about is how the JIT can reorder a write.
For instance, a simple double check lock makes sense to me:
class Foo {
private volatile Helper helper = null; // 1
public Helper getHelper() { // 2
if (helper == null) { // 3
synchronized(this) { // 4
if (helper == null) // 5
helper = new Helper(); // 6
}
}
return helper;
}
}
We use volatile on line 1 to enforce a happens-before relationship. Without it, is entirely possible for the JIT to reoder our code. For example:
Thread 1 is at line 6 and memory is allocated to helper however, the constructor has not yet run because the JIT could reordering our code.
Thread 2 comes in at line 2 and gets an object that is not fully created yet.
I understand this, but I don't fully understand the limitations that the JIT has on reordering.
For instance, say I have a method that creates and puts a MyObject into a HashMap<String, MyObject> (I know that a HashMapis not thread safe and should not be used in a multi-thread environment, but bear with me). Thread 1 calls createNewObject:
public class MyObject {
private Double value = null;
public MyObject(Double value) {
this.value = value;
}
}
Map<String, MyObject> map = new HashMap<String, MyObject>();
public void createNewObject(String key, Double val){
map.put(key, new MyObject( val ));
}
At the same time thread 2 calls a get from the Map.
public MyObject getObject(String key){
return map.get(key);
}
Is it possible for thread 2 to receive an object from getObject(String key) that is not fully constructed? Something like:
Thread 1: Allocate memory for new MyObject( val )
Thread 1: Place object in map
Thread 2: call getObject(String key)
Thread 1: Finish constructing the new MyObject.
Or will map.put(key, new MyObject( val )) not put an object into the map until it's fully constructed?
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
In a nutshell can it only reorder when creating a new Object and assigning it to a reference variable, like the double checked lock? A complete rundown on the JIT might be much for a SO answer, but what I'm really curious about is how it can reorder a write (like line 6 on the double checked lock) and what stops it from putting an object into a Map that is not fully constructed.

WARNING: WALL OF TEXT
The answer to your question is before the horizontal line. I will continue to explain deeper the fundamental problem in the second portion of my answer (which is not related to the JIT, so that's it if you are only interested in the JIT). The answer to the second part of your question lies at the bottom because it relies on what I describe further.
TL;DR The JIT will do whatever it wants, the JMM will do whatever it wants, being valid under the condition that you let them by writing thread unsafe code.
NOTE: "initialization" refers to what happens in the constructor, which excludes anything else such as calling a static init method after constructing etc...
"If the reordering produces results consistent with a legal execution, it is not illegal." (JLS 17.4.5-200)
If the result of a set of actions conforms to a valid chain of execution as per the JMM, then the result is allowed regardless of whether the author intended the code to produce that result or not.
"The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.
This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization" (JLS 17.4).
The JIT will reorder whatever it sees fit unless we do not allow it using the JMM (in a multithreaded environment).
The details of what the JIT can or will do is nondeterministic. Looking at millions of samples of runs will not produce a meaningful pattern because reorderings are subjective, they depend on very specific details such as CPU arch, timings, heuristics, graph size, JVM vendor, bytecode size, etc... We only know that the JIT will assume that the code runs in a single threaded environment when it does not need to conform to the JMM. In the end, the JIT matters very little to your multithreaded code. If you want to dig deeper, see this SO answer and do a little research on such topics as IR Graphs, the JDK HotSpot source, and compiler articles such as this one. But again, remember that the JIT has very little to do with your multithreaded code transforms.
In practice, the "object that is not fully created yet" is not a side effect of the JIT but rather the memory model (JMM). In summary, the JMM is a specification that puts forth guarantees of what can and cannot be results of a certain set of actions, where actions are operations that involve a shared state. The JMM is more easily understood by higher level concepts such as atomicity, memory visibility, and ordering, those three of which are components of a thread-safe program.
To demonstrate this, it is highly unlikely for your first sample of code (the DCL pattern) to be modified by the JIT that would produce "an object that is not fully created yet." In fact, I believe that it is not possible to do this because it would not follow the order or execution of a single-threaded program.
So what exactly is the problem here?
The problem is that if the actions aren't ordered by a synchronization order, a happens-before order, etc... (described again by JLS 17.4-17.5) then threads are not guaranteed to see the side effects of performing such actions. Threads might not flush their caches to update the field, threads might observe the write out of order. Specific to this example, threads are allowed to see the object in an inconsistent state because it is not properly published. I'm sure that you have heard of safe publishing before if you have ever worked even the tiniest bit with multithreading.
You might ask, well if single-threaded execution cannot be modified by the JIT, why can the multithreaded version be?
Put simply, it's because the thread is allowed to think ("perceive" as usually written in textbooks) that the initialization is out of order due to the lack of proper synchronization.
"If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic" (The "Double-Checked Locking is Broken" Declaration).
Making the object immutable ensures that the state is fully initialized when the constructor exits.
Remember that object construction is always unsynchronized. An object that is being initialized is ONLY visible and safe with respect to the thread that constructed it. In order for other threads to see the initialization, you must publish it safely. Here are those ways:
"There are a few trivial ways to achieve safe publication:
Exchange the reference through a properly locked field (JLS 17.4.5)
Use static initializer to do the initializing stores (JLS 12.4)
Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
Initialize the value into a final field (JLS 17.5)."
(Safe Publication and Safe Initialization in Java)
Safe publication ensures that other threads will be able to see the fully initialized objects when after it finishes.
Revisiting our idea that threads are only guaranteed to see side effects if they are in order, the reason that you need volatile is so that your write to the helper in thread 1 is ordered with respect to the read in thread 2. Thread 2 is not allowed to perceive the initialization after the read because it occurs before the write to helper. It piggy backs on the volatile write such that the read must happen after the initialization AND THEN the write to the volatile field (transitive property).
To conclude, an initialization will only occur after the object is created only because another thread THINKS that is the order. An initialization will never occur after construction due to a JIT optimisation. You can fix this by ensuring proper publication through a volatile field or by making your helper immutable.
Now that I've described the general concepts behind how publication works in the JMM, hopefully understanding how your second example won't work will be easy.
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
To the constructing thread, it will put it into the map after initialization.
To the reader thread, it can see whatever the hell it wants. (improperly constructed object in HashMap? That is definitely within the realm of possibility).
What you described with your 4 steps is completely legal. There is no order between assigning value or adding it to the map, thus thread 2 can perceive the initialization out of order since MyObject was published unsafely.
You can actually fix this problem by just converting to ConcurrentHashMap and getObject() will be completely thread safe as once you put the object in the map, the initialization will occur before the put and both will need to occur before the get as a result of ConcurrentHashMap being thread safe. However, once you modify the object, it will become a management nightmare because you need to ensure that updating the state is visible and atomic - what if a thread retrieves an object and another thread updates the object before the first thread could finish modifying and putting it back in the map?
T1 -> get() MyObject=30 ------> +1 --------------> put(MyObject=31)
T2 -------> get() MyObject=30 -------> +1 -------> put(MyObject=31)
Alternatively you could also make MyObject immutable, but you still need to map the map ConcurrentHashMap in order for other threads to see the put - thread caching behavior might cache an old copy and not flush and keep reusing the old version. ConcurrentHashMap ensures that its writes are visible to readers and ensures thread-safety. Recalling our 3 prerequisites for thread-safety, we get visibility from using a thread-safe data structure, atomicity by using an immutable object, and finally ordering by piggybacking on ConcurrentHashMap's thread safety.
To wrap up this entire answer, I will say that multithreading is a very difficult profession to master, one that I myself most definitely have not. By understanding concepts of what makes a program thread-safe and thinking about what the JMM allows and guarantees, you can ensure that your code will do what you want it to do. Bugs in multithreaded code occur often as a result of the JMM allowing a counterintuitive result that is within its parameters, not the JIT doing performance optimisations. Hopefully you will have learned something a little bit more about multithreading if you read everything. Thread safety should be achieved by building a repertoire of thread-safe paradigms rather than using little inconveniences of the spec (Lea or Bloch, not even sure who said this).

Why is this code thread safe?

I am preparing for the OCP exam and I found this question in a mock exam:
Given:
class Calculator {
private AtomicInteger i = new AtomicInteger();
public void add(int value) {
int oldValue = i.get();
int newValue = oldValue + value;
System.out.print(i.compareAndSet(oldValue,newValue));
}
public int getValue() {
return i.get();
}
}
What could you do to make this class thread safe?
And surprisingly, to me, the answer is:
"Class Calculator is thread-safe"
It must be that I have not understood correctly the concept. To my understanding, a class is thread safe when all the methods work as expected under thread concurrency. Now, if two thread call at the same time getValue(), then call add() passing a different value, and then they call getValue() again, the 'second' thread won't see its value passed increased.
I understand that the oldValue and the newValue as local variables are stored in the method stack, but that doesn't prevent that the second call to compareAndSet will find that the oldValue is not the current value and so won't add the newValue.
What am I missing here?

According to JCIP
A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or
interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or
other coordination on the part of the calling code.
Although there is no definition of thread-safety and no specification of the class, in my opinion, by any sane definition of an add method in a Calculator class it is "correct" if it the value of the AtomicInteger i is increased in any case, "regardless of the scheduling or interleaving of the execution".
Therefore, in my opinion, the class is not thread-safe by this definition.

There is clearly a problem with the term "thread-safe" here, in that it isn't absolute. What is considered thread-safe depends on what you expect the program to do. And in most real-world applications you wouldn't consider this code thread-safe.
However the JLS formally specifies a different concept:
A program is correctly synchronized if and only if all sequentially
consistent executions are free of data races.
If a program is correctly synchronized, then all executions of the
program will appear to be sequentially consistent
Correctly synchronized is a precisely defined, objective condition and according to that definition the code is correctly synchronized because every access to i is in a happens-before relationship with every other access, and that's enough to satisfy the criteria.
The fact that the exact order of reads/writes depends on unpredictable timing is a different problem, outside the realms of correct synchronization (but well within what most people would call thread safety).

The add method is going to do one of two things:
Add value to the value of i, and print true.
Do nothing and print false.
The theoretically sound1 definitions of thread-safe that I have seen say something like this:
Given a set of requirements, a program is thread-safe with respect to those requirements if it correct with respect to those requirements for all possible executions in a multi-threaded environment.
In this case, we don't have a clear statement of requirements, but if we infer that the intended behavior is as above, then that class is thread-safe with respect to the inferred requirements.
Now if the requirements were that add always added value to i, then that implementation does not meet the requirement. In that case, you could argue that the method is not thread-safe. (In a single-threaded use-case add would always work, but in a multi-threaded use-case the add method could occasionally fail to meet the requirement ... and hence would be non-thread-safe.)
1 - By contrast, the Wikipedia description (seen on 2016-01-17) is this: "A piece of code is thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time." The problem is that it doesn't say what "safe execution" means. Eric Lippert's 2009 blog post "What is this thing you call thread-safe" is really pertinent.

It's threadsafe because compareAndSet is threadsafe and that's the only part that's modifying shared resources in the object.
It doesn't make a difference how many threads enter that method body at the same time. The first one to reach the end and call compareAndSet "wins" and gets to change the value while the others find the value has changed on them and compareAndSet returns false. It never results in the system being in an indeterminate state, though callers would probably have to handle the false outcome in any realistic scenario.

what thread safety issue does java.util.Stack or java.util.Queue has

If I ignore the size() inaccuracy, and assume I allocated large enough underlying Vector so that no reallocation happens, what thread safety issue does java.util.Stack or java.util.Queue has?
I cannot think of a valid/reasonable consistency argument to say they are thread unsafe.
Anybody has some insights?

"Thread safe" isn't an absolute attribute for a class -- what's safe or unsafe is your usage of the object. You can come up with unsafe ways to use a ConcurrentHashMap, and you can come up with thread-safe ways to use a plain HashMap.
When people say a class is thread-safe, they generally mean that each method is implemented in a way that's thread-safe on its own. In that sense, a Stack is thread-safe. But its interface doesn't allow for easy/safe handling of common use cases, so in that sense it's not very thread-safe.
For instance, if your code checks that the Stack is not empty, and if so, pop an element -- that's unsafe because it could be that it had one element (and thus was not empty), but someone else popped it before you got a chance to (in which case you're trying to pop an empty stack, and will get an exception).
To be more thread-safe, you really need a single method that handles that case for you. A BlockingQueue gives you that. For instance, take() will block until there's a value to pop, while poll() will instantly return back a value or null if there's no element to pop.

Stack, which extends Vector, has every method synchronized. This means that interactions with individual methods are thread-safe.
Queue is an interface. The safety of use across threads is up to the individual implementations. For example, an ArrayBlockingQueue is thread safe, but a LinkedList is not.

Look at this method from ArrayBlockingQueue (leave any existing synchronisation aside):
private void insert(E x) {
items[putIndex] = x;
// HERE
putIndex = inc(putIndex);
++count;
notEmpty.signal();
}
Let thread A progress until HERE, and let thread B take over and execute the method; then let A continue. It is easy to see that B's E x overwrites A's E x, with count being incremented by 2 and putIndex being advanced twice.
Similar HEREs can be found in other methods as well.
All data structures with memory for data and variables for bookkeeping are blatantly vulnerable to unsynced concurrent access.

Interpretation of "program order rule" in Java concurrency

Program order rule states "Each action in a thread happens-before every action in that thread that comes later in the program order"
1.I read in another thread that an action is
reads and writes to variables
locks and unlocks of monitors
starting and joining with threads
Does this mean that reads and writes can be changed in order, but reads and writes cannot change order with actions specified in 2nd or 3rd lines?
2.What does "program order" mean?
Explanation with an examples would be really helpful.
Additional related question
Suppose I have the following code:
long tick = System.nanoTime(); //Line1: Note the time
//Block1: some code whose time I wish to measure goes here
long tock = System.nanoTime(); //Line2: Note the time
Firstly, it's a single threaded application to keep things simple. Compiler notices that it needs to check the time twice and also notices a block of code that has no dependency with surrounding time-noting lines, so it sees a potential to reorganize the code, which could result in Block1 not being surrounded by the timing calls during actual execution (for instance, consider this order Line1->Line2->Block1). But, I as a programmer can see the dependency between Line1,2 and Block1. Line1 should immediately precede Block1, Block1 takes a finite amount of time to complete, and immediately succeeded by Line2.
So my question is: Am I measuring the block correctly?
If yes, what is preventing the compiler from rearranging the order.
If no, (which is think is correct after going through Enno's answer) what can I do to prevent it.
P.S.: I stole this code from another question I asked in SO recently.

It probably helps to explain why such rule exist in the first place.
Java is a procedural language. I.e. you tell Java how to do something for you. If Java executes your instructions not in the order you wrote, it would obviously not work. E.g. in the below example, if Java would do 2 -> 1 -> 3 then the stew would be ruined.
1. Take lid off
2. Pour salt in
3. Cook for 3 hours
So, why does the rule not simply say "Java executes what you wrote in the order you wrote"? In a nutshell, because Java is clever. Take the following example:
1. Take eggs out of the freezer
2. Take lid off
3. Take milk out of the freezer
4. Pour egg and milk in
5. Cook for 3 hours
If Java was like me, it'll just execute it in order. However Java is clever enough to understand that it's more efficient AND that the end result would be the same should it do 1 -> 3 -> 2 -> 4 -> 5 (you don't have to walk to the freezer again, and that doesn't change the recipe).
So what the rule "Each action in a thread happens-before every action in that thread that comes later in the program order" is trying to say is, "In a single thread, your program will run as if it was executed in the exact order you wrote it. We might change the ordering behind the scene but we make sure that none of that would change the output.
So far so good. Why does it not do the same across multiple threads? In multi-thread programming, Java isn't clever enough to do it automatically. It will for some operations (e.g. joining threads, starting threads, when a lock (monitor) is used etc.) but for other stuff you need to explicitly tell it to not do reordering that would change the program output (e.g. volatile marker on fields, use of locks etc.).
Note:
Quick addendum about "happens-before relationship". This is a fancy way of saying no matter what reordering Java might do, stuff A will happen before stuff B. In our weird later stew example, "Step 1 & 3 happens-before step 4 "Pour egg and milk in" ". Also for example, "Step 1 & 3 do not need a happens-before relationship because they don't depend on each other in any way"
On the additional question & response to the comment
First, let us establish what "time" means in the programming world. In programming, we have the notion of "absolute time" (what's the time in the world now?) and the notion of "relative time" (how much time has passed since x?). In an ideal world, time is time but unless we have an atomic clock built in, the absolute time would have to be corrected time to time. On the other hand, for relative time we don't want corrections as we are only interested in the differences between events.
In Java, System.currentTime() deals with absolute time and System.nanoTime() deals with relative time. This is why the Javadoc of nanoTime states, "This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time".
In practice, both currentTimeMillis and nanoTime are native calls and thus the compiler can't practically prove if a reordering won't affect the correctness, which means it will not reorder the execution.
But let us imagine we want to write a compiler implementation that actually looks into native code and reorders everything as long as it's legal. When we look at the JLS, all that it tells us is that "You can reorder anything as long as it cannot be detected". Now as the compiler writer, we have to decide if the reordering would violate the semantics. For relative time (nanoTime), it would clearly be useless (i.e. violates the semantics) if we'd reorder the execution. Now, would it violate the semantics if we'd reorder for absolute time (currentTimeMillis)? As long as we can limit the difference from the source of the world's time (let's say the system clock) to whatever we decide (like "50ms")*, I say no. For the below example:
long tick = System.currentTimeMillis();
result = compute();
long tock = System.currentTimeMillis();
print(result + ":" + tick - tock);
If the compiler can prove that compute() takes less than whatever maximum divergence from the system clock we can permit, then it would be legal to reorder this as follows:
long tick = System.currentTimeMillis();
long tock = System.currentTimeMillis();
result = compute();
print(result + ":" + tick - tock);
Since doing that won't violate the spec we defined, and thus won't violate the semantics.
You also asked why this is not included in the JLS. I think the answer would be "to keep the JLS short". But I don't know much about this realm so you might want to ask a separate question for that.
*: In actual implementations, this difference is platform dependent.

The program order rule guarantees that, within individual threads, reordering optimizations introduced by the compiler cannot produce different results from what would have happened if the program had been executed in serial fashion. It makes no guarantees about what order the thread's actions may appear to occur in to any other threads if its state is observed by those threads without synchronization.
Note that this rule speaks only to the ultimate results of the program, and not to the order of individual executions within that program. For instance, if we have a method which makes the following changes to some local variables:
x = 1;
z = z + 1;
y = 1;
The compiler remains free to reorder these operations however it sees best fit to improve performance. One way to think of this is: if you could reorder these ops in your source code and still obtain the same results, the compiler is free to do the same. (And in fact, it can go even further and completely discard operations which are shown to have no results, such as invocations of empty methods.)
With your second bullet point the monitor lock rule comes into play: "An unlock on a monitor happens-before every subsequent lock on that main monitor lock." (Java Concurrency in Practice p. 341) This means that a thread acquiring a given lock will have a consistent view of the actions which occurred in other threads before releasing that lock. However, note that this guarantee only applies when two different threads release or acquire the same lock. If Thread A does a bunch of stuff before releasing Lock X, and then Thread B acquires Lock Y, Thread B is not assured to have a consistent view of A's pre-X actions.
It is possible for reads and writes to variables to be reordered with start and join if a.) doing so doesn't break within-thread program order, and b.) the variables have not had other "happens-before" thread synchronization semantics applied to them, say by storing them in volatile fields.
A simple example:
class ThreadStarter {
Object a = null;
Object b = null;
Thread thread;
ThreadStarter(Thread threadToStart) {
this.thread = threadToStart;
}
public void aMethod() {
a = new BeforeStartObject();
b = new BeforeStartObject();
thread.start();
a = new AfterStartObject();
b = new AfterStartObject();
a.doSomeStuff();
b.doSomeStuff();
}
}
Since the fields a and b and the method aMethod() are not synchronized in any way, and the action of starting thread does not change the results of the writes to the fields (or the doing of stuff with those fields), the compiler is free to reorder thread.start() to anywhere in the method. The only thing it could not do with the order of aMethod() would be to move the order of writing one of the BeforeStartObjects to a field after writing an AfterStartObject to that field, or to move one of the doSomeStuff() invocations on a field before the AfterStartObject is written to it. (That is, assuming that such reordering would change the results of the doSomeStuff() invocation in some way.)
The critical thing to bear in mind here is that, in the absence of synchronization, the thread started in aMethod() could theoretically observe either or both of the fields a and b in any of the states which they take on during the execution of aMethod() (including null).
Additional question answer
The assignments to tick and tock cannot be reordered with respect to the code in Block1 if they are to be actually used in any measurements, for example by calculating the difference between them and printing the result as output. Such reordering would clearly break Java's within-thread as-if-serial semantics. It changes the results from what would have been obtained by executing instructions in the specified program order. If the assignments aren't used for any measurements and have no side-effects of any kind on the program result, they'll likely be optimized away as no-ops by the compiler rather than being reordered.

Before I answer the question,
reads and writes to variables
Should be
volatile reads and volatile writes (of the same field)
Program order doesn't guarantee this happens before relationship, rather the happens-before relationship guarantees program order
To your questions:
Does this mean that reads and writes can be changed in order, but reads and writes cannot change order with actions specified in 2nd or 3rd lines?
The answer actually depends on what action happens first and what action happens second. Take a look at the JSR 133 Cookbook for Compiler Writers. There is a Can Reorder grid that lists the allowed compiler reordering that can occur.
For instance a Volatile Store can be re-ordered above or below a Normal Store but a Volatile Store cannot be be reordered above or below a Volatile Load. This is all assuming intrathread semantics still hold.
What does "program order" mean?
This is from the JLS
Among all the inter-thread actions performed by each thread t, the
program order of t is a total order that reflects the order in which
these actions would be performed according to the intra-thread
semantics of t.
In other words, if you can change the writes and loads of a variable in such a way that it will preform exactly the same way as you wrote it then it maintains program order.
For instance
public static Object getInstance(){
if(instance == null){
instance = new Object();
}
return instance;
}
Can be reordered to
public static Object getInstance(){
Object temp = instance;
if(instance == null){
temp = instance = new Object();
}
return temp;
}

it simply mean though the thread may be multiplxed, but the internal order of the thread's action/operation/instruction would remain constant (relatively)
thread1: T1op1, T1op2, T1op3...
thread2: T2op1, T2op2, T2op3...
though the order of operation (Tn'op'M) among thread may vary, but operations T1op1, T1op2, T1op3 within a thread will always be in this order, and so as the T2op1, T2op2, T2op3
for ex:
T2op1, T1op1, T1op2, T2op2, T2op3, T1op3

Java tutorial http://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html says that happens-before relationship is simply a guarantee that memory writes by one specific statement are visible to another specific statement. Here is an illustration
int x;
synchronized void x() {
x += 1;
}
synchronized void y() {
System.out.println(x);
}
synchronized creates a happens-before relationship, if we remove it there will be no guarantee that after thread A increments x thread B will print 1, it may print 0

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.