Concurrent variable modification: cannot fully understand this example

Concurrent variable modification: cannot fully understand this example - java

I need some help to fully understand what's happening when, running this code
public class Main extends Thread {
private static int x;
public static void main(String[] args) {
Thread th1 = new Main("A");
Thread th2 = new Main("B");
th1.start();
th2.start();
}
public Main(String n) {
super(n);
}
public void run() {
while(x<4) { //1
x++; //2
System.out.print(Thread.currentThread().getName()+x+" "); //3
}
}
}
I get the output
B2 B3 B4 A2
I understand that threads A and B both increment x, then B loops incrementing and outputting... but why is last output A2? Shouldn't A see x as 4 when executing //3?
Bonus question: why is it impossible for x to become 5?
EDIT
This question (in a slightly different form) comes from a mock test for OCP certification, where explanation states that x will never be 5. I'm glad to see that I'm not the only one to disagree.

When you update a variable's value in one thread, its value is not necessarily visible to all threads immediately. This is because memory is held in the CPU cache, which allows it to be read and written much more quickly than it would be to main memory.
Periodically, the updated contents of the cache are copied to main memory. It is only when this happens that other threads see updates to values.
What it looks like is happening here is that B is updating the value, but that value is not being committed to main memory; as such, A sees old values of it.
If you make the variable volatile, all reads and writes are done directly from/to main memory (or, at least, the cache is refreshed from/flushed to main memory), so updates to the values are visible immediately to all threads.
Note, however, that you are not performing atomic reads and writes: it is possible for another thread to update the value of x in between the current thread checking x < 4 and incrementing x++. As such, you might end up with a value of 5 being printed.
The easiest way to fix this is to make the checking/incrementing synchronized:
synchronized (Main.class) {
if (x < 4) {
x++;
System.out.println(...);
}
}
This also has the effect of ensuring visibility of updates to x in all threads, but also ensures that only one thread can check/increment x at once.

This is a classic race condition. When you call th1.start() & th2.start() it only schedules the thread to start, it doesn't sequentially start then and there. As a result, your actual threads can and do start in any old order. Now, add to that fact that between while (x<4) or x++ or System.out.println any one of the threads can schedule out and allow another thread to run and you basically get undefined behavior.
Bonus question: why is it impossible for x to become 5?
It's not impossible (for the same reason the output is interleaved). Try increasing your number of threads and eventually you'll see x become 5 and maybe even higher depending on how much thread contention you can create.
I disagree with others that this is a volatility issue. Rather this is a shared memory access issue. Using volatile alone will not fix this. A simple mutex around the static x variable access will properly protect it and sequence how you expect with the exception of the order of 'A' vs. 'B' which would require additional synchronization.

You, my friend, have run into what is called a Data Race.
Wikipedia has an example depicting exactly what you are going through:
https://en.wikipedia.org/wiki/Race_condition.
So, why is this happening?
The reason lies hidden in the way a computer process instructions. Take, for example, the following line of java code:
x++;
Now, ignoring compiler magic for the moment, we have to think what the computer needs to do to execute this instruction.
We need to read the old value of x.
We need to perform the addition x + 1.
We need to write the new value back into the variable x.
This works wonderfully when just looking at it from a sequential standpoint. But what happens if two people are doing the exact same thing, at the same time?
See the Wikipedia example for exact answers.
The important thing to note here is that your single x++ instruction is actually multiple instructions for a computer. Even if each instruction can be carried out atomically by the processor, you are not guaranteed atomicity for the whole sequence of instructions.
The same holds true for using the variable x. When you are calling the System.out.println() function, you are once again accessing x. This access means that we have to read x from memory again.
Do we know what B has done to the variable from the time you changed it?
Nope.
Also, I noticed the volatile comment. This is actually wrong (as confirmed by running the code on my computer). volatile ensures that we do not read/write jumbled data into the variable. It does not ensure any other atomicity.
Bonus question: why is it impossible for x to become 5?
It is very possible, although perhaps unlikely. The part of your program that takes time is the work and synchronization done inside your System.out.println() statement. This is probably why you do not see the value 5 often.

Your variable x is static so it is shared between both threads.
Thread B increments x to 4 and completes, writing each step as it goes.
Thread A gets one chance to look at x when it is at 1 so increments it and prints A2. The next time it sees x it is at >= 4 so it exits its loop.
Bonus question - yes it is possible for x to become 5 - and even print as 5. If both threads check x<4 when it happens to be 3 at the same time they will both increment it.

knowing that start is asynchronous method call so first one off the thread will start before the other.
two : x is a static but in a local context means the first running thread will change the x while the second is still sleeping (when sleeping the second have a local stored value of the local static x that he will use once he awaken )
after that once the second thread print the local x he will seek it on the memory(the global one) and find it equals to 4 so he will stop.
this may help
|------------------------------------------------------------------------------------------|
| Thread A:x works |local| big static X that changed . . . . . . . . . . . . . . ..|
| Thread B:x=2 sleep |local| big static X that will be read after first loop.|
|------------------------------------------------------------------------------------------|
so we can say x is local and global in the same time
proof : add a sleep with random time and see the result for x<10 after the increment dont forget the try catch clause.

Related

how synchronized keyword works internally

I read the below program and answer in a blog.
int x = 0;
boolean bExit = false;
Thread 1 (not synchronized)
x = 1;
bExit = true;
Thread 2 (not synchronized)
if (bExit == true)
System.out.println("x=" + x);
is it possible for Thread 2 to print “x=0”?
Ans : Yes ( reason : Every thread has their own copy of variables. )
how do you fix it?
Ans: By using make both threads synchronized on a common mutex or make both variable volatile.
My doubt is : If we are making the 2 variable as volatile then the 2 threads will share the variables from the main memory. This make a sense, but in case of synchronization how it will be resolved as both the thread have their own copy of variables.
Please help me.

This is actually more complicated than it seems. There are several arcane things at work.
Caching
Saying "Every thread has their own copy of variables" is not exactly correct. Every thread may have their own copy of variables, and they may or may not flush these variables into the shared memory and/or read them from there, so the whole thing is non-deterministic. Moreover, the very term flushing is really implementation-dependent. There are strict terms such as memory consistency, happens-before order, and synchronization order.
Reordering
This one is even more arcane. This
x = 1;
bExit = true;
does not even guarantee that Thread 1 will first write 1 to x and then true to bExit. In fact, it does not even guarantee that any of these will happen at all. The compiler may optimize away some values if they are not used later. The compiler and CPU are also allowed to reorder instructions any way they want, provided that the outcome is indistinguishable from what would happen if everything was really in program order. That is, indistinguishable for the current thread! Nobody cares about other threads until...
Synchronization comes in
Synchronization does not only mean exclusive access to resources. It is also not just about preventing threads from interfering with each other. It's also about memory barriers. It can be roughly described as each synchronization block having invisible instructions at the entry and exit, the first one saying "read everything from the shared memory to be as up-to-date as possible" and the last one saying "now flush whatever you've been doing there to the shared memory". I say "roughly" because, again, the whole thing is an implementation detail. Memory barriers also restrict reordering: actions may still be reordered, but the results that appear in the shared memory after exiting the synchronized block must be identical to what would happen if everything was indeed in program order.
All that only works, of course, only if both blocks use the same locking object.
The whole thing is described in details in Chapter 17 of the JLS. In particular, what's important is the so-called "happens-before order". If you ever see in the documentation that "this happens-before that", it means that everything the first thread does before "this" will be visible to whoever does "that". This may even not require any locking. Concurrent collections are a good example: one thread puts there something, another one reads that, and that magically guarantees that the second thread will see everything the first thread did before putting that object into the collection, even if those actions had nothing to do with the collection itself!
Volatile variables
One last warning: you better give up on the idea that making variables volatile will solve things. In this case maybe making bExit volatile will suffice, but there are so many troubles that using volatiles can lead to that I'm not even willing to go into that. But one thing is for sure: using synchronized has much stronger effect than using volatile, and that goes for memory effects too. What's worse, volatile semantics changed in some Java version so there may exist some versions that still use the old semantics which was even more obscure and confusing, whereas synchronized always worked well provided you understand what it is and how to use it.
Pretty much the only reason to use volatile is performance because synchronized may cause lock contention and other troubles. Read Java Concurrency in Practice to figure all that out.
Q & A
1) You wrote "now flush whatever you've been doing there to the shared
memory" about synchronized blocks. But we will see only the variables
that we access in the synchronize block or all the changes that the
thread call synchronize made (even on the variables not accessed in the
synchronized block)?
Short answer: it will "flush" all variables that were updated during the synchronized block or before entering the synchronized block. And again, because flushing is an implementation detail, you don't even know whether it will actually flush something or do something entirely different (or doesn't do anything at all because the implementation and the specific situation already somehow guarantee that it will work).
Variables that wasn't accessed inside the synchronized block obviously won't change during the execution of the block. However, if you change some of those variables before entering the synchronized block, for example, then you have a happens-before relationship between those changes and whatever happens in the synchronized block (the first bullet in 17.4.5). If some other thread enters another synchronized block using the same lock object then it synchronizes-with the first thread exiting the synchronized block, which means that you have another happens-before relationship here. So in this case the second thread will see the variables that the first thread updated prior to entering the synchronized block.
If the second thread tries to read those variables without synchronizing on the same lock, then it is not guaranteed to see the updates. But then again, it isn't guaranteed to see the updates made inside the synchronized block as well. But this is because of the lack of the memory-read barrier in the second thread, not because the first one didn't "flush" its variables (memory-write barrier).
2) In this chapter you post (of JLS) it is written that: "A write to a
volatile field (§8.3.1.4) happens-before every subsequent read of that
field." Doesn't this mean that when the variable is volatile you will
see only changes of it (because it is written write happens-before
read, not happens-before every operation between them!). I mean
doesn't this mean that in the example, given in the description of the
problem, we can see bExit = true, but x = 0 in the second thread if
only bExit is volatile? I ask, because I find this question here: http://java67.blogspot.bg/2012/09/top-10-tricky-java-interview-questions-answers.html
and it is written that if bExit is volatile the program is OK. So the
registers will flush only bExits value only or bExits and x values?
By the same reasoning as in Q1, if you do bExit = true after x = 1, then there is an in-thread happens-before relationship because of the program order. Now since volatile writes happen-before volatile reads, it is guaranteed that the second thread will see whatever the first thread updated prior to writing true to bExit. Note that this behavior is only since Java 1.5 or so, so older or buggy implementations may or may not support this. I have seen bits in the standard Oracle implementation that use this feature (java.concurrent collections), so you can at least assume that it works there.
3) Why monitor matters when using synchronized blocks about memory
visibility? I mean when try to exit synchronized block aren't all
variables (which we accessed in this block or all variables in the
thread - this is related to the first question) flushed from registers
to main memory or broadcasted to all CPU caches? Why object of
synchronization matters? I just cannot imagine what are relations and
how they are made (between object of synchronization and memory).
I know that we should use the same monitor to see this changes, but I
don't understand how memory that should be visible is mapped to
objects. Sorry, for the long questions, but these are really
interesting questions for me and it is related to the question (I
would post questions exactly for this primer).
Ha, this one is really interesting. I don't know. Probably it flushes anyway, but Java specification is written with high abstraction in mind, so maybe it allows for some really weird hardware where partial flushes or other kinds of memory barriers are possible. Suppose you have a two-CPU machine with 2 cores on each CPU. Each CPU has some local cache for every core and also a common cache. A really smart VM may want to schedule two threads on one CPU and two threads on another one. Each pair of the threads uses its own monitor, and VM detects that variables modified by these two threads are not used in any other threads, so it only flushes them as far as the CPU-local cache.
See also this question about the same issue.
4) I thought that everything before writing a volatile will be up to
date when we read it (moreover when we use volatile a read that in
Java it is memory barrier), but the documentation don't say this.
It does:
17.4.5.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
If hb(x, y) and hb(y, z), then hb(x, z).
A write to a volatile field (§8.3.1.4) happens-before every subsequent
read of that field.
If x = 1 comes before bExit = true in program order, then we have happens-before between them. If some other thread reads bExit after that, then we have happens-before between write and read. And because of the transitivity, we also have happens-before between x = 1 and read of bExit by the second thread.
5) Also, if we have volatile Person p does we have some dependency
when we use p.age = 20 and print(p.age) or have we memory barrier in
this case(assume age is not volatile) ? - I think - No
You are correct. Since age is not volatile, then there is no memory barrier, and that's one of the trickiest things. Here is a fragment from CopyOnWriteArrayList, for example:
Object[] elements = getArray();
E oldValue = get(elements, index);
if (oldValue != element) {
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len);
newElements[index] = element;
setArray(newElements);
} else {
// Not quite a no-op; ensures volatile write semantics
setArray(elements);
Here, getArray and setArray are trivial setter and getter for the array field. But since the code changes elements of the array, it is necessary to write the reference to the array back to where it came from in order for the changes to the elements of the array to become visible. Note that it is done even if the element being replaced is the same element that was there in the first place! It is precisely because some fields of that element may have changed by the calling thread, and it's necessary to propagate these changes to future readers.
6) And is there any happens before 2 subsequent reads of volatile
field? I mean does the second read will see all changes from thread
which reads this field before it(of course we will have changes only
if volatile influence visibility of all changes before it - which I am
a little confused whether it is true or not)?
No, there is no relationship between volatile reads. Of course, if one thread performs a volatile write and then two other thread perform volatile reads, they are guaranteed to see everything at least up to date as it was before the volatile write, but there is no guarantee of whether one thread will see more up-to-date values than the other. Moreover, there is not even strict definition of one volatile read happening before another! It is wrong to think of everything happening on a single global timeline. It is more like parallel universes with independent timelines that sometimes sync their clocks by performing synchronization and exchanging data with memory barriers.

It depends on the implementation which decides if threads will keep a copy of the variables in their own memory. In case of class level variables threads have a shared access and in case of local variables threads will keep a copy of it. I will provide two examples which shows this fact , please have a look at it.
And in your example if I understood it correctly your code should look something like this--
package com.practice.multithreading;
public class LocalStaticVariableInThread {
static int x=0;
static boolean bExit = false;
public static void main(String[] args) {
Thread t1=new Thread(run1);
Thread t2=new Thread(run2);
t1.start();
t2.start();
}
static Runnable run1=()->{
x = 1;
bExit = true;
};
static Runnable run2=()->{
if (bExit == true)
System.out.println("x=" + x);
};
}
Output
x=1
I am getting this output always. It is because the threads share the variable and the when it is changed by one thread other thread can see it. But in real life scenarios we can never say which thread will start first, since here the threads are not doing anything we can see the expected result.
Now take this example--
Here if you make the i variable inside the for-loop` as static variable then threads won t keep a copy of it and you won t see desired outputs, i.e. the count value will not be 2000 every time even if u have synchronized the count increment.
package com.practice.multithreading;
public class RaceCondition2Fixed {
private int count;
int i;
/*making it synchronized forces the thread to acquire an intrinsic lock on the method, and another thread
cannot access it until this lock is released after the method is completed. */
public synchronized void increment() {
count++;
}
public static void main(String[] args) {
RaceCondition2Fixed rc= new RaceCondition2Fixed();
rc.doWork();
}
private void doWork() {
Thread t1 = new Thread(new Runnable() {
#Override
public void run() {
for ( i = 0; i < 1000; i++) {
increment();
}
}
});
Thread t2 = new Thread(new Runnable() {
#Override
public void run() {
for ( i = 0; i < 1000; i++) {
increment();
}
}
});
t1.start();
t2.start();
try {
t1.join();
t2.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
/*if we don t use join then count will be 0. Because when we call t1.start() and t2.start()
the threads will start updating count in the spearate threads, meanwhile the main thread will
print the value as 0. So. we need to wait for the threads to complete. */
System.out.println(Thread.currentThread().getName()+" Count is : "+count);
}
}

Two threads accessing the same ArrayList at the same time?

I have the following code in thread 1:
synchronized (queues.get(currentQueue)) { //line 1
queues.get(currentQueue).add(networkEvent); //line 2
}
and the following in thread 2:
synchronized (queues.get(currentQueue)) {
if (queues.get(currentQueue).size() > 10) {
currentQueue = 1;
}
}
Now to my question: The currentQueue variable currently has the value of 0. When thread 2 changes the value of currentQueue to 1 and thread 1 waits at line 1 (because of the synchronized), does thread 1 then use the updated currentQueue value in line 2 after thread 2 has finished (that's what I want to).

The answer to the question is that it depends. I assume there is other chunk of code that increments the currentQueue variable. This being the case, the lock is happening not at the 'currentQueue' variable and neither is it happening at the collection of 'queues', but rather it is happening on one of the 10 queues (or however many you have) in the 'queues' collection.
Hence, if both threads happen to access the same queue (say queue 5), then the answer to your question is yes. However, for that to happen is one in ten chance (one in x chance, where x = the number or queues in the 'queues' collection). Therefore, if the threads access different queues, then the answer is no.

The correct answer to your question is: The result is undefined.
Your monitor object is queues.get(currentQueue), but since currentQueue is variable, your monitor is variable, therefore the state it is currently in is more or less random. Effectively this code would break eventually.
A simple way to fix it would be a function like this:
protected synchronized QueueType getCurrentQueue() {
return queues.get(currentQueue);
}
However this is still a bad way of implementing the whole thing. You should either try to eliminate the synchronization completely through the use of a concurrent Queue (like ConcurrentLinkedQueue) or work with a lock/final monitor object.
final Object queueLock = new Object();
...
synchronized(queueLock) {
queues.get(currentQueue).add(networkEvent);
}
Note that you will have to use that locking every time you access queues or currentQueue as both define the dataset you are using.

Assuming you have no other thread will change the value of currentQueue, yes Thread 1 will end up using the queue pointed to by the updated value of currentQueue, since you're invoking queues.get(currentQueue) once again in the body of the synchronized block. This however doesn't mean that your synchronization is sound. You actually should synchronize on currentQueue, since it seems to be the shared key to access the current queue.
Also remember when you use synchronize you're synchronizing on the reference of the variable, and not its value. So if you reassign a new object to it, your synchronization doesn't make sense anymore.

How atomicity is achieved in the classes defined in java.util.concurrent.atomic package?

I was going through the source code of java.util.concurrent.atomic.AtomicInteger to find out how atomicity is achieved by the atomic operations provided by the class. For instance AtomicInteger.getAndIncrement() method source is as follows
public final int getAndIncrement() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return current;
}
}
I am not able to understand the purpose of writing the sequence of operations inside a infinite for loop. Does it serve any special purpose in Java Memory Model (JMM). Please help me find a descriptive understanding. Thanks in advance.

I am not able to understand the purpose of writing the sequence of operations inside a infinite for loop.
The purpose of this code is to ensure that the volatile field gets updated appropriately without the overhead of a synchronized lock. Unless there are a large number of threads all competing to update this same field, this will most likely spin a very few times to accomplish this.
The volatile keyword provides visibility and memory synchronization guarantees but does not in itself ensure atomic operations with multiple operations (test and set). If you are testing and then setting a volatile field there are race-conditions if multiple threads are trying to perform the same operation at the same time. In this case, if multiple threads are trying to increment the AtomicInteger at the same time, you might miss one of the increments. The concurrent code here uses the spin loop and the compareAndSet underlying methods to make sure that the volatile int is only updated to 4 (for example) if it still is equal to 3.
t1 gets the atomic-int and it is 0.
t2 gets the atomic-int and it is 0.
t1 adds 1 to it
t1 atomically tests to make sure it is 0, it is, and stores 1.
t2 adds 1 to it
t2 atomically tests to make sure it is 0, it is not, so it has to spin and try again.
t2 gets the atomic-int and it is 1.
t2 adds 1 to it
t2 atomically tests to make sure it is 1, it is, and stores 2.
Does it serve any special purpose in Java Memory Model (JMM).
No, it serves the purpose of the class and method definitions and uses the JMM and the language definitions around volatile to achieve its purpose. The JMM defines what the language does with the synchronized, volatile, and other keywords and how multiple threads interact with cached and central memory. This is mostly about native code interactions with operating system and hardware and is rarely, if ever, about Java code.
It is the compareAndSet(...) method which gets closer to the JMM by calling into the Unsafe class which is mostly native methods with some wrappers:
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
}

I am not able to understand the purpose of writing the sequence of
operations inside a infinite for loop.
To understand why it is in an infinite loop I find it helpful to understand what the compareAndSet does and how it may return false.
Atomically sets the value to the given updated value if the current
value == the expected value.
Parameters:
expect - the expected value
update - the new value
Returns:
true if successful. False return indicates that the actual value was not
equal to the expected value
So you read the Returns message and ask how is that possible?
If two threads are invoking incrementAndGet at close to the same time, and they both enter and see the value current == 1. Both threads will create a thread-local next == 2 and try to set via compareAndSet. Only one thread will win as per documented and the thread that loses must try again.
This is how CAS works. You attempt to change the value if you fail, try again, if you succeed then continue on.
Now simply declaring the field as volatile will not work because incrementing is not atomic. So something like this is not safe from the scenario I explained
volatile int count = 0;
public int incrementAndGet(){
return ++count; //may return the same number more than once.
}

Java's compareAndSet is based on CPU compare-and-swap (CAS) instructions see http://en.wikipedia.org/wiki/Compare-and-swap. It compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value.
In case of incrementAndGet we read the current value and call compareAndSet(current, current + 1). If it returns false it means that another thread interfered and changed the current value, which means that our attempt failed and we need to repeat the whole cycle until it succeeds.

Interpretation of "program order rule" in Java concurrency

Program order rule states "Each action in a thread happens-before every action in that thread that comes later in the program order"
1.I read in another thread that an action is
reads and writes to variables
locks and unlocks of monitors
starting and joining with threads
Does this mean that reads and writes can be changed in order, but reads and writes cannot change order with actions specified in 2nd or 3rd lines?
2.What does "program order" mean?
Explanation with an examples would be really helpful.
Additional related question
Suppose I have the following code:
long tick = System.nanoTime(); //Line1: Note the time
//Block1: some code whose time I wish to measure goes here
long tock = System.nanoTime(); //Line2: Note the time
Firstly, it's a single threaded application to keep things simple. Compiler notices that it needs to check the time twice and also notices a block of code that has no dependency with surrounding time-noting lines, so it sees a potential to reorganize the code, which could result in Block1 not being surrounded by the timing calls during actual execution (for instance, consider this order Line1->Line2->Block1). But, I as a programmer can see the dependency between Line1,2 and Block1. Line1 should immediately precede Block1, Block1 takes a finite amount of time to complete, and immediately succeeded by Line2.
So my question is: Am I measuring the block correctly?
If yes, what is preventing the compiler from rearranging the order.
If no, (which is think is correct after going through Enno's answer) what can I do to prevent it.
P.S.: I stole this code from another question I asked in SO recently.

It probably helps to explain why such rule exist in the first place.
Java is a procedural language. I.e. you tell Java how to do something for you. If Java executes your instructions not in the order you wrote, it would obviously not work. E.g. in the below example, if Java would do 2 -> 1 -> 3 then the stew would be ruined.
1. Take lid off
2. Pour salt in
3. Cook for 3 hours
So, why does the rule not simply say "Java executes what you wrote in the order you wrote"? In a nutshell, because Java is clever. Take the following example:
1. Take eggs out of the freezer
2. Take lid off
3. Take milk out of the freezer
4. Pour egg and milk in
5. Cook for 3 hours
If Java was like me, it'll just execute it in order. However Java is clever enough to understand that it's more efficient AND that the end result would be the same should it do 1 -> 3 -> 2 -> 4 -> 5 (you don't have to walk to the freezer again, and that doesn't change the recipe).
So what the rule "Each action in a thread happens-before every action in that thread that comes later in the program order" is trying to say is, "In a single thread, your program will run as if it was executed in the exact order you wrote it. We might change the ordering behind the scene but we make sure that none of that would change the output.
So far so good. Why does it not do the same across multiple threads? In multi-thread programming, Java isn't clever enough to do it automatically. It will for some operations (e.g. joining threads, starting threads, when a lock (monitor) is used etc.) but for other stuff you need to explicitly tell it to not do reordering that would change the program output (e.g. volatile marker on fields, use of locks etc.).
Note:
Quick addendum about "happens-before relationship". This is a fancy way of saying no matter what reordering Java might do, stuff A will happen before stuff B. In our weird later stew example, "Step 1 & 3 happens-before step 4 "Pour egg and milk in" ". Also for example, "Step 1 & 3 do not need a happens-before relationship because they don't depend on each other in any way"
On the additional question & response to the comment
First, let us establish what "time" means in the programming world. In programming, we have the notion of "absolute time" (what's the time in the world now?) and the notion of "relative time" (how much time has passed since x?). In an ideal world, time is time but unless we have an atomic clock built in, the absolute time would have to be corrected time to time. On the other hand, for relative time we don't want corrections as we are only interested in the differences between events.
In Java, System.currentTime() deals with absolute time and System.nanoTime() deals with relative time. This is why the Javadoc of nanoTime states, "This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time".
In practice, both currentTimeMillis and nanoTime are native calls and thus the compiler can't practically prove if a reordering won't affect the correctness, which means it will not reorder the execution.
But let us imagine we want to write a compiler implementation that actually looks into native code and reorders everything as long as it's legal. When we look at the JLS, all that it tells us is that "You can reorder anything as long as it cannot be detected". Now as the compiler writer, we have to decide if the reordering would violate the semantics. For relative time (nanoTime), it would clearly be useless (i.e. violates the semantics) if we'd reorder the execution. Now, would it violate the semantics if we'd reorder for absolute time (currentTimeMillis)? As long as we can limit the difference from the source of the world's time (let's say the system clock) to whatever we decide (like "50ms")*, I say no. For the below example:
long tick = System.currentTimeMillis();
result = compute();
long tock = System.currentTimeMillis();
print(result + ":" + tick - tock);
If the compiler can prove that compute() takes less than whatever maximum divergence from the system clock we can permit, then it would be legal to reorder this as follows:
long tick = System.currentTimeMillis();
long tock = System.currentTimeMillis();
result = compute();
print(result + ":" + tick - tock);
Since doing that won't violate the spec we defined, and thus won't violate the semantics.
You also asked why this is not included in the JLS. I think the answer would be "to keep the JLS short". But I don't know much about this realm so you might want to ask a separate question for that.
*: In actual implementations, this difference is platform dependent.

The program order rule guarantees that, within individual threads, reordering optimizations introduced by the compiler cannot produce different results from what would have happened if the program had been executed in serial fashion. It makes no guarantees about what order the thread's actions may appear to occur in to any other threads if its state is observed by those threads without synchronization.
Note that this rule speaks only to the ultimate results of the program, and not to the order of individual executions within that program. For instance, if we have a method which makes the following changes to some local variables:
x = 1;
z = z + 1;
y = 1;
The compiler remains free to reorder these operations however it sees best fit to improve performance. One way to think of this is: if you could reorder these ops in your source code and still obtain the same results, the compiler is free to do the same. (And in fact, it can go even further and completely discard operations which are shown to have no results, such as invocations of empty methods.)
With your second bullet point the monitor lock rule comes into play: "An unlock on a monitor happens-before every subsequent lock on that main monitor lock." (Java Concurrency in Practice p. 341) This means that a thread acquiring a given lock will have a consistent view of the actions which occurred in other threads before releasing that lock. However, note that this guarantee only applies when two different threads release or acquire the same lock. If Thread A does a bunch of stuff before releasing Lock X, and then Thread B acquires Lock Y, Thread B is not assured to have a consistent view of A's pre-X actions.
It is possible for reads and writes to variables to be reordered with start and join if a.) doing so doesn't break within-thread program order, and b.) the variables have not had other "happens-before" thread synchronization semantics applied to them, say by storing them in volatile fields.
A simple example:
class ThreadStarter {
Object a = null;
Object b = null;
Thread thread;
ThreadStarter(Thread threadToStart) {
this.thread = threadToStart;
}
public void aMethod() {
a = new BeforeStartObject();
b = new BeforeStartObject();
thread.start();
a = new AfterStartObject();
b = new AfterStartObject();
a.doSomeStuff();
b.doSomeStuff();
}
}
Since the fields a and b and the method aMethod() are not synchronized in any way, and the action of starting thread does not change the results of the writes to the fields (or the doing of stuff with those fields), the compiler is free to reorder thread.start() to anywhere in the method. The only thing it could not do with the order of aMethod() would be to move the order of writing one of the BeforeStartObjects to a field after writing an AfterStartObject to that field, or to move one of the doSomeStuff() invocations on a field before the AfterStartObject is written to it. (That is, assuming that such reordering would change the results of the doSomeStuff() invocation in some way.)
The critical thing to bear in mind here is that, in the absence of synchronization, the thread started in aMethod() could theoretically observe either or both of the fields a and b in any of the states which they take on during the execution of aMethod() (including null).
Additional question answer
The assignments to tick and tock cannot be reordered with respect to the code in Block1 if they are to be actually used in any measurements, for example by calculating the difference between them and printing the result as output. Such reordering would clearly break Java's within-thread as-if-serial semantics. It changes the results from what would have been obtained by executing instructions in the specified program order. If the assignments aren't used for any measurements and have no side-effects of any kind on the program result, they'll likely be optimized away as no-ops by the compiler rather than being reordered.

Before I answer the question,
reads and writes to variables
Should be
volatile reads and volatile writes (of the same field)
Program order doesn't guarantee this happens before relationship, rather the happens-before relationship guarantees program order
To your questions:
Does this mean that reads and writes can be changed in order, but reads and writes cannot change order with actions specified in 2nd or 3rd lines?
The answer actually depends on what action happens first and what action happens second. Take a look at the JSR 133 Cookbook for Compiler Writers. There is a Can Reorder grid that lists the allowed compiler reordering that can occur.
For instance a Volatile Store can be re-ordered above or below a Normal Store but a Volatile Store cannot be be reordered above or below a Volatile Load. This is all assuming intrathread semantics still hold.
What does "program order" mean?
This is from the JLS
Among all the inter-thread actions performed by each thread t, the
program order of t is a total order that reflects the order in which
these actions would be performed according to the intra-thread
semantics of t.
In other words, if you can change the writes and loads of a variable in such a way that it will preform exactly the same way as you wrote it then it maintains program order.
For instance
public static Object getInstance(){
if(instance == null){
instance = new Object();
}
return instance;
}
Can be reordered to
public static Object getInstance(){
Object temp = instance;
if(instance == null){
temp = instance = new Object();
}
return temp;
}

it simply mean though the thread may be multiplxed, but the internal order of the thread's action/operation/instruction would remain constant (relatively)
thread1: T1op1, T1op2, T1op3...
thread2: T2op1, T2op2, T2op3...
though the order of operation (Tn'op'M) among thread may vary, but operations T1op1, T1op2, T1op3 within a thread will always be in this order, and so as the T2op1, T2op2, T2op3
for ex:
T2op1, T1op1, T1op2, T2op2, T2op3, T1op3

Java tutorial http://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html says that happens-before relationship is simply a guarantee that memory writes by one specific statement are visible to another specific statement. Here is an illustration
int x;
synchronized void x() {
x += 1;
}
synchronized void y() {
System.out.println(x);
}
synchronized creates a happens-before relationship, if we remove it there will be no guarantee that after thread A increments x thread B will print 1, it may print 0

behaviour of volatile keyword in java

I need some example on Volatile Keyword of Java Threads.
As per definition of volatile keyword it says, when variable is declared as volatile then thread will directly read/write to variable memory instead of read/write from local thread cache.
please correct me if I am wrong.
So in that understanding when I run the below program,
public class ThreadRunnableBoth implements Runnable{
private volatile int num =0;
public void run(){
Thread t = Thread.currentThread();
String name = t.getName();
for(int i=0; i<100; i++){
if(name.equals("Thread1")){
num=10;
System.out.println("value of num 1 is :"+num);
}else{
num=15;
System.out.println("value of num 2 is :"+num);
}
}
}
public static void main(String args[]) throws InterruptedException{
Runnable r = new ThreadRunnableBoth();
Thread t1 = new Thread(r);
t1.setName("Thread1");
Thread t2 = new Thread(r);
t2.setName("Thread2");
t1.start();
t2.start();
}
}
I got these example from some site and when i tried running it I cant see any difference removing Volatile or adding Volatile Keyword.
Please explain me the difference happens on removing it and adding it.
Thanks a lot.

The main differences between having a volatile keyword or not is whether you need a memory fence to safely operate with the data.
Memory fences prevent side effects that can occur amongst multiple threads due to out-of-order execution. By instructing the CPU, the compiler / runtime environment can tell the CPU that the original ordering constraint on the read cannot be manipulated without destroying the correctness of the program.
Read up on memory fences here, and remember that the key to the solution is consistency, not location. The read request can stop at cache, provide that the cache is guaranteed to be consistent (by the CPU's internal mechanisms).

As per definition of volatile keyword it says, when variable is
declared as volatile then thread will directly read/write to variable
memory instead of read/write from local thread cache.
Not necessarily. A system that supports cache coherence can have volatile fields up to date without ever reading from main-memory. Volatile says that each thread will see the most up-to-date value of a certain field.
As for memory visibility you won't necessarily see any changes (immediately) if you remove volatile but your program is suspect to failure. The longer it runs the more problems you may end up seeing.

So without the volatile keyword, the threads are just printing the value of num in their local memory cache. Their changes to num are in no way synchronized with the other thread's view of num. I see output like:
value of num 1 is :10
value of num 2 is :15
value of num 1 is :10
value of num 2 is :15
value of num 1 is :10
value of num 2 is :15
...
With volatile, they are both updating and printing to the same global storage location with memory barriers around the set/get. But this won't change the output which is very subject to race conditions. I see output like:
value of num 2 is :15
value of num 1 is :15
value of num 2 is :15
value of num 1 is :10
value of num 2 is :15
value of num 1 is :10
...
There is a race between which set was last when the value is printed.
You may not be seeing this output because your processor architecture or JRE is context switching only on the IO events or otherwise not providing a full threaded execution. If you show some output then I can comment some more.

The effect of volatile variable is evident on multiprocessor system wherein different threads run on different processors. On ordinary single processor system, the impact may not be evident.
Here is good discussion thread on this site on the same subject.

In your example, num starts with a default value of 0, and you then (on its declaration line) assign it to 0. That assignment would be a data race if num weren't volatile, but of course you wouldn't be able to tell the difference.
Then you only use num in one thread, and within a thread you will always see things happen in the order the code said they would. So in this case, num doesn't have to be volatile.
Now, if you modified your main method so that it checked t1.num after the thread had started (but without checking that it has finished in a way that creates a happens-before edge, like Thread.join), you would have a data race without num being volatile. You could have main wait for 5 days, and it still wouldn't be guaranteed to see num as anything other than 0. And not just that, but if ThreadRunnableBoth also had a non-volatile boolean that started false and were set to true at the end of the leap, main could also see that boolean as true (thus meaning the thread had finished) but num still at 0! This is a data race, and can happen (for instance) on a multicore machine where the boolean is flushed out of a local register before num is. In this example, making both num and the boolean volatile will ensure that if the boolean is true, num == 0 || num == 15.
But here's the kicker: even without the volatile keyword -- that is, even in the presence of a data race -- you're not guaranteed to see racy behavior. That is, the data race says you can't guarantee that you'll see the change in another thread -- but it doesn't guarantee that you won't. It could be that it works just fine 100 times on your machine, and then someone puts it on an 8-core machine in the wild, and it's part of a more complex program so it gets optimized differently, and then things break.

Most of the talk is about hardware. Actually compiler optimisations are typically more relevant. You're accessing a field repetitively in a small method, so let's put it in a register. Altering the physical memory wont alter the value in the register.
Although the ("new", but many years old) Java Memory Model (JMM) does not talk about main memory like the old one and does not provide guarantees of progress (very difficult to actually specify), implementation of the volatile/happens-before specification will result in eviction from the register and synchronisation between threads.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.