Concurrency: Cache Coherence Issue or Compiler Optimization?

Concurrency: Cache Coherence Issue or Compiler Optimization? - java

From my understanding, if Hardware supports Cache coherence on a multi-processor system, then writes to a shared variable will be visible to threads running on other processors. In order to test this, I wrote a simple program in Java and pThreads to test this
public class mainTest {
public static int i=1, j = 0;
public static void main(String[] args) {
/*
* Thread1: Sleeps for 30ms and then sets i to 1
*/
(new Thread(){
public void run(){
synchronized (this) {
try{
Thread.sleep(30);
System.out.println("Thread1: j=" + mainTest.j);
mainTest.i=0;
}catch(Exception e){
throw new RuntimeException("Thread1 Error");
}
}
}
}).start();
/*
* Thread2: Loops until i=1 and then exits.
*/
(new Thread(){
public void run(){
synchronized (this) {
while(mainTest.i==1){
//System.out.println("Thread2: i = " + i); Comment1
mainTest.j++;
}
System.out.println("\nThread2: i!=1, j=" + j);
}
}
}).start();
/*
* Sleep the main thread for 30 seconds, instead of using join.
*/
Thread.sleep(30000);
}
}
/* pThreads */
#include<stdio.h>
#include<pthread.h>
#include<assert.h>
#include<time.h>
int i = 1, j = 0;
void * threadFunc1(void * args) {
sleep(1);
printf("Thread1: j = %d\n",j);
i = 0;
}
void * threadFunc2(void * args) {
while(i == 1) {
//printf("Thread2: i = %d\n", i);
j++;
}
}
int main() {
pthread_t t1, t2;
int res;
printf("Main: creating threads\n");
res = pthread_create(&t1, NULL, threadFunc1, "Thread1"); assert(res==0);
res = pthread_create(&t2, NULL, threadFunc2, "Thread2"); assert(res==0);
res = pthread_join(t1,NULL); assert(res==0);
res = pthread_join(t2,NULL); assert(res==0);
printf("i = %d\n", i);
printf("Main: End\n");
return 0;
}
I noticed that the pThread program always ends. (I tested it for different sleep times for thread1). However the Java program ends only a very few times; does not end most of the times.
If I uncomment the Comment1 in java program, then it ends all the time. Also if I use volatile, then it ends for java in all cases.
So my confusion is,
if cache coherence is done in hardware, then 'i=0' should be visible to other threads unless
compiler optimized the code. But if compiler optimized the code, then I don't understand why the thread ends sometimes and doesn't sometimes. Also adding a System.out.println seems to change the behavior.
Can anyone see a compiler optimization that Java does (which is not done by C compiler), which is causing this behavior?
Is there something additional that the Compiler has to do, to get Cache coherence even if the hardware already supports it? (like enable/disable)
Should I be using Volatile for all shared variables by default?
Am I missing something? Any additional comments are welcome.

if cache coherence is done in hardware, then 'i=0' should be visible to other threads unless compiler optimized the code. But if compiler optimized the code, then I don't understand why the thread ends sometimes and doesn't sometimes. Also adding a System.out.println seems to change the behavior.
Note: The javac does next to no optimization, so don't think in terms of static optimisations.
You are locking on different objects which are unrelated to the object you are modifying. As the field you are modifying is not volatile the JVM optimiser is free to optimise it dynamically as it chooses, regardless of the support your hardware could otherwise provide.
As this is dynamic, it may or may not optimise the read of the field which you don't change in that thread.
Can anyone see a compiler optimization that Java does (which is not done by C compiler), which is causing this behavior?
The optimisation is most likely that the read is cached in a register or the code is eliminated completely. This optimisation typically takes about 10-30 ms so you are testing whether this optimisation has occurred before the program finishes.
Is there something additional that the Compiler has to do, to get Cache coherence even if the hardware already supports it? (like enable/disable)
You have to use the model correctly, forget about the idea that the compiler will optimise your code, and ideally use the concurrency libraries for passing work between threads.
public static void main(String... args) {
final AtomicBoolean flag = new AtomicBoolean(true);
/*
* Thread1: Sleeps for 30ms and then sets i to 1
*/
new Thread(new Runnable() {
#Override
public void run() {
try {
Thread.sleep(30);
System.out.println("Thread1: flag=" + flag);
flag.set(false);
} catch (Exception e) {
throw new RuntimeException("Thread1 Error");
}
}
}).start();
/*
* Thread2: Loops until flag is false and then exits.
*/
new Thread(new Runnable() {
#Override
public void run() {
long j = 0;
while (flag.get())
j++;
System.out.println("\nThread2: flag=" + flag + ", j=" + j);
}
}).start();
}
prints
Thread1: flag=true
Thread2: flag=false, j=39661265
Should I be using Volatile for all shared variables by default?
Almost never. It would work if you have a since flag if you set it only once. However, using locking is more likely to be useful generally.

Your specific problem is that the 2nd thread needs to synchronize memory after i has been set to 0 by the 1st thread. Because both the threads are synchronizing on this which, as #Peter and #Marko has pointed out are different objects. It is possible for the 2nd thread to enter the while loop _before the first thread sets i = 0. There is no additional memory barrier crossed in the while loop so the field is never updated.
If I uncomment the Comment1 in java program, then it ends all the time.
This works is because the underlying System.out PrintStream is synchronized which causes a memory-barrier to be crossed. Memory barriers force synchronization memory between the thread and central memory and ensure ordering of memory operations. Here's the PrintStream.println(...) source:
public void println(String x) {
synchronized (this) {
print(x);
newLine();
}
}
if cache coherence is done in hardware, then 'i=0' should be visible to other threads unless compiler optimized the code
You have to remember that each of the processors has both a few registers and a lot of per-processor cache memory. It is the cached memory which is the main issue here not compiler optimizations.
Can anyone see a compiler optimization that Java does (which is not done by C compiler), which is causing this behavior?
The use of cached memory and memory operation reordering both are significant performance optimizations. Processors are free to change the order of operations to improve pipelining and they do not synchronize their dirty pages unless a memory barrier is crossed. This means that a thread can run asynchronously using local high-speed memory to [significantly] increase performance. The Java memory model allows for this and is vastly more complicated compared to pthreads.
Should I be using volatile for all shared variables by default?
If you expect thread #1 to update a field and thread #2 to see that update then yes, you will need to mark the field as volatile. Using Atomic* classes is often recommended and is required if you want to increment a shared variable (++ is two operations).
If you are doing multiple operations (such as iterating across a shared collection) then synchronized keyword should be used.

The program will end if Thread 2 starts running after Thread 1 has already set i to 0. Using synchronized(this) may contribute to this somewhat because there's a memory barrier at each entry into a synchronized block, regardless of the lock acquired (you use disparate locks, so no contention will ensue).
Aside from this there may be other complicated interactions between the moment your code gets JITted and the moment Thread 1 writes 0, since this changes the level of optimization. Optimized code will normally read only once from the global var and cache the value in a register or similar thread-local location.

Cache coherency is a hardware level feature. How manipulating a variable maps to CPU instructions and indirectly to the hardware is a language/runtime feature.
In other words, setting a variable does not necessarily translate into CPU instructions that write to that variable's memory. A compiler (offline or JIT) can use other information to determine that it does not need to be written to memory.
Having said that, most languages with support for concurrency have additional syntax to tell the compiler that the data you are working with is intended for concurrent access. For many (like Java), it's opt-in.

If the expected behavior is for thread 2 to detect the change in variable and terminate, definately "Volatile" keyword is required. It allows the thead to be able to communicate via the volatile variable. Compiler usually optimize to fetch from cache as it is faster compared to fetching from main memory.
Check out this awesome post, it will give you your answer:
http://jeremymanson.blogspot.sg/2008/11/what-volatile-means-in-java.html
I believe in this case, it has nothing to do with cache coherence. As mentioned it is a computer architecture features, which should be transparent to a c/java program.
If no volatile is specified, the behaviour is undefined and that's why sometimes the other thread can get the value change and sometimes it can't.
volatile in C and java context has different meaning.
http://en.wikipedia.org/wiki/Volatile_variable
Depending on your C compiler, the program might get optimized and have the same effect as your java program. So a volatile keyword is always recommended.

Related

HashMap synchronized `put` but not `get`

I have the following code snippet that I'm trying to see if it can crash/misbehave at some point. The HashMap is being called from multiple threads in which put is inside a synchronized block and get is not. Is there any issue with this code? If so, what modification I need to make to see that happens given that I only use put and get this way, and there is no putAll, clear or any operations involved.
import java.util.HashMap;
import java.util.Map;
public class Main {
Map<Integer, String> instanceMap = new HashMap<>();
public static void main(String[] args) {
System.out.println("Hello");
Main main = new Main();
Thread thread1 = new Thread("Thread 1"){
public void run(){
System.out.println("Thread 1 running");
for (int i = 0; i <= 100; i++) {
System.out.println("Thread 1 " + i + "-" + main.getVal(i));
}
}
};
thread1.start();
Thread thread2 = new Thread("Thread 2"){
public void run(){
System.out.println("Thread 2 running");
for (int i = 0; i <= 100; i++) {
System.out.println("Thread 2 " + i + "-" + main.getVal(i));
}
}
};
thread2.start();
}
private String getVal(int key) {
check(key);
return instanceMap.get(key);
}
private void check(int key) {
if (!instanceMap.containsKey(key)) {
synchronized (instanceMap) {
if (!instanceMap.containsKey(key)) {
// System.out.println(Thread.currentThread().getName());
instanceMap.put(key, "" + key);
}
}
}
}
}
What I have checked out:
Are size(), put(), remove(), get() atomic in Java synchronized HashMap?
Extending HashMap<K,V> and synchronizing only puts
Why does HashMap.get(key) needs to be synchronized when change operations are synchronized?

I somewhat modified your code:
removed System.out.println() from the "hot" loop, it is internally synchronized
increased the number of iterations
changed printing to only print when there's an unexpected value
There's much more we can do and try, but this already fails, so I stopped there. The next step would we to rewrite the whole thing to jcsctress.
And voila, as expected, sometimes this happens on my Intel MacBook Pro with Temurin 17:
Exception in thread "Thread 2" java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "java.util.Map.get(Object)" is null
at com.gitlab.janecekpetr.playground.Playground.getVal(Playground.java:35)
at com.gitlab.janecekpetr.playground.Playground.lambda$0(Playground.java:21)
at java.base/java.lang.Thread.run(Thread.java:833)
Code:
private record Val(int index, int value) {}
private static final int MAX = 100_000;
private final Map<Integer, Integer> instanceMap = new HashMap<>();
public static void main(String... args) {
Playground main = new Playground();
Runnable runnable = () -> {
System.out.println(Thread.currentThread().getName() + " running");
Val[] vals = new Val[MAX];
for (int i = 0; i < MAX; i++) {
vals[i] = new Val(i, main.getVal(i));
}
System.out.println(Stream.of(vals).filter(val -> val.index() != val.value()).toList());
};
Thread thread1 = new Thread(runnable, "Thread 1");
thread1.start();
Thread thread2 = new Thread(runnable, "Thread 2");
thread2.start();
}
private int getVal(int key) {
check(key);
return instanceMap.get(key);
}
private void check(int key) {
if (!instanceMap.containsKey(key)) {
synchronized (instanceMap) {
if (!instanceMap.containsKey(key)) {
instanceMap.put(key, key);
}
}
}
}

To specifically explain the excellent sleuthing work in the answer by #PetrJaneček :
Every field in java has an evil coin attached to it. Anytime any thread reads the field, it flips this coin. It is not a fair coin - it is evil. It will flip heads 10,000 times in a row if that's going to ruin your day (for example, you may have code that depends on coinflips landing a certain way, or it'll fail to work. The coin is evil: You may run into the situation that just to ruin your day, during all your extensive testing, the coin flips heads, and during the first week in production it's all heads flips. And then the big new potential customer demos your app and the coin starts flipping some tails on you).
The coinflip decides which variant of the field is used - because every thread may or may not have a local cache of that field. When you write to a field from any thread? Coin is flipped, on tails, the local cache is updated and nothing more happens. Read from any thread? Coin is flipped. On tails, the local cache is used.
That's not really what happens of course (your JVM does not actually have evil coins nor is it out to get you), but the JMM (Java Memory Model), along with the realities of modern hardware, means that this abstraction works very well: It will reliably lead to the right answer when writing concurrent code, namely, that any field that is touched by more than one thread must have guards around it, or must never change at all during the entire duration of the multi-thread access 'session'.
You can force the JVM to flip the coin the way you want, by establishing so-called Happens Before relationships. This is explicit terminology used by the JMM. If 2 lines of code have a Happens-Before relationship (one is defined as 'happening before' the other, as per the JMM's list of HB relationship establishing actions), then it is not possible (short of a bug in the JVM itself) to observe any side effect of the HA line whilst not also observing all side effects of the HB line. (That is to say: the 'happens before' line happens before the 'happens after' line as far as your code could ever tell, though it's a bit of schrodiner's cat situation. If your code doesn't actually look at these files in a way that you'd ever be able to tell, then the JVM is free to not do that. And it won't, you can rely on the evil coin being evil: If the JMM takes a 'right', there will be some combination of CPU, OS, JVM release, version, and phase of the moon that combine to use it).
A small selection of common HB/HA establishing conditions:
The first line inside a synchronized(lock) block is HA relative to the hitting of that block in any other thread.
Exiting a synchronized(lock) block is HB relative to any other thread entering any synchronized(lock) block, assuming the two locks are the same reference.
thread.start() is HB relative to the first line that thread will run.
The 'natural' HB/HA: line X is HB relative to line Y if X and Y are run by the same thread and X is 'before it' in your code. You can't write x = 5; y = x; and have y be set by a version of x that did not witness the x = 5 happening (of course, if another thread is also modifying x, all bets are off unless you have HB/HA with whatever line is doing that).
writes and reads to volatile establish HB/HA but you usually can't get any guarantees about which direction.
This explains the way your code may fail: The get() call establishes absolutely no HB/HA relationship with the other thread that is calling put(), and therefore the get() call may or may not use locally cached variants of the various fields that HashMap uses internally, depending on the evil coin (which is of course hitting some fields; it'll be private fields in the HashMap implementation someplace, so you don't know which ones, but HashMap obviously has long-lived state, which implies fields are involved).
So why haven't you actually managed to 'see' your code asplode like the JMM says it will? Because the coin is EVIL. You cannot rely on this line of reasoning: "I wrote some code that should fail if the synchronizations I need aren't happening the way I want. I ran it a whole bunch of times, and it never failed, therefore, apparently this code is concurrency-safe and I can use this in my production code". That is simply not ever actually reliable. That's why you need to be thinking: Evil! That coin is out to get me! Because if you don't, you may be tempted to write test code like this.
You should be scared of writing code where more than one thread interacts with the same field. You should be bending over backwards to avoid it. Use message queues. Do your chat between threads by using databases, which have much nicer primitives for this stuff (transactions and isolation levels). Rewrite the code so that it takes a bunch of params up front and then runs without interacting with other threads via fields at all, until it is all done, and it then returns a result (and then use e.g. fork/join framework to make it all work). Make your webserver performant and using all the cores simply by relying on the fact that every incoming request will be its own thread, so the only thing that needs to happen for you to use all the cores is for that many folks to hit your server at the same time. If you don't have enough requests, great! Your server isn't busy so it doesn't matter you aren't using all the cores.
If truly you decide that interacting with the same field from multiple threads is the right answer, you need to think NASA programming mars rovers on the lines that interact with those fields, because tests simply cannot be relied upon. It's not as hard as it sounds - especially if you keep the actual interacting with the relevant fields down to a minimum and keep thinking: "Have I established HB/HA"?
In this case, I think Petr figured it out correctly: System.out.println is hella slow and does various synchronizing actions. JMM is a package deal, and commutative: Once HB/HA establishes, everything the HB line changed is observable to the code in the HA line, and add in the natural rule, which means all code that follows the HA line cannot possibly observe a universe where something any line before the HB line did is not yet visible. In other words, the System.out.println statements HB/HA with each other in some order, but you can't rely on that (System.out is not specced to synchronize. But, just about every implementation does. You should not rely on implementation details, and I can trivially write you some java code that is legal, compiles, runs, and breaks no contracts, because you can set System.out with System.setOut - that does not synchronize when interacting with System.out!). The evil coin in this case took the shape of 'accidental' synchronization via intentionally unspecced behaviour of System.out.

The following explanation is more in line with the terminology used in the JMM. Could be useful if you want a more solid understanding of this topic.
2 Actions are conflicting when they access the same address and there is at least 1 write.
2 Actions are concurrent when they are not ordered by a happens-before relation (there is no happens-before edge between them).
2 Actions are in data race when they are conflicting and concurrent.
When there are data races in your program, weird problems can happen like unexpected reordering of instructions, visibility problems, or atomicity problems.
So what makes up the happens-before relation. If a volatile read observes a particular volatile write, then there is a happens-before edge between the write and the read. This means that read will not only see that write, but everything that happened before that write. There are other sources of happens-before edges like the release of a monitor and subsequent acquire of the same monitor. And there is a happens-before edge between A, B when A occurs before B in the program order. Note: the happens-before relation is transitive, so if A happens-before B and B happens-before C, then A happens-before C.
In your case, you have a get/put operations which are conflicting since they access the same address(es) and there is at least 1 write.
The put/get action are concurrent, since is no happens-before edge between writing and reading because even though the write releases the monitor, the get doesn't acquire it.
Since the put/get operations are concurrent and conflicting, they are in data race.
The simplest way to fix this problem, is to execute the map.get in a synchronized block (using the same monitor). This will introduce the desired happens-before edge and makes the actions sequential instead of concurrent and as consequence, the data-race disappears.
A better-performing solution would be to make use of a ConcurrentHashMap. Instead of a single central lock, there are many locks and they can be acquired concurrently to improve scalability and performance. I'm not going to dig into the optimizations of the ConcurrentHashMap because would create confusion.
[Edit]
Apart from a data-race, your code also suffers from race conditions.

Threads does not work without volatile and reads the value from RAM instead of caching

Volatile is supposed to make the Threads read the values from RAM disabling thread cache, and without volatile caching will be enabled making a thread unaware of the variable change made by another thread but this does not work for the below code.
Why does this happen and code works the same with and without volatile keyword there?
public class Racing{
private boolean won = false; //without volatile keyword
public void race() throws InterruptedException{
Thread one = new Thread(()->{
System.out.println("Player-1 is racing...");
while(!won){
won=true;
}
System.out.println("Player-1 has won...");
});
Thread two=new Thread(()->{
System.out.println("Player-2 is racing...");
while(!won){
System.out.println("Player-2 Still Racing...");
}
});
one.start();
//Thread.sleep(2000);
two.start();
}
public static void main(String k[]) {
Racing racing=new Racing();
try{
racing.race();
}
catch(InterruptedException ie){}
}
Why does this behave the same with and without volatile ?

Volatile is supposed to make the threads read the values from RAM
disabling thread cache
No, this is not accurate. It depends on the architecture where the code is running. The Java language standard itself does not state anything about how the volatile should or not be implemented.
From Myths Programmers Believe about CPU Caches can read:
As a computer engineer who has spent half a decade working with caches
at Intel and Sun, I’ve learnt a thing or two about cache-coherency.
(...)
For another, if volatile variables were truly written/read from main-memory > every single time, they would be horrendously slow – main-memory references are > 200x slower than L1 cache references. In reality, volatile-reads (in Java) can > often be just as cheap as a L1 cache reference, putting to rest the notion that volatile forces reads/writes all the way to main memory. If you’ve been avoiding the use of volatiles because of performance concerns, you might have been a victim of the above misconceptions.
Unfortunately, there still are several articles online propagating this inaccuracy (i.e., that volatile forces variables to be read from main memory).
Accordingly to the language standard (§17.4):
A field may be declared volatile, in which case the Java Memory Model
ensures that all threads see a consistent value for the variable
So informally, all threads will have a view of the most updated value of that variable. There is nothing about how the hardware should enforce such constrain.
Why does this happen and code works same with and without volatile
Well (in your case) without the volatile is undefined behavior, meaning you might or not see the most updated value of the flag won, consequently, theoretically the race condition is still there. However, because you have added the following statement
System.out.println("Player-2 Still Racing...");
in:
Thread two = new Thread(()->{
System.out.println("Player-2 is racing...");
while(!won){
System.out.println("Player-2 Still Racing...");
}
});
two things will happen, you will avoid the Spin on field problem, and second if one looks at the System.out.println code:
public void println(String x) {
synchronized (this) {
print(x);
newLine();
}
}
one can see that there is a synchronized being called, which will increase the likelihood that the threads will be reading the most updated value of the field flag (before the called to the println method). However, even that might change based on the JVM implementation.

Without volatile, there is no guarantee that another thread will see updates written to a variable. That does not mean that another thread will not see those updates if the value is not volatile. Other threads may eventually see the modified value.
In your example, you are using System.out.printlns, which contain memory barriers. That means once the println works, all variables updated before that point are visible to all the threads. The program might work differently if you do not print anything.

how synchronized keyword works internally

I read the below program and answer in a blog.
int x = 0;
boolean bExit = false;
Thread 1 (not synchronized)
x = 1;
bExit = true;
Thread 2 (not synchronized)
if (bExit == true)
System.out.println("x=" + x);
is it possible for Thread 2 to print “x=0”?
Ans : Yes ( reason : Every thread has their own copy of variables. )
how do you fix it?
Ans: By using make both threads synchronized on a common mutex or make both variable volatile.
My doubt is : If we are making the 2 variable as volatile then the 2 threads will share the variables from the main memory. This make a sense, but in case of synchronization how it will be resolved as both the thread have their own copy of variables.
Please help me.

This is actually more complicated than it seems. There are several arcane things at work.
Caching
Saying "Every thread has their own copy of variables" is not exactly correct. Every thread may have their own copy of variables, and they may or may not flush these variables into the shared memory and/or read them from there, so the whole thing is non-deterministic. Moreover, the very term flushing is really implementation-dependent. There are strict terms such as memory consistency, happens-before order, and synchronization order.
Reordering
This one is even more arcane. This
x = 1;
bExit = true;
does not even guarantee that Thread 1 will first write 1 to x and then true to bExit. In fact, it does not even guarantee that any of these will happen at all. The compiler may optimize away some values if they are not used later. The compiler and CPU are also allowed to reorder instructions any way they want, provided that the outcome is indistinguishable from what would happen if everything was really in program order. That is, indistinguishable for the current thread! Nobody cares about other threads until...
Synchronization comes in
Synchronization does not only mean exclusive access to resources. It is also not just about preventing threads from interfering with each other. It's also about memory barriers. It can be roughly described as each synchronization block having invisible instructions at the entry and exit, the first one saying "read everything from the shared memory to be as up-to-date as possible" and the last one saying "now flush whatever you've been doing there to the shared memory". I say "roughly" because, again, the whole thing is an implementation detail. Memory barriers also restrict reordering: actions may still be reordered, but the results that appear in the shared memory after exiting the synchronized block must be identical to what would happen if everything was indeed in program order.
All that only works, of course, only if both blocks use the same locking object.
The whole thing is described in details in Chapter 17 of the JLS. In particular, what's important is the so-called "happens-before order". If you ever see in the documentation that "this happens-before that", it means that everything the first thread does before "this" will be visible to whoever does "that". This may even not require any locking. Concurrent collections are a good example: one thread puts there something, another one reads that, and that magically guarantees that the second thread will see everything the first thread did before putting that object into the collection, even if those actions had nothing to do with the collection itself!
Volatile variables
One last warning: you better give up on the idea that making variables volatile will solve things. In this case maybe making bExit volatile will suffice, but there are so many troubles that using volatiles can lead to that I'm not even willing to go into that. But one thing is for sure: using synchronized has much stronger effect than using volatile, and that goes for memory effects too. What's worse, volatile semantics changed in some Java version so there may exist some versions that still use the old semantics which was even more obscure and confusing, whereas synchronized always worked well provided you understand what it is and how to use it.
Pretty much the only reason to use volatile is performance because synchronized may cause lock contention and other troubles. Read Java Concurrency in Practice to figure all that out.
Q & A
1) You wrote "now flush whatever you've been doing there to the shared
memory" about synchronized blocks. But we will see only the variables
that we access in the synchronize block or all the changes that the
thread call synchronize made (even on the variables not accessed in the
synchronized block)?
Short answer: it will "flush" all variables that were updated during the synchronized block or before entering the synchronized block. And again, because flushing is an implementation detail, you don't even know whether it will actually flush something or do something entirely different (or doesn't do anything at all because the implementation and the specific situation already somehow guarantee that it will work).
Variables that wasn't accessed inside the synchronized block obviously won't change during the execution of the block. However, if you change some of those variables before entering the synchronized block, for example, then you have a happens-before relationship between those changes and whatever happens in the synchronized block (the first bullet in 17.4.5). If some other thread enters another synchronized block using the same lock object then it synchronizes-with the first thread exiting the synchronized block, which means that you have another happens-before relationship here. So in this case the second thread will see the variables that the first thread updated prior to entering the synchronized block.
If the second thread tries to read those variables without synchronizing on the same lock, then it is not guaranteed to see the updates. But then again, it isn't guaranteed to see the updates made inside the synchronized block as well. But this is because of the lack of the memory-read barrier in the second thread, not because the first one didn't "flush" its variables (memory-write barrier).
2) In this chapter you post (of JLS) it is written that: "A write to a
volatile field (§8.3.1.4) happens-before every subsequent read of that
field." Doesn't this mean that when the variable is volatile you will
see only changes of it (because it is written write happens-before
read, not happens-before every operation between them!). I mean
doesn't this mean that in the example, given in the description of the
problem, we can see bExit = true, but x = 0 in the second thread if
only bExit is volatile? I ask, because I find this question here: http://java67.blogspot.bg/2012/09/top-10-tricky-java-interview-questions-answers.html
and it is written that if bExit is volatile the program is OK. So the
registers will flush only bExits value only or bExits and x values?
By the same reasoning as in Q1, if you do bExit = true after x = 1, then there is an in-thread happens-before relationship because of the program order. Now since volatile writes happen-before volatile reads, it is guaranteed that the second thread will see whatever the first thread updated prior to writing true to bExit. Note that this behavior is only since Java 1.5 or so, so older or buggy implementations may or may not support this. I have seen bits in the standard Oracle implementation that use this feature (java.concurrent collections), so you can at least assume that it works there.
3) Why monitor matters when using synchronized blocks about memory
visibility? I mean when try to exit synchronized block aren't all
variables (which we accessed in this block or all variables in the
thread - this is related to the first question) flushed from registers
to main memory or broadcasted to all CPU caches? Why object of
synchronization matters? I just cannot imagine what are relations and
how they are made (between object of synchronization and memory).
I know that we should use the same monitor to see this changes, but I
don't understand how memory that should be visible is mapped to
objects. Sorry, for the long questions, but these are really
interesting questions for me and it is related to the question (I
would post questions exactly for this primer).
Ha, this one is really interesting. I don't know. Probably it flushes anyway, but Java specification is written with high abstraction in mind, so maybe it allows for some really weird hardware where partial flushes or other kinds of memory barriers are possible. Suppose you have a two-CPU machine with 2 cores on each CPU. Each CPU has some local cache for every core and also a common cache. A really smart VM may want to schedule two threads on one CPU and two threads on another one. Each pair of the threads uses its own monitor, and VM detects that variables modified by these two threads are not used in any other threads, so it only flushes them as far as the CPU-local cache.
See also this question about the same issue.
4) I thought that everything before writing a volatile will be up to
date when we read it (moreover when we use volatile a read that in
Java it is memory barrier), but the documentation don't say this.
It does:
17.4.5.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
If hb(x, y) and hb(y, z), then hb(x, z).
A write to a volatile field (§8.3.1.4) happens-before every subsequent
read of that field.
If x = 1 comes before bExit = true in program order, then we have happens-before between them. If some other thread reads bExit after that, then we have happens-before between write and read. And because of the transitivity, we also have happens-before between x = 1 and read of bExit by the second thread.
5) Also, if we have volatile Person p does we have some dependency
when we use p.age = 20 and print(p.age) or have we memory barrier in
this case(assume age is not volatile) ? - I think - No
You are correct. Since age is not volatile, then there is no memory barrier, and that's one of the trickiest things. Here is a fragment from CopyOnWriteArrayList, for example:
Object[] elements = getArray();
E oldValue = get(elements, index);
if (oldValue != element) {
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len);
newElements[index] = element;
setArray(newElements);
} else {
// Not quite a no-op; ensures volatile write semantics
setArray(elements);
Here, getArray and setArray are trivial setter and getter for the array field. But since the code changes elements of the array, it is necessary to write the reference to the array back to where it came from in order for the changes to the elements of the array to become visible. Note that it is done even if the element being replaced is the same element that was there in the first place! It is precisely because some fields of that element may have changed by the calling thread, and it's necessary to propagate these changes to future readers.
6) And is there any happens before 2 subsequent reads of volatile
field? I mean does the second read will see all changes from thread
which reads this field before it(of course we will have changes only
if volatile influence visibility of all changes before it - which I am
a little confused whether it is true or not)?
No, there is no relationship between volatile reads. Of course, if one thread performs a volatile write and then two other thread perform volatile reads, they are guaranteed to see everything at least up to date as it was before the volatile write, but there is no guarantee of whether one thread will see more up-to-date values than the other. Moreover, there is not even strict definition of one volatile read happening before another! It is wrong to think of everything happening on a single global timeline. It is more like parallel universes with independent timelines that sometimes sync their clocks by performing synchronization and exchanging data with memory barriers.

It depends on the implementation which decides if threads will keep a copy of the variables in their own memory. In case of class level variables threads have a shared access and in case of local variables threads will keep a copy of it. I will provide two examples which shows this fact , please have a look at it.
And in your example if I understood it correctly your code should look something like this--
package com.practice.multithreading;
public class LocalStaticVariableInThread {
static int x=0;
static boolean bExit = false;
public static void main(String[] args) {
Thread t1=new Thread(run1);
Thread t2=new Thread(run2);
t1.start();
t2.start();
}
static Runnable run1=()->{
x = 1;
bExit = true;
};
static Runnable run2=()->{
if (bExit == true)
System.out.println("x=" + x);
};
}
Output
x=1
I am getting this output always. It is because the threads share the variable and the when it is changed by one thread other thread can see it. But in real life scenarios we can never say which thread will start first, since here the threads are not doing anything we can see the expected result.
Now take this example--
Here if you make the i variable inside the for-loop` as static variable then threads won t keep a copy of it and you won t see desired outputs, i.e. the count value will not be 2000 every time even if u have synchronized the count increment.
package com.practice.multithreading;
public class RaceCondition2Fixed {
private int count;
int i;
/*making it synchronized forces the thread to acquire an intrinsic lock on the method, and another thread
cannot access it until this lock is released after the method is completed. */
public synchronized void increment() {
count++;
}
public static void main(String[] args) {
RaceCondition2Fixed rc= new RaceCondition2Fixed();
rc.doWork();
}
private void doWork() {
Thread t1 = new Thread(new Runnable() {
#Override
public void run() {
for ( i = 0; i < 1000; i++) {
increment();
}
}
});
Thread t2 = new Thread(new Runnable() {
#Override
public void run() {
for ( i = 0; i < 1000; i++) {
increment();
}
}
});
t1.start();
t2.start();
try {
t1.join();
t2.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
/*if we don t use join then count will be 0. Because when we call t1.start() and t2.start()
the threads will start updating count in the spearate threads, meanwhile the main thread will
print the value as 0. So. we need to wait for the threads to complete. */
System.out.println(Thread.currentThread().getName()+" Count is : "+count);
}
}

Why is this code working without volatile?

I am new to Java, I am currently learning about volatile. Say I have the following code:
public class Test
{
private static boolean b = false;
public static void main(String[] args) throws Exception
{
new Thread(new Runnable()
{
public void run()
{
while(true)
{
b = true;
}
}
}).start();
// Give time for thread to start
Thread.sleep(2000);
System.out.println(b);
}
}
Output:
true
This code has two threads (the main thread and another thread). Why is the other thread able to modify the value of b, shouldn't b be volatile in order for this to happen?

The volatile keyword guarantees that changes are visible amongst multiple threads, but you're interpreting that to mean that opposite is also true; that the absence of the volatile keyword guarantees isolation between threads, and there's no such guarantee.
Also, while your code example is multi-threaded, it isn't necessarily concurrent. It could be that the values were cached per-thread, but there was enough time for the JVM to propagate the change before you printed the result.

You are right that with volatile, you can ensure/guarantee that your 2 threads will see the appropriate value from main memory at all times, and never a thread-specific cached version of it.
Without volatile, you lose that guarantee. And each thread is working with its own cached version of the value.
However, there is nothing preventing the 2 threads from resynchronizing their memory if and when they feel like it, and eventually viewing the same value (maybe). It's just that you can't guarantee that it will happen, and you most certainly cannot guarantee when it will happen. But it can happen at some indeterminate point in time.
The point is that your code may work sometimes, and sometimes not. But even if every time you run it on your personal computer, is seems like it's reading the variable properly, it's very likely that this same code will break on a different machine. So you are taking big risks.

Java multi-threading used in conjuction with method invocation?

I met the following Java class on the internet:
public class Lock1 implements Runnable {
int b=100;
public synchronized void m1() throws Exception {
b=1000;
Thread.sleep(50);
System.out.println("b="+b);
}
public synchronized void m2() throws Exception {
Thread.sleep(30);
//System.out.println("m2");
b=2000;
}
public void run() {
try {m1();}
catch(Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
Lock1 tt=new Lock1();
Thread t = new Thread(tt);
t.start();
tt.m2();
System.out.println(tt.b);
}
}
Tried running this a lot of times, the result is almost always:
1000
b=1000
In my original guess, I thought the first line should be "2000", since tt.m2() is just a method invocation(not a thread), the main method should continue with its execution and get the resulting "b" as the one has been assigned value 2000 in method m2.
The second try that I did is to uncomment out the
System.out.println("m2")
in m2 method.Suprisingly, the result will be nearly always:
m2
2000
b=1000
Why adding a statement in the m2 method, will cause the output value of tt.b to be changed?
Sorry I am quite confused here about the difference between threads and method invocation, hope experts can help out!

Synchronization in the Java sense combines several things. In this case these points are interesting:
mutual exclusion
memory barriers for readers
memory barriers for writers
After entering a synchronized block (or method) you have got two guarantees: You have the lock (mutual exclusion) and that the JVM and the compiler will discard any cache for the for the synchronization object. This means an access to this.b will fetch the actual value for 'b' from the RAM and not from any cache but only once. Then it will work with the cached copy again.
Leaving a synchronized block in turn guarantees that the CPU flushes all dirty (i.e. written) caches to the memory.
The point in your stuff is: System.out.println(tt.b); is in no way synchronized which means the access to it has not crossed a defined memory barrier. So although the other thread has written a new value for b and flushed it to the RAM the main thread has no idea, that it should read b from RAM and not from its own cache.
The solution is:
synchronized(tt){
System.out.println(tt.b);
}
This meets the golden rule, that if something is synchronized then every access to it should be synchronized and not only half of the accesses.
And regarding your added System.out: There are three things:
First: It is slow (compared to some memory fiddling). This means that in the meantime the CPU or the JVM might decide for themselves, that a new look to tt might be appropriate
Second: It is big (compared to some memory fiddling). This means that the touched code alone might evict tt from the caches.
Third: It is synchronized internally. This means that you crossed some memory barriers (which might have nothing to do with your tt - who knows). But these might also have some effect.
This is the lead rule of multithreading debugging: Adding System.out in order to catch errors will, according to Murphy, actually hide the problem.

I guess this is JVM implementation specific.
Basically, each thread has its's own copy (view) of the object variables and the way they are synced back and forth is not determined.

The most likely cause is that System.out.println is slow. The cause of the "unexpected" results is because of a race condition between the delay (Thread.sleep) and the overhead of opening the output stream (System.out.println).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.