Can we use volatile writes exclusively to enforce cache coherence?

Can we use volatile writes exclusively to enforce cache coherence? - java

I have encountered the following claim: "Reading or writing to a volatile variable imposes a memory barrier in which the entire cache is flushed/invalidated."
Now consider the following execution scenario:
initial volatile boolean barrier;
initial int b = 0;
thread 1 b = 1; // write1
thread 1 barrier = true; // write2
thread 2 barrier = true; // write3
thread 2 print(b); // r1
Question: is thread 2 guaranteed to print 1?
Based on the claim, I would answer yes: thread 1 flushes its cache on write2 (so that b = 1 ends up in main memory), and thread 2 invalidates its cache on write3 (so that it will read b from main memory).
However, in the relevant JLS sections I am unable to find a guarantee for this behaviour, since write3 is a write, and not a read. Thus the following seemingly crucial clause does not apply:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
Is there some other information I am missing, or am I perhaps misunderstanding something?
(Relevant questions:
Volatile variables and other variables
Is a write to a volatile a memory-barrier in Java)

I think, you highlighted a wrong word in the phrase you quoted (I mean, the fact that it synchronizes with reads is by far not the main issue here): "A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread".
Note that it does not say anything at all about reads of other variables. Thread-1's version of b might still be sitting in the register for all you know.

Reasoning of volatile in terms of cache flushing/invalidation will lead to more confusion and it doesn't present the full picture.
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
volatile write of variable followed by a read of that variable in other thread establishes a weaker consistency in terms of happens before gaurantee.
i.e changing a little in op code
thread 1 b = 1; // write1
thread 1 barrier = true; // write2
thread 2 while(!barrier) // read barrier
thread 2 print(b); // r1
now it is guaranteed to print b as 1.
Now if you see when thread 2 reads barrier as true and then thread 2 is guaranteed to see everything which happened before writing to barrier in program order.
If you need to see volatile in terms of barrier then please refer
http://gee.cs.oswego.edu/dl/jmm/cookbook.html.
In brief volatile implementation invoke two kinds of things
first compiler is restricted in what can be optimize w.r.t to volatile read and write. Secondly and more importantly it is implemented
as volatile read -> loadload
volatile wtire - > storeStore|loadStore
before write and StoreLoad after write.
here loadload means process the invalidate queue
before reading the next value ensures reading lates values.
StoreStore means flush store buffer before writing the next value.
http://gee.cs.oswego.edu/dl/jmm/cookbook.html.
http://www.puppetmastertrading.com/images/hwViewForSwHackers.pdf

Related

What does "subsequent read" mean in the context of volatile variables?

Java memory visibility documentation says that:
A write to a volatile field happens-before every subsequent read of that same field.
I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores. So for example I assign value to variable in cycle c1 in some thread and then second thread is able to see this value in subsequent cycle c1 + 1?

It sounds to me like it's saying that it provides lockless acquire/release memory-ordering semantics between threads. See Jeff Preshing's article explaining the concept (mostly for C++, but the main point of the article is language neutral, about the concept of lock-free acquire/release synchronization.)
In fact Java volatile provides sequential consistency, not just acq/rel. There's no actual locking, though. See Jeff Preshing's article for an explanation of why the naming matches what you'd do with a lock.)
If a reader sees the value you wrote, then it knows that everything in the producer thread before that write has also already happened.
This ordering guarantee is only useful in combination with other guarantees about ordering within a single thread.
e.g.
int data[100];
volatile bool data_ready = false;
Producer:
data[0..99] = stuff;
// release store keeps previous ops above this line
data_ready = true;
Consumer:
while(!data_ready){} // spin until we see the write
// acquire-load keeps later ops below this line
int tmp = data[99]; // gets the value from the producer
If data_ready was not volatile, reading it wouldn't establish a happens-before relationship between two threads.
You don't have to have a spinloop, you could be reading a sequence number, or an array index from a volatile int, and then reading data[i].
I don't know Java well. I think volatile actually gives you sequential-consistency, not just release/acquire. A sequential-release store isn't allowed to reorder with later loads, so on typical hardware it needs an expensive memory barrier to make sure the local core's store buffer is flushed before any later loads are allowed to execute.
Volatile Vs Atomic explains more about the ordering volatile gives you.
Java volatile is just an ordering keyword; it's not equivalent to C11 _Atomic or C++11 std::atomic<T> which also give you atomic RMW operations. In Java, volatile_var++ is not an atomic increment, it a separate load and store, like volatile_var = volatile_var + 1. In Java, you need a class like AtomicInteger to get an atomic RMW.
And note that C/C++ volatile doesn't imply atomicity or ordering at all; it only tells the compiler to assume that the value can be modified asynchronously. This is only a small part of what you need to write lockless for anything except the simplest cases.

It means that once a certain Thread writes to a volatile field, all other Thread(s) will observe (on the next read) that written value; but this does not protect you against races though.
Threads have their caches, and those caches will be invalidated and updated with that newly written value via cache coherency protocol.
EDIT
Subsequent means whenever that happens after the write itself. Since you don't know the exact cycle/timing when that will happen, you usually say when some other thread observes the write, it will observer all the actions done before that write; thus a volatile establishes the happens-before guarantees.
Sort of like in an example:
// Actions done in Thread A
int a = 2;
volatile int b = 3;
// Actions done in Thread B
if(b == 3) { // observer the volatile write
// Thread B is guaranteed to see a = 2 here
}
You could also loop (spin wait) until you see 3 for example.

Peter's answer gives the rationale behind the design of the Java memory model.
In this answer I'm attempting to give an explanation using only the concepts defined in the JLS.
In Java every thread is composed by a set of actions.
Some of these actions have the potential to be observable by other threads (e.g. writing a shared variable), these
are called synchronization actions.
The order in which the actions of a thread are written in the source code is called the program order.
An order defines what is before and what is after (or better, not before).
Within a thread, each action has a happens-before relationship (denoted by <) with the next (in program order) action.
This relationship is important, yet hard to understand, because it's very fundamental: it guarantees that if A < B then
the "effects" of A are visible to B.
This is indeed what we expect when writing the code of a function.
Consider
Thread 1 Thread 2
A0 A'0
A1 A'1
A2 A'2
A3 A'3
Then by the program order we know A0 < A1 < A2 < A3 and that A'0 < A'1 < A'2 < A'3.
We don't know how to order all the actions.
It could be A0 < A'0 < A'1 < A'2 < A1 < A2 < A3 < A'3 or the sequence with the primes swapped.
However, every such sequence must have that the single actions of each thread are ordered according to the thread's program order.
The two program orders are not sufficient to order every action, they are partial orders, in opposition of the
total order we are looking for.
The total order that put the actions in a row according to a measurable time (like a clock) they happened is called the execution order.
It is the order in which the actions actually happened (it is only requested that the actions appear to be happened in
this order, but that's just an optimization detail).
Up until now, the actions are not ordered inter-thread (between two different threads).
The synchronization actions serve this purpose.
Each synchronization action synchronizes-with at least another synchronization action (they usually comes in pairs, like
a write and a read of a volatile variable, a lock and the unlock of a mutex).
The synchronize-with relationship is the happens-before between thread (the former implies the latter), it is exposed as
a different concept because 1) it slightly is 2) happens-before are enforced naturally by the hardware while synchronize-with
may require software intervention.
happens-before is derived from the program order, synchronize-with from the synchronization order (denoted by <<).
The synchronization order is defined in terms of two properties: 1) it is a total order 2) it is consistent with each thread's
program order.
Let's add some synchronization action to our threads:
Thread 1 Thread 2
A0 A'0
S1 A'1
A1 S'1
A2 S'2
S2 A'3
The program orders are trivial.
What is the synchronization order?
We are looking for something that by 1) includes all of S1, S2, S'1 and S'2 and by 2) must have S1 < S2 and S'1 < S'2.
Possible outcomes:
S1 < S2 < S'1 < S'2
S1 < S'1 < S'2 < S2
S'1 < S1 < S'2 < S'2
All are synchronization orders, there is not one synchronization order but many, the question of above is wrong, it
should be "What are the synchronization orders?".
If S1 and S'1 are so that S1 << S'1 than we are restricting the possible outcomes to the ones where S1 < S'2 so the
outcome S'1 < S1 < S'2 < S'2 of above is now forbidden.
If S2 << S'1 then the only possible outcome is S1 < S2 < S'1 < S'2, when there is only a single outcome I believe we have
sequential consistency (the converse is not true).
Note that if A << B these doesn't mean that there is a mechanism in the code to force an execution order where A < B.
Synchronization actions are affected by the synchronization order they do not impose any materialization of it.
Some synchronization actions (e.g. locks) impose a particular execution order (and thereby a synchronization order) but some don't (e.g. reads/writes of volatiles).
It is the execution order that create the synchronization order, this is completely orthogonal to the synchronize-with relationship.
Long story short, the "subsequent" adjective refers to any synchronization order, that is any valid (according to each thread
program order) order that encompasses all the synchronization actions.
The JLS then continues defining when a data race happens (when two conflicting accesses are not ordered by happens-before)
and what it means to be happens-before consistent.
Those are out of scope.

I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores...?
Subsequent means (according to the dictionary) coming after in time. There certainly is a global clock across all CPUs in a computer (think X Ghz) and the document is trying to say that if thread-1 did something at clock tick 1 then thread-2 does something on another CPU at clock tick 2, it's actions are considered subsequent.
A write to a volatile field happens-before every subsequent read of that same field.
The key phrase that could be added to this sentence to make it more clear is "in another thread". It might make more sense to understand it as:
A write to a volatile field happens-before every subsequent read of that same field in another thread.
What this is saying that if a read of a volatile field happens in Thread-2 after (in time) the write in Thread-1, then Thread-2 will be guaranteed to see the updated value. Further up in the documentation you point to is the section (emphasis mine):
... The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular.
Notice the highlighted phrase. The Java compiler is free to reorder instructions in any one thread's execution for optimization purposes as long as the reordering doesn't violate the definition of the language – this is called execution order and is critically different than program order.
Let's look at the following example with variables a and b that are non-volatile ints initialized to 0 with no synchronized clauses. What is shown is program order and the time in which the threads are encountering the lines of code.
Time Thread-1 Thread-2
1 a = 1;
2 b = 2;
3 x = a;
4 y = b;
5 c = a + b; z = x + y;
If Thread-1 adds a + b at Time 5, it is guaranteed to be 3. However, if Thread-2 adds x + y at Time 5, it might get 0, 1, 2, or 3 depends on race conditions. Why? Because the compiler might have reordered the instructions in Thread-1 to set a after b because of efficiency reasons. Also, Thread-1 may not have appropriately published the values of a and b so that Thread-2 might get out of date values. Even if Thread-1 gets context-switched out or crosses a write memory barrier and a and b are published, Thread-2 needs to cross a read barrier to update any cached values of a and b.
If a and b were marked as volatile then the write to a must happen-before (in terms of visibility guarantees) the subsequent read of a on line 3 and the write to b must happen-before the subsequent read of b on line 4. Both threads would get 3.
We use volatile and synchronized keywords in java to ensure happens-before guarantees. A write memory barrier is crossed when assigning a volatile or exiting a synchronized block and a read barrier is crossed when reading a volatile or entering a synchronized block. The Java compiler cannot reorder write instructions past these memory barriers so the order of updates is assured. These keywords control instruction reordering and insure proper memory synchronization.
NOTE: volatile is unnecessary in a single-threaded application because program order assures the reads and writes will be consistent. A single-threaded application might see any value of (non-volatile) a and b at times 3 and 4 but it always sees 3 at Time 5 because of language guarantees. So although use of volatile changes the reordering behavior in a single-threaded application, it is only required when you share data between threads.

This is more a definition of what will not happen rather than what will happen.
Essentially it is saying that once a write to an atomic variable has happened there cannot be any other thread that, on reading the variable, will read a stale value.
Consider the following situation.
Thread A is continuously incrementing an atomic value a.
Thread B occasionally reads A.a and exposes that value as a
non-atomic b variable.
Thread C occasionally reads both A.a and B.b.
Given that a is atomic it is possible to reason that from the point of view of C, b may occasionally be less than a but will never be greater than a.
If a was not atomic no such guarantee could be given. Under certain caching situations it would be quite possible for C to see b progress beyond a at any time.
This is a simplistic demonstration of how the Java memory model allows you to reason about what can and cannot happen in a multi-threaded environment. In real life the potential race conditions between reading and writing to data structures can be much more complex but the reasoning process is the same.

volatile vs not volatile

Let's consider the following piece of code in Java
int x = 0;
int who = 1
Thread #1:
(1) x++;
(2) who = 2;
Thread #2
while(who == 1);
x++;
print x; ( the value should be equal to 2 but, perhaps, it is not* )
(I don't know Java memory models- let assume that it is strong memory model- I mean: (1) and (2) will be doesn't swapped)
Java memory model guarantees that access/store to the 32 bit variables is atomic so our program is safe. But, nevertheless we should use a attribute volatile because *. The value of x may be equal to 1 because x can be kept in register when Thread#2 read it. To resolve it we should make the x variable volatile. It is clear.
But, what about that situation:
int x = 0;
mutex m; ( just any mutex)
Thread #1:
mutex.lock()
x++;
mutex.unlock()
Thread #2
mutex.lock()
x++;
print x; // the value is always 2, why**?
mutex.unlock()
The value of x is always 2 though we don't make it volatile. Do I correctly understand that locking/unlocking mutex is connected with inserting memory barriers?

I'll try to tackle this. The Java memory model is kind of involved and hard to contain in a single StackOverflow post. Please refer to Brian Goetz's Java Concurrency in Practice for the full story.
The value of x is always 2 though we don't make it volatile. Do I correctly understand that locking/unlocking mutex is connected with inserting memory barriers?
First if you want to understand the Java memory model, it's always Chapter 17 of the spec you want to read through.
That spec says:
An unlock on a monitor happens-before every subsequent lock on that monitor.
So yes, there's a memory visibility event at the unlock of your monitor. (I assume by "mutex" you mean monitor. Most of the locks and other classes in the java.utils.concurrent package also have happens-before semantics, check the documentation.)
Happens-before is what Java means when it guarantees not just that the events are ordered, but also that memory visibility is guaranteed.
We say that a read r of a variable v is allowed to observe a write w
to v if, in the happens-before partial order of the execution trace:
r is not ordered before w (i.e., it is not the case that
hb(r, w)), and
there is no intervening write w' to v (i.e. no write w' to v such
that hb(w, w') and hb(w', r)).
Informally, a read r is allowed to see the result of a write w if there
is no happens-before ordering to prevent that read.
This is all from 17.4.5. It's a little confusing to read through, but the info is all there if you do read through it.

Let's go over some things. The following statement is true: Java memory model guarantees that access/store to the 32 bit variables is atomic. However, it does not follow that the first pseudoprogram you listed is safe. Simply because two statements are ordered syntactically does not mean that the visibility of their updates are also so ordered as viewed by other threads. Thread #2 may see the update caused by who=2 before the increment in x is visible. Making x volatile would still not make the program correct. Instead, making the variable 'who' voliatile would make the program correct. That is because volatile interacts with the java memory model in specific ways.
I feel like there is some notion of 'writing back to main memory' at the core of a common sense understanding of volatile which is incorrect. Volatile does not write back the value to main memory in Java. What reading from and writing to a volatile variable does is create what's called a happens-before relationship. When thread #1 writes to a volatile variable you're creating a relationship that ensures that any other threads #2 viewing that volatile variable will also be able to 'view' all the actions thread #1 has taken before that. In your example that means making 'who' volatile. By writing the value 2 to 'who' you are creating a happens-before relationship so that when thread #2 views who=2 it will similarly see an updated version of x.
In your second example (assuming you meant to have the 'who' variable too) the mutex unlocking creates a happens-before relationship as I specified above. Since that means other threads viewing the unlock of the mutex (ie. they are able to lock it themselves) they will see the updated version of x.

The volatile key word and memory consistency errors

In the oracle Java documentation located here, the following is said:
Atomic actions cannot be interleaved, so they can be used without fear of thread interference. However, this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible. Using volatile variables reduces the risk of memory consistency errors, because any write to a volatile variable establishes a happens-before relationship with subsequent reads of that same variable. This means that changes to a volatile variable are always visible to other threads. What's more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the volatile, but also the side effects of the code that led up the change.
It also says:
Reads and writes are atomic for reference variables and for most
primitive variables (all types except long and double).
Reads and writes are atomic for all variables declared volatile (including long
and double variables).
I have two questions regarding these statements:
"Using volatile variables reduces the risk of memory consistency errors" - What do they mean by "reduces the risk", and how is a memory consistency error still possible when using volatile?
Would it be true to say that the only effect of placing volatile on a non-double, non-long primitive is to enable the "happens-before" relationship with subsequent reads from other threads? I ask this since it seems that those variables already have atomic reads.

What do they mean by "reduces the risk"?
Atomicity is one issue addressed by the Java Memory Model. However, more important than Atomicity are the following issues:
memory architecture, e.g. impact of CPU caches on read and write operations
CPU optimizations, e.g. reordering of loads and stores
compiler optimizations, e.g. added and removed loads and stores
The following listing contains a frequently used example. The operations on x and y are atomic. Still, the program can print both lines.
int x = 0, y = 0;
// thread 1
x = 1
if (y == 0) System.out.println("foo");
// thread 2
y = 1
if (x == 0) System.out.println("bar");
However, if you declare x and y as volatile, only one of the two lines can be printed.
How is a memory consistency error still possible when using volatile?
The following example uses volatile. However, updates might still get lost.
volatile int x = 0;
// thread 1
x += 1;
// thread 2
x += 1;
Would it be true to say that the only effect of placing volatile on a non-double, non-long primitive is to enable the "happens-before" relationship with subsequent reads from other threads?
Happens-before is often misunderstood. The consistency model defined by happens-before is weak and difficult to use correctly. This can be demonstrated with the following example, that is known as Independent Reads of Independent Writes (IRIW):
volatile int x = 0, y = 0;
// thread 1
x = 1;
// thread 2
y = 1;
// thread 3
if (x == 1) System.out.println(y);
// thread 4
if (y == 1) System.out.println(x);
Only with happens-before, two 0s would be valid result. However, that's apparently counter-intuitive. For that reason, Java provides a stricter consistency model, that forbids this relativity issue, and that is known as sequential consistency. You can find it in sections §17.4.3 and §17.4.5 of the Java Language Specification. The most important part is:
A program is correctly synchronized if and only if all sequentially consistent executions are free of data races. If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent (§17.4.3).
That means, volatile gives you more than happens-before. It gives you sequential consistency if used for all conflicting accesses (§17.4.3).

The usual example:
while(!condition)
sleep(10);
if condition is volatile, this behaves as expected. If it is not, the compiler is allowed to optimize this to
if(!condition)
for(;;)
sleep(10);
This is completely orthogonal to atomicity: if condition is of a hypothetical integer type that is not atomic, then the sequence
thread 1 writes upper half to 0
thread 2 reads upper half (0)
thread 2 reads lower half (0)
thread 1 writes lower half (1)
can happen while the variable is updated from a nonzero value that just happens to have a lower half of zero to a nonzero value that has an upper half of zero; in this case, thread 2 reads the variable as zero. The volatile keyword in this case makes sure that thread 2 really reads the variable instead of using its local copy, but it does not affect timing.
Third, atomicity does not protect against
thread 1 reads value (0)
thread 2 reads value (0)
thread 1 writes incremented value (1)
thread 2 writes incremented value (1)
One of the best ways to use atomic volatile variables are the read and write counters of a ring buffer:
thread 1 looks at read pointer, calculates free space
thread 1 fills free space with data
thread 1 updates write pointer (which is `volatile`, so the side effects of filling the free space are also committed before)
thread 2 looks at write pointer, calculates amount of data received
...
Here, no lock is needed to synchronize the threads, atomicity guarantees that the read and write pointers will always be accessed consistently and volatile enforces the necessary ordering.

For question 1, the risk is only reduced (and not eliminated) because volatile only applies to a single read/write operation and not more complex operations such as increment, decrement, etc.
For question 2, the effect of volatile is to make changes immediately visible to other threads. As the quoted passage states "this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible." Simply because reads are atomic does not mean that they are thread safe. So establishing a happens before relationship is almost a (necessary) side-effect of guaranteeing memory consistency across threads.

Ad 1: With a volatile variable, the variable is always checked against a master copy and all threads see a consistent state. But if you use that volatility variable in a non-atomic operation writing back the result (say a = f(a)) then you might still create a memory inconsistency. That's how I would understand the remark "reduces the risk". A volatile variable is consistent at the time of read, but you still might need to use a synchronize.
Ad 2: I don't know. But: If your definition of "happens before" includes the remark
This means that changes to a volatile variable are always visible to other threads. What's more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the volatile, but also the side effects of the code that led up the change.
I would not dare to rely on any other property except that volatile ensures this. What else do you expect from it?!

Assume that you have a CPU with a CPU cache or CPU registers. Independent from your CPU architecture in terms of number of cores it has, volatile does NOT guarantee you a perfect inconsistency. The only way to achieve this is to use synchronized or atomic references with a performance price.
For example you have multiple threads (Thread A & Thread B) working on a shared data. Assume that Thread A wants to update the shared data and it's is started .For performance reasons, Thread A's stack was moved to CPU cache or registers. Then Thread A updated the shared data. But the problem with those places is that actually they don't flush back the updated value to the main memory immediately. This is where inconsistency's offered because up to the flash back operation, Thread B might have wanted to play with the same data, which would have taken it from the main memory - yet unupdated value.
If you use volatile all the operations will be perfomed on the main memory so you don't have a flush back latency. But, this time you may suffer from thread pipeline. In the middle of write operation (composed of number of atomic operations), Thread B may have been executed by the os to perform a read operation and that's it! Thread B will read the unupdated value again. That's why it's said it reduces the risk.
Hope you got it.

when coming to concurrency, you might want to ensure 2 things:
atomic operations: a set of operations is atomic - this is usually achieved with
"synchronized" (higher level constructs). Also with volatile for instance for read/write on long and double.
visibility: a thread B sees a modification made by a thread A. Even if an operation is atomic, like a write to an int variable, a second thread can still see a non-up-to-date value of the variable, due to processor caches. Putting a variable as volatile ensures that the second thread does see the up-to-date value of that variable. More than that, it ensures that the second thread sees an up-to-date value of ALL the variables written by the first thread before the write to the volatile variable.

Volatile and more threads

I'm trying to understand the volatile keyword and its proper using. Looking at the Brian Goetz's article Java theory and practice: Fixing the Java Memory Model, I'm stuck on this example:
Map configOptions;
char[] configText;
volatile boolean initialized = false;
// In Thread A
configOptions = new HashMap();
configText = readConfigFile(fileName);
processConfigOptions(configText, configOptions);
initialized = true;
// In Thread B
while (!initialized)
sleep();
// use configOptions
The volatile variable above is used as a "guard" to indicate that a set of shared variables had been initialized.
I understand that since java 1.5, the volatile is strong enough to ensure that when thread B reads the volatile variable, it sees all variables that was visible to the thread A at the time the thread A writes to the volatile variable.
But what if there would be a thread C doing something like this:
// In Thread C
configOptions = new HashMap();
// put something to configOptions
My question: Is the volatile strong enough to ensure that when thread B reads the volatile variable, it sees all variables from all threads. Maybe some kind of flushing all caches? If not, then such a code with 3 threads is broken, right?

per the lang spec (http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.4):
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
and
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
so the volatile variable itself is safe from stale cache problems. Your questions is; "what about all other variables?" Well no, the volatile keyword only affects caching on the variable it is on: all other variables on those threads are unsynchronized.

In this answer I will try to explain what volatile variables in Java is.
So, where to start?
Read and write operations with volatile variables are guaranteed to be atomic, even for 64-bit length variables. Note: i++; is not atomic because technically it is three variables.
Writing some value to volatile variable happens-before this value can be read from it. You can find lots of questions on SO about what happens-before is. Important: in JVM it is implemented with memory fences, store fence on writing and load fence on reading. From practical side that means when you read some value from it, you're guaranteed to see all values written to non-volatile variables before volatile write;
Values written to volatile variables are available to all CPUs and all threads at once, without any CPU caches.
Now, regarding your question.
Is the volatile strong enough to ensure that when thread B reads the volatile variable, it sees all variables from all threads?
No. It is strong enough to ensure that when thread B read some value from volatile variable, it sees (will read) values from variables written before volatile write.
Maybe some kind of flushing all caches?
Actually yes, on x86 architecture volatile write empties store order buffer, volatile read empties load order buffer. If you want more details on that, you may want to read answer for this question: Java 8 Unsafe: xxxFence() instructions
If not, then such a code with 3 threads is broken, right?
This code works as intended (I guess), because thread B does volatile read prior to reading configOptions which guarantees its visibility.

Is the volatile strong enough to ensure that when thread B reads the
volatile variable, it sees all variables from all threads. Maybe some
kind of flushing all caches?
All variables from the volatile-writing-thread that are written prior to the volatile store will be visible.
So there is no 'flush all caches' magic.
If not, then such a code with 3 threads is broken, right?
It could very well be broken with two threads if you do not synchronize correctly. There is a reason the initialized flag is written to. That effectively flushes all the writes that occurred on that thread.

Happens-before relationships with volatile fields and synchronized blocks in Java - and their impact on non-volatile variables?

I am still pretty new to the concept of threading, and try to understand more about it. Recently, I came across a blog post on What Volatile Means in Java by Jeremy Manson, where he writes:
When one thread writes to a volatile variable, and another thread sees
that write, the first thread is telling the second about all of the
contents of memory up until it performed the write to that volatile
variable. [...] all of the memory contents seen by Thread 1, before
it wrote to [volatile] ready, must be visible to Thread 2, after it
reads the value true for ready. [emphasis added by myself]
Now, does that mean that all variables (volatile or not) held in Thread 1's memory at the time of the write to the volatile variable will become visible to Thread 2 after it reads that volatile variable? If so, is it possible to puzzle that statement together from the official Java documentation/Oracle sources? And from which version of Java onwards will this work?
In particular, if all Threads share the following class variables:
private String s = "running";
private volatile boolean b = false;
And Thread 1 executes the following first:
s = "done";
b = true;
And Thread 2 then executes afterwards (after Thread 1 wrote to the volatile field):
boolean flag = b; //read from volatile
System.out.println(s);
Would this be guaranteed to print "done"?
What would happen if instead of declaring b as volatile I put the write and read into a synchronized block?
Additionally, in a discussion entitled "Are static variables shared between threads?", #TREE writes:
Don't use volatile to protect more than one piece of shared state.
Why? (Sorry; I can't comment yet on other questions, or I would have asked there...)

Yes, it is guaranteed that thread 2 will print "done" . Of course, that is if the write to b in Thread 1 actually happens before the read from b in Thread 2, rather than happening at the same time, or earlier!
The heart of the reasoning here is the happens-before relationship. Multithreaded program executions are seen as being made of events. Events can be related by happens-before relationships, which say that one event happens before another. Even if two events are not directly related, if you can trace a chain of happens-before relationships from one event to another, then you can say that one happens before the other.
In your case, you have the following events:
Thread 1 writes to s
Thread 1 writes to b
Thread 2 reads from b
Thread 2 reads from s
And the following rules come into play:
"If x and y are actions of the same thread and x comes before y in program order, then hb(x, y)." (the program order rule)
"A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field." (the volatile rule)
The following happens-before relationships therefore exist:
Thread 1 writes to s happens before Thread 1 writes to b (program order rule)
Thread 1 writes to b happens before Thread 2 reads from b (volatile rule)
Thread 2 reads from b happens before Thread 2 reads from s (program order rule)
If you follow that chain, you can see that as a result:
Thread 1 writes to s happens before Thread 2 reads from s

What would happen if instead of declaring b as volatile I put the write and read into a synchronized block?
If and only if you protect all such synchronized blocks with the same lock will you have the same guarantee of visibility as with your volatile example. You will in addition have mutual exclusion of the execution of such synchronized blocks.
Don't use volatile to protect more than one piece of shared state.
Why?
volatile does not guarantee atomicity: in your example the s variable may also have been mutated by other threads after the write you are showing; the reading thread won't have any guarantee as to which value it sees. Same thing goes for writes to s occurring after your read of the volatile, but before the read of s.
What is safe to do, and done in practice, is sharing immutable state transitively accessible from the reference written to a volatile variable. So maybe that's the meaning intended by "one piece of shared state".
is it possible to puzzle that statement together from the official Java documentation/Oracle sources?
Quotes from the spec:
17.4.4. Synchronization Order
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
17.4.5. Happens-before Order
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
If an action x synchronizes-with a following action y, then we also have hb(x, y).
This should be enough.
And from which version of Java onwards will this work?
Java Language Specification, 3rd Edition introduced the rewrite of the Memory Model specification which is the key to the above guarantees. NB most previous versions acted as if the guarantees were there and many lines of code actually depended on it. People were surprised when they found out that the guarantees had in fact not been there.

Would this be guaranteed to print "done"?
As said in Java Concurrency in Practice:
When thread A writes to a volatile variable and subsequently
thread B reads that same variable, the values of all variables that
were visible to A prior to writing to the volatile variable become
visible to B after reading the volatile variable.
So YES, This guarantees to print "done".
What would happen if instead of declaring b as volatile I put the
write and read into a synchronized block?
This too will guarantee the same.
Don't use volatile to protect more than one piece of shared state.
Why?
Because, volatile guarantees only Visibility. It does'nt guarantee atomicity. If We have two volatile writes in a method which is being accessed by a thread A and another thread B is accessing those volatile variables , then while thread A is executing the method it might be possible that thread A will be preempted by thread B in the middle of operations(e.g. after first volatile write but before second volatile write by the thread A). So to guarantee the atomicity of operation synchronization is the most feasible way out.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.