Why and how does volatile imply atomic reads/writes?

Why and how does volatile imply atomic reads/writes? - java

First off, I'm aware that volatile does not make multiple operations (as i++) atomic. This question is about a single read or write operation.
My initial understanding was that volatile only enforces a memory barrier (i.e. other threads will be able to see updated values).
Now I've noticed that JLS section 17.7 says that volatile additionally makes a single read or write atomic. For instance, given two threads, both writing a different value to a volatile long x, then x will finally represent exactly one of the values.
I'm curious how this is possible. On a 32 bit system, if two threads write to a 64 bit location in parallel and without "proper" synchronization (i.e. some kind of lock), it should be possible for the result to be a mixup. For clarity, let's use an example in which thread 1 writes 0L while thread 2 writes -1L to the same 64 bit memory location.
T1 writes lower 32 bit
T2 writes lower 32 bit
T2 writes upper 32 bit
T1 writes upper 32 bit
The result could then be 0x0000FFFF, which is undesirable. How does volatile prevent this scenario?
I've also read elsewhere that this does, typically, not degrade performance. How is it possible to synchronize writes with only a minor speed impact?

Your statement that volatile only enforces a memory barrier (in the meaning, flushes the processor cache) is false. It also implies a happens-before relationship of read-write combinations of volatile values. For example:
class Foo {
volatile boolean x;
boolean y;
void qux() {
x = true; // volatile write
y = true;
}
void baz() {
System.out.print(x); // volatile read
System.out.print(" ");
System.out.print(y);
}
}
When you run both methods from two threads, the above code will either print true false, true true or false false but never false true. Without the volatile keyword, you are not guaranteed the later condition because the JIT compiler might reorder statements.
The same way as the JIT compiler can assure this condition, is can guard 64-bit value reads and writes in the assembly. volatile values are treated explicitly by the JIT compiler to assure their atomicity. Some processor instruction sets support this directly by specific 64-bit instructions, otherwise the JIT compiler emulates it.
The JVM is more complex as you might expect it to be and it is often explained without full scope. Consider reading this excellent article which covers all the details.

volatile assures that what a thread reads is the latest values at that point, but it doesn't synchronize two writes.
If a thread writes a normal variable, it keeps the values within the thread until some certain events happen. If a thread writes a volatile variable, it change the memory of the variable immediately.

On a 32 bit system, if two threads write to a 64 bit location in parallel and without "proper" synchronization (i.e. some kind of lock), it should be possible for the result to be a mixup
This is indeed what can happen if a variable isn't marked volatile. Now, what does the system do if the field is marked volatile? Here is a resource that explains this: http://gee.cs.oswego.edu/dl/jmm/cookbook.html
Nearly all processors support at least a coarse-grained barrier instruction, often just called a Fence, that guarantees that all loads and stores initiated before the fence will be strictly ordered before any load or store initiated after the fence [...] if available, you can implement volatile store as an atomic instruction (for example XCHG on x86) and omit the barrier. This may be more efficient if atomic instructions are cheaper than StoreLoad barriers
Essentially the processors provide facilities to implement the guarantee, and what facility is available depends on the processor.

Related

Does StoreStore memory barrier in Java forbid the read-write reordering?

Now we have
Load A
StoreStore
Store B
Is it possible that the actual execution order is as follows
StoreStore
Store B
Load A
If it is possible, how to explain a situation which seems to violate the The Java volatile Happens-Before Guarantee.
As far as I know, the volatile semantic is implemented using the following JMM memory barrier addition strategy
insert a StoreStore before volatile variable write operation
insert a StoreLoad after volatile variable write operation
insert a LoadLoad after volatile variable read operation
insert a LoadStore after volatile variable read operation
Now if we have two java threads as follows
thread 1
Load A
StoreStore
Store volatile B
thread 2
Load volatile B
Load C
Accoring to "The Java volatile Happens-Before Guarantee"，Load A should happens-before Load C when Load volatile B is after Store volatile B, but if Load A can be reordered to "after Store volatile B",how to guarantee Load A is before Load C?

Technically speaking, the Java language doesn't have memory barriers. Rather the Java Memory Model is specified in terms of happens before relations; see the following for details:
Behavior of memory barrier in Java
The terminology you are discussing comes from The JSR-133 Cookbook for Compiler Writers. As the document says it is a guide for people who are writing compilers that implement the Java Memory Model. It is interpreting the implications of the JMM, and is clearly not intended to be an official specification. The JLS is the specification.
The section in the JSR-133 Cookbook on memory barriers classifies them in terms of the way that they constrain specific sequences of loads and stores. For StoreStore barriers it says:
The sequence: Store1; StoreStore; Store2
ensures that Store1's data are visible to other processors (i.e., flushed to memory) before the data associated with Store2 and all subsequent store instructions. In general, StoreStore barriers are needed on processors that do not otherwise guarantee strict ordering of flushes from write buffers and/or caches to other processors or main memory.
As you can see, a StoreStore barrier only constrains the behavior of store operations.
In your example, you have a load followed by a store. The semantics of a StoreStore barrier says nothing about load operations. Therefore, the reordering that you propose is allowed.

This is answering just the updated part of your Question.
First of all, the example you have presented is not Java code. Therefore we cannot apply JMM reasoning to it. (Just so that we are clear about this.)
If you want to understand how Java code behaves, forget about memory barriers. The Java Memory Model tells you everything that you need to do in order for the memory reads and writes to have guaranteed behavior. And everything you need know in order to reason about (correct) behavior. So:
Write your Java code
Analyze the code to ensure that there are proper happens before chains in all cases where on thread needs to read a value written by another thread.
Leave the problem of compiling your (correct) Java code to machine instructions to the compiler.
Looking at the sequences of pseudo-instructions in your example, they don't make much sense. I don't think that a real Java compiler would (internally) use barriers like that when compiling real Java code. Rather, I think there would be a StoreLoad memory barrier after each volatile write and before each volatile read.
Lets consider some real Java code snippets:
public int a;
public volatile int b;
// thread "one"
{
a = 1;
b = 2;
}
// thread "two"
{
if (b == 2) {
print(a);
}
}
Now assuming that the code in thread "two" is executed after thread "one", there will be a happens-before chain like this:
a = 1 happens-before b = 2
b = 2 happens-before b == 2
b == 2 happens-before print(a)
Unless there is some other code involved, the happens-before chain means that thread "two" will print "1".
Note:
It is not necessary to consider the memory barriers that the compiler uses when compiling the code.
The barriers are implementation specific and internal to the compiler.
If you look at the native code you won't see memory barriers per se. You will see native instructions that have the required semantics to ensure that the (hidden) memory barrier is present.

How does Java manage the visibility of a volatile field?

This Q is looking for specific details on how exactly Java makes a volatile field visible.
The volatile keyword in Java is used for making a variable "actively" visible to the readers of that variable right after a write operation on it is done. This is one form of happens-before relationship-- makes the results of a write exposed to whoever accessing that memory location of that variable for some use. And when used, makes the read/write operations on that variable atomic-- for long & double as well-- R/W to every other var types are atomic already.
I'm looking to find out what Java does to make a variable value visible after a write operation?
Eg.: The following code is from one of the answers on this discussion:
public class Foo extends Thread {
private volatile boolean close = false;
public void run() {
while(!close) {
// do work
}
}
public void close() {
close = true;
// interrupt here if needed
}
}
Reads and writes to boolean literals are atomic. if the method close() above is invoked, it is an atomic operation to set the value of close as true even if it isn't declared as volatile.
What more volatile is doing in this code is making sure that a change to this value is seen the moment it happens.
How exactly volatile is achieving this?
by giving priority to threads with operations on a volatile variable? if so - how, in thread scheduling, or by making the threads-with-read-operations go look up a flag to see whether there's a writer-thread pending? I'm aware that "A write to a volatile field happens-before every subsequent read of that same field." Is it choosing among the threads, the one(s) that have a write operation on a volatile variable before giving CPU time to threads that only read?
If this is managed in thread scheduling level (which i doubt), then running a thread with a write on a volatile field has a bigger effect than it seems.
How exactly is Java managing visibility of volatile variables?
TIA.

This is the comment from source code of OpenJDK about volatile
// ----------------------------------------------------------------------------
// Volatile variables demand their effects be made known to all CPU's
// in order. Store buffers on most chips allow reads & writes to
// reorder; the JMM's ReadAfterWrite.java test fails in -Xint mode
// without some kind of memory barrier (i.e., it's not sufficient that
// the interpreter does not reorder volatile references, the hardware
// also must not reorder them).
//
// According to the new Java Memory Model (JMM):
// (1) All volatiles are serialized wrt to each other. ALSO reads &
// writes act as aquire & release, so:
// (2) A read cannot let unrelated NON-volatile memory refs that
// happen after the read float up to before the read. It's OK for
// non-volatile memory refs that happen before the volatile read to
// float down below it.
// (3) Similar a volatile write cannot let unrelated NON-volatile
// memory refs that happen BEFORE the write float down to after the
// write. It's OK for non-volatile memory refs that happen after the
// volatile write to float up before it.
//
// We only put in barriers around volatile refs (they are expensive),
// not _between_ memory refs (that would require us to track the
// flavor of the previous memory refs). Requirements (2) and (3)
// require some barriers before volatile stores and after volatile
// loads.
I hope it's helpful.

According to this :
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#volatile
Java's new memory model does this by
1) prohibiting the compiler and runtime from allocating volatile variables in registers.
2) not allowing the compiler/optimizer to reorder field access from the code. Effectively, this is like acquiring a lock.
3) Forcing the compiler/runtime to flush a volatile variable to main memory from cache as soon as it is written.
4) Marking a cache as invalidated before a volatile field is read.
From the article:
"Volatile fields are special fields which are used for communicating state between threads. Each read of a volatile will see the last write to that volatile by any thread; in effect, they are designated by the programmer as fields for which it is never acceptable to see a "stale" value as a result of caching or reordering. The compiler and runtime are prohibited from allocating them in registers. They must also ensure that after they are written, they are flushed out of the cache to main memory, so they can immediately become visible to other threads. Similarly, before a volatile field is read, the cache must be invalidated so that the value in main memory, not the local processor cache, is the one seen. There are also additional restrictions on reordering accesses to volatile variables. "
...
"Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not..."

The volatile key word and memory consistency errors

In the oracle Java documentation located here, the following is said:
Atomic actions cannot be interleaved, so they can be used without fear of thread interference. However, this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible. Using volatile variables reduces the risk of memory consistency errors, because any write to a volatile variable establishes a happens-before relationship with subsequent reads of that same variable. This means that changes to a volatile variable are always visible to other threads. What's more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the volatile, but also the side effects of the code that led up the change.
It also says:
Reads and writes are atomic for reference variables and for most
primitive variables (all types except long and double).
Reads and writes are atomic for all variables declared volatile (including long
and double variables).
I have two questions regarding these statements:
"Using volatile variables reduces the risk of memory consistency errors" - What do they mean by "reduces the risk", and how is a memory consistency error still possible when using volatile?
Would it be true to say that the only effect of placing volatile on a non-double, non-long primitive is to enable the "happens-before" relationship with subsequent reads from other threads? I ask this since it seems that those variables already have atomic reads.

What do they mean by "reduces the risk"?
Atomicity is one issue addressed by the Java Memory Model. However, more important than Atomicity are the following issues:
memory architecture, e.g. impact of CPU caches on read and write operations
CPU optimizations, e.g. reordering of loads and stores
compiler optimizations, e.g. added and removed loads and stores
The following listing contains a frequently used example. The operations on x and y are atomic. Still, the program can print both lines.
int x = 0, y = 0;
// thread 1
x = 1
if (y == 0) System.out.println("foo");
// thread 2
y = 1
if (x == 0) System.out.println("bar");
However, if you declare x and y as volatile, only one of the two lines can be printed.
How is a memory consistency error still possible when using volatile?
The following example uses volatile. However, updates might still get lost.
volatile int x = 0;
// thread 1
x += 1;
// thread 2
x += 1;
Would it be true to say that the only effect of placing volatile on a non-double, non-long primitive is to enable the "happens-before" relationship with subsequent reads from other threads?
Happens-before is often misunderstood. The consistency model defined by happens-before is weak and difficult to use correctly. This can be demonstrated with the following example, that is known as Independent Reads of Independent Writes (IRIW):
volatile int x = 0, y = 0;
// thread 1
x = 1;
// thread 2
y = 1;
// thread 3
if (x == 1) System.out.println(y);
// thread 4
if (y == 1) System.out.println(x);
Only with happens-before, two 0s would be valid result. However, that's apparently counter-intuitive. For that reason, Java provides a stricter consistency model, that forbids this relativity issue, and that is known as sequential consistency. You can find it in sections §17.4.3 and §17.4.5 of the Java Language Specification. The most important part is:
A program is correctly synchronized if and only if all sequentially consistent executions are free of data races. If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent (§17.4.3).
That means, volatile gives you more than happens-before. It gives you sequential consistency if used for all conflicting accesses (§17.4.3).

The usual example:
while(!condition)
sleep(10);
if condition is volatile, this behaves as expected. If it is not, the compiler is allowed to optimize this to
if(!condition)
for(;;)
sleep(10);
This is completely orthogonal to atomicity: if condition is of a hypothetical integer type that is not atomic, then the sequence
thread 1 writes upper half to 0
thread 2 reads upper half (0)
thread 2 reads lower half (0)
thread 1 writes lower half (1)
can happen while the variable is updated from a nonzero value that just happens to have a lower half of zero to a nonzero value that has an upper half of zero; in this case, thread 2 reads the variable as zero. The volatile keyword in this case makes sure that thread 2 really reads the variable instead of using its local copy, but it does not affect timing.
Third, atomicity does not protect against
thread 1 reads value (0)
thread 2 reads value (0)
thread 1 writes incremented value (1)
thread 2 writes incremented value (1)
One of the best ways to use atomic volatile variables are the read and write counters of a ring buffer:
thread 1 looks at read pointer, calculates free space
thread 1 fills free space with data
thread 1 updates write pointer (which is `volatile`, so the side effects of filling the free space are also committed before)
thread 2 looks at write pointer, calculates amount of data received
...
Here, no lock is needed to synchronize the threads, atomicity guarantees that the read and write pointers will always be accessed consistently and volatile enforces the necessary ordering.

For question 1, the risk is only reduced (and not eliminated) because volatile only applies to a single read/write operation and not more complex operations such as increment, decrement, etc.
For question 2, the effect of volatile is to make changes immediately visible to other threads. As the quoted passage states "this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible." Simply because reads are atomic does not mean that they are thread safe. So establishing a happens before relationship is almost a (necessary) side-effect of guaranteeing memory consistency across threads.

Ad 1: With a volatile variable, the variable is always checked against a master copy and all threads see a consistent state. But if you use that volatility variable in a non-atomic operation writing back the result (say a = f(a)) then you might still create a memory inconsistency. That's how I would understand the remark "reduces the risk". A volatile variable is consistent at the time of read, but you still might need to use a synchronize.
Ad 2: I don't know. But: If your definition of "happens before" includes the remark
This means that changes to a volatile variable are always visible to other threads. What's more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the volatile, but also the side effects of the code that led up the change.
I would not dare to rely on any other property except that volatile ensures this. What else do you expect from it?!

Assume that you have a CPU with a CPU cache or CPU registers. Independent from your CPU architecture in terms of number of cores it has, volatile does NOT guarantee you a perfect inconsistency. The only way to achieve this is to use synchronized or atomic references with a performance price.
For example you have multiple threads (Thread A & Thread B) working on a shared data. Assume that Thread A wants to update the shared data and it's is started .For performance reasons, Thread A's stack was moved to CPU cache or registers. Then Thread A updated the shared data. But the problem with those places is that actually they don't flush back the updated value to the main memory immediately. This is where inconsistency's offered because up to the flash back operation, Thread B might have wanted to play with the same data, which would have taken it from the main memory - yet unupdated value.
If you use volatile all the operations will be perfomed on the main memory so you don't have a flush back latency. But, this time you may suffer from thread pipeline. In the middle of write operation (composed of number of atomic operations), Thread B may have been executed by the os to perform a read operation and that's it! Thread B will read the unupdated value again. That's why it's said it reduces the risk.
Hope you got it.

when coming to concurrency, you might want to ensure 2 things:
atomic operations: a set of operations is atomic - this is usually achieved with
"synchronized" (higher level constructs). Also with volatile for instance for read/write on long and double.
visibility: a thread B sees a modification made by a thread A. Even if an operation is atomic, like a write to an int variable, a second thread can still see a non-up-to-date value of the variable, due to processor caches. Putting a variable as volatile ensures that the second thread does see the up-to-date value of that variable. More than that, it ensures that the second thread sees an up-to-date value of ALL the variables written by the first thread before the write to the volatile variable.

Behavior of memory barrier in Java

After reading more blogs/articles etc, I am now really confused about the behavior of load/store before/after memory barrier.
Following are 2 quotes from Doug Lea in one of his clarification article about JMM, which are both very straighforward:
Anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.
Note that it is important for both threads to access the same volatile variable in order to properly set up the happens-before relationship. It is not the case that everything visible to thread A when it writes volatile field f becomes visible to thread B after it reads volatile field g.
But then when I looked into another blog about memory barrier, I got these:
A store barrier, “sfence” instruction on x86, forces all store instructions prior to the barrier to happen before the barrier and have the store buffers flushed to cache for the CPU on which it is issued.
A load barrier, “lfence” instruction on x86, forces all load instructions after the barrier to happen after the barrier and then wait on the load buffer to drain for that CPU.
To me, Doug Lea's clarification is more strict than the other one: basically, it means if the load barrier and store barrier are on different monitors, the data consistency will not be guaranteed. But the later one means even if the barriers are on different monitors, the data consistency will be guaranteed. I am not sure if I understanding these 2 correctly and also I am not sure which of them is correct.
Considering the following codes:
public class MemoryBarrier {
volatile int i = 1, j = 2;
int x;
public void write() {
x = 14; //W01
i = 3; //W02
}
public void read1() {
if (i == 3) { //R11
if (x == 14) //R12
System.out.println("Foo");
else
System.out.println("Bar");
}
}
public void read2() {
if (j == 2) { //R21
if (x == 14) //R22
System.out.println("Foo");
else
System.out.println("Bar");
}
}
}
Let's say we have 1 write thread TW1 first call the MemoryBarrier's write() method, then we have 2 reader threads TR1 and TR2 call MemoryBarrier's read1() and read2() method.Consider this program run on CPU which does not preserve ordering (x86 DO preserve ordering for such cases which is not the case), according to memory model, there will be a StoreStore barrier (let's say SB1) between W01/W02, as well as 2 LoadLoad barrier between R11/R12 and R21/R22 (let's say RB1 and RB2).
Since SB1 and RB1 are on same monitor i, so thread TR1 which calls read1 should always see 14 on x, also "Foo" is always printed.
SB1 and RB2 are on different monitors, if Doug Lea is correct, thread TR2 will not be guaranteed to see 14 on x, which means "Bar" may be printed occasionally. But if memory barrier runs like Martin Thompson described in the blog, the Store barrier will push all data to main memory and Load barrier will pull all data from main memory to cache/buffer, then TR2 will also be guaranteed to see 14 on x.
I am not sure which one is correct, or both of them are but what Martin Thompson described is just for x86 architecture. JMM does not guarantee change to x is visible to TR2 but x86 implementation does.
Thanks~

Doug Lea is right. You can find the relevant part in section §17.4.4 of the Java Language Specification:
§17.4.4 Synchronization Order
[..] A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order). [..]
The memory model of the concrete machine doesn't matter, because the semantics of the Java Programming Language are defined in terms of an abstract machine -- independent of the concrete machine. It's the responsibility of the Java runtime environment to execute the code in such a way, that it complies with the guarantees given by the Java Language Specification.
Regarding the actual question:
If there is no further synchronization, the method read2 can print "Bar", because read2 can be executed before write.
If there is an additional synchronization with a CountDownLatch to make sure that read2 is executed after write, then method read2 will never print "Bar", because the synchronization with CountDownLatch removes the data race on x.
Independent volatile variables:
Does it make sense, that a write to a volatile variable does not synchronize-with a read of any other volatile variable?
Yes, it makes sense. If two threads need to interact with each other, they usually have to use the same volatile variable in order to exchange information. On the other hand, if a thread uses a volatile variable without a need for interacting with all other threads, we don't want to pay the cost for a memory barrier.
It is actually important in practice. Let's make an example. The following class uses a volatile member variable:
class Int {
public volatile int value;
public Int(int value) { this.value = value; }
}
Imagine this class is used only locally within a method. The JIT compiler can easily detect, that the object is only used within this method (Escape analysis).
public int deepThought() {
return new Int(42).value;
}
With the above rule, the JIT compiler can remove all effects of the volatile reads and writes, because the volatile variable can not be accesses from any other thread.
This optimization actually exists in the Java JIT compiler:
src/share/vm/opto/memnode.cpp

As far as I understood the question is actually about volatile read/writes and its happens-before guarantees. Speaking of that part, I have only one thing to add to nosid's answer:
Volatile writes cannot be moved before normal writes, volatile reads cannot be moved after normal reads. That's why read1() and read2() results will be as nosid wrote.
Speaking about barriers - the defininition sounds fine for me, but the one thing that probably confused you is that these are things/tools/way to/mechanism (call it whatever you like) to implement behavior described in JMM in hotspot. When using Java, you should rely on JMM guarantees, not implementation details.

What operations are atomic operations

I am little confused...
Is it true that reading\writing from several threads all except long and double are atomic operations and it's need to use volatile only with long and double?

It sounds like you're referring to this section of the JLS. It is guaranteed for all primitive types -- except double and long -- that all threads will see some value that was actually written to that variable. (With double and long, the first four bytes might have been written by one thread, and the last four bytes by another thread, as specified in that section of the JLS.) But they won't necessarily see the same value at the same time unless the variable is marked volatile.
Even using volatile, x += 3 is not atomic, because it's x = x + 3, which does a read and a write, and there might be writes to x between the read and the write. That's why we have things like AtomicInteger and the other utilities in java.util.concurrent.

Let's not confuse atomic with thread-safe. Long and double writes are not atomic underneath because each is two separate 32 bit stores. Storing and loading non long/double fields are perfectly atomic assuming they are not a compound writes (i++ for example).
By atomic I mean you will not read some garbled object as a result of many threads writing different objects to the same field.
From Java Concurrency In Practice 3.1.2
Out-of-thin-aire safety: When a thread reads a variable without
synchronization, it may see a stale value, but at least it sees a
value that was actually placed there by some thread rather than some
random value. This is true for all variables, except 64-bit long and
double, which are not volatile. The JVM is permitted to treat 64-bit
read or write as two seperate 32-bit operations which are not atomic.

That doesn't sound right.
An atomic operation is one that forces all threads to wait to access a resource until another thread is done with it. I don't see why other data types would be atomic, and others not.

volatile has other semantics than just writing the value atomically
it means that other threads can see the updated value immediately (and that it can't be optimized out)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.