Double check locking and code reordering in Java [duplicate]

Double check locking and code reordering in Java [duplicate] - java

This question already has answers here:
Why is volatile used in double checked locking
(8 answers)
Closed 4 years ago.
In some article I read that double check locking is broken. As the compiler can reorder the sequence of constructors.
Ss allocate memory for an object
Then return the address to a reference variable
Then initialize the state of the object
While typically one would expect:
It should be as allocated memory for the object
then initialize the state of object
then return the address to the reference variable.
Again, when using the synchronized keyword, the code reorder never happens as per JMM specification.
Why is compiler reordering the sequence of constructor events when it is inside the synchronized() block?
I saw a lot of posts here about DCL, but I am expecting the description based on JMM and compiler reordering.

The compiler is free to reorder instructions within a synchronized block. And the compiler is free to reorder instructions before (as long as they stay before) or after (as long as they stay after) the synchronized block. However, the compiler is not free to reorder instructions across the synchronized block boundaries (block start or block end).
Thus, the construction and assignment which are wholly within the synchronized block can be reordered, and an outside viewer which has not correctly synchronized can see the assignment before the construction.

First of all:
Again when using the synchronized keyword, the code reorder never happens as per the JMM specification.
The above statement is not fully accurate. The JMM defined the happens-before relationship.
The JLS only defines the program order and happens-before order. See 17.4.5. Happens-before Order.
It has effects on the reordering of instructions. For example,
int x = 1;
synch(obj) {
y = 2;
}
int z = 3;
Now for the above piece of code, the below types of reordering are possible.
synch(obj) {
int x = 1;
y = 2;
int z = 3;
}
The above is a valid reordering.
See Roach Motels and The Java Memory Model.
synch(obj) {
int z = 3;
y = 2;
int x = 1;
}
The above is also a valid reordering.
What is not possible is that y=2 will only be executed after the lock has been acquired and before the lock is released this is what guaranteed given by JMM. Also to see the correct effects from another thread, we need to access y inside the synchronized block only.
Now I come to DCL.
See the code of DCL.
if (singleton == null)
synch(obj) {
if(singleton == null) {
singleton == new Singleton()
}
}
return singleton;
Now the problem in the above approach is:
singleton = new Singleton() is not a single instruction. But a set of instructions. It is quite possible that a singleton reference is assigned an object reference first, before fully initializing the constructor.
So if 1 happens then it's quite possible the other thread reads a singleton reference as a non null and thus is seeing a partially constructed object.
The above effects can be controlled by making a singleton as volatile which also establishes happens-before guarantees and visibility.

Why is compiler reordering the sequence of constructor events when it is inside the synchronized() block?
It would typically do this to make the code run faster.
The Java Language Specification (JLS) says that the implementation (for example, the compiler) is allowed to reorder instructions and sequences of instructions subject to certain constraints.
The problem is that the broken variants of DCL make assumptions that fall outside of what the JLS says can be made. The result is an execution that the JLS says is not well-formed. Whether this manifests itself as an actual bug / unexpected behaviour depends on the compiler version, the hardware and various other things.
But the point is that the compiler isn't doing anything wrong. The fault is in the DCL code.
I just want to add that the JIT compiler is often not reordering the events per se. what it is often doing is removing constraints on hardware-level memory read/write actions. For example, by removing the constraint that a particular memory write is flushed to main memory, you allow the hardware to defer (or even skip entirely) a slow write-to-memory, and just write to the L1 cache. By contrast, the end of a synchronized block will force the cached writes to main memory, incurring extra memory traffic and (probably) a pipeline stalls.

Related

Java Constructors and thread safety [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Literature talks about advertising a reference to an object before its constructor finished allocating and initializing its data structures. This usually involves putting it somewhere where other threads can see it prematurely. As I understand it, it involves an explicit act of advertising such as when using a Listener.
My question relates to the implementation of a constructor and the possibility of something similar happening. I can imagine that a constructor can be implemented with something similar to:
Type t = new Type(...);
An implementation in C might do something like:
t = malloc(sizeOf Type);
And then proceed to initialize all the fields.
If it can be implemented like this, then the reference t will be non-null, before the data is initialized. If another thread checks it for being non-null, it will then proceed to use it before it is fully initialized. The result will be mayhem.
I cannot find anything that says that you cannot implement it like this. I am probably missing something pretty basic in my understanding of the Java Memory Model. Is there anything that instructs JVM implementors not to do it like this?

Every once in a while, the thread-safely of Java object constructors comes up. More specifically, it's not so much about the process of object construction but rather the visibility of writes triggered by that process in relation to other threads.
What if a JVM implementation were to allocate memory for the new instance, store the new reference value and only then execute the constructor? What are the guarantees provided by the Java memory model and would that represent a violation?
It's all about the actual reference assignment. Constructors themselves do not come with a guarantee that all writes happen before the write of the object reference. If the reference is not assigned to a volatile or final field, the JIT and/or the target CPU (in terms of memory reordering) are free to assign the reference before object construction. That's an optimization decision the JIT can easily make. In case of volatile or final fields, however, the situation is different as of Java 1.5.
A prominent example affected by constructor thread-safety is the double-checked locking pattern (lazy initialization not requiring a lock after the initialization phase), which, if implemented as follows, suffers from a concurrency issue and is not thread-safe. Another thread may see a partially constructed Singleton instance because the Java memory model does not mandate any specific memory ordering for normal reads and writes.
private Singleton singleton;
public Singleton getInstance() {
if (singleton == null) {
synchronized (this) {
if (singleton == null) {
singleton = new Singleton();
}
}
}
return singleton;
}
With Java 1.5, the memory model was changed in respect to volatile and final fields. With the new model, volatile writes have release semantics and volatile reads have acquire semantics. Provided volatile is used for singleton, this pattern works as expected because the memory model guarantees the expected order of events.
tmp = new Singleton();
// implicit release memory barrier caused by volatile
singleton = tmp;
Release semantics prevent memory reordering of any read or write that precedes it in program order with any write that follows it in program order. This is equivalent to a combination of LoadStore and StoreStore memory barriers. Consequently, reads and writes belonging to Singleton object construction must not move after the volatile singleton write.
tmp = singleton;
// implicit acquire memory barrier caused by volatile
if (tmp == null) {
synchronized (this)
if (tmp == null) {
Acquire semantics prevent memory reordering of any read that precedes it in program order with any read or write that follows it in program order. This is equivalent to a combination of LoadLoad and LoadStore memory barriers. Consequently, Singleton reads and writes must not move before the volatile singleton read.
It's worth noting that in all versions of Java, volatile reads and writes are totally-ordered. All threads observe the same volatile read/write order. To achieve that, either a volatile write precedes a StoreLoad memory barrier or a volatile read follows a StoreLoad memory barrier. On x86, only the StoreLoad memory barrier emit and instruction, other barriers have to be considered during JIT reordering.
Similarly, the semantics in terms of final fields have changed with Java 1.5. JSR133, which introduced the memory model changes, used the following example to illustrate the problem:
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x;
int j = f.y;
}
}
}
Given two threads, thread A calling writer() and thread B calling reader(), the natural assumption would be that thread B is guaranteed to see the values 3 for i and and 0 or 4 for j. Due to reordering, thread B could see 0 instead - a clear violation of the premise of final, not in terms of the original memory model but in respect to the higher-level contract of final to represent immutable constant values.
To address this, Java 1.5 and later specify this guarantee:
[...] A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields. [...]
The implementation uses a StoreStore memory barrier to prevent the write of x from moving after the assignment of f. Default values of y can still be observed.
In Java 9, java.lang.invoke.VarHandle was introduced to provide access to acquire/release and volatile semantics. VarHandle is comparable to C++11's std::atomic in that it provides atomic primitives and memory ordering control including explicit memory barriers.
The Java object constructor is not inherently thread-safe. With the help of volatile, final, and VarHandle, required guarantees can be established. For most common use cases, alternative patterns exist that do not require dealing with these kinds of low-level details. Whenever possible, prefer not to roll your own lock-free code to reduce code complexity and maximize the probability of correctness.

A direct violation of your hypothetical implementation of an allocator is in JLS 17.5 which enforces that any seen references to an object see all its final fields correctly initialized: "A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields." The allocator you made would fail this invariant.

Behavior of memory barrier in Java

After reading more blogs/articles etc, I am now really confused about the behavior of load/store before/after memory barrier.
Following are 2 quotes from Doug Lea in one of his clarification article about JMM, which are both very straighforward:
Anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.
Note that it is important for both threads to access the same volatile variable in order to properly set up the happens-before relationship. It is not the case that everything visible to thread A when it writes volatile field f becomes visible to thread B after it reads volatile field g.
But then when I looked into another blog about memory barrier, I got these:
A store barrier, “sfence” instruction on x86, forces all store instructions prior to the barrier to happen before the barrier and have the store buffers flushed to cache for the CPU on which it is issued.
A load barrier, “lfence” instruction on x86, forces all load instructions after the barrier to happen after the barrier and then wait on the load buffer to drain for that CPU.
To me, Doug Lea's clarification is more strict than the other one: basically, it means if the load barrier and store barrier are on different monitors, the data consistency will not be guaranteed. But the later one means even if the barriers are on different monitors, the data consistency will be guaranteed. I am not sure if I understanding these 2 correctly and also I am not sure which of them is correct.
Considering the following codes:
public class MemoryBarrier {
volatile int i = 1, j = 2;
int x;
public void write() {
x = 14; //W01
i = 3; //W02
}
public void read1() {
if (i == 3) { //R11
if (x == 14) //R12
System.out.println("Foo");
else
System.out.println("Bar");
}
}
public void read2() {
if (j == 2) { //R21
if (x == 14) //R22
System.out.println("Foo");
else
System.out.println("Bar");
}
}
}
Let's say we have 1 write thread TW1 first call the MemoryBarrier's write() method, then we have 2 reader threads TR1 and TR2 call MemoryBarrier's read1() and read2() method.Consider this program run on CPU which does not preserve ordering (x86 DO preserve ordering for such cases which is not the case), according to memory model, there will be a StoreStore barrier (let's say SB1) between W01/W02, as well as 2 LoadLoad barrier between R11/R12 and R21/R22 (let's say RB1 and RB2).
Since SB1 and RB1 are on same monitor i, so thread TR1 which calls read1 should always see 14 on x, also "Foo" is always printed.
SB1 and RB2 are on different monitors, if Doug Lea is correct, thread TR2 will not be guaranteed to see 14 on x, which means "Bar" may be printed occasionally. But if memory barrier runs like Martin Thompson described in the blog, the Store barrier will push all data to main memory and Load barrier will pull all data from main memory to cache/buffer, then TR2 will also be guaranteed to see 14 on x.
I am not sure which one is correct, or both of them are but what Martin Thompson described is just for x86 architecture. JMM does not guarantee change to x is visible to TR2 but x86 implementation does.
Thanks~

Doug Lea is right. You can find the relevant part in section §17.4.4 of the Java Language Specification:
§17.4.4 Synchronization Order
[..] A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order). [..]
The memory model of the concrete machine doesn't matter, because the semantics of the Java Programming Language are defined in terms of an abstract machine -- independent of the concrete machine. It's the responsibility of the Java runtime environment to execute the code in such a way, that it complies with the guarantees given by the Java Language Specification.
Regarding the actual question:
If there is no further synchronization, the method read2 can print "Bar", because read2 can be executed before write.
If there is an additional synchronization with a CountDownLatch to make sure that read2 is executed after write, then method read2 will never print "Bar", because the synchronization with CountDownLatch removes the data race on x.
Independent volatile variables:
Does it make sense, that a write to a volatile variable does not synchronize-with a read of any other volatile variable?
Yes, it makes sense. If two threads need to interact with each other, they usually have to use the same volatile variable in order to exchange information. On the other hand, if a thread uses a volatile variable without a need for interacting with all other threads, we don't want to pay the cost for a memory barrier.
It is actually important in practice. Let's make an example. The following class uses a volatile member variable:
class Int {
public volatile int value;
public Int(int value) { this.value = value; }
}
Imagine this class is used only locally within a method. The JIT compiler can easily detect, that the object is only used within this method (Escape analysis).
public int deepThought() {
return new Int(42).value;
}
With the above rule, the JIT compiler can remove all effects of the volatile reads and writes, because the volatile variable can not be accesses from any other thread.
This optimization actually exists in the Java JIT compiler:
src/share/vm/opto/memnode.cpp

As far as I understood the question is actually about volatile read/writes and its happens-before guarantees. Speaking of that part, I have only one thing to add to nosid's answer:
Volatile writes cannot be moved before normal writes, volatile reads cannot be moved after normal reads. That's why read1() and read2() results will be as nosid wrote.
Speaking about barriers - the defininition sounds fine for me, but the one thing that probably confused you is that these are things/tools/way to/mechanism (call it whatever you like) to implement behavior described in JMM in hotspot. When using Java, you should rely on JMM guarantees, not implementation details.

Does synchronized keyword prevent reordering in Java?

Suppose I have the following code in Java
a = 5;
synchronized(lock){
b = 5;
}
c = 5;
Does synchronized prevent reordering? There is no dependency between a, b and c. Would assignment to a first happen then to b and then to c? If I did not have synchronized, the statements can be reordered in any way the JVM chooses right?

Locking the assignment to b will, at the very least, introduce an acquire-fence before the assignment, and a release-fence after the assignment.
This prevents instructions after the acquire-fence to be moved above the fence, and instructions before the release-fence to be moved below the fence.
Using the ↓↑ notation:
a = 5;
↓
b = 5;
↑
c = 5;
The ↓ prevents instructions from being moved above it.
The ↑ prevents instructions from being moved below it.

Does synchronized prevent reordering?
It prevents some re-ordering. You can still have re-ordering outside the synchronized block and inside the synchronized block, but not from inside a synchronized block, to outside it.
There is no dependency between a, b and c.
That makes no difference.
Would assignment to a first happen then to b and then to c?
Yes. But as has been noted, this is not guaranteed for all JVMs. (See below)
If I did not have synchronized, the statements can be reordered in any way the JVM chooses right?
Yes, by the JVM and/or the CPU instruction optimiser and/or CPU cache, but it is unlikely given there is no obvious reason to suspect that changing the order of a = 5; and b = 5; will improve performance.
What you could see is a change of visibility for the cache. i.e. another thread reading these values could see the b = 5; before a = 5; e.g. they are on different cache lines, if it is not also synchronized.

Does synchronized prevent reordering?
Partially, see below.
Would assignment to a first happen then to b and then to c?
No. As dcastro pointed out, actions can be moved into synchronized blocks. So the compiler would be allowed to generate code, that corresponds to the following statements:
synchronized (lock){
a = 5;
b = 5;
c = 5;
}
And the compiler is also allowed to reorder statements within a synchronized block, that have no dependency to each other. So the compiler can also generate code that corresponds to the following statements:
synchronized (lock){
c = 5;
b = 5;
a = 5;
}
If I did not have synchronized, the statements can be reordered in any way the JVM chooses right?
Well, I think that's the wrong question, and it's also the wrong way to think about the Java Memory Model. The Java Memory Model is not defined in terms of reorderings. Actually, it's much easier than most people think. Basically, there is only one important rule, that you can find in §17.4.5 in the Java Language Specification:
A program is correctly synchronized if and only if all sequentially consistent executions are free of data races. If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent.
In other words: If you completely ignore reorderings, and for all possible executions of the program, there is no way that two threads access the same memory location, which is neither volatile nor atomic, and at least one of the actions is a write operation, then all executions will appear, as if there are no reorderings.
In short: Avoid data races, and you will never observe a reordering.

Java - happens-before relationship for monitor unlock

I have recently read http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html which clearly describes a lot of intrinsics of Java memory model. One particular excerpt got my attention namely:
The rule for a monitorexit (i.e., releasing synchronization) is that
actions before the monitorexit must be performed before the monitor is released.
Seems obvious to me, however having read http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html and happens-before definition, all I could find about the monitor unblock is when one thread unlocks the monitor that happens-before the other thread locking it again (which makes perfect sense as well). Could someone explain how JLS explains the obvious condition that all the actions within the synchronization block must happen-before the unlock operation?
FURTHER COMMENTS:
Based on a couple of responses I wanted to write up further comments to what you guys have been saying:
Reodering within a single-thread
A couple of "truths" from the source I cited:
a = new A()
if new A() involves a hundred operations, followed by assignment of address on a heap to a, compiler can simply reorder those to assign the heap address to a and then follow the usual initialization (problem from double checked locking)
synchronized{
a = 5;
}
print a;
can be changed to
synchronized{
a = 5;
print a;
}
SO we reordered monitorexit with print statement (also valid according to JLS)
Now, the simple case I mentioned:
x = 1;
y = 2;
c = x + y;
print c;
I see no reason that stops a compiler from either assigning x first or y first. There is completely nothing stopping it, as the final output is unchanged regardless of whether x is assigned first or y is. So the reordering is perfectly possible.
monitor.unlock
Based on the example with print statement being "pulled into" the synchronization block, let's try to reverse this i.e. startwing with
synchronized{
a = 5;
print a;
}
I could expect the compiler to do this:
synchronized{
a = 5;
}
print a;
Seems perfectly reasonable within the single-threaded world, YET this is definitely invalid, and against JLS (according to the cited source). Now why is that the case, if I cannot find anything within the JLS about this? and clearly the motivation about the "order of the program" is now irrelevant, since the compiler can make reoderings such as "pulling in" the statements to the synchronized block.

It's not just all actions performed within the synchronized block, it's also referring to all actions by that thread prior to the monitorexit.
Could someone explain how JLS explains the obvious condition that all
the actions within the synchronization block must happen-before the
unlock operation?
For a particular thread (and only one thread) all actions regardless of synchronized maintains program order, so it appears as if all reads and writes happen in order (we don't need a happens-before ordering in a single-thread case).
The happens-before relationship takes into account multiple threads, that is all actions happening in one thread prior to monitorexit are visible to all threads after a successive monitorenter.
EDIT to address your update.
There are particular rules the compiler must follow to re-order. The specific one in this case is demonstrated in the Can Reorder grid found here
Specifically useful entries are
First Action: Normal Load (load a; print a)
Second Action: Monitor Exit
The value here is No meaning the compiler cannot reorder two actions in which the first is a normal load and the second is monitorexit so in your case this reorder would violate the JLS.
There is a rule known as roach-motel ordering, that is read/writes can be reordered into a synchronized block but not out of it.

Maybe you missed this (§17.4.5):
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
Combined with what you already know about happens-before, it should be clear that this implies that all actions preceding the unlock action will be visible to the other thread.
Regarding your additions to the question, if you write this:
synchronized {
a = 5;
b = 3;
}
and the compiler emits this:
synchronized{
a = 5;
}
b = 3;
then the stipulation I have quoted above is violated: now b = 3 does not happen before the lock-release. This is why it is illegal. (Note that your example with print a isn't instructive because it involves only reading + side effects not easily describable with simple variables.)

Java memory barriers

I'm reading JSR 133 Cookbook and have the following question about memory barriers. An example of inserted memory barriers is in the book, but only writing and reading from local variables is used. Suppose I have the following variables
int a;
volatile int b;
And the code
b=a;
Do I understand correctly that this one line would produce the following instructions
load a
LoadStore membar
store b

The underlying behavior of the JVM is guaranteed only against the volatile variable. It may be possible that two separate threads may have access to different values for variable 'a' even after a thread completes evaluation of the b = a; statement. The JVM only guarantees that access to the volatile variable is serialized and has Happens-Before semantics. What this means is that the result of executing b = a; on two different threads (in the face of a "volatile" value for 'a' (ha ha)) is indeterminate because the JVM only says that the store to 'b' is serialized, it puts no guarantee on which thread has precedence.
More precisely what this means is that the JVM treats variable 'b' as having its own lock; allowing only one thread to read or write 'b' at a time; and this lock only protects access to 'b' and nothing else.
Now, this means different things under different JVMs and how this lock is actually implemented on different machine architectures may result in vastly different runtime behavior for your application. The only guarantee you should trust is what the Java reference manual says, "A field may be declared volatile, in which case the Java Memory Model ensures that all threads see a consistent value for the variable." For further review see Dennis Byrne's excellent article for some examples of how different JVM implementations deal with this issue.
Happens-Before semantics are not very interesting in the provided example because an integer primitive doesn't provide much opportunity for the kind of instruction reordering that volatile was intended (in part) to remedy. A better example is this:
private AnObjectWithAComplicatedConstructor _sampleA;
private volatile AnObjectWithAComplicatedConstructor _sampleB;
public void getSampleA() {
if (_sampleA == null) {
_sampleA = new AnObjectWithAComplicatedConstructor();
}
return _sampleA;
}
public void getSampleB() {
if (_sampleB == null) {
_sampleB = new AnObjectWithAComplicatedConstructor();
}
return _sampleB;
}
In this example field '_sampleA' has a serious problem; in a multithreaded situation it is very possible that '_sampleA' may be in the process of being initialized in one thread at the same time another thread attempts to use it leading to all sorts of sporatic and very, very difficult to duplicate bugs. To see this consider thread X to execute the 'new' byte code instruction statement of the new in getSampleA() and then stores the (yet-to-be-initialized) result in field '_sampleA'. Thread X is now paused by the JVM and thread Y starts executing getSampleA() and sees that the '_sampleA' is not null; which uninitialized value is then returned and thread Y now starts calling methods on the resulting instance causing all sorts of problems; which will, of course, only appear in production, at odd hours, and under heavy service loads.
The worse case for field _sampleB is that it may have multiple threads initializing individual instances; all but one of which will eventually be discarded. Code like this should be wrapped in a "synchronized" block but the volatile keyword will do the trick because it requires that the value finally stored in '_sampleB' has Happens-Before semantics which means that the stuff to the right of the equals sign is guaranteed to be complete when the stuff on the left hand side of the equals sign is performed.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.