double check locking without volatile (but with VarHandle release/acquire)

double check locking without volatile (but with VarHandle release/acquire) - java

The question is rather easy, in a way. Suppose I have this class:
static class Singleton {
}
And I want to provide a singleton factory for it. I can do the (probably) obvious. I am not going to mention the enum possibility or any other, as they are of no interest to me.
static final class SingletonFactory {
private static volatile Singleton singleton;
public static Singleton getSingleton() {
if (singleton == null) { // volatile read
synchronized (SingletonFactory.class) {
if (singleton == null) { // volatile read
singleton = new Singleton(); // volatile write
}
}
}
return singleton; // volatile read
}
}
I can get away from one volatile read with the price of higher code complexity:
public static Singleton improvedGetSingleton() {
Singleton local = singleton; // volatile read
if (local == null) {
synchronized (SingletonFactory.class) {
local = singleton; // volatile read
if (local == null) {
local = new Singleton();
singleton = local; // volatile write
}
}
}
return local; // NON volatile read
}
This is pretty much what our code has been using for close to a decade now.
The question is can I make this even faster with release/acquire semantics added in java-9 via VarHandle:
static final class SingletonFactory {
private static final SingletonFactory FACTORY = new SingletonFactory();
private Singleton singleton;
private static final VarHandle VAR_HANDLE;
static {
try {
VAR_HANDLE = MethodHandles.lookup().findVarHandle(SingletonFactory.class, "singleton", Singleton.class);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private static Singleton getInnerSingleton() {
Singleton localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY); // acquire
if (localSingleton == null) {
synchronized (SingletonFactory.class) {
localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY); // acquire
if (localSingleton == null) {
localSingleton = new Singleton();
VAR_HANDLE.setRelease(FACTORY, localSingleton); // release
}
}
}
return localSingleton;
}
}
Would this be a valid and correct implementation?

Yes, this is correct, and it is present on Wikipedia. (It doesn't matter that the field is volatile, since it is only ever accessed from VarHandle.)
If the first read sees a stale value, it enters the synchronized block. Since synchronized blocks involve happen-before relationships, the second read will always see the written value. Even on Wikipedia it says sequential consistency is lost, but it refers to the fields; synchronized blocks are sequentially consistent, even though they use release-acquire semantics.
So the second null check will never succeed, and the object is never instantiated twice.
It is guaranteed that the second read will see the written value, because it is executed with the same lock held as when the value was computed and stored in the variable.
On x86 all loads have acquire semantics, so the only overhead would be the null check. Release-acquire allows values to be seen eventually (that's why the relevant method was called lazySet before Java 9, and its Javadoc used that exact same word). This is prevented in this scenario by the synchronized block.
Instructions may not be reordered out and into synchronized blocks.

I am going to try and answer this myself... TL;DR : This is a correct implementation, but potentially more expensive than the one with volatile?.
Though this looks better, it can under-perform in some case. I am going to push myself against the famous IRIW example : independent reads of independent writes:
volatile x, y
-----------------------------------------------------
x = 1 | y = 1 | int r1 = x | int r3 = y
| | int r2 = y | int r4 = x
This reads as :
there are two threads (ThreadA and ThreadB) that write to x and y (x = 1 and y = 1)
there are two more threads (ThreadC and ThreadD) that read x and y, but in reverse order.
Because x and y are volatile a result as below is impossible:
r1 = 1 (x) r3 = 1 (y)
r2 = 0 (y) r4 = 0 (x)
This is what sequential consistency of volatile guarantees. If ThreadC observed the write to x (it saw that x = 1), it means that ThreadD MUST observe the same x = 1. This is because in a sequential consistent execution writes happens as-if in global order, or it happens as-if atomically, everywhere. So every single thread must see the same value. So this execution is impossible, according to the JLS too:
If a program has no data races, then all executions of the program will appear to be sequentially consistent.
Now if we move the same example to release/acquire (x = 1 and y = 1 are releases while the other reads are acquires):
non-volatile x, y
-----------------------------------------------------
x = 1 | y = 1 | int r1 = x | int r3 = y
| | int r2 = y | int r4 = x
A result like:
r1 = 1 (x) r3 = 1 (y)
r2 = 0 (y) r4 = 0 (x)
is possible and allowed. This breaks sequential consistency and this is normal, since release/acquire is "weaker". For x86 release/acquire does not impose a StoreLoad barrier , so an acquire is allowed to go above (reorder) an release (unlike volatile which prohibits this). In simpler words volatiles themselves are not allowed to be re-ordered, while a chain like:
release ... // (STORE)
acquire ... // this acquire (LOAD) can float ABOVE the release
is allowed to be "inverted" (reordered), since StoreLoad is not mandatory.
Though this is somehow wrong and irrelevant, because JLS does not explain things with barriers. Unfortunately, these are not yet documented in the JLS either...
If I extrapolate this to the example of SingletonFactory, it means that after a release :
VAR_HANDLE.setRelease(FACTORY, localSingleton);
any other thread that does an acquire:
Singleton localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY);
is not guaranteed to read the value from the release (a non-null Singleton).
Think about it: in case of volatile, if one thread has seen the volatile write, every other thread will, for sure, see it too. There is no such guarantee with release/acquire.
As such, with release/acquire every thread might need to enter the synchronized block. And this might happen for many threads, because it's really unknown when the store that happened in the release will be visible by the load acquire.
And even if the synchronized itself does offer happens-before order, this code, at least for some time (until the release is observed) is going to perform worse? (I assume so): every thread competing to enter the synchronized block.
So in the end - this is about what is more expensive? A volatile store or an eventually seen release. I have no answer to this one.

Related

Is it guaranteed that volatile field would be properly initialized

Here:
An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.
Are the same guarantees held for the volatile field?
What if the y field in the following example would be volatile could we observe 0?
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x; // guaranteed to see 3
int j = f.y; // could see 0
}
}
}

Yes, it is possible to see 0 when
class FinalFieldExample {
final int x;
volatile int y;
static FinalFieldExample f;
...
}
The short explanation:
writer() thread publishes f = new FinalFieldExample() object via a data race
because of this data race, reader() thread is allowed see f = new FinalFieldExample() object as semi-initialized.
In particular, reader() thread can see a value of y that was before y = 4; — i.e. initial value 0.
More detailed explanations are here.
You can reproduce this behavior on ARM64 with this jcstress test.

I think reading 0 is possible.
The spec says:
A write to a volatile variable v synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
In our case, we have a write and a read of the same variable, but there is nothing that ensures the read to be subsequent. In particular, the write and read occur in different threads that are not related by any other synchronization action.
That is, it is possible that the read will occur before the write in synchronization order.
This may sound surprising given that the writing thread writes f after y, and the reading thread reads y only if it detects f has been written. But since the write and read to f are not synchronized, the following quote applies:
More specifically, if two actions share a happens-before relationship, they do not necessarily have to appear to have happened in that order to any code with which they do not share a happens-before relationship. Writes in one thread that are in a data race with reads in another thread may, for example, appear to occur out of order to those reads.
The explanatory notes to example 17.4.1 also reaffirm that the runtime is permitted to reorder these writes:
If some execution exhibited this behavior, then we would know that instruction 4 came before instruction 1, which came before instruction 2, which came before instruction 3, which came before instruction 4. This is, on the face of it, absurd.
However, compilers are allowed to reorder the instructions in either thread, when this does not affect the execution of that thread in isolation.
In our case, the behavior of the writing thread, in isolation, is not affected by reordering the writes to f and y.

Yes, 0 is possible when x is volatile, because there is no guarantee that the write x = 3 in the writer() thread always happens-before the read local_f.x in the reader() thread.
class FinalFieldExample {
volatile int x;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
var local_f = f;
if (local_f != null) {
int i = local_f.x; // could see 0
}
}
}
As a result, even though x is volatile (which means that all reads and writes to x happen in a global order), nothing prevents the read local_f.x in the reader() thread from happening before the write x = 3 in the writer() thread.
local_f.x in these case will return 0 (the default value for int, which works like an initial write).
The problem is that after the reader() thread reads f, there is no guarantee (i.e. no happens-before relation) that it sees the inner state on f correctly: i.e. it may not see the write x = 3 into the inner field f.x made by the writer() thread in FinalFieldExample constructor.
You can create this happens-before relation by:
either making f volatile (x can be made non-volatile)
class FinalFieldExample {
int x;
static volatile FinalFieldExample f;
...
}
From the JLS:
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
or making x final instead of volatile
class FinalFieldExample {
final int x;
static FinalFieldExample f;
...
}
From the JLS:
An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.

EDIT: My answer below looks like it's wrong. volatile only requires that all reads and writes (and other "actions") have completed when the write is made, but subsequent writes can still be reordered to occur before the write to the volatile. Thus it is possible to see f before the write to y has occurred.
Which is really weird, but here we are.
user17206833's answer above mine appears to be correct and contains a link to a very useful resource, I suggest you check it out.
Wrong stuff (I'm leaving it up because it illustrate a common misconception):
OP I think I misread your question:
"What if the y field in the following example would be volatile could we observe 0?"
If y is volatile, then no you cannot observe a 0.
class FinalFieldExample {
final int x;
volatile int y;
If this is what you mean, then the write to y followed by a read of y must create a happens-before edge for the read. The JLS says: "A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field." and never qualifies that statement requiring a read of a reference of some type. The fact that f is neither volatile nor final should make no difference.

Firstly, volatile and initialization are unrelated concepts: A field's initialization guarantees are unaffected by it being volatile or not.
Unless this "escapes" from within the constructor (which is not the case here), the constructor is guaranteed to have completed execution before any other process can access the instance's fields/methods, so y must be initialized in reader() if f != null, ie
int j = f.y; // will always see 4
See JLS volatile

Java Selling Tickets with Multithreading

I have two threads to sell tickets.
public class MyThread {
public static void main(String[] args) {
Ticket ticket = new Ticket();
Thread thread1 = new Thread(()->{
for (int i = 0; i < 30; i++) {
ticket.sell();
} }, "A");
thread1.start();
Thread thread2 = new Thread(()->{
for (int i = 0; i < 30; i++) {
ticket.sell();
} }, "B");
thread2.start();
}
}
class Ticket {
private Integer num = 20 ;
private Object obj = new Object();
public void sell() {
// why shouldn't I use "num" as a monitor object ?
// I thought "num" is unique among two threads.
synchronized ( num ) {
if (this.num >= 0) {
System.out.println(Thread.currentThread().getName() + " sells " + this.num + "th ticket");
this.num--;
}
}
}
}
The output will be wrong if I use num as a monitor object.
But if I use obj as a monitor object, the output will be correct.
What's the difference between using num and using obj ?
===============================================
And why does it still not work if I use (Object)num as a monitor object ?
class Ticket {
private int num = 20 ;
private Object obj = new Object();
public void sell() {
// Can I use (Object)num as a monitor object ?
synchronized ( (Object)num ) {
if (this.num >= 0) {
System.out.println(Thread.currentThread().getName() + " sells " + this.num + "th ticket");
this.num--;
}
}
}
}

Integer is a boxed value. It contains a primitive int, and the compiler deals with autoboxing/autounboxing that int. Because of this, the statement this.num-- is actually:
num=Integer.valueOf(num.intValue()-1)
That is, the num instance containing the lock is lost once you perform that update.

The fundamental problem here is synchronizing on a non-final value.
The most important thing to understand about the Java Memory Model - that is, what values a thread sees whilst executing a Java program - is the happens-before relationship.
In the specific case of a synchronized block, actions done in one thread before exiting the synchronized block happen before actions done inside the synchronized block in another thread - so, if the first thread increments a variable inside that synchronized block, the second thread sees that updated value.
This goes over and above the well-known fact that a synchronized block can only be entered by one thread at a time: only one thread at a time and you get to see what the previous thread did.
// Thread 1 // Thread 2
synchronized (monitor) {
num = 1
} // Exiting monitor
// *happens before*
// entering monitor
synchronized (monitor) {
int n = num; // Guaranteed to see n = 1 (provided no other thread has entered a block synchronized on monitor and changed it first).
}
There is a very important caveat to this guarantee: it only holds if the two executions of the synchronized block use the same monitor. And that's not the same variable, it's the same actual concrete object on the heap (variables don't have monitors, they're just pointers to a value in the heap).
So, if you reassign the monitor inside the synchronized block:
synchronized (num) {
if (num > 0) {
num--; // This is the same as `num = Integer.valueOf(num.intValue() - 1);`
}
}
then you are destroying the happens-before guarantee, because the next thread to arrive at that synchronized block is entering the monitor of a different object (*).
Once you do, the behavior of your program is ill-defined: if you're lucky, it fails in an obvious way; if you're very unlucky, it can seem to work, and then start failing mysteriously at a later date.
Your code is just broken.
This isn't something that's specific to Integers either: this code would have the same problem.
// Assume `Object someObject = new Object();` is defined as a field.
synchronized (someObject) {
someObject = new Object();
}
(*) Actually, you still get a happens-before relationship for the new object: it's just not for the things inside this synchronized block, it's for things that happened in some other synchronized block that used the object as the monitor. Essentially, it's impossible to reason about what this means, so you may as well just consider it "broken".
The correct way to do it is to synchronize on a field that you can't (not just don't) reassign. You could simply synchronize on this (which can't be reassigned):
synchronized (this) {
if (num > 0) {
num--; // This is the same as `num = Integer.valueOf(num.intValue() - 1);`
}
}
Now it doesn't matter that you're reassigning num inside the block, because you're not synchronizing on it any more. You get the happens-before guarantee from the fact that you're always synchronizing on the same thing.
Note, however, that you must always access num from inside a synchronized block - for example, if you have a getter to get the number of tickets remaining, that must also synchronize on this, in order to get the happens-before guarantee that the value changed in the sell() method is visible in that getter.
This works, but it may not be entirely desirable: anybody who has access to a reference to your Ticket instance can also synchronize on it. This means they can potentially deadlock your code.
Instead, it is a common practice to introduce a private field which is used purely for locking: this is what the obj field gives you. The only modification from your code should be to make it final (and give it a better name than obj):
private final Object obj = new Object();
This can't be accessed outside your class, so nefarious clients cannot cause a deadlock for you directly.
Again, this can't be reassigned inside your synchronized block (or anywhere else), so there is no risk of you breaking the happens-before guarantee by reassigning it.

Final Field Semantics

17.5. final Field Semantics
Example 17.5-1. final Fields In The Java Memory Model
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x; // guaranteed to see 3
int j = f.y; // could see 0
}
}
}
I have been troubled by this problem for several days.
Could anyone directly answer me why it the f.y could be see 0?

If thread A assigns y:
Thread A:
writer();
And thread B reads y:
Thread B
reader();
Then y has been read without any synchronization and therefore might not see the value assigned. It's a simple application of Java's synchronization requirement. If you still don't understand it please clarify in your question.
To put this another way, if y were declared as volatile then it would be guaranteed to be seen.
class FinalFieldExample {
final int x;
volatile int y;
Then:
static void reader() {
if (f != null) {
int i = f.x; // guaranteed to see 3
int j = f.y; // guaranteed to see 4

This is because there is no happens-before relationship between the write operation [this.]y=4 and the read operation f.y. The assignment y=4 happens-before the constructor finishes, and the constructor finishes before the assignment to the static field, but since both threads are directly accessing a static field and do not have a direct joint sequencing relationship, there is no formal guarantee that the write to the non-final field y is visible in the reading thread. The JLS does make a specific promise about final fields; in essence, there is a happens-before rule that isn't being expressed explicitly (and probably should).
Any mechanism for creating a happens-before relationship would solve this theoretical problem, including making either y or f volatile.
(Note that I very seriously doubt that any implementation has ever existed that would resequence such that the assignment to f was executed before the constructor finished, but the JLS is making the point that there is no guarantee except in the case of final fields that this won't happen.)

Does object construction guarantee in practice that all threads see non-final fields initialized?

The Java memory model guarantees a happens-before relationship between an object's construction and finalizer:
There is a happens-before edge from the end of a constructor of an
object to the start of a finalizer (§12.6) for that object.
As well as the constructor and the initialization of final fields:
An object is considered to be completely initialized when its
constructor finishes. A thread that can only see a reference to an
object after that object has been completely initialized is guaranteed
to see the correctly initialized values for that object's final
fields.
There's also a guarantee about volatile fields since, there's a happens-before relations with regard to all access to such fields:
A write to a volatile field (§8.3.1.4) happens-before every subsequent
read of that field.
But what about regular, good old non-volatile fields? I've seen a lot of multi-threaded code that doesn't bother creating any sort of memory barrier after object construction with non-volatile fields. But I've never seen or heard of any issues because of it and I wasn't able to recreate such partial construction myself.
Do modern JVMs just put memory barriers after construction? Avoid reordering around construction? Or was I just lucky? If it's the latter, is it possible to write code that reproduces partial construction at will?
Edit:
To clarify, I'm talking about the following situation. Say we have a class:
public class Foo{
public int bar = 0;
public Foo(){
this.bar = 5;
}
...
}
And some Thread T1 instantiates a new Foo instance:
Foo myFoo = new Foo();
Then passes the instance to some other thread, which we'll call T2:
Thread t = new Thread(() -> {
if (myFoo.bar == 5){
....
}
});
t.start();
T1 performed two writes that are interesting to us:
T1 wrote the value 5 to bar of the newly instantiated myFoo
T1 wrote the reference to the newly created object to the myFoo variable
For T1, we get a guarantee that write #1 happened-before write #2:
Each action in a thread happens-before every action in that thread
that comes later in the program's order.
But as far as T2 is concerned the Java memory model offers no such guarantee. Nothing prevents it from seeing the writes in the opposite order. So it could see a fully built Foo object, but with the bar field equal to equal to 0.
Edit2:
I took a second look at the example above a few months after writing it. And that code is actually guaranteed to work correctly since T2 was started after T1's writes. That makes it an incorrect example for the question I wanted to ask. The fix it to assume that T2 is already running when T1 is performing the write. Say T2 is reading myFoo in a loop, like so:
Foo myFoo = null;
Thread t2 = new Thread(() -> {
for (;;) {
if (myFoo != null && myFoo.bar == 5){
...
}
...
}
});
t2.start();
myFoo = new Foo(); //The creation of Foo happens after t2 is already running

Taking your example as the question itself - the answer would be yes, that is entirely possible. The initialized fields are visible only to the constructing thread, like you quoted. This is called safe publication (but I bet you already knew about this).
The fact that you are not seeing that via experimentation is that AFAIK on x86 (being a strong memory model), stores are not re-ordered anyway, so unless JIT would re-ordered those stores that T1 did - you can't see that. But that is playing with fire, literately, this question and the follow-up (it's close to the same) here of a guy that (not sure if true) lost 12 milion of equipment
The JLS guarantees only a few ways to achieve the visibility. And it's not the other way around btw, the JLS will not say when this would break, it will say when it will work.
1) final field semantics
Notice how the example shows that each field has to be final - even if under the current implementation a single one would suffice, and there are two memory barriers inserted (when final(s) are used) after the constructor: LoadStore and StoreStore.
2) volatile fields (and implicitly AtomicXXX); I think this one does not need any explanations and it seems you quoted this.
3) Static initializers well, kind of should be obvious IMO
4) Some locking involved - this should be obvious too, happens-before rule...

But anecdotal evidence suggests that it doesn't happen in practice
To see this issue, you have to avoid using any memory barriers. e.g. if you use thread safe collection of any kind or some System.out.println can prevent the problem occurring.
I have seen a problem with this before though a simple test I just wrote for Java 8 update 161 on x64 didn't show this problem.

It seems there is no synchronization during object construction.
The JLS doesn't permit it, nor was I able to produce any signs of it in code. However, it's possible to produce an opposition.
Running the following code:
public class Main {
public static void main(String[] args) throws Exception {
new Thread(() -> {
while(true) {
new Demo(1, 2);
}
}).start();
}
}
class Demo {
int d1, d2;
Demo(int d1, int d2) {
this.d1 = d1;
new Thread(() -> System.out.println(Demo.this.d1+" "+Demo.this.d2)).start();
try {
Thread.sleep(500);
} catch(InterruptedException e) {
e.printStackTrace();
}
this.d2 = d2;
}
}
The output would continuously show 1 0, proving that the created thread was able to access data of a partially created object.
However, if we synchronized this:
Demo(int d1, int d2) {
synchronized(Demo.class) {
this.d1 = d1;
new Thread(() -> {
synchronized(Demo.class) {
System.out.println(Demo.this.d1+" "+Demo.this.d2);
}
}).start();
try {
Thread.sleep(500);
} catch(InterruptedException e) {
e.printStackTrace();
}
this.d2 = d2;
}
}
The output is 1 2, showing that the newly created thread will in fact wait for a lock, opposed to the unsynchronized exampled.
Related: Why can't constructors be synchronized?

Testing initialization safety of final fields

I am trying to simply test out the initialization safety of final fields as guaranteed by the JLS. It is for a paper I'm writing. However, I am unable to get it to 'fail' based on my current code. Can someone tell me what I'm doing wrong, or if this is just something I have to run over and over again and then see a failure with some unlucky timing?
Here is my code:
public class TestClass {
final int x;
int y;
static TestClass f;
public TestClass() {
x = 3;
y = 4;
}
static void writer() {
TestClass.f = new TestClass();
}
static void reader() {
if (TestClass.f != null) {
int i = TestClass.f.x; // guaranteed to see 3
int j = TestClass.f.y; // could see 0
System.out.println("i = " + i);
System.out.println("j = " + j);
}
}
}
and my threads are calling it like this:
public class TestClient {
public static void main(String[] args) {
for (int i = 0; i < 10000; i++) {
Thread writer = new Thread(new Runnable() {
#Override
public void run() {
TestClass.writer();
}
});
writer.start();
}
for (int i = 0; i < 10000; i++) {
Thread reader = new Thread(new Runnable() {
#Override
public void run() {
TestClass.reader();
}
});
reader.start();
}
}
}
I have run this scenario many, many times. My current loops are spawning 10,000 threads, but I've done with this 1000, 100000, and even a million. Still no failure. I always see 3 and 4 for both values. How can I get this to fail?

I wrote the spec. The TL; DR version of this answer is that just because it may see 0 for y, that doesn't mean it is guaranteed to see 0 for y.
In this case, the final field spec guarantees that you will see 3 for x, as you point out. Think of the writer thread as having 4 instructions:
r1 = <create a new TestClass instance>
r1.x = 3;
r1.y = 4;
f = r1;
The reason you might not see 3 for x is if the compiler reordered this code:
r1 = <create a new TestClass instance>
f = r1;
r1.x = 3;
r1.y = 4;
The way the guarantee for final fields is usually implemented in practice is to ensure that the constructor finishes before any subsequent program actions take place. Imagine someone erected a big barrier between r1.y = 4 and f = r1. So, in practice, if you have any final fields for an object, you are likely to get visibility for all of them.
Now, in theory, someone could write a compiler that isn't implemented that way. In fact, many people have often talked about testing code by writing the most malicious compiler possible. This is particularly common among the C++ people, who have lots and lots of undefined corners of their language that can lead to terrible bugs.

From Java 5.0, you are guarenteed that all threads will see the final state set by the constructor.
If you want to see this fail, you could try an older JVM like 1.3.
I wouldn't print out every test, I would only print out the failures. You could get one failure in a million but miss it. But if you only print failures, they should be easy to spot.
A simpler way to see this fail is to add to the writer.
f.y = 5;
and test for
int y = TestClass.f.y; // could see 0, 4 or 5
if (y != 5)
System.out.println("y = " + y);

I'd like to see a test which fails or an explanation why it's not possible with current JVMs.
Multithreading and Testing
You can't prove that a multithreaded application is broken (or not) by testing for several reasons:
the problem might only appear once every x hours of running, x being so high that it is unlikely that you see it in a short test
the problem might only appear with some combinations of JVM / processor architectures
In your case, to make the test break (i.e. to observe y == 0) would require the program to see a partially constructed object where some fields have been properly constructed and some not. This typically does not happen on x86 / hotspot.
How to determine if a multithreaded code is broken?
The only way to prove that the code is valid or broken is to apply the JLS rules to it and see what the outcome is. With data race publishing (no synchronization around the publication of the object or of y), the JLS provides no guarantee that y will be seen as 4 (it could be seen with its default value of 0).
Can that code really break?
In practice, some JVMs will be better at making the test fail. For example some compilers (cf "A test case showing that it doesn't work" in this article) could transform TestClass.f = new TestClass(); into something like (because it is published via a data race):
(1) allocate memory
(2) write fields default values (x = 0; y = 0) //always first
(3) write final fields final values (x = 3) //must happen before publication
(4) publish object //TestClass.f = new TestClass();
(5) write non final fields (y = 4) //has been reodered after (4)
The JLS mandates that (2) and (3) happen before the object publication (4). However, due to the data race, no guarantee is given for (5) - it would actually be a legal execution if a thread never observed that write operation. With the proper thread interleaving, it is therefore conceivable that if reader runs between 4 and 5, you will get the desired output.
I don't have a symantec JIT at hand so can't prove it experimentally :-)

Here is an example of default values of non final values being observed despite that the constructor sets them and doesn't leak this. This is based off my other question which is a bit more complicated. I keep seeing people say it can't happen on x86, but my example happens on x64 linux openjdk 6...

This is a good question with a complicated answer. I've split it in pieces for an easier read.
People have said here enough times that under the strict rules of JLS - you should be able to see the desired behavior. But compilers (I mean C1 and C2), while they have to respect the JLS, they can make optimizations. And I will get to this later.
Let's take the first, easy scenario, where there are two non-final variables and see if we can publish an in-correct object. For this test, I am using a specialized tool that was tailored for this kind of tests exactly. Here is a test using it:
#Outcome(id = "0, 2", expect = Expect.ACCEPTABLE_INTERESTING, desc = "not correctly published")
#Outcome(id = "1, 0", expect = Expect.ACCEPTABLE_INTERESTING, desc = "not correctly published")
#Outcome(id = "1, 2", expect = Expect.ACCEPTABLE, desc = "published OK")
#Outcome(id = "0, 0", expect = Expect.ACCEPTABLE, desc = "II_Result default values for int, not interesting")
#Outcome(id = "-1, -1", expect = Expect.ACCEPTABLE, desc = "actor2 acted before actor1, this is OK")
#State
#JCStressTest
public class FinalTest {
int x = 1;
Holder h;
#Actor
public void actor1() {
h = new Holder(x, x + 1);
}
#Actor
public void actor2(II_Result result) {
Holder local = h;
// the other actor did it's job
if (local != null) {
// if correctly published, we can only see {1, 2}
result.r1 = local.left;
result.r2 = local.right;
} else {
// this is the case to "ignore" default values that are
// stored in II_Result object
result.r1 = -1;
result.r2 = -1;
}
}
public static class Holder {
// non-final
int left, right;
public Holder(int left, int right) {
this.left = left;
this.right = right;
}
}
}
You do not have to understand the code too much; though the very minimal explanations is this: there are two Actors that mutate some shared data and those results are registered. #Outcome annotations control those registered results and set certain expectations (under the hood things are far more interesting and verbose). Just bare in mind, this is a very sharp and specialized tool; you can't really do the same thing with two threads running.
Now, if I run this, the result in these two:
#Outcome(id = "0, 2", expect = Expect.ACCEPTABLE_INTERESTING....)
#Outcome(id = "1, 0", expect = Expect.ACCEPTABLE_INTERESTING....)
will be observed (meaning there was an unsafe publication of the Object, that the other Actor/Thread has actually see).
Specifically these are observed in the so-called TC2 suite of tests, and these are actually run like this:
java... -XX:-TieredCompilation
-XX:+UnlockDiagnosticVMOptions
-XX:+StressLCM
-XX:+StressGCM
I will not dive too much of what these do, but here is what StressLCM and StressGCM does and, of course, what TieredCompilation flag does.
The entire point of the test is that:
This code proves that two non-final variables set in the constructor are incorrectly published and that is run on x86.
The sane thing to do now, since there is a specialized tool in place, change a single field to final and see it break. As such, change this and run again, we should observe the failure:
public static class Holder {
// this is the change
final int right;
int left;
public Holder(int left, int right) {
this.left = left;
this.right = right;
}
}
But if we run it again, the failure is not going to be there. i.e. none of the two #Outcome that we have talked above are going to be part of the output. How come?
It turns out that when you write even to a single final variable, the JVM (specifically C1) will do the correct thing, all the time. Even for a single field, as such this is impossible to demonstrate. At least at the moment.
In theory you could throw Shenandoah into this and it's interesting flag : ShenandoahOptimizeInstanceFinals (not going to dive into it). I have tried running previous example with:
-XX:+UnlockExperimentalVMOptions
-XX:+UseShenandoahGC
-XX:+ShenandoahOptimizeInstanceFinals
-XX:-TieredCompilation
-XX:+UnlockDiagnosticVMOptions
-XX:+StressLCM
-XX:+StressGCM
but this does not work as I hoped it will. What is far worse for my arguments of even trying this, is that these flags are going to be removed in jdk-14.
Bottom-line: At the moment there is no way to break this.

What about you modified the constructor to do this:
public TestClass() {
Thread.sleep(300);
x = 3;
y = 4;
}
I am not an expert on JLF finals and initializers, but common sense tells me this should delay setting x long enough for writers to register another value?

What if one changes the scenario into
public class TestClass {
final int x;
static TestClass f;
public TestClass() {
x = 3;
}
int y = 4;
// etc...
}
?

What's going on in this thread? Why should that code fail in the first place?
You launch 1000s of threads that will each do the following:
TestClass.f = new TestClass();
What that does, in order:
evaluate TestClass.f to find out its memory location
evaluate new TestClass(): this creates a new instance of TestClass, whose constructor will initialize both x and y
assign the right-hand value to the left-hand memory location
An assignment is an atomic operation which is always performed after the right-hand value has been generated. Here is a citation from the Java language spec (see the first bulleted point) but it really applies to any sane language.
This means that while the TestClass() constructor is taking its time to do its job, and x and y could conceivably still be zero, the reference to the partially initialized TestClass object only lives in that thread's stack, or CPU registers, and has not been written to TestClass.f
Therefore TestClass.f will always contain:
either null, at the start of your program, before anything else is assigned to it,
or a fully initialized TestClass instance.

Better understanding of why this test does not fail can come from understanding of what actually happens when constructor is invoked. Java is a stack-based language. TestClass.f = new TestClass(); consists of four action. First new instruction is called, its like malloc in C/C++, it allocates memory and places a reference to it on the top of the stack. Then reference is duplicated for invoking a constructor. Constructor in fact is like any other instance method, its invoked with the duplicated reference. Only after that reference is stored in the method frame or in the instance field and becomes accessible from anywhere else. Before the last step reference to the object is present only on the top of creating thread's stack and no body else can see it. In fact there is no difference what kind of field you are working with, both will be initialized if TestClass.f != null. You can read x and y fields from different objects, but this will not result in y = 0. For more information you should see JVM Specification and Stack-oriented programming language articles.
UPD: One important thing I forgot to mention. By java memory there is no way to see partially initialized object. If you do not do self publications inside constructor, sure.
JLS:
An object is considered to be completely initialized when its
constructor finishes. A thread that can only see a reference to an
object after that object has been completely initialized is guaranteed
to see the correctly initialized values for that object's final
fields.
JLS:
There is a happens-before edge from the end of a constructor of an
object to the start of a finalizer for that object.
Broader explanation of this point of view:
It turns out that the end of an object's constructor happens-before
the execution of its finalize method. In practice, what this means is
that any writes that occur in the constructor must be finished and
visible to any reads of the same variable in the finalizer, just as if
those variables were volatile.
UPD: That was the theory, let's turn to practice.
Consider the following code, with simple non-final variables:
public class Test {
int myVariable1;
int myVariable2;
Test() {
myVariable1 = 32;
myVariable2 = 64;
}
public static void main(String args[]) throws Exception {
Test t = new Test();
System.out.println(t.myVariable1 + t.myVariable2);
}
}
The following command displays machine instructions generated by java, how to use it you can find in a wiki:
java.exe -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -Xcomp
-XX:PrintAssemblyOptions=hsdis-print-bytes -XX:CompileCommand=print,*Test.main Test
It's output:
...
0x0263885d: movl $0x20,0x8(%eax) ;...c7400820 000000
;*putfield myVariable1
; - Test::<init>#7 (line 12)
; - Test::main#4 (line 17)
0x02638864: movl $0x40,0xc(%eax) ;...c7400c40 000000
;*putfield myVariable2
; - Test::<init>#13 (line 13)
; - Test::main#4 (line 17)
0x0263886b: nopl 0x0(%eax,%eax,1) ;...0f1f4400 00
...
Field assignments are followed by NOPL instruction, one of it's purposes is to prevent instruction reordering.
Why does this happen? According to specification finalization happens after constructor returns. So GC thread cant see a partially initialized object. On a CPU level GC thread is not distinguished from any other thread. If such guaranties are provided to GC, than they are provided to any other thread. This is the most obvious solution to such restriction.
Results:
1) Constructor is not synchronized, synchronization is done by other instructions.
2) Assignment to object's reference cant happen before constructor returns.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.