Unsafe publication in Java - java

In the book Java Concurrency in Practice by Brian Goetz et al.:
If you do not ensure that publishing the shared reference happens-before another thread loads that shared reference, then the write of the reference to the new object can be reordered (from the perspective of the thread consuming the object) with writes to its fields. In that case, another thread could see an up-to-date value for the object reference but out-of-date values for some or all of that object's state-a partially constructed object.
Does this mean that: in the thread publishing the object, the write of the reference to the new object is not reordered with writes to its fields; the write to its fields happens before the write of the reference. However, that publishing thread may flush the updated reference to main memory before it flushes the updated object fields. Therefore, the thread consuming the object may see a non-null reference for the object, yet see outdated values for the object fields? And in that sense, the operations are reordered for the consuming thread.

Yes.
The answer to your question is right there in the paragraph that you quoted, and you seem to echo the answer in your question.
One comment though: You said that, "[the] publishing thread may flush the updated reference to main memory before it flushes the updated object fields." If you're talking about Java code, then it's best to stick with what is written in the Java Language Specification (JLS).
The JLS tells you how a Java program is allowed to behave. It says nothing about "main memory" or "caches" or "flushing." It only says that without explicit synchronization, the updates that one thread performs in a certain order on two or more variables may seem to have happened in a different order when viewed from the perspective of some other thread. How or why that can happen is "implementation details."

in the thread publishing the object, the write of the reference to the new object is not reordered with writes to its fields; the write to its fields happens before the write of the reference.
Yes. Because in one single thread this process happens in Program Order which doesn't allow reordering: "If x and y are actions of the same thread and x comes before y in program order, then hb(x, y)." (https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.5). We may rephrase you a bit: "the write of the reference to the new object is not reordered with writes to its fields", which means that if you read the reference to an object, it is guaranteed that you will read all its fields consecutively.
the thread consuming the object may see a non-null reference for the object, yet see outdated values for the object fields?
Yes, it may when you publish the object in unsafe manner, without appropriate HB edges implemented with memory barriers. Literally speaking, in absence of the HB/membars you get undefined behavior. This means that in other thread you can see/read anything (except out-of-thin-air (OoTA) values, explicitly forbidden by JMM). Safe publication makes all the values written before the publication visible to all readers that observed the published object. There are few most popular and simple ways to make the publication safe:
Publish the reference through a properly locked field (https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.5)
Use the static initializer to do the initializing stores (http://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4)
Publish the reference via a volatile field (https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.5), or as the consequence of this rule, via the AtomicX classes
Initialize the value into a final field, which leads to the freeze action (https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.5).
You also can use other actions which produces HBs like Thread.start() etc., but my day-to-day favorites are:
the final fields for immutable data
volatile/AtomicXXX fields and locks (explicit synchronized block/ReadWriteLock, implicit locks in BlockingQueue) for mutable data.

Related

Is calling start() on a object of this class safe? An example from Java Concurrency in practice

First off, I will give links to the source code that I will be talking about since copy/paste would make this question page too long.
In Listing 5.15 http://jcip.net/listings/CellularAutomata.java of JCIP, I imagine that in some main method, one will create a CellularAutomata object and then call start() on that object.
However, is it okay to do so? When the object's start method is called, it will create N(number of processors) threads with instances of Worker. It seems though that the N threads that are created with the worker object might be seeing a incomplete reference or object of that Worker.
The reasoning behind it is that, the this reference escapes during the construction of the CellularAutomata object when calling
new Runnable() and new Worker(mainBoard.getSubBoard(count, i))
And since Worker[] workers; and CyclicBarrier barrier; are fields of the CellularAutomata object, the threads created in the start() method of that object might not be able to see those objects in a proper state.
I am thinking this is similar to the Holder's example http://jcip.net/listings/StuffIntoPublic.java
http://jcip.net/listings/Holder.java
where the Holder's field might not be visible by other threads.
I understand that the Holder example was problematic because the field was not final, and therefore might not be visible, and in the CellularAutomata they are final. I read that class with only final fields are guaranteed visibility for their fields when published. However, I also read that although final fields might be the only fields of a class, if the class is not properly constructed, then that guarantee is gone. And in this example, since the this reference escapes, I assume it is not properly constructed. Here is an example of implicitly letting the this reference escape which is similar to what's going on in CellularAutomata. http://jcip.net/listings/ThisEscape.java
Please let me know if my thoughts need correction, I would really appreciate it. This concurrency journey has been filling me with so many doubts and questions and if you have any other references to where I can learn concurrency and the foundations for concurrency in Java, please let me know.
Thank you
The danger in allowing this to escape is that it might be seen before it's fully constructed. In this case, that's not an issue because the runnable doesn't execute until start() is invoked, which must be after the constructor completes.
Furthermore, besides the final field guarantees, there are at least two additional happens-before barriers between the assignment of mainBoard and the execution of the runnable. One is the call to Thread.start() by what will be the last thread entering the barrier, which happens-before any action in the started thread. Then there is the actual call to CylicBarrier.await(), which happen[s]-before actions that are part of the barrier action.
So I would say the code is pretty safe.
You can read the relevant section the Java Language Specification: 17.5. final Field Semantics
The first relevant section (emphasis added by me):
An object is considered to be completely initialized when its
constructor finishes. A thread that can only see a reference to an
object after that object has been completely initialized is guaranteed
to see the correctly initialized values for that object's final
fields.
The this references is not seen by any other thread before the constructor completes, so it's fine.
There is nothing magic about the this reference "escaping" from the constructor; the relevant thing is that no other thread should see it (before the constructor completes).
The next paragraph in the JLS expands on this (emphasis and italics added by me):
The usage model for final fields is a simple one: Set the final fields
for an object in that object's constructor; and do not write a
reference to the object being constructed in a place where another
thread can see it before the object's constructor is finished. If this
is followed, then when the object is seen by another thread, that
thread will always see the correctly constructed version of that
object's final fields.

Non-volatile fields + first object access from another thread (java)

I have been working on a certain server-type application for a while now, and I found that its design challenges the way I see memory coherence (so to speak) in Java.
This application uses NIO, therefore there is a limited amount of I/O threads (they only do network I/O and nothing else; they never terminate, but may get blocked waiting for more work).
Each connection is internally represented as an object of a specific type, let's call it ClientCon for the sake of this example. ClientCon has various session related fields, none of which are volatile. There is no synchronization of any kind in relation to getting/setting values for these fields.
Received data is made up of logical units with a fixed maximum size. Each such unit has some metadata that allows the handling type (class) to be decided. Once that is done, a new object of that type is created. All such handlers have fields, none of which are volatile. An I/O thread (a concrete I/O thread is assigned to each ClientCon) then calls a protected read method with remaining buffer contents (after metadata was read) on the new handler object.
After this, the same handler object is put into a special queue, which (the queue) is then submitted to a thread pool for execution (where each handler's run method is called to take actions based on the read data). For the sake of this example, we can say that TP threads never terminate.
Therefore, a TP thread will get its hands on an object it never had access to before. All fields of that object are non-volatile (and most/all are non-final, as they were modified outside the constructor).
The handler's run method may act based on session-specific fields in ClientCon as well as set them and/or act on handler object's own fields, whose values were set in the read method.
According to CPJ (Concurrent Programming in Java: Design and Principles):
The first time a thread accesses a field of an object, it sees either the initial value of the field or a value since written by some other thread.
A more comprehensive example of this quote can be found in JLS 17.5:
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x; // guaranteed to see 3
int j = f.y; // could see 0
}
}
}
The class FinalFieldExample has a final int field x and a non-final
int field y. One thread might execute the method writer and another
might execute the method reader.
Because the writer method writes f after the object's constructor
finishes, the reader method will be guaranteed to see the properly
initialized value for f.x: it will read the value 3. However, f.y is
not final; the reader method is therefore not guaranteed to see the
value 4 for it.
This application has been running on x86 (and x86/64) Windows/Unix OSes (Linux flavors, Solaris) for years now (both Sun/Oracle and OpenJDK JVMs, versions 1.5 to 8) and apparently there have been no memory coherency issues related to received data handling. Why?
To sum it up, is there a way for a TP thread to see the object as it was initialized after construction and be unable to see all or some changes done by an I/O thread when it called the protected read method? If so, it would be nice if a detailed example could be presented.
Otherwise, are there some side-effects that could cause the object's field values to always be visible in other threads (e.g. I/O thread acquiring a monitor when adding the handler object to a queue)? Neither the I/O thread nor the TP thread synchronizes on the handler object itself. The queue does no such thing as well (not that it would make sense, anyway). Is this related to a concrete JVM's implementation details, perhaps?
EDIT:
It follows from the above definitions that:
An unlock on a monitor happens-before every subsequent lock on that
monitor. – Not applicable: monitor is not acquired on the handler object
A write to a volatile field (§8.3.1.4) happens-before every subsequent
read of that field. – Not applicable: no volatile fields
A call to start() on a thread happens-before any actions in the
started thread. – A TP thread might already exist when the queue with handler object(s) is submitted for execution. A new handler object might be added to queue amidst an execution on an existing TP thread.
All actions in a thread happen-before any other thread successfully
returns from a join() on that thread. – Not applicable: threads do not wait for each other
The default initialization of any object happens-before any other
actions (other than default-writes) of a program. – Not applicable: field writes are after default init AND after constructor finishes
When a program contains two conflicting accesses (§17.4.1) that are
not ordered by a happens-before relationship, it is said to contain a
data race.
and
Memory that can be shared between threads is called shared memory or
heap memory.
All instance fields, static fields, and array elements are stored in
heap memory. In this chapter, we use the term variable to refer to
both fields and array elements.
Local variables (§14.4), formal method parameters (§8.4.1), and
exception handler parameters (§14.20) are never shared between threads
and are unaffected by the memory model.
Two accesses to (reads of or writes to) the same variable are said to
be conflicting if at least one of the accesses is a write.
There was a write without forcing a HB relationship on field(s), and later there is a read, once again, not forcing a HB relationship on those field(s). Or am I horribly wrong here? That is, there is no declaration that anything about the object could have changed, so why would the JVM force-flush possibly cached values for these fields?
TL;DR
Thread #1 writes values to a new object's fields in a way that does not allow JVM to know that those values should be propagated to other threads.
Thread #2 acquires the object that was modified after construction by Thread #1 and reads those field values.
Why does the issue described in FinalFieldExample/JLS 17.5 NEVER happen in practice?
Why does Thread #2 never see only a default-initialized object (or, alternatively, the object as it was after construction, but before/in the mid of field value changes by Thread #1)?
I'm quite sure that when a thread pool starts a thread / runs a callable, it has a hapens-before semantics, so all changes before the happens-before are available to the thread.
The scenario you mentioned in CPJ is valid when you have more than one thread modifying data concurrently on the same object instance (e.g. 2 threads already running and modifying the same value (or values that happen to be next to each other in the heap).
It looks like in your case, there is no concurrent modification/read of the fields.
It might depend on what type of thread pool you are using. If it's an ExecutorService, then that class makes some strong guarantees about its task. From the documentation:
Memory consistency effects: Actions in a thread prior to the submission of a Runnable or Callable task to an ExecutorService happen-before any actions taken by that task, which in turn happen-before the result is retrieved via Future.get().
So when you initialize any object, plus any other objects, then submit that object to an ExecutorService, all those writes are made visible to the thread that will eventually handle your task.
Now, if you home-rolled your own thread pool, or you're using a thread pool with out these guarantees, then all bets are off. I'd say switch to something that has the guarantee though.
In practice one reason you will never see a violation here is "most are non-final" which means that there is at least one final field. The way HotSpot implements the guarantees given by the JLS when final fields are involved is to put a memory barrier at the end of the constructor, thereby granting the non-final fields the same visibility guarantees.
In theory now this is obviously not necessary, which means it depends on how you queue additional work in your thread pool. Generally I cannot imagine any design where there is not some synchronization going on when queueing the work, before it is executed - not only would it make working with this incredibly awkward (for the same reason that starting a thread invokes happens-before behavior), the way to implement such data structures also necessitates some synchronization.
Java's ThreadPoolExecutor.execute() for example does use a BlockingQueue internally which already gives you all the visibility and ordering guarantees you'd need.

"Monitor" in java threads

I have read different things in different blogs about monitors. So I'm a bit confused now.
As much as I understand, monitor is a somebody who would make sure that only one thread is executing the code in the critical section. So is it like if we have 3 synchronized methods/blocks then we would have 3 monitors to make sure that only one thread is in the critical section?
If the above is true then why it is said that in Java every object has a monitor associated with it? It should be every synchronized block is associated with a monitor.
What is a monitor?
A monitor is something a thread can grab and hold, preventing all other threads from grabbing that same monitor and forcing them to wait until the monitor is released. This is what a synchronized block does.
Where do these monitors come from in the first place?
The answer is: from any Java object. When you write:
Object foo = new Object();
synchronized (foo) {
System.out.println("Hello world.");
}
...what this means is: the current thread will first grab the monitor associated with the object stored in variable foo and hold it while it prints "Hello world", then releases it.
Why does every Java object have a monitor associated with it?
There is no technical reason for it to be that way. It was a design decision made in the early versions of Java and it's too late to change now (even though it is confusing at first and it does cause problems if people aren't careful).
When using synchronized with blocks, you specify an object to lock on. In that case, the monitor of that object is used for locking.
When using synchronized with methods, you don't specify an object to lock on, and instead this object is implied. Again, the monitor of this is used for locking.
So, objects have monitors, and synchronized methods/blocks do not have their own monitors, but instead they use the monitors of specific objects.
In the context of Java programming, the monitor is the intrinsic lock (where intrinsic means "built-in") on a Java object. For a thread to enter any synchronized instance method on an object it must first acquire the intrinsic lock on that object. For a thread to enter any synchronized static method on a class it must first acquire the intrinsic lock on that class.
This is how monitor is defined in the Java tutorial:
Synchronization is built around an internal entity known as the intrinsic lock or monitor lock. (The API specification often refers to this entity simply as a "monitor.")
There is a good reason that the monitor belongs to an object, and not to an individual block: the monitor is there to protect the state of the object. Objects should be designed to be cohesive, making it likely that instance variables will end up being referenced by multiple methods; the safe thing to do, in order to guarantee that the object is always in a consistent state, is to allow only one synchronized method on that object to execute at a time.
The term "monitor" comes from Concurrent Pascal. See Per Brinch Hansen's paper "Java's Insecure Parallelism", which argues that Java doesn't actually implement monitors:
Gosling (1996, p. 399) claims that Java uses monitors to synchronize threads. Unfortunately, a closer inspection reveals that Java does not support a monitor concept:
Unless they are declared as synchronized, Java class methods are unsynchronized.
Unless they are declared as private, Java class variables are public (within a package)
Another quote from the same paper:
The failure to give an adequate meaning to thread interaction is a very deep flaw of Java that vitiates the conceptual integrity of the monitor concept.

Java Concurrent Collections and visiblity

I'm getting a little unsure about what to expect from Concurrent Collections (e.g. ConcurrentMap) regarding visibility of the data in the collection.
A: Thread1 puts a complex object and Thread2 gets it. Will all attributes be visible in Thread2?
B: Thread1 puts a complex object and later changes some attributes. Then Thread2 gets it, will all changes be visible in Thread2?
I guess B is false, and if so I should synchronize every access on the complex object?
Pushing to a concurrent collection is defined as publishing it. See "Memory Consistency Properties" in the Package description.
This means if you just change a stored object, you do not get automatically a happens before relationship. You would need to make those changes synchronied/volatile or using a concurrent primitive itself.
A: If the object is immutable or if the object is mutable but all the properties are set before the object is added to the collection then yes, they will be all visible.
B: If no synchronisation mechanisms are in place then it is not guaranteed, it depends when the thread 2 accesses the object.
If you need this sort of behaviour guaranteed (i.e. the reading thread to be guaranteed to see all the modifications made by the mutator thread in a transactional-like manner) I suggest you set up a semaphoring mechanism. Even better, it would be simpler if you use immutable objects.

Does Volatile variable makes sense here(multi-core processor)?

I declared a instance variable as voltile. Say two threads are created by two processors under multi core where thread updates the variable. To ensure
instantaneous visibilty, I believe declaring variable as volatile is right choice here so that update done by thread happens in main memory and is visible to another thread .
Right?
Intention here to understand the concept in terms of multicore processor.
I am assuming you are considering using volatile vs. not using any special provisions for concurrency (such as synchronized or AtomicReference).
It is irrelevant whether you are running single-code or multicore: sharing data between threads is never safe without volatile. There are many more things the runtime is allowed to do without it; basically it can pretend the accessing thread is the only thread running on the JVM. The thread can read the value once and store it on the call stack forever; a loop reading the value, but never writing it, may be transformed such that the value is read only once at the outset and never reconsidered, and so on.
So the message is simple: use volatile—but that's not necessarily all you need to take care of in concurrent code.
It doesn't matter if it's done by different processors or not. When you don't have mult-processors, you can still run into concurrency problems because context switches may happen any time.
If a field is not volatile, it may still be in one thread's cache while its context is switched out and the other thread's context switches in. In that case, the thread that just took over the (single) processor will not see that the field has changed.
Since these things can happen even with one processor, they are bound to happen with more than one processor, so indeed, you need to protect your shared data.
Whether volatile is the right choice or not depends on what type it is and what kind of change you are trying to protect from. But again, that has nothing to do with the number of processors.
If the field is a reference type, then volatile only ensures the vilibility of new assignments to the field. It doesn't protect against changes in the object it points to - for that you need to synchronize.

Categories

Resources