Immutable Objects and Initialization Safety with Thread Safety - java

A note from the The Java Concurrency In Practice
Immutable objects can be used safely by any thread without additional
synchronization, even when synchronization is not used to publish them
What I get is that they are thread - safe.
A snippet from Jeremy Manson blog-
class String {
// Don't do this, either.
static String lastConstructed;
private final byte[] bytes;
public String(byte[] value) {
bytes = new byte[value.length];
System.arraycopy(value, 0, bytes, 0, value.length);
lastConstructed = this;
}
}
Since the this reference is being stored in lastConstructed" hence Escaping the constructor
Ofcourse, it would work if you made lastConstructed volatile (postJDK5+ semantics)
One of the questions asked there is-
If lastConstructed was volatile but then the reference was unsafely
published to another thread, then the String would not be immutable.
Right?
to which Jeremy's reply was-
It wouldn't be thread-safe because it was immutable, but it would be thread-safe because lastConstructed was volatile.
I perfectly understand that it would be thread-safe because lastConstructed was volatile, but I don't get It wouldn't be thread-safe because it was immutable.
Why? The note from Concurrency In Practice says Immutable objects can be used safely by any thread (i.e Thread Safety guarantee). If something is immutable, then it is thread safe.
Please suggest.

Though #Peter Lawrey has explained the details and problems of designing thread safe and immutable classes, and based on the further discussion, I think the question hasn't received the direct answer. So, I'd like to elaborate a bit:
The main problem in the understanding the phrase "Immutable objects can be used safely by any thread" is that it's not complete per se. It relies on the premise that any object must also be safely published to be thread-safe. Immutable objects is not an exception. So, the full phrase should be "Immutable and safely published objects can be used safely by any thread."
The problem in the String example is that it allows the reference to the object to escape from the constructor, thus presenting to other threads potentially invalid object state. For instance, if the compiler decides to optimize the constructor and rearrange operations for performance reasons this way:
public String(byte[] value) {
bytes = new byte[value.length];
lastConstructed = this;
System.arraycopy(value, 0, bytes, 0, value.length);
}
some other thread that reads lastConstructed will be able to see the string that is not fully constructed. So, the class is not thread safe, even though its instances are immutable.
Here we come to the meaning of the phrase "It wouldn't be thread-safe because it was immutable". It means that immutability alone doesn't guarantee thread safety and the example proves it.
Making lastConstructed volatile would force the compiler to emit a memory barrier that would prevent the optimizer from rearranging operations in the way described above. It would guarantee that the array copying always happens before the assigning lastConstructed = this. As the result, another thread, that reads lastConstructed, would never see an underconstructed string. It would also guarantee that other threads would always read the actual value of lastConstructed. Thats why "it would be thread-safe because lastConstructed was volatile" in this particular case.

A common mis-understanding is that you have Object fields in Java. You only have references and primitives. This means that
static String lastConstructed;
The field lastConstructed is a mutable reference. It's visibility is not thread safe. Having an immutable object doesn't confer any properties on a reference to that object.
Similarly if you have a final field, this doesn't make your Object immutable.
final Date today = new Date();
today is not immutable just because you made one reference to it final.
A more subtle on is in the use of volatile You have to be careful as to whether you are reading or write the volatile value. Even if you do
volatile Date now = new Date();
now.setTime(System.currentTimeMillis()); // no thread safe.
There is two reasons this is not thread safe. 1) The access to now is a read not a write and secondly it occurs before the write in any case. What is required is a write barrier after. Something you see what appears to be nonsense.
now = now; // this adds a write barrier.
A related myth is that if you use a thread safe collection, any series of operations you perform is also thread safe. It's a bit like fairy dust, you just sprinkle it around and many of your bugs disappear, but this doesn't mean you really are thread safe.
In short, thread safety is like a chain of dependencies, if any aspect of how data is accessed is not thread safe, none of it is.
Note: adding some thread safe constructs can hide thread safety issues, but it doesn't fix them, it just means that some change in JVM, or OS or hardware will break your code in the future.

Related

Do i need a lock at all if only 1 thread updates the value? [duplicate]

public class Test{
private MyObj myobj = new MyObj(); //it is not volatile
public class Updater extends Thred{
myobje = getNewObjFromDb() ; //not am setting new object
}
public MyObj getData(){
//getting stale date is fine for
return myobj;
}
}
Updated regularly updates myobj
Other classes fetch data using getData
IS this code thread safe without using volatile keyword?
I think yes. Can someone confirm?
No, this is not thread safe. (What makes you think it is?)
If you are updating a variable in one thread and reading it from another, you must establish a happens-before relationship between the write and the subsequent read.
In short, this basically means making both the read and write synchronized (on the same monitor), or making the reference volatile.
Without that, there are no guarantees that the reading thread will see the update - and it wouldn't even be as simple as "well, it would either see the old value or the new value". Your reader threads could see some very odd behaviour with the data corruption that would ensue. Look at how lack of synchronization can cause infinite loops, for example (the comments to that article, especially Brian Goetz', are well worth reading):
The moral of the story: whenever mutable data is shared across threads, if you don’t use synchronization properly (which means using a common lock to guard every access to the shared variables, read or write), your program is broken, and broken in ways you probably can’t even enumerate.
No, it isn't.
Without volatile, calling getData() from a different thread may return a stale cached value.
volatile forces assignments from one thread to be visible on all other threads immediately.
Note that if the object itself is not immutable, you are likely to have other problems.
You may get a stale reference. You may not get an invalid reference.
The reference you get is the value of the reference to an object that the variable points to or pointed to or will point to.
Note that there are no guarantees how much stale the reference may be, but it's still a reference to some object and that object still exists. In other words, writing a reference is atomic (nothing can happen during the write) but not synchronized (it is subject to instruction reordering, thread-local cache et al.).
If you declare the reference as volatile, you create a synchronization point around the variable. Simply speaking, that means that all cache of the accessing thread is flushed (writes are written and reads are forgotten).
The only types that don't get atomic reads/writes are long and double because they are larger than 32-bits on 32-bit machines.
If MyObj is immutable (all fields are final), you don't need volatile.
The big problem with this sort of code is the lazy initialization. Without volatile or synchronized keywords, you could assign a new value to myobj that had not been fully initialized. The Java memory model allows for part of an object construction to be executed after the object constructor has returned. This re-ordering of memory operations is why the memory-barrier is so critical in multi-threaded situations.
Without a memory-barrier limitation, there is no happens-before guarantee so you do not know if the MyObj has been fully constructed. This means that another thread could be using a partially initialized object with unexpected results.
Here are some more details around constructor synchronization:
Constructor synchronization in Java
Volatile would work for boolean variables but not for references. Myobj seems to perform like a cached object it could work with an AtomicReference. Since your code extracts the value from the DB I'll let the code stay as is and add the AtomicReference to it.
import java.util.concurrent.atomic.AtomicReference;
public class AtomicReferenceTest {
private AtomicReference<MyObj> myobj = new AtomicReference<MyObj>();
public class Updater extends Thread {
public void run() {
MyObj newMyobj = getNewObjFromDb();
updateMyObj(newMyobj);
}
public void updateMyObj(MyObj newMyobj) {
myobj.compareAndSet(myobj.get(), newMyobj);
}
}
public MyObj getData() {
return myobj.get();
}
}
class MyObj {
}

Explain how JIT reordering works

I have been reading a lot about synchronization in Java and all the problems that can occur. However, what I'm still slightly confused about is how the JIT can reorder a write.
For instance, a simple double check lock makes sense to me:
class Foo {
private volatile Helper helper = null; // 1
public Helper getHelper() { // 2
if (helper == null) { // 3
synchronized(this) { // 4
if (helper == null) // 5
helper = new Helper(); // 6
}
}
return helper;
}
}
We use volatile on line 1 to enforce a happens-before relationship. Without it, is entirely possible for the JIT to reoder our code. For example:
Thread 1 is at line 6 and memory is allocated to helper however, the constructor has not yet run because the JIT could reordering our code.
Thread 2 comes in at line 2 and gets an object that is not fully created yet.
I understand this, but I don't fully understand the limitations that the JIT has on reordering.
For instance, say I have a method that creates and puts a MyObject into a HashMap<String, MyObject> (I know that a HashMapis not thread safe and should not be used in a multi-thread environment, but bear with me). Thread 1 calls createNewObject:
public class MyObject {
private Double value = null;
public MyObject(Double value) {
this.value = value;
}
}
Map<String, MyObject> map = new HashMap<String, MyObject>();
public void createNewObject(String key, Double val){
map.put(key, new MyObject( val ));
}
At the same time thread 2 calls a get from the Map.
public MyObject getObject(String key){
return map.get(key);
}
Is it possible for thread 2 to receive an object from getObject(String key) that is not fully constructed? Something like:
Thread 1: Allocate memory for new MyObject( val )
Thread 1: Place object in map
Thread 2: call getObject(String key)
Thread 1: Finish constructing the new MyObject.
Or will map.put(key, new MyObject( val )) not put an object into the map until it's fully constructed?
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
In a nutshell can it only reorder when creating a new Object and assigning it to a reference variable, like the double checked lock? A complete rundown on the JIT might be much for a SO answer, but what I'm really curious about is how it can reorder a write (like line 6 on the double checked lock) and what stops it from putting an object into a Map that is not fully constructed.
WARNING: WALL OF TEXT
The answer to your question is before the horizontal line. I will continue to explain deeper the fundamental problem in the second portion of my answer (which is not related to the JIT, so that's it if you are only interested in the JIT). The answer to the second part of your question lies at the bottom because it relies on what I describe further.
TL;DR The JIT will do whatever it wants, the JMM will do whatever it wants, being valid under the condition that you let them by writing thread unsafe code.
NOTE: "initialization" refers to what happens in the constructor, which excludes anything else such as calling a static init method after constructing etc...
"If the reordering produces results consistent with a legal execution, it is not illegal." (JLS 17.4.5-200)
If the result of a set of actions conforms to a valid chain of execution as per the JMM, then the result is allowed regardless of whether the author intended the code to produce that result or not.
"The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.
This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization" (JLS 17.4).
The JIT will reorder whatever it sees fit unless we do not allow it using the JMM (in a multithreaded environment).
The details of what the JIT can or will do is nondeterministic. Looking at millions of samples of runs will not produce a meaningful pattern because reorderings are subjective, they depend on very specific details such as CPU arch, timings, heuristics, graph size, JVM vendor, bytecode size, etc... We only know that the JIT will assume that the code runs in a single threaded environment when it does not need to conform to the JMM. In the end, the JIT matters very little to your multithreaded code. If you want to dig deeper, see this SO answer and do a little research on such topics as IR Graphs, the JDK HotSpot source, and compiler articles such as this one. But again, remember that the JIT has very little to do with your multithreaded code transforms.
In practice, the "object that is not fully created yet" is not a side effect of the JIT but rather the memory model (JMM). In summary, the JMM is a specification that puts forth guarantees of what can and cannot be results of a certain set of actions, where actions are operations that involve a shared state. The JMM is more easily understood by higher level concepts such as atomicity, memory visibility, and ordering, those three of which are components of a thread-safe program.
To demonstrate this, it is highly unlikely for your first sample of code (the DCL pattern) to be modified by the JIT that would produce "an object that is not fully created yet." In fact, I believe that it is not possible to do this because it would not follow the order or execution of a single-threaded program.
So what exactly is the problem here?
The problem is that if the actions aren't ordered by a synchronization order, a happens-before order, etc... (described again by JLS 17.4-17.5) then threads are not guaranteed to see the side effects of performing such actions. Threads might not flush their caches to update the field, threads might observe the write out of order. Specific to this example, threads are allowed to see the object in an inconsistent state because it is not properly published. I'm sure that you have heard of safe publishing before if you have ever worked even the tiniest bit with multithreading.
You might ask, well if single-threaded execution cannot be modified by the JIT, why can the multithreaded version be?
Put simply, it's because the thread is allowed to think ("perceive" as usually written in textbooks) that the initialization is out of order due to the lack of proper synchronization.
"If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic" (The "Double-Checked Locking is Broken" Declaration).
Making the object immutable ensures that the state is fully initialized when the constructor exits.
Remember that object construction is always unsynchronized. An object that is being initialized is ONLY visible and safe with respect to the thread that constructed it. In order for other threads to see the initialization, you must publish it safely. Here are those ways:
"There are a few trivial ways to achieve safe publication:
Exchange the reference through a properly locked field (JLS 17.4.5)
Use static initializer to do the initializing stores (JLS 12.4)
Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
Initialize the value into a final field (JLS 17.5)."
(Safe Publication and Safe Initialization in Java)
Safe publication ensures that other threads will be able to see the fully initialized objects when after it finishes.
Revisiting our idea that threads are only guaranteed to see side effects if they are in order, the reason that you need volatile is so that your write to the helper in thread 1 is ordered with respect to the read in thread 2. Thread 2 is not allowed to perceive the initialization after the read because it occurs before the write to helper. It piggy backs on the volatile write such that the read must happen after the initialization AND THEN the write to the volatile field (transitive property).
To conclude, an initialization will only occur after the object is created only because another thread THINKS that is the order. An initialization will never occur after construction due to a JIT optimisation. You can fix this by ensuring proper publication through a volatile field or by making your helper immutable.
Now that I've described the general concepts behind how publication works in the JMM, hopefully understanding how your second example won't work will be easy.
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
To the constructing thread, it will put it into the map after initialization.
To the reader thread, it can see whatever the hell it wants. (improperly constructed object in HashMap? That is definitely within the realm of possibility).
What you described with your 4 steps is completely legal. There is no order between assigning value or adding it to the map, thus thread 2 can perceive the initialization out of order since MyObject was published unsafely.
You can actually fix this problem by just converting to ConcurrentHashMap and getObject() will be completely thread safe as once you put the object in the map, the initialization will occur before the put and both will need to occur before the get as a result of ConcurrentHashMap being thread safe. However, once you modify the object, it will become a management nightmare because you need to ensure that updating the state is visible and atomic - what if a thread retrieves an object and another thread updates the object before the first thread could finish modifying and putting it back in the map?
T1 -> get() MyObject=30 ------> +1 --------------> put(MyObject=31)
T2 -------> get() MyObject=30 -------> +1 -------> put(MyObject=31)
Alternatively you could also make MyObject immutable, but you still need to map the map ConcurrentHashMap in order for other threads to see the put - thread caching behavior might cache an old copy and not flush and keep reusing the old version. ConcurrentHashMap ensures that its writes are visible to readers and ensures thread-safety. Recalling our 3 prerequisites for thread-safety, we get visibility from using a thread-safe data structure, atomicity by using an immutable object, and finally ordering by piggybacking on ConcurrentHashMap's thread safety.
To wrap up this entire answer, I will say that multithreading is a very difficult profession to master, one that I myself most definitely have not. By understanding concepts of what makes a program thread-safe and thinking about what the JMM allows and guarantees, you can ensure that your code will do what you want it to do. Bugs in multithreaded code occur often as a result of the JMM allowing a counterintuitive result that is within its parameters, not the JIT doing performance optimisations. Hopefully you will have learned something a little bit more about multithreading if you read everything. Thread safety should be achieved by building a repertoire of thread-safe paradigms rather than using little inconveniences of the spec (Lea or Bloch, not even sure who said this).

Are Mutable Atomic References a Bad Idea?

I have a data structure that I occasionally wish to modify, and occasionally wish to replace outright. At the moment, I'm storing this in an AtomicReference, and using synchrnonized blocks (synchronized on the AtomicReference itself, not its stored value) when I need to modify it, rather than replace it.
So something like:
public void foo(AtomicReference reference){
synchronized(reference){
reference.get()
.performSomeModification();
}
}
Notice that the modifying call is a member of the wrapped value, not the atomic reference, and is not guaranteed to have any thread safety of its own.
Is this safe? Findbugs (a freeware code reviewing tool) had this to say about it, so now I'm worried there's something happening under the hood, where it may be prematurely releasing the lock or something. I've also seen documentation referencing AtomicReference as specifically for immutable things.
Is this safe? If it isn't I could create my own Reference-storing class that I would be more certain about the behavior of, but I don't want to jump to conclusions.
From the linked documentation:
For example, synchronizing on an AtomicBoolean will not prevent other threads from modifying the AtomicBoolean.
It can't prevent other threads from modifying the AtomicBoolean because it can't force other threads to synchronize on the AtomicBoolean.
If I understand your question correctly, your intention is to synchronize calls to performSomeModification(). The code you've written will achieve that, if and only if every call to performSomeModification() is synchronized on the same object. As in the example from the docs, the basic problem is the enforceability of that requirement. You can't force other callers to synchronize on the AtomicReference. You or some other developer who comes after you could easily call performSomeModification() without external synchronization.
You should make it hard to use your API incorrectly. Since AtomicReference is a generic type (AtomicReference<V>), you can enforce the synchronization in a variety of ways, depending on what V is:
If V is an interface, you could easily wrap the instance in a synchronized wrapper.
If V is a class that you can modify, you could synchronize performSomeModification(), or create a subclass in which it is synchronized. (Possibly an anonymous subclass produced by a factory method.)
If V is a class that you cannot modify, it may be difficult to wrap. In that case, you could encapsulate the AtomicReference in a class that you do control, and have that class perform the required synchronization.
Are Mutable Atomic References a Bad Idea?
Definitely not! AtomicReference is designed to provide thread-safe, atomic updates of the underlying reference. In fact, the Javadoc description of AtomicReference is:
An object reference that may be updated atomically.
So they most definitely are designed to be mutated!
Is this safe?
It depends on what you mean by "safe", and what the rest of your code is doing. There's nothing inherently unsafe about your snippet of code in isolation. It's perfectly valid, though perhaps a bit unusual, to synchronize on an AtomicReference. As a developer unfamiliar with this code, I would see the synchronization on reference and assume that it means that the underlying object may be replaced at any time, and you want to make sure your code is always operating on the "newest" reference.
The standard best practices for synchronization apply, and violating them could result in unsafe behavior. For example, since you say performSomeModification() is not thread-safe, it would be unsafe if you accessed the underlying object somewhere else without synchronizing on reference.
public void bar(AtomicReference reference) {
// no synchronization: performSomeModification could be called on the object
// at the same time another thread is executing foo()
reference.get().performSomeModification();
}
If could also be "unsafe" if your application requires that only one instance of the underlying object be operated on at any one time, and you haven't synchronized on the reference when .set()ing it:
public void makeNewFoo(AtomicReference reference) {
// no synchronication on "reference", so it may be updated by another thread
// while foo() is executing performSomeModification() on the "old" reference
SomeObject foo = new SomeObject();
reference.set(foo);
}
If you need to synchronize on the AtomicReference, do so, it's perfectly safe. But I would highly recommend adding a few code comments about why you're doing it.

Immutable objects are thread safe, but why?

Lets say for example, a thread is creating and populating the reference variable of an immutable class by creating its object and another thread kicks in before the first one completes and creates another object of the immutable class, won't the immutable class usage be thread unsafe?
Creating an immutable object also means that all fields has to be marked as final.
it may be necessary to ensure correct behavior if a reference to
a newly created instance is passed from one thread to another without
synchronization
Are they trying to say that the other thread may re-point the reference variable to some other object of the immutable class and that way the threads will be pointing to different objects leaving the state inconsistent?
Actually immutable objects are always thread-safe, but its references may not be.
Confused?? you shouldn't be:-
Going back to basic:
Thread-safe simply means that two or more threads must work in coordination on the shared resource or object. They shouldn't over-ride the changes done by any other thread.
Now String is an immutable class, whenever a thread tries to change it, it simply end up creating a new object. So simply even the same thread can't make any changes to the original object & talking about the other thread would be like going to Sun but the catch here is that generally we use the same old reference to point that newly created object.
When we do code, we evaluate any change in object with the reference only.
Statement 1:
String str = "123"; // initially string shared to two threads
Statement 2:
str = str+"FirstThread"; // to be executed by thread one
Statement 3:
str=str+"SecondThread"; // to be executed by thread two
Now since there is no synchronize, volatile or final keywords to tell compiler to skip using its intelligence for optimization (any reordering or caching things), this code can be run in following manner.
Load Statement2, so str = "123"+"FirstThread"
Load Statement3, so str = "123"+"SecondThread"
Store Statement3, so str = "123SecondThread"
Store Statement2, so str = "123FirstThread"
and finally the value in reference str="123FirstThread" and for sometime if we assume that luckily our GC thread is sleeping, that our immutable objects still exist untouched in our string pool.
So, Immutable objects are always thread-safe, but their references may not be. To make their references thread-safe, we may need to access them from synchronized blocks/methods.
In addition to other answers posted already, immutable objects once created, they cannot be modified further. Hence they are essentially read-only.
And as we all know, read-only things are always thread-safe. Even in databases, multiple queries can read same rows simultaneously, but if you want to modify something, you need exclusive lock for that.
Immutable objects are thread safe, but why?
An immutable object is an object that is no longer modified once it has been constructed. If in addition, the immutable object is only made accessible to other thread after it has been constructed, and this is done using proper synchronization, all threads will see the same valid state of the object.
If one thread is creating populating the reference variable of the immutable class by creating its object and at the second time the other thread kicks in before the first thread completes and creates another object of the immutable class, won't the immutable class usage be thread unsafe?
No. What makes you think so? An object's thread safety is completely unaffected by what you do to other objects of the same class.
Are they trying to say that the other thread may re-point the reference variable to some other object of the immutable class and that way the threads will be pointing to different objects leaving the state inconsistent?
They are trying to say that whenever you pass something from one thread to another, even if it is just a reference to an immutable object, you need to synchronize the threads. (For instance, if you pass the reference from one thread to another by storing it in an object or a static field, that object or field is accessed by several threads, and must be thread-safe)
Thread safety is data sharing safety, And because in your code you make decisions based on the data your objects hold, the integrity and deterministic behaviour of it is vital. i.e
Imagine we have a shared boolean instance variable across two threads that are about to execute a method with the following logic
If flag is false, then I print "false" and then I set the flag back to true.
If flag is true, then I print "true" and then I set the flag back to false.
If you run continuously in a single thread loop, you will have a deterministic output which will look like:
false - true - false - true - false - true - false ...
But, if you ran the same code with two threads, then, the output of your output is not deterministic anymore, the reason is that the thread A can wake up, read the flag, see that is false, but before it can do anything, thread B wakes up and reads the flag, which is also false!! So both will print false... And this is only one problematic scenario I can think of... As you can see, this is bad.
If you take out the updates of the equation the problem is gone, just because you are eliminating all the risks associated with data sync. that's why we say that immutable objects are thread safe.
It is important to note though, that immutable objects are not always the solution, you may have a case of data that you need to share among different threads, in this cases there are many techniques that go beyond the plain synchronization and that can make a whole lot of difference in the performance of your application, but this is a complete different subject.
Immutable objects are important to guarantee that the areas of the application that we are sure that don't need to be updated, are not updated, so we know for sure that we are not going to have multithreading issues
You probably might be interested in taking a look at a couple of books:
This is the most popular: http://www.amazon.co.uk/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/ref=sr_1_1?ie=UTF8&qid=1329352696&sr=8-1
But I personally prefer this one: http://www.amazon.co.uk/Concurrency-State-Models-Java-Programs/dp/0470093552/ref=sr_1_3?ie=UTF8&qid=1329352696&sr=8-3
Be aware that multithreading is probably the trickiest aspect of any application!
Immutability doesn't imply thread safety.In the sense, the reference to an immutable object can be altered, even after it is created.
//No setters provided
class ImmutableValue
{
private final int value = 0;
public ImmutableValue(int value)
{
this.value = value;
}
public int getValue()
{
return value;
}
}
public class ImmutableValueUser{
private ImmutableValue currentValue = null;//currentValue reference can be changed even after the referred underlying ImmutableValue object has been constructed.
public ImmutableValue getValue(){
return currentValue;
}
public void setValue(ImmutableValue newValue){
this.currentValue = newValue;
}
}
Two threads will not be creating the same object, so no problem there.
With regards to 'it may be necessary to ensure...', what they are saying is that if you DON'T make all fields final, you will have to ensure correct behavior yourself.

visibility of immutable object after publication

I have an immutable object, which is capsulated in class and is global state.
Lets say i have 2 threads that get this state, execute myMethod(state) with it. And lets say thread1 finish first. It modify the global state calling GlobalStateCache.updateState(state, newArgs);
GlobalStateCache {
MyImmutableState state = MyImmutableState.newInstance(null, null);
public void updateState(State currentState, Args newArgs){
state = MyImmutableState.newInstance(currentState, newArgs);
}
}
So thread1 will update the cached state, then thread2 do the same, and it will override the state (not take in mind the state updated from thread1)
I searched google, java specifications and read java concurrency in practice but this is clearly not specified.
My main question is will the immutable state object value be visible to a thread which already had read the immutable state. I think it will not see the changed state, only reads after the update will see it.
So i can not understand when to use immutable objects? Is this depends on if i am ok with concurrent modifications during i work with the latest state i have saw and not need to update the state?
Publication seems to be somewhat tricky concept, and the way it's explained in java concurrency in practice didn't work well to me (as opposed to many other multithreading concepts explained in this wonderful book).
With above in mind, let's first get clear on some simpler parts of your question.
when you state lets say thread1 finish first - how would you know that? or, to be more precise, how would thread2 "know" that? as far as I can tell this could be only possible with some sort of synchronization, explicit or not-so-explicit like in thread join (see the JLS - 17.4.5 Happens-before Order). Code you provided so far does not give sufficient details to tell whether this is the case or not
when you state that thread1 will update the cached state - how would thread2 "know" that? with the piece of code you provided, it looks entirely possible (but not guaranteed mind you) for thread2 to never know about this update
when you state thread2... will override the state what does override mean here? There's nothing in GlobalStateCache code example that could somehow guarantee that thread1 will ever notice this override. Even more, the code provided suggests nothing that would somehow impose happen-before relation of updates from different threads so one can even speculate that override may happen the other way around, you see?
the last but not the least, the wording the immutable state sounds rather fuzzy to me. I would say dangerously fuzzy given this tricky subject. The field state is mutable, it can be changed, well, by invoking method updateState right? From your code I would rather conclude that instances of MyImmutableState class are assumed to be immutable - at least that's what name tells me.
With all above said, what is guaranteed to be visible with the code you provided so far? Not much I'm afraid... but maybe better than nothing at all. The way I see it is...
For thread1, it is guaranteed that prior to invoking updateState it will see either null or properly constructed (valid) object updated from thread2. After the update, it is guaranteed to see either of properly constructed (valid) objects updated from thread1 or thread2. Note after this update thread1 is guaranteed not to see null per the very JLS 17.4.5 I refer to above ("...x and y are actions of the same thread and x comes before y in program order...")
For thread2, guarantees are pretty similar to above.
Essentially, all that is guaranteed with the code you provided is that both threads will see either null or one of properly constructed (valid) instances of MyImmutableState class.
Above guarantees may look insignificant at the first glance, but if you skim one page above the one with quote that confused you ("Immutable objects can be used safely etc..."), you'll find an example worth deeper drilling into in 3.5.1. Improper Publication: When Good Objects Go Bad.
Yeah object being immutable alone won't guarantee its visibility but it at least will guarantee that the object won't "explode from inside", like in example provided in 3.5.1:
public class Holder {
private int n;
public Holder(int n) { this.n = n; }
public void assertSanity() {
if (n != n)
throw new AssertionError("This statement is false.");
}
}
Goetz comments for above code begin at explaining issues true for both mutable and immutable objects, ...we say the Holder was not properly published. Two things can go wrong with improperly published objects. Other threads could see a stale value for the holder field, and thus see a null reference or other older value even though a value has been placed in holder...
...then he dives into what can happen if object is mutable, ...But far worse, other threads could see an up-todate value for the holder reference, but stale values for the state of the Holder. To make things even less predictable, a thread may see a stale value the first time it reads a field and then a more up-to-date value the next time, which is why assertSanity can throw AssertionError.
Above "AssertionHorror" may sound counter-intuitive but all the magic goes away if you consider scenario like below (completely legal per Java 5 memory model - and for a good reason btw):
thread1 invokes sharedHolderReference = Holder(42);
thread1 first fills n field with default value (0) then is going to assign it within constructor but...
...but scheduler switches to thread2,
sharedHolderReference from thread1 becomes visible to thread2 because, say because why not? maybe optimizing hot-spot compiler decided it's a good time for that
thread2 reads the up-todate sharedHolderReference with field value still being 0 btw
thread2 invokes sharedHolderReference.assertSanity()
thread2 reads the left side value of if statement within assertSanity which is, well, 0 then it is going to read the right side value but...
...but scheduler switches back to thread1,
thread1 completes the constructor assignment suspended at step #2 above by setting n field value 42
value 42 in the field n from thread1 becomes visible to thread2 because, say because why not? maybe optimizing hot-spot compiler decided it's a good time for that
then, at some moment later, scheduler switches back to thread2
thread2 proceeds from where it was suspended at step #6 above, ie it reads right-hand side of if statement, which is, well, 42 now
oops our innocent if (n != n) suddenly turns into if (0 != 42) which...
...naturally throws AssertionError
As far as I understand, initialization safety for immutable objects just guarantees that above won't happen - no more... and no less
I think the key is to distinguish between objects and references.
The immutable objects are safe to publish, so any thread can publish object, and if any other thread reads a reference to such object - it can safely use the object. Of course, reader thread will see the immutable object state that was published at the moment the thread read the reference, it will not see any updates, until it reads the reference again.
It is very useful in many situations. E.g. if there is a single publisher, and many readers - and readers need to see a consistent state. The readers periodically read the reference, and work on the obtained state - it is guaranteed to be consistent, and it does not require any locking on reader thread. Also when it is OK to loose some updates, e.g. you don't care which thread updates the state.
If I understand your question, immutability doesn't seem to be relevant here. You're just asking whether threads will see updates to shared objects.
[Edit] after an exchange of comments, I now see that you need also to hold a reference to your shared singleton state while doing some actions, and then setting the state to reflect that action.
The good news, as before, is that providing this will of necessity also solve your memory consistency issue.
Instead of defining separate synchronized getState and updateState methods, you'll have to perform all three actions without being interrupted: getState, yourAction, and updateState.
I can see three ways to do it:
1) Do all three steps inside a single synchronized method in GlobalStateCache. Define an atomic doActionAndUpdateState method in GlobalStateCache, synchronized of course on your state singleton, which would take a functor object to do your action.
2) Do getState and updateState as separate calls, and change updateState so that it checks to be sure state hasn't changed since the get. Define getState and checkAndUpdateState in GlobalStateCache. checkAndUpdateState will take the original state caller got from getState, and must be able to check if state has changed since your get. If it has changed, you'll need to do something to let caller know they potentially need to revert their action (depends on your use case).
3) Define a getStateWithLock method in GlobalStateCache. This implies that you'll also need to assure callers release their lock. I'd create an explicit releaseStateLock method, and have your updateState method call it.
Of these, I advise against #3, because it leaves you vulnerable to leaving that state locked in the event of some kinds of bugs. I'd also advise (though less strongly) against #2, because of the complexity it creates with what happens in the event that the state has changed: do you just abandon the action? Do you retry it? Must it be (can it be) reverted? I'm for #1: a single synchronized atomic method, which will look something like this:
public interface DimitarActionFunctor {
public void performAction();
}
GlobalStateCache {
private MyImmutableState state = MyImmutableState.newInstance(null, null);
public MyImmutableState getState {
synchronized(state) {
return state;
}
}
public void doActionAndUpdateState(DimitarActionFunctor functor, State currentState, Args newArgs){
synchronized(state) {
functor.performAction();
state = MyImmutableState.newInstance(currentState, newArgs);
}
}
}
}
Caller then constructs a functor for the action (an instance of DimitarActionFunctor), and calls doActionAndUpdateState. Of course, if the actions need data, you'll have to define your functor interface to take that data as arguments.
Again, I point you to this question, not for the actual difference, but for how they both work in terms of memory consistency: Difference between volatile and synchronized in Java
So much depends on the actual use case here that it's hard to make a recommendation, but it looks like you want some sort of Compare-And-Set semantics for the GlobalStateCache, using a java.util.concurrent.atomic.AtomicReference.
public class GlobalStateCache {
AtomicReference<MyImmutableState> atomic = new AtomicReference<MyImmutableState>(MyImmutableState.newInstance(null, null);
public State getState()
{
return atomic.get();
}
public void updateState( State currentState, Args newArgs )
{
State s = currentState;
while ( !atomic.compareAndSet( s, MyImmutableState.newInstance( s, newArgs ) ) )
{
s = atomic.get();
}
}
}
This, of course, depends on the expense of potentially creating a few extra MyImmutableState objects, and whether you need to re-run myMethod(state) if the state has been updated underneath, but the concept should be correct.
Answering you "main" question: no Thread2 will not see the change. Immutable objects do not change :-)
So if Thread1 read state A and then Thread2 stores state B, Thread1 should read the variable again to see the changes.
Visibily of variables is affected by volatile keyword. If variable is declared as volatile then Java guarantees that if one thread updates the variable all other threads will see the change immediately (at the cost of speed).
Still immutable objects are very useful in multithreaded environments. I will give you an example how I used it once. Lets say you have an object that is periodically changed (life field in my case) by one thread and it is somehow processed by other threads (my program was sending it to clients over the network). These threads fail if the object is changed in the middle of processing (they send inconsistent life field state). If you make this object immutable and will create a new instance every time it changes, you don't have to write any synchronization at all. Updating thread will periodically publish new versions of an object and every time other threads read it they will have most recent version of it and can safely process it. This particular example saves time spent on synchronization but wastes more memory. This should give you a general understanding when you can use them.
Another link I found: http://download.oracle.com/javase/tutorial/essential/concurrency/immutable.html
Edit (answer comment):
I will explain my task. I had to write a network server that will send clients the most recent life field and will constantly update it. With design mentioned above, I have two types of threads:
Thread that updates object that represents life field. It is immutable, so actually it creates a new instance every time and only changes reference. The fact that reference is declared volatile is crucial.
Every client is served with its own thread. When client requests life field, it reads the reference once and starts sending. Because network connection can be slow, life field can be updated many times while this thread sends data. The fact that object is immutable guarantees that server will send consistent state of life field. In the comments you are concerned about changes made while this thread processes the object. Yes, when client receives data it may not be up to date but there is nothing you can do about it. This is not synchronization issue but rather a slow connection issue.
I am not stating that immutable objects can solve all of your concurrency problems. This is obviously not true and you point this out. I am trying to explain you where it actually can solve problems. I hope my example is clear now.

Categories

Resources