Are Mutable Atomic References a Bad Idea?

Are Mutable Atomic References a Bad Idea? - java

I have a data structure that I occasionally wish to modify, and occasionally wish to replace outright. At the moment, I'm storing this in an AtomicReference, and using synchrnonized blocks (synchronized on the AtomicReference itself, not its stored value) when I need to modify it, rather than replace it.
So something like:
public void foo(AtomicReference reference){
synchronized(reference){
reference.get()
.performSomeModification();
}
}
Notice that the modifying call is a member of the wrapped value, not the atomic reference, and is not guaranteed to have any thread safety of its own.
Is this safe? Findbugs (a freeware code reviewing tool) had this to say about it, so now I'm worried there's something happening under the hood, where it may be prematurely releasing the lock or something. I've also seen documentation referencing AtomicReference as specifically for immutable things.
Is this safe? If it isn't I could create my own Reference-storing class that I would be more certain about the behavior of, but I don't want to jump to conclusions.

From the linked documentation:
For example, synchronizing on an AtomicBoolean will not prevent other threads from modifying the AtomicBoolean.
It can't prevent other threads from modifying the AtomicBoolean because it can't force other threads to synchronize on the AtomicBoolean.
If I understand your question correctly, your intention is to synchronize calls to performSomeModification(). The code you've written will achieve that, if and only if every call to performSomeModification() is synchronized on the same object. As in the example from the docs, the basic problem is the enforceability of that requirement. You can't force other callers to synchronize on the AtomicReference. You or some other developer who comes after you could easily call performSomeModification() without external synchronization.
You should make it hard to use your API incorrectly. Since AtomicReference is a generic type (AtomicReference<V>), you can enforce the synchronization in a variety of ways, depending on what V is:
If V is an interface, you could easily wrap the instance in a synchronized wrapper.
If V is a class that you can modify, you could synchronize performSomeModification(), or create a subclass in which it is synchronized. (Possibly an anonymous subclass produced by a factory method.)
If V is a class that you cannot modify, it may be difficult to wrap. In that case, you could encapsulate the AtomicReference in a class that you do control, and have that class perform the required synchronization.

Are Mutable Atomic References a Bad Idea?
Definitely not! AtomicReference is designed to provide thread-safe, atomic updates of the underlying reference. In fact, the Javadoc description of AtomicReference is:
An object reference that may be updated atomically.
So they most definitely are designed to be mutated!
Is this safe?
It depends on what you mean by "safe", and what the rest of your code is doing. There's nothing inherently unsafe about your snippet of code in isolation. It's perfectly valid, though perhaps a bit unusual, to synchronize on an AtomicReference. As a developer unfamiliar with this code, I would see the synchronization on reference and assume that it means that the underlying object may be replaced at any time, and you want to make sure your code is always operating on the "newest" reference.
The standard best practices for synchronization apply, and violating them could result in unsafe behavior. For example, since you say performSomeModification() is not thread-safe, it would be unsafe if you accessed the underlying object somewhere else without synchronizing on reference.
public void bar(AtomicReference reference) {
// no synchronization: performSomeModification could be called on the object
// at the same time another thread is executing foo()
reference.get().performSomeModification();
}
If could also be "unsafe" if your application requires that only one instance of the underlying object be operated on at any one time, and you haven't synchronized on the reference when .set()ing it:
public void makeNewFoo(AtomicReference reference) {
// no synchronication on "reference", so it may be updated by another thread
// while foo() is executing performSomeModification() on the "old" reference
SomeObject foo = new SomeObject();
reference.set(foo);
}
If you need to synchronize on the AtomicReference, do so, it's perfectly safe. But I would highly recommend adding a few code comments about why you're doing it.

Related

Explain how JIT reordering works

I have been reading a lot about synchronization in Java and all the problems that can occur. However, what I'm still slightly confused about is how the JIT can reorder a write.
For instance, a simple double check lock makes sense to me:
class Foo {
private volatile Helper helper = null; // 1
public Helper getHelper() { // 2
if (helper == null) { // 3
synchronized(this) { // 4
if (helper == null) // 5
helper = new Helper(); // 6
}
}
return helper;
}
}
We use volatile on line 1 to enforce a happens-before relationship. Without it, is entirely possible for the JIT to reoder our code. For example:
Thread 1 is at line 6 and memory is allocated to helper however, the constructor has not yet run because the JIT could reordering our code.
Thread 2 comes in at line 2 and gets an object that is not fully created yet.
I understand this, but I don't fully understand the limitations that the JIT has on reordering.
For instance, say I have a method that creates and puts a MyObject into a HashMap<String, MyObject> (I know that a HashMapis not thread safe and should not be used in a multi-thread environment, but bear with me). Thread 1 calls createNewObject:
public class MyObject {
private Double value = null;
public MyObject(Double value) {
this.value = value;
}
}
Map<String, MyObject> map = new HashMap<String, MyObject>();
public void createNewObject(String key, Double val){
map.put(key, new MyObject( val ));
}
At the same time thread 2 calls a get from the Map.
public MyObject getObject(String key){
return map.get(key);
}
Is it possible for thread 2 to receive an object from getObject(String key) that is not fully constructed? Something like:
Thread 1: Allocate memory for new MyObject( val )
Thread 1: Place object in map
Thread 2: call getObject(String key)
Thread 1: Finish constructing the new MyObject.
Or will map.put(key, new MyObject( val )) not put an object into the map until it's fully constructed?
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
In a nutshell can it only reorder when creating a new Object and assigning it to a reference variable, like the double checked lock? A complete rundown on the JIT might be much for a SO answer, but what I'm really curious about is how it can reorder a write (like line 6 on the double checked lock) and what stops it from putting an object into a Map that is not fully constructed.

WARNING: WALL OF TEXT
The answer to your question is before the horizontal line. I will continue to explain deeper the fundamental problem in the second portion of my answer (which is not related to the JIT, so that's it if you are only interested in the JIT). The answer to the second part of your question lies at the bottom because it relies on what I describe further.
TL;DR The JIT will do whatever it wants, the JMM will do whatever it wants, being valid under the condition that you let them by writing thread unsafe code.
NOTE: "initialization" refers to what happens in the constructor, which excludes anything else such as calling a static init method after constructing etc...
"If the reordering produces results consistent with a legal execution, it is not illegal." (JLS 17.4.5-200)
If the result of a set of actions conforms to a valid chain of execution as per the JMM, then the result is allowed regardless of whether the author intended the code to produce that result or not.
"The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.
This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization" (JLS 17.4).
The JIT will reorder whatever it sees fit unless we do not allow it using the JMM (in a multithreaded environment).
The details of what the JIT can or will do is nondeterministic. Looking at millions of samples of runs will not produce a meaningful pattern because reorderings are subjective, they depend on very specific details such as CPU arch, timings, heuristics, graph size, JVM vendor, bytecode size, etc... We only know that the JIT will assume that the code runs in a single threaded environment when it does not need to conform to the JMM. In the end, the JIT matters very little to your multithreaded code. If you want to dig deeper, see this SO answer and do a little research on such topics as IR Graphs, the JDK HotSpot source, and compiler articles such as this one. But again, remember that the JIT has very little to do with your multithreaded code transforms.
In practice, the "object that is not fully created yet" is not a side effect of the JIT but rather the memory model (JMM). In summary, the JMM is a specification that puts forth guarantees of what can and cannot be results of a certain set of actions, where actions are operations that involve a shared state. The JMM is more easily understood by higher level concepts such as atomicity, memory visibility, and ordering, those three of which are components of a thread-safe program.
To demonstrate this, it is highly unlikely for your first sample of code (the DCL pattern) to be modified by the JIT that would produce "an object that is not fully created yet." In fact, I believe that it is not possible to do this because it would not follow the order or execution of a single-threaded program.
So what exactly is the problem here?
The problem is that if the actions aren't ordered by a synchronization order, a happens-before order, etc... (described again by JLS 17.4-17.5) then threads are not guaranteed to see the side effects of performing such actions. Threads might not flush their caches to update the field, threads might observe the write out of order. Specific to this example, threads are allowed to see the object in an inconsistent state because it is not properly published. I'm sure that you have heard of safe publishing before if you have ever worked even the tiniest bit with multithreading.
You might ask, well if single-threaded execution cannot be modified by the JIT, why can the multithreaded version be?
Put simply, it's because the thread is allowed to think ("perceive" as usually written in textbooks) that the initialization is out of order due to the lack of proper synchronization.
"If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic" (The "Double-Checked Locking is Broken" Declaration).
Making the object immutable ensures that the state is fully initialized when the constructor exits.
Remember that object construction is always unsynchronized. An object that is being initialized is ONLY visible and safe with respect to the thread that constructed it. In order for other threads to see the initialization, you must publish it safely. Here are those ways:
"There are a few trivial ways to achieve safe publication:
Exchange the reference through a properly locked field (JLS 17.4.5)
Use static initializer to do the initializing stores (JLS 12.4)
Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
Initialize the value into a final field (JLS 17.5)."
(Safe Publication and Safe Initialization in Java)
Safe publication ensures that other threads will be able to see the fully initialized objects when after it finishes.
Revisiting our idea that threads are only guaranteed to see side effects if they are in order, the reason that you need volatile is so that your write to the helper in thread 1 is ordered with respect to the read in thread 2. Thread 2 is not allowed to perceive the initialization after the read because it occurs before the write to helper. It piggy backs on the volatile write such that the read must happen after the initialization AND THEN the write to the volatile field (transitive property).
To conclude, an initialization will only occur after the object is created only because another thread THINKS that is the order. An initialization will never occur after construction due to a JIT optimisation. You can fix this by ensuring proper publication through a volatile field or by making your helper immutable.
Now that I've described the general concepts behind how publication works in the JMM, hopefully understanding how your second example won't work will be easy.
I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?
To the constructing thread, it will put it into the map after initialization.
To the reader thread, it can see whatever the hell it wants. (improperly constructed object in HashMap? That is definitely within the realm of possibility).
What you described with your 4 steps is completely legal. There is no order between assigning value or adding it to the map, thus thread 2 can perceive the initialization out of order since MyObject was published unsafely.
You can actually fix this problem by just converting to ConcurrentHashMap and getObject() will be completely thread safe as once you put the object in the map, the initialization will occur before the put and both will need to occur before the get as a result of ConcurrentHashMap being thread safe. However, once you modify the object, it will become a management nightmare because you need to ensure that updating the state is visible and atomic - what if a thread retrieves an object and another thread updates the object before the first thread could finish modifying and putting it back in the map?
T1 -> get() MyObject=30 ------> +1 --------------> put(MyObject=31)
T2 -------> get() MyObject=30 -------> +1 -------> put(MyObject=31)
Alternatively you could also make MyObject immutable, but you still need to map the map ConcurrentHashMap in order for other threads to see the put - thread caching behavior might cache an old copy and not flush and keep reusing the old version. ConcurrentHashMap ensures that its writes are visible to readers and ensures thread-safety. Recalling our 3 prerequisites for thread-safety, we get visibility from using a thread-safe data structure, atomicity by using an immutable object, and finally ordering by piggybacking on ConcurrentHashMap's thread safety.
To wrap up this entire answer, I will say that multithreading is a very difficult profession to master, one that I myself most definitely have not. By understanding concepts of what makes a program thread-safe and thinking about what the JMM allows and guarantees, you can ensure that your code will do what you want it to do. Bugs in multithreaded code occur often as a result of the JMM allowing a counterintuitive result that is within its parameters, not the JIT doing performance optimisations. Hopefully you will have learned something a little bit more about multithreading if you read everything. Thread safety should be achieved by building a repertoire of thread-safe paradigms rather than using little inconveniences of the spec (Lea or Bloch, not even sure who said this).

discussion for concurrency when using reference value

when writing concurrency program, sometimes we use the reference parameter, assume it is ref1 with fake type Reference, a method like
public void testRefVarInMethod(Reference ref1) {
Reference ref2 = ref1;
....
....
}
In this method, I declare a new variable ref2 which points to ref1. We all know that method variable is thread safe, however, as to reference ref1, anybody can change its value outside the method, so the ref2's value will be changed too. I guess this cannot guarantee thread safe, why do some people write code like this?

That's why people use methods like clone to ensure thread-safety.
Reference ref2 = ref1.clone();
By referencing the copy of ref1, ref2 will not be affected regardless how ref1 is changed by some other threads.
Edit:
As pointed out in the comments, the clone method does not necessarily enforce thread-safety. It has to be correctly implemented in a way that modifying ref1 will not change the state of ref2. i.e., ref1 and ref2 do not share any mutable fields.

The value of the local variable ref2 itself cannot be changed outside your method (none can make it point to another object from outside). It's only the state of the object it references that can be changed (someone can call ref1.setField(newValue)) concurrently.
People do that because they need to share objects between threads. Otherwise, they wouldn't be able to gain benefits of multithreading in many cases.
But people don't do it recklessly, they usually introduce various forms of synchronization to guarantee thread safety. For instance, one can use synchronized section as the simplest and most straightforward tool to delineate a critical section that can be executed by only one thread at any given time:
synchronized(ref2) {
// Change or read object here
}
If all the code uses the same approach, making changes (and reading them) on the object will be safe.
There're many other, more specialised and more efficient, synchronization primitives and techniques that you should learn about if you're going to write multithreaded programs with shared objects: immutability, volatile, ReadWriteLock etc. Books like "Java Concurrency in Practice" can give you a good introduction into the field.

Are non-synchronised static methods thread safe if they don't modify static class variables?

I was wondering if you have a static method that is not synchronised, but does not modify any static variables is it thread-safe? What about if the method creates local variables inside it? For example, is the following code thread-safe?
public static String[] makeStringArray( String a, String b ){
return new String[]{ a, b };
}
So if I have two threads calling ths method continously and concurrently, one with dogs (say "great dane" and "bull dog") and the other with cats (say "persian" and "siamese") will I ever get cats and dogs in the same array? Or will the cats and dogs never be inside the same invocation of the method at the same time?

This method is 100% thread safe, it would be even if it wasn't static. The problem with thread-safety arises when you need to share data between threads - you must take care of atomicity, visibility, etc.
This method only operates on parameters, which reside on stack and references to immutable objects on heap. Stack is inherently local to the thread, so no sharing of data occurs, ever.
Immutable objects (String in this case) are also thread-safe because once created they can't be changed and all threads see the same value. On the other hand if the method was accepting (mutable) Date you could have had a problem. Two threads can simultaneously modify that same object instance, causing race conditions and visibility problems.

A method can only be thread-unsafe when it changes some shared state. Whether it's static or not is irrelevant.

The function is perfectly thread safe.
If you think about it... assume what would happen if this were different. Every usual function would have threading problems if not synchronized, so all API functions in the JDK would have to be synchronized, because they could potentially be called by multiple threads. And since most time the app is using some API, multithreaded apps would effectively be impossible.
This is too ridiculous to think about it, so just for you: Methods are not threadsafe if there is a clear reason why there could be problems. Try to always think about what if there were multiple threads in my function, and what if you had a step-debugger and would one step after another advance the first... then the second thread... maybe the second again... would there be problems? If you find one, its not thread safe.
Please be also aware, that most of the Java 1.5 Collection classes are not threadsafe, except those where stated, like ConcurrentHashMap.
And if you really want to dive into this, have a close look at the volatile keyword and ALL its side effects. Have a look at the Semaphore() and Lock() class, and their friends in java.util.Concurrent. Read all the API docs around the classes. It is worth to learn and satisfying, too.
Sorry for this overly elaborate answer.

Use the static keyword with synchronized static methods to modify static data shared among threads. With the static keyword all the threads created will contend for a single version of the method.
Use the volatile keyword along with synchronized instance methods will guarantee that each thread has its own copy of the shared data and no read/ writes will leak out between threads.

String objects being immutable is another reason for thread-safe scenario above. Instead if mutable objects are used (say makeMutableArray..) then surely thread-safety will break.

Since the complete method was pushed onto the stack, any variable creation that takes place lives within the stack (again exceptions being static variables) and only accessible to one thread. So all the methods are thread safe until they change the state of some static variable.
See also:
Is static method is thread safe in Java?

How can I use the volatile keyword in Java correctly?

Say I have two threads and an object. One thread assigns the object:
public void assign(MyObject o) {
myObject = o;
}
Another thread uses the object:
public void use() {
myObject.use();
}
Does the variable myObject have to be declared as volatile? I am trying to understand when to use volatile and when not, and this is puzzling me. Is it possible that the second thread keeps a reference to an old object in its local memory cache? If not, why not?
Thanks a lot.

I am trying to understand when to use
volatile and when not
You should mostly avoid using it. Use an AtomicReference instead (or another atomic class where appropriate). The memory effects are the same and the intent is much clearer.
I highly suggest reading the excellent Java Concurrency in Practice for a better understanding.

Leaving the complicated technical details behind, you can see volatile less or more as a synchronized modifier for variables. When you'd like to synchronize access to methods or blocks, then you'd usually like to use the synchronized modifier as follows:
public synchronized void doSomething() {}
If you'd like to "synchronize" access to variables, then you'd like to use the volatile modifier:
private volatile SomeObject variable;
Behind the scenes they do different things, but the effect is the same: the changes are immediately visible for the next accessing thread.
In your specific case, I don't think that the volatile modifier has any value. The volatile does not guarantee in any way that the thread assigning the object will run before the thread using the object. It can be as good the other way round. You probably just want to do a nullcheck in use() method first.
Update: also see this article:
Access to the variable acts as though it is enclosed in a synchronized block, synchronized on itself. We say "acts as though" in the second point, because to the programmer at least (and probably in most JVM implementations) there is no actual lock object involved.

Declaring a volatile Java variable means:
The value of this variable will never be cached thread-locally
Access to the variable acts as though it is enclosed in a synchronized block
The typical and most common use of volatile is :
public class StoppableThread extends Thread {
private volatile boolean stop = false;
public void run() {
while (!stop) {
// do work
}
}
public void stopWork() {
stop = true;
}
}

You can use volatile in this case. You will require volatile, synchronization around the access to the variable or some similar mechanism (like AtomicReference) to guarantee that changes made on the assignment thread are actually visible to the reading thread.

I have spent quite a lot of time trying to understanding the volatile keyword.
I think #aleroot has given the best and simplest example in the world.
This is in turn my explanation for dummies (like me :-)):
Scenario1: Assuming the stop is not declared as volatile then
a given thread does and 'thinks' the following:
stopWork() is called: I have to set the stop to true
Great, I did it in my local stack now I have to update the main heap of JVM.
Oops, JVM tells me to give a way in CPU to another thread, I have to stop for a while...
OK, I am back. Now I can update the main heap with my value. Updating ...
Scenario2: Now let the stop be declared as volatile:
stopWork() is called: I have to set the stop to true
Great, I did it in my local stack now I have to update the main heap of JVM.
Sorry guys, I have to do (2) NOW - I am told it is volatile. I have to occupy CPU a bit longer...
Updating the main heap ...
OK, I am done. Now I can yield.
No synchronization, just a simple idea...
Why not to declare all variables volatile just in case? Because of Scenario2/Step3. It is a bit inefficient but still better than regular synchronization.

There are some confusing comments here: to clarify, your code is incorrect as it stands, assuming two different threads call assign() and use().
In the absence of volatile, or another happens-before relationship (for example, synchronization on a common lock) any write to myObject in assign() is not guaranteed to be seen by the thread calling use() -- not immediately, not in a timely fashion, and indeed not ever.
Yes, volatile is one way of correcting this (assuming this is incorrect behaviour -- there are plausible situations where you don't care about this!).
You are exactly correct that the 'use' thread can see any 'cached' value of myObject, including the one it was assigned at construction time and any intermediate value (again in the absence of other happens-before points).

Mutating a lock object

Just curious to know (in as much detail as possible), why is it a bad practice to
modify the object while using it as a lock.
//Assuming the lockObject is globally available
synchronized(lockObject){
lockObject.someMutativeOperation(...);
}
Cheers

I don't know that I've ever heard that assertion. Certainly it would be bad to reassign lockObject (because then you'd be locking on a different object elsewhere), but I don't see anything wrong with mutating it.
Furthermore, it is fairly common to have a synchronized method which mutates an object:
public synchronized void setSomething(int something) {
this.something = something;
}
In this case, the object itself is used as the lock. What is the point in synchronizing on a separate object?

Thats not bad practice, thats good practice. Where did you hear otherwise?
If you're using primitive synchronization, you synchronize on an object (or another lock) before you modify it.
It depends on the scope of the object though. If the object is scoped outside of your class, you should use a different synchronization mechanism

I guess what you have heard about is mutating the reference:
synchronized (thing) {
...
thing = newThing;
...
}
This usually indicates an error. It should probably have locked using a reference that does not change. I think it was Bitter Java that had a bug of this nature in a read-write lock (there is has been a read-write lock in the Java library for five years, so the specific implementation is no longer necessary).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.