Safe publication example in Java Concurrency in Practice

Safe publication example in Java Concurrency in Practice - java

Java Concurrency in Practice says you can safely publish an effectively immutable object (say, a Date object that you construct and never change again) by sticking it into a synchronized collection like the following (from the book, page 53):
public Map<String, Date> lastLogin =
Collections.synchronizedMap(new HashMap<String, Date>())
I understand that any Date object put into this map will be visible (at least in its initial but completely constructed state) once placed into this synchronized map, but only once other threads can obtain the reference to this Map object.
Since the reference field lastLogin has none of the properties of fields that guarantee visibility (final, volatile, guarded, or initialized by a static initializer), I think that it's possible the map itself will not show up in a completely constructed state to other threads, therefore putting the cart before the horse. Or am I missing something?

Your suspicion is half right, in that the value of lastLogin is not guaranteed to be visible to other threads. Because lastLogin is not volatile or final, another thread may read it as null.
However, you do not need to worry that other threads will see an incomplete version of the map. Collections.synchronizedMap(...) returns an instance of a private class with final fields. JLS section 17.5 says:
The usage model for final fields is a simple one: Set the final fields for an object in that object's constructor; and do not write a reference to the object being constructed in a place where another thread can see it before the object's constructor is finished. If this is followed, then when the object is seen by another thread, that thread will always see the correctly constructed version of that object's final fields.
SynchronizedMap follows these rules, so another thread reading lastLogin will either read null or a reference to the fully constructed map, never a reference to an incomplete or unsafe version of the map.

Related

Java final keyword semantics with respect to cache

What is the behavior of Java final keyword with respect of caches?
quote from:jsr133-faq
The values for an object's final fields are set in its constructor.
Assuming the object is constructed "correctly", once an object is
constructed, the values assigned to the final fields in the
constructor will be visible to all other threads without
synchronization. In addition, the visible values for any other object
or array referenced by those final fields will be at least as
up-to-date as the final fields.
I don't understand what it refers to when it says as up-to-date as the final fields.:
In addition, the visible values for any other object or array
referenced by those final fields will be at least as up-to-date as the
final fields.
My guess is, for example:
public class CC{
private final Mutable mutable; //final field
private Other other; //non-final field
public CC(Mutable m, Other o){
mutable=m;
other=o;
}
}
When the constructor CC returns, besides the pointer value of mutable, all values on the object graph rooted at m, if exist in the local processor cache, will be flushed to main memory. And at the same time, mark the corresponding cache lines of other processors' local caches as Invalid.
Is that the case? What does it look like in assembly? How do they actually implement it?

Is that the case?
The actual guarantee is that any thread that can see an instance of CC created using that constructor is guaranteed to see the mutable reference and also the state of the Mutable object's fields as of the time that the constructor completed.
It does not guarantee that the state of all values in the closure of the Mutable instance will be visible. However, any writes (in the closure or not) made by the thread that executed the constructor prior to the constructor completing will be visible. (By "happens-before" analysis.)
Note that the behavior is specified in terms what one thread is guaranteed to see, not in terms of cache flushing / invalidation. The latter is a way of implementing the behavior that the specification requires. There may be other ways.
What does it look like in assembly?
That will be version / platform / etc specific. There is a way to get the JIT compiler to dump out the compiled code, if you want to investigate what the native code looks like for your hardware.
How to see JIT-compiled code in JVM?
How do they actually implement it?
See above.

Storing object reference into a volatile field

I'm using the following field:
private DateDao dateDao;
private volatile Map<String, Date> dates;
public Map<String, Date> getDates() {
return Collections.unmodifiableMap(dates);
}
public retrieveDates() {
dates = dateDao.retrieveDates();
}
Where
public interface DateDao {
//Currently returns HashMap instance
public Map<String, Date> retrieveDates();
}
Is it safe to publish the map of dates that way? I mean, volatile field means that the reference to a field won't be cached in CPU registers and be read from memory any time it is accessed.
So, we might as well read a stale value for the state of the map because HashMap doesn't do any synchronization.
Is it safe to do so?
UPD: For instance assume that the DAo method implemented in the following way:
public Map<String, Date> retrieveDates() {
Map<String, Date> retVal = new HashMap<>();
retVal.put("SomeString", new Date());
//ad so forth...
return retVal;
}
As can be seen, the Dao method doesn't do any synchronization, and both HashMap and Date are mutable and not thread safe. Now, we've created and publish them as it was shown above. Is it guaranteed that any subsequent read from the dates from some another thread will observe not only the correct reference to the Map object, but also it's "fresh" state.
I'm not sure about if the thread can't observe some stale value (e.g. dates.get("SomeString") returns null)

I think you're asking two questions:
Given that DAO code, is it possible for your code using it to use the object reference it gets here:
dates = dateDao.retrieveDates();
before the dateDao.retrieveDates method as quoted is done adding to that object. E.g., do the memory model' statement reordering semantics allow the retrieveDates method to return the reference before the last put (etc.) is complete?
Once your code has the dates reference, is there an issue with unsynchronized access to dates in your code and also via the read-only view of it you return from getDates.
Whether your field is volatile has no bearing on either of those questions. The only thing that making your field volatile does is prevent a thread calling getDates from getting an out-of-date value for your dates field. That is:
Thread A Thread B
---------- --------
1. Updates `dates` from dateDao.retrieveDates
2. Updates `dates` from " " again
3. getDates returns read-only
view of `dates` from #1
Without volatile, the scenario above is possible (but harmless). With volatile, it isn't, Thread B will see the value of dates from #2, not #1.
But that doesn't relate to either of the questions I think you're asking.
Question 1
No, your code in retrieveDates cannot see the object reference returned by dateDao.retrieveDates before dateDao.retrieveDates is done filling in that map. The memory model allows reordering statements, but:
...compilers are allowed to reorder the instructions in either thread, when this does not affect the execution of that thread in isolation
(My emphasis.) Returning the reference to your code before dateDao.retrieveDates would obviously affect the execution of the thread in isolation.
Question 2
The DAO code you've shown can never modify the map it returns to you, since it doesn't keep a copy of it, so we don't need to worry about the DAO.
In your code, you haven't shown anything that modifies the contents of dates. If your code doesn't modify the contents of dates, then there's no need for synchronization, since the map is unchanging. You might want to make that a guarantee by wrapping dates in the read-only view when you get it, rather than when you return it:
dates = Collection.unmodifiableMap(dateDao.retrieveDates());
If your code does modify dates somewhere you haven't shown, then yes, there's potential for trouble because Collections.unmodifiableMap does nothing to synchronize map operations. It just creates a read-only view.
If you wanted to ensure synchronization, you'd want to wrap dates in a Collections.synchronizedMap instance:
dates = Collections.synchronizedMap(dateDao.retrieveDates());
Then all access to it in your code will be synchronized, and all access to it via the read-only view you return will also be synchronized, as they all go through the synchronized map.

As far as I can tell, declaring a map volatile won't synchronize its access (i.e. readers could read the map while it is being updated by the dao). However, it guarantees that the map lives in shared memory, so every thread will see the same values in it at every given time. What I usually do when I need synchronization and freshness is using a lock object, something similar to the following :
private DateDao dateDao;
private volatile Map<String, Date> dates;
private final Object _lock = new Object();
public Map<String, Date> getDates() {
synchronized(_lock) {
return Collections.unmodifiableMap(dates);
}
}
public retrieveDates() {
synchronized(_lock) {
dates = dateDao.retrieveDates();
}
}
This provides readers/writers synchronization (but note that writers are not prioritized, i.e. if a reader is getting the map the writers will have to wait) and 'data freshness' via volatile. Moreover, this is a pretty basic approach, and there are other ways of achieving the same features (e.g. Locks and Semaphores), but most of the times this does the trick for me.

Is setting a HashMap thread safe?

I have a HashMap in my program which is accessed by multiple threads, and is occasionally set by a single thread.
For example:
Map<String, String> myMap = new HashMap<String, String>();
This is accessed by multiple threads. Once an hour, a single thread calls:
myMap = myRefreshedVersionOfTheMap;
So my question is whether or not this is thread safe. If both maps always have the key "importantKey", is it possible for a reading thread to ever access the map at a time when "importantKey" does not exist?
Edit:
Thanks to the answers, I've realized this question is actually independent of the HashMap. It was more a question about object reference assignment.

This is not thread safe. Even though there are no writes to the map itself after the point of publication (from the point of view of the thread doing the publication), and reference assignment is atomic, the new Map<> has not been safely published. It particular, there are writes to the Map during its construction - either in the constructor, or after, depending on how you add those elements, and those writes may or may not be seen by other threads, since even though they intuitively occur before the map is published to the other threads, this isn't formally the case according to the memory model.
For an object to be safely published, it must be communicated to the outside world using some mechanism that either establishes a happens-before relationship between the object construction, the reference publication and the reference read, or it must use a handful of narrower methods which are guaranteed to be safe for publishing:
Initializing an object reference from a static initializer.
Storing a reference to it into a final field.
Your idiom would be safe if you declared myMap volatile. More details on safe publication can be found in JCIP (highly recommended), or here, or in this longer answer on a similar topic.

If you mean you are creating an entirely new Map and are assigning it to myMap which is what the other threads are accessing, then yes. Reference assignment is atomic. It's threadsafe because you are not modifying the contents of a Map while other threads are reading from it - you just have multiple threads reading from a Map.
You just need to declare it volatile so other threads don't cache it.

First off, Java's HashMap class is not thread safe, so there are no guarantees when reads and writes are happening concurrently.
However, since reads and writes to references in Java are atomic, then the pattern you described could be thread-safe as long as the refresh code is not mutating the old map. For example, the following would be fine:
// this refresh code would be thread-safe
Map<String, String> copy = new HashMap<String, String>(myMap);
copy.put(x, y); // change the map
myMap = copy;
// you might also consider
myMap = Collections.unmodifiableMap(copy);
// to make sure that the non-thread-safe map will never be mutated
One thing to consider with this pattern is that you may want the myMap field to be declared as volatile so that all threads will get the most recent version of myMap whenever they read from that variable.
Finally, as other posters have mentioned ConcurrentHashMap may be a better approach depending on the complexity of the refresh code. One disadvantage of ConcurrentHashMap is that it doesn't offer any way to batch the operations, so you'd have to make sure that the state at every point during the refresh process was valid for the rest of your application to consume.

HashMap is not thread safe. You can use any of the followings
ConcurrentHashMap.
HashMap with synchronized on the outside.
Different HashMap for each thread.
Check this similar answer here

Confused about ThreadLocal

I just learned about ThreadLocal this morning. I read that it should always be final and static like:
private static final ThreadLocal<Session> threadLocal = new ThreadLocal<Session>();
(Session is a Hibernate Session)
My confusion is this: Because it is static, it is available to any thread in the JVM. Yet it will hold information local to each thread which accesses it? I'm trying to wrap my head around this so I apologize if this is unclear. Each thread in the application has access to the same ThreadLocal object, but the ThreadLocal object will store objects local to each thread?

Yes, the instance would be the same, but the code attaches the value you set with the Thread.currentThread(), when you set and when you retrieve, so the value set will be accessible just within the current thread when accessed using the methods set and get.
Its really easy to understand it.
Imagine that each Thread has a map that associates a value to a ThreadLocal instance. Every time you perform a get or a set on a ThreadLocal, the implemention of ThreadLocal gets the map associated to the current Thread (Thread.currentThread()) and perform the get or set in that map using itself as key.
Example:
ThreadLocal tl = new ThreadLocal();
tl.set(new Object()); // in this moment the implementation will do something similar to Thread.getCurrentThread().threadLocals.put(tl, [object you gave])
Object obj = t1.get(); // in this moment the implementation will do something similar to Thread.getCurrentThread().threadLocals.get(tl)
And the interesting thing on this is that the ThreadLocal is hierarchic, meaning if you defined a value for a parent Thread it will be accessible from a child one.

You always access the same instance of ThreadLocal for a specific problem but this instance returns a different value for each thread calling the get method.
That's the point : it's easy to find the object but each thread will have its specific own value. Thus you can for example make sure your specific value won't be accessed by two different threads.
You could see it (conceptually) as a kind of HashMap<Thread><V> which would always be accessed with Thread.currentThread() as key.

Because the thread-specific values are not stored in the ThreadLocal object, but the current Thread's ThreadLocalMap. The ThreadLocal object merely serves as key in these maps.
For details, read the JavaDoc of ThreadLocal and subclasses, or, if you are curious about the implementation, the source code available in every recent JDKs src.zip.

Why can an Object member variable not be both final and volatile in Java?

If in a class I have a ConcurrentHashMap instance that will be modified and read by multiple threads I might define like this:
public class My Class {
private volatile ConcurrentHashMap<String,String> myMap = new ConcurrentHashMap<String,String>();
...
}
adding final to the myMap field results in an error saying I can only use final or volatile. Why can it not be both?

volatile only has relevance to modifications of the variable itself, not the object it refers to. It makes no sense to have a final volatile field because final fields cannot be modified. Just declare the field final and it should be fine.

It's because of Java Memory Model (JMM).
Essentially, when you declare object field as final you need to initialize it in object's constructor and then final field won't change it's value. And JMM promises that after ctor is finished any thread will see the same (correct) value of final field. So, you won't need to use explicit synchronization, such as synchronize or Lock to allow all threads to see correct value of final field.
When you declare object's field as volatile, field's value can change, but still every read of value from any thread will see latest value written to it.
So, final and volatile achieve same purpose -- visibility of object's field value, but first is specifically used for a variable may only be assigned to once and second is used for a variable that can be changed many times.
References:
http://docs.oracle.com/javase/specs/jls/se7/html/jls-4.html#jls-4.12.4
http://docs.oracle.com/javase/specs/jls/se7/html/jls-8.html#jls-8.3.1.4

Because volatile and final are two extreme ends in Java
volatile means the variable is bound to changes
final means the value of the variable will never change whatsoever

volatile is used for variables that their value may change, in certain cases, otherwise there is no need for volatile, and final means that the variable may not change, so there's no need for volatile.
Your concurrency concerns are important, but making the HashMap volatile will not solve the problem, for handling the concurrency issues, you already use ConcurrentHashMap.

A volatile field gives you guarantees as what happens when you change it. (No an object which it might be a reference to)
A final field cannot be changed (What the fields reference can be changed)
It makes no sense to have both.

volatile modifier guarantees that all reads and writes go straight to main memory, i.e. like the variable access is almost into synchronized block. This is irrelevant for final variable that cannot be changed.

Because it doesn't make any sense. Volatile affects object reference value, not the object's fields/etc.
In your situation (you have concurrent map) you should do the field final.

In a multithread environment different threads will read a variable from main memory and add it to the CPU cache. It may result in two different threads making changes on the same variable, while ignoring each others results.
enter image description here
We use word volatile to indicate that variable will be saved in main memory and will be read from main memory. Thus whenever a thread want to read/write a variable it will be done from main memory, essentially making a variable safe in multithread environment.
When we use final keyword we indicate that variable will not change. As you can see if a variable is unchangeable, than it doesn't matter if multiple threads will use it. No thread can change the variable, so even if variable is saved to CPU caches at different times, and threads will use this variable at different times than it's still ok, because the variable can only be read.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Safe publication example in Java Concurrency in Practice - java

Related

Java final keyword semantics with respect to cache

Storing object reference into a volatile field

Is setting a HashMap thread safe?

Confused about ThreadLocal

Why can an Object member variable not be both final and volatile in Java?

Categories

Resources