As I understand both maps are designed to work in multithreaded enviroment. But I intereted in what features they guarantee(i.e. availability, consistency).
I believe they don't use read locks (relying on volatile fields to ensure that reads see writes from other threads) and are internally broken down into some number of segments (based on the expected concurrency level) that entries are distributed among, each of which uses its own write lock separate from the others. That way reads never block and writes only block if they happen to need to write to the same segment at the same time. I'm no expert on it though.
As far as guarantees, I'm not sure what you're asking. ConcurrentMap specifies a memory consistency guarantee:
Memory consistency effects: As with other concurrent collections, actions in a thread prior to placing an object into a ConcurrentMap as a key or value happen-before actions subsequent to the access or removal of that object from the ConcurrentMap in another thread.
Related
I have a Map which is read by multiple threads but which is (from time to time) cleared and rebuilt by another thread.
I have surrounded all the access to this map with
readWriteLock.readLock().lock()
try {
... access myMap here...
} finally {
readWriteLock.readLock().unlock()
}
... or the writeLock() equivalents, depending on the type of access.
My question is... will the ReadWriteLock ensure the updates to myMap are visible to the other threads (since they must wait until after the unlock() is called by the writing thread? Or, do I also need to make myMap a concurrent map, like ConcurrentHashMap?
I will probably do that, just to be safe, but I'd like to understand better.
Yes, this should be fine even without a thread-aware map. The Javadoc for ReadWriteLock explicitly says:
All ReadWriteLock implementations must guarantee that the memory synchronization effects of writeLock operations (as specified in the Lock interface) also hold with respect to the associated readLock. That is, a thread successfully acquiring the read lock will see all updates made upon previous release of the write lock.
(Of course, by using a reader/writer lock at all you depend on the map supporting concurrent lookups from different threads. One could imagine clever data structure that try to save time overall by mutating some internal cached state during a lookup. But the standard collections such as HashMap will not do that).
Monitor = mutex(lock) + condition variable
Each Java object has a monitor, holding above principle.
synchronized key word claim a monitor(lock + conditionvar) of an object.
My understanding is, for atomicity, conditionvar is not required, lock(mutex) would suffice.
To maintain atomicity of a memory area, Java provides Lock , atomic package and binary semaphore.
For atomicity, Which approach is better in terms of performance?
It depends on the access pattern: synchronized(var) { ... } is the simplest to use as it doesn't require explicit unlock, unlike ReentrantLock. Those two are the same: synchronized(var) will grab a lock on var, while Lock will grab a lock on "itself" (so to say). But the ReentrantLock allows you to get extended information (see its javadoc for more information: isHeldByCurrentThread() and getHoldCount()).
Performance-wise, ReentrantReadWriteLock will improve performance when you have few writes and many reads (as you don't need to lock when you're only reading), but you should take extra care when taking and releasing locks, as "ownable synchronizers" can deadlock your threads (read and write locks are not treated in the same manner).
But if the data you want to read/write is a "simple type" (as described in the atomic package javadoc), you will get the best performance by using the AtomicInteger and the likes, as they use specific, optimized CPU instruction set such as compare-and-swap in SSE* set.
Consider these two situations:
a map which you are going to populate once at the beginning and then will be accessed from many different threads.
a map which you are going to use as cache that will be accessed from many different threads. you would like to avoid computing the result that will be stored in the map unless it is missing, the get-computation-store block will be synchronized. (and the map will not otherwise be used)
In either of these cases, does ConcurrentHashMap offer you anything additional in terms of thread safety above an ordinary HashMap?
In the first case, it should not matter in practice, but there is no guarantee that modifications written to a regular hashmap will ever be seen by other threads. So if one thread initially creates and populates the map, and that thread never synchronized with your other threads, then those threads may never see the initial values set into the map.
The above situation is unlikely in practice, and would only take a single synchronization event or happens before guarantee between the threads (read / write to a volatile variable for instance) to ensure even theoretical correctness.
In the second case, there is a concern since access to a HashMap that modifies it structurally (adding a value) requires synchronization. Furthermore, you need some type of synchronization to establish a happens-before relationship / shared visibility with the other threads or there is no guarantee that the other threads will see the new values you put in. ConcurrentHashMap offers these guarantees and will not break when one thread modifies it structurally.
There is no difference in thread safety, no. For scenario #2 there is a difference in performance and a small difference in timing guarantees.
There will be no synchronization for your scenario #2, so threads that want to use the cache don't have to queue up and wait for others to finish. However, in order to get that benefit you don't have hard happens-before relationships at the synchronization boundaries, so it's possible two threads will compute the same cached value more or less at the same time. This is generally harmless as long as the computation is repeatable.
(There is also the slight difference that ConcurrentHashMap does not allow null to be used as a key.)
We know that ConcurrentHashMap can provide concurrent access to multiple threads to boost performance , and inside this class, segments are synchronized up (am I right?). Question is, can this design guarantee the thread safety? Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
From my recollection that the book "Java Concurrency in Practice" says the ConcurrentHashMap provide concurrent reading and a decent level of concurrent writing. in the aforementioned scenario, and if my guess is correct, the performance won't be better than using the Collection's static synchonization wrapper api?
Thanks for clarifying,
John
You will still have to synchronize any access to the object being modified, and as you suspect all access to the same key will still have contention. The performance improvement comes in access to different keys, which is of course the more typical case.
All a ConcurrentMap can give you wrt to concurrency is that modifications to the map itself are done atomically, and that any writes happen-before any reads (this is important as it provides safe publishing of any reference from the map.
Safe-publishing means that any (mutable) object retrieved from the map will be seen with all writes to it before it was placed in the map. It won't help for publishing modifications that are made after retrieving it though.
However, concurrency and thread-safety is generally hard to reason about and make correct if you have mutable objects that are being modified by multiple parties. Usually you have to lock in order to get it right. A better approach is often to use immutable objects in conjunction with the ConcurrentMap conditional putIfAbsent/replace methods and linearize your algorithm that way. This lock-free style tends to be easier to reason about.
Question is, can this design guarantee the thread safety?
It guarantees the thread safety of the map; i.e. that access and updates on the map have a well defined and orderly behaviour in the presence of multiple threads performing updates simultaneously.
It does guarantee thread safety of the key or value objects. And it does not provide any form of higher level synchronization.
Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
If you have multiple threads trying to use the same key, then their operations will inevitably be serialized to some degree. That is unavoidable.
In fact, from briefly looking at the source code, it looks like ConcurrentHashMap falls back to using conventional locks if there is too much contention for a particular segment of the map. And if you have multiple threads trying to access AND update the same key simultaneously, that will trigger locking.
first remember that a thread safe tool doesn't guarantee thread safe usage of it in and of itself
the if(!map.contains(k))map.put(k,v); construct to putIfAbsent for example is not thread safe
and each value access/modification still has to be made thread safe independently
Reads are concurrent, even for the same key, so performance will be better for typical applications.
When using any of the java.util.concurrent classes, do I still need to synchronize access on the instance to avoid visibility issues between difference threads?
Elaborating the question a bit more
When using an instance of java.util.concurrent, is it possible that one thread modify the instance (i.e., put an element in a concurrent hashmap) and a subsequent thread won't be seeing the modification?
My question arises from the fact that The Java Memory Model allows threads to cache values instead of fetching them directly from memory if the access to the value is not synchronized.
On the java.util.concurrent package Memory Consistency Properties, you can check the Javadoc API for the package:
The methods of all classes in
java.util.concurrent and its
subpackages extend these guarantees to
higher-level synchronization. In
particular:
Actions in a thread prior to placing an object into any
concurrent collection
happen-before actions subsequent to the access or removal of that
element from the collection in
another thread.
[...]
Actions prior to "releasing" synchronizer methods such as
Lock.unlock, Semaphore.release, and
CountDownLatch.countDown
happen-before actions subsequent to a successful "acquiring" method
such as Lock.lock,
Semaphore.acquire, Condition.await,
and CountDownLatch.await on the
same synchronizer object in another
thread.
[...]
So, the classes in this package make sure of the concurrency, making use of a set of classes for thread control (Lock, Semaphore, etc.). This classes handle the happen-before logic programmatically, i.e. managing a FIFO stack of concurrent threads, locking and releasing current and subsequent threads (i.e. using Thread.wait() and Thread.resume(), etc.
Then, (theoretically) you don't need to synchronize your statements accessing this classes, because they are controlling concurrent threads access programmatically.
Because the ConcurrentHashMap (for example) is designed to be used from a concurrent context, you don't need to synchronise it further. In fact, doing so could undermine the optimisations it introduces.
For example, Collections.synchronizedMap(...) represents a way to make a map thread safe, as I understand it, it works essentially by wrapping all the calls within the synchronized keyword. Something like ConcurrentHashMap on the other hand creates synchronized "buckets" across the elements in the collection, causing finer grained concurrency control and therefore giving less lock contention under heavy usage. It may also not lock on reads for example. If you wrap this again with some synchronised access, you could undermine this. Obviously, you have to be careful that all access to the collection is syncrhronised etc which is another advantage of the newer library; you don't have to worry (as much!).
The java.lang.concurrent collections may implement their thread safety via syncrhonised. in which case the language specification guarantees visibility. They may implement things without using locks. I'm not as clear on this, but I assume it the same visibility would be in place here.
If you're seeing what looks like lost updates in your code, it may be that its just a race condition. Something like the ConcurrentHashpMap will give you the most recent value on a read and the write may not have yet been written. It's often a trade off between accuracy and performance.
The point is; java.util.concurrent stuff is meant to do this stuff so I'd be confident that it ensures visibility and use of volatile and/or addition syncrhonisation shouldn't be needed.