ConcurrentHashMap in Java locking mechanism for computeIfPresent

ConcurrentHashMap in Java locking mechanism for computeIfPresent - java

I'm using Java 8 and would like to know if the computeIfPresent operation of the ConcurrentHashMap does lock the whole table/map or just the bin containing the key.
From the documentation of the computeIfPresent method:
Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map
This looks like the whole map is locked when invoking this method for a key. Why does the whole map have to be locked if a value of a certain key is updated? Wouldn't it be better to just lock the bin containing the key/value pair?

Judging by implementation (Oracle JDK 1.8.0_101), just the corresponding bin is locked. This does not contradict the documentation snippet you've cited, since it mentions that some update operations may be blocked, not necessarily all. Of course, it'd be clearer if the docs stated explicitly what gets locked, but that'd be leaking implementation details to what is de facto a part of the interface.

If you look at the source code of ConcurrentHashMap#computeIfPresent, you'll notice that the synchronisation is made directly on the node itself.
So an operation will only block if you attempt to update any node that is being computed. You should not have any problems with other nodes.
From my understanding, the synchronization directly on nodes is actually the major add-on of ConcurrentHashMap vs old Hashtable.
If you look at the source code of hashtables, you'll notice that the synchronization is much wider. A contrario, any synchronization in ConcurrentHashMap happens directly on nodes.
The end of Hashtable documentation also suggest that :
[...] Hashtable is synchronized. If a thread-safe implementation is not
needed, it is recommended to use HashMap in place of Hashtable. If a
thread-safe highly-concurrent implementation is desired, then it is
recommended to use ConcurrentHashMap in place of Hashtable.

Related

Is Hashmap's containsKey method threadsafe if the map is initialized once, and is never modified again

Can we use Hashmap's containsKey() method without synchronizing in an multi-threaded environment?
Note: Threads are only going to read the Hashmap. The map is initialized once, and is never modified again.

It really depends on how/when your map is accessed.
Assuming the map is initialized once, and never modified again, then methods that don't modify the internal state like containsKey() should be safe.
In this case though, you should make sure your map really is immutable, and is published safely.
Now if in your particular case the state does change during the course of your program, then no, it is not safe.
From the documentation:
Note that this implementation is not synchronized.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
In this case, you should use ConcurrentHashMap, or synchronize externally.

You shouldn't look at a single method this way. A HashMap is not meant to be used in a multi-threaded setup.
Having said that, the one exception would be: a map that gets created once (single threaded), and afterwards is "read" only. In other words: if a map doesn't get changed anymore, then you can have as many threads reading it as you want.
From that point of view, just containsKey() calls shouldn't call a problem. The problem arises when the information that this method relies on changes over time.

No, it is not thread-safe for any operations. You need to synchronise all access or use something like ConcurrentHashMap.
My favourite production system troubleshooting horror story is when we found that HashMap.get went into an infinite loop locking up 100% CPU forever because of missing synchronisation. This happened because the linked lists that were used within each bucket got into an inconsistent state. The same could happen with containsKey.
You should be safe if no one modifies the HashMap after it has been initially published, but better use an implementation that guarantees this explicitly (such as ImmutableMap or, again, a ConcurrentMap).

No. (No it is not. Not at all. 30 characters?)

It's complicated, but, mostly, no.
The spec of HashMap makes no guarantees whatsoever. It therefore reserves the right to blast yankee doodle dandy from your speakers if you try: You're just not supposed to use it that way.
... however, in practice, whilst the API of HashMap makes no guarantees, generally it works out. But, mind the horror story of #Thilo's answer.
... buuut, the Java Memory Model works like this: You should consider that each thread gets an individual copy of each and every field across the entire heap of the VM. These individual copies are then synced up at indeterminate times. That means that all sorts of code simply isn't going to work right; you add an entry to the map from one thread, and if you then access that map from another you won't see it even though a lot of time has passed – that's theoretically possible. Also, internally, map uses multiple fields and presumably these fields must be consistent with each other or you'll get weird behaviours (exceptions and wrong results). The JMM makes no guarantees about consistency either. The way out of this dilemma is that the JMM offers these things called 'comes-before/comes-after' relationships which give you guarantees that changes have been synced up. Using the 'synchronized' keyword is one easy way to get such relationships going.
Why not use a ConcurrentHashMap which has all the bells and whistles built in and does in fact guarantee that adding an entry from thread A and then querying it via containsKey from thread B will get you a consistent answer (which might still be 'no, that key is not in the map', because perhaps thread B got there slightly before thread A or slightly after but there's no way for you to know. It won't throw any exceptions or do something really bizarre such as returning 'false' for things you added ages ago all of a sudden).
So, whilst it's complicated, the answer is basically: Don't do that; either use a synchronized guard, or, probably the better choice: ConcurrentHashMap.

No, Read the bold part of HashMap documentation:
Note that this implementation is not synchronized.
So you should handle it:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
And suggested solutions:
This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method

#user7294900 is right.
If your application does not modifies the HashMap structurally which is build thread-safely and your application just invoke containsKey method, it's thread safe.
For instance, I've used HashMap like this:
#Component
public class SpringSingletonBean {
private Map<String, String> map = new HashMap<>();
public void doSomething() {
//
if (map.containsKey("aaaa")) {
//do something
}
}
#PostConstruct
public void init() {
// do something to initialize the map
}
}
It works well.

ConcurrentHashMap and its operations

Suppose there is a ConcurrentHashMap and there are two threads.
If both threads are reading some data from the same bucket, then my understanding says that both can read that bucket concurrently, as CHM does not block reading operations.
But suppose one thread is writing (put) to a bucket. Then, can a second thread simultaneously read (get) from the same bucket or will the second thread have to wait for the put operation to complete?
If it were Hashtable then get will have to wait until the put operation is complete. But in case of CHM how it will behave?

There is no need for speculation. The source code for ConcurrentHashMap is open, and anyone can read it. (This is JDK 8 build 128, the first JDK 8 release candidate.)
You should have no trouble understanding it, as it's only 6,300 lines long. :-) Actually, a good fraction of this is comments, and most of the code goes toward handling edge cases. The straightforward paths of get() and put() aren't terribly complicated and are only a few dozen lines of code.
Your understanding of read operations (get(), contains()) is correct; there is no blocking. Hashing to a bucket and searching within the bucket, if necessary, is straightforward, with no locking. Memory visibility is ensured by volatile reads. (At lines 622-623, the val and next fields of Node are volatile.) Read operations proceed concurrently with other reads and also with writes to the same bucket.
The policy for removing and replacing values is fairly straightforward in that the head of the bucket is locked while the bucket is being searched and modified. See the synchronized block at line 1117 of replaceNode. A put that adds to an existing bucket is similar; see the synchronized block at line 1027 of putVal. These operations will of course block other threads attempting to remove, replace, or add entries to this same bucket. If a value is in the midst of being replaced, a thread that is getting the value for this key will see either the old value or the new value, depending on whether the reading thread finds the node before or after the value is replaced by the writing thread.
There is a special case for putting the first element into a bucket. At lines 1018-1020, if putVal finds a bucket empty, it will create a new Node and CAS (compare-and-swap) it into place. If this succeeds, the operation is complete. If two threads are attempting to add nodes into the same bucket more-or-less simultaneously, the CAS for the first will succeed, and the CAS for the second will fail. But note that this code is within a for-loop (line 1014). The thread whose CAS has failed simply goes around the loop and retries. In fact, all the other write operations are within a loop. The general approach is that operations proceed optimistically but are checked for concurrent writers. If the optimistic attempt fails, the operation is retried and goes through a (possibly) different path based on the now updated state.

Hi as Per my knowledge ConcurrentHashMap allows multiple readers to read concurrently without any blocking. This is achieved by partitioning Map into different parts based on concurrency level and locking only a portion of Map during updates. Default concurrency level is 16, and accordingly Map is divided into 16 part and each part is governed with different lock. This means, 16 thread can operate on Map simultaneously, until they are operating on different part of Map. This makes ConcurrentHashMap high performance despite keeping thread-safety intact. Though, it comes with caveat. Since update operations like put(), remove(), putAll() or clear() is not synchronized, concurrent retrieval may not reflect most recent change on Map.
I hope this will help..

This is from the JavaDocs of ConcurrentHashMap class:
"Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset"

In Hastable concurrent operations will lock the whole collection, but in ConcurrentHashMap only one bucket will be locked.

From the doc:
A hash table supporting full concurrency of retrievals and adjustable
expected concurrency for updates. This class obeys the same functional
specification as Hashtable, and includes versions of methods
corresponding to each method of Hashtable. However, even though all
operations are thread-safe, retrieval operations do not entail
locking, and there is not any support for locking the entire table in
a way that prevents all access. This class is fully interoperable with
Hashtable in programs that rely on its thread safety but not on its
synchronization details.
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset. For aggregate operations such as putAll and
clear, concurrent retrievals may reflect insertion or removal of only
some entries. Similarly, Iterators and Enumerations return elements
reflecting the state of the hash table at some point at or since the
creation of the iterator/enumeration. They do not throw
ConcurrentModificationException. However, iterators are designed to be
used by only one thread at a time.
So, you shouldn't expect operations to synchronize exactly as a Hashtable, but the same (series of) operation are threadsafe. The second highlighted sentence does not imply, but in my opinion strongly suggest, what is going on here: a put in progress, i.e. not finished, will not block a get - the get will simply not see the changes yet.
Although I have not worked myself through the whole CHM class, this piece of documentation supports my hypothesis (taken from OpenJDK 6)
static final class Segment<K,V> extends ReentrantLock implements Serializable {
/*
* Segments maintain a table of entry lists that are always
* kept in a consistent state, so can be read (via volatile
* reads of segments and tables) without locking. This
* requires replicating nodes when necessary during table
* resizing, so the old lists can be traversed by readers
* still using old version of table.
When an update is "complete" doesn't seem to be explicitly defined; generally as soon as the new bucket is linked into the list of buckets, I guess. CHM also makes heavy use of volatile fields to ensure that threads read the most recent buckets in the list.

Iteration of ConcurrentHashMap

I was reading about ConcurrentHashMap.
I read that it provides an Iterator that requires no synchronization and even allows the Map to be modified during iteration and thus there will be no ConcurrentModificationException.
I was wondering if this is a good thing as I might not get the element, put into ConcurrentHashMap earlier, during iteration as another thread might have changed it.
Is my thinking correct? If yes, is it good or bad?

I was wondering if this is a good thing as I might not get the element, put into ConcurrentHashMap earlier, during iteration as another thread might have changed it.
I don't think this should be a concern - the same statement is true if you use synchronization and the thread doing the iteration happens to grab the lock and execute it's loop prior to the thread that would insert the value.
If you need some sort of coordination between your threads to ensure that some action takes place after (and only after) another action, then you still need to manage this coordination, regardless of the type of Map used.

Usually, the ConcurrentHashMap weakly consistent iterator is sufficient. If instead you want a strongly consistent iterator, then you have a couple of options:
The ctrie is a hash array mapped trie that provides constant time snapshots. There is Java source code available for the data structure.
Clojure has a PersistentHashMap that you can use - this lets you iterate over a snapshot of the data.
Use a local database, e.g. HSQLDB to store the data instead of using a ConcurrentHashMap. Use a composite primary key of key|timestamp, and when you "update" a value you instead store a new entry with the current timestamp. To get an iterator, retrieve a resultset with a where timetamp < System.currentTimeMillis() clause, and iterate over the resultset.
In either case you're iterating over a snapshot, so you've got a strongly consistent iterator; in the former case you run the risk of running out of memory, while the latter case is a more complex solution.

The whole point of concurrent -anything is that you acknowledge concurrent activity, and don't trust that all access is serialized. With most collections, you cannot expect inter-element consistency without working for it.
If you don't care about seeing the latest data, but want a consistent (but possibly old) view of data, have a look at purely functional structures like Finger Trees.

Is synchronization necessary for unmodifiable maps?

I'm using JDK 1.4...so I don't have access to the nice concurrency stuff in 1.5+.
Consider the following class fragment:
private Map map = Collections.EMPTY_MAP;
public Map getMap() {
return map;
}
public synchronized void updateMap(Object key, Object value) {
Map newMap = new HashMap(map);
newMap.put(key, value);
map = Collections.unmodifiableMap(newMap);
}
Is it necessary to synchronize (or volatile) the map reference given that I will only be allowed to update the map via the updateMap method (which is synchronized)?
The map object will be accessed (read) in multiple threads, especially via Iterators. Knowing that Iterators will throw an exception if the back-end map's structure is changed, I figured I would make the map unmodifiable. So as I change the structure of the map via updateMap, the existing iterators will continue working on the "old" map (which is fine for my purposes).
The side effect is, I don't have to worry about synchronizing reads. In essense, I'm going to have a much greater magnitude of reads compared to writes. The threads that are currently iterating over the map object will continue to do so and any new threads that kick off will grab the latest map. (well, I assume it will considering erickson's comments here - Java concurrency scenario -- do I need synchronization or not?)
Can somebody comment on whether or not this idea is a good one?
Thanks!

You should use the volatile keyword, to ensure that Threads will see the most recent Map version. Otherwise, without synchronization, there is no guarantee that other threads will ever see anything except the empty map.
Since your updateMap() is synchronized, each access to it will see the latest value for map. Thus, you won't lose any updates. This is guaranteed. However, since your getMap() is not synchronized and map is not volatile, there is no guarantee that a thread will see the latest value for map unless that thread itself was the most recent thread to update the map. Use of volatile will fix this.
However, you do have access to the Java 1.5 and 1.6 concurrency additions. A backport exists. I highly recommend use of the backport, as it will allow easy migration to the JDK concurrency classes when you are able to migrate to a later JDK, and it allows higher performance than your method does. (Although if updates to your map are rare, your performance should be OK.)

Technically, if you use volatile map, you don't need to synchronize updateMap (updating 'map' is atomic, and all other instructions in that method operate on objects local to the current thread).

I know this isnt part of your question. But if you are worried about concurrency and are not using java 1.5 for ConcurrentHashMap. Use an immutable instance of a Hashtable. It is a blocking concurrent map implementation which handles all concurrency writes and reads.

ConcurrentHashMap and putAll() method

Normally (ie. not concurrently), putAll() cannot be more efficient than using lot's of calls to put(), even assuming that you exclude the cost of building the other Map that you pass to putAll(). That's because putAll() will need to iterate the passed Map's elements, as well as running through the algorithm for adding each key value pair to the Map that put() executes.
But for a ConcurrentHashMap, does it make sense to construct a regular Map and then use putAll() to update it? Or should I just do 10 (or 100, or 1000) calls to put()?
Does the answer change for multiple calls to putIfAbsent()?
Thanks!

The first (mostly) threadsafe Map in Java Collections was the synchronized HashMap using Collections.synchronizedMap(). It worked by only allowing one operation at a time. Java 5 added the ConcurrentHashMap, which works differently. Basically the Map is divided into slices. A put() operation will only lock the relevant slice. It also added the threadsafe primitives like putIfAbsent().
The reason I explain this is that putAll() might be more or less efficient depending on how it's implemented. It may work by locking the whole map, which may actually be more efficient than trying to obtain individual locks on every put(). Or it may work by doing a bunch of put() calls anyway, in which case there's not much difference.
So if it makes sense for your code and you're doing a lot of updates at once, I would use putAll() unless it's putIfAbsent() that you're after.
Edit: I just checked, the Java 6 ConcurrentHashMap implements putAll() as a loop of put() operations so it's no better or worse than doing this yourself.

putAll() just delegates to put(). There's no need to build up an intermediate map if you don't have it already. You can see this in the source code, and it doesn't matter what Java implementation you're using, since the code is public domain and shared by all of them.
Note that putAll() is not atomic, but merely has the guarantee that each individual put() is atomic.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.