I am using a ConcurrentHashMap, and I need to iterate over all its elements when calculating a new element that is not present yet and do some other modifications possibly over the same map.
I wanted those operations be atomic, and block the ConcurrentHashMap to prevent from getting an exception derived from concurrency.
The solution I programmed was to synchronize the ConcurrentHashMap object with itself as lock, but Sonar reports a major issue, so I do not know whether that solution is correct
Proposed code:
Modification to the original text
public class MyClass<K, V> {
ConcurrentHashMap<K, V> map = new ConcurrentHashMap<>();
public V get(K key) {
return map.computeIfAbsent(key, this::calculateNewElement);
}
protected V calculateNewElement(K key) {
V result;
// the following line throws the sonar issue:
synchronized(map) {
// calculation of the new element (assignating it to result)
// with iterations over the whole map
// and possibly with other modifications over the same map
}
return result;
}
}
This code triggers a Sonar major issue:
Multi-threading - Synchronization performed on util.concurrent
instance
findbugs:JLM_JSR166_UTILCONCURRENT_MONITORENTER
This method performs synchronization on an object that is an instance
of a class from the java.util.concurrent package (or its subclasses).
Instances of these classes have their own concurrency control
mechanisms that are orthogonal to the synchronization provided by the
Java keyword synchronized. For example, synchronizing on an
AtomicBoolean will not prevent other threads from modifying the
AtomicBoolean.
Such code may be correct, but should be carefully reviewed and
documented, and may confuse people who have to maintain the code at a
later date.
If you have to change many nodes for each update, maybe you're using the wrong data structure. Check out concurrent implementations of trees. A persistent collection (that provides immutability plus fast updates) would seem ideal.
There is a method provided for atomic updates:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/concurrent/ConcurrentHashMap.html#compute(K,java.util.function.BiFunction)
ConcurrentHashMap is built to allow a high degree of concurrent access. (See this article describing its inner workings) If an update is made to an entry using the provided means (such as compute, computeIfPresent, etc.) that should lock only the segment the entry is in, not the whole thing.
When you lock the whole map for an update you're not getting the benefit of using this specialized data structure. That's what Sonar is complaining about.
There is also the issue that readers have to do locking too, updater threads aren't the only ones that need to lock. This kind of thing is why CHM was invented in the first place.
https://www.amazon.com.tr/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601.
"Concurrent objects do not support 'client side-locking'"
You could perform client side-locking on synchronized collections like.
final List<Type> synchronizedList = Collections.synchronizedList(new ArrayList<>()); //do not use another reference to internal array list and access the list using through synchronizedList reference.
In this case, you could use;
synchronized(synchronizedList){
//do something with synchronized list.
}
NOTE: This might perform badly, namely introduce scability issues because the code is highly serialized. (Amdal's Law).
Concurrent objects are designed for scability. Maybe you can take a snapshot of the map to another 'local' collections and do operations on them. Or you can directly use map without any synchronization. (In this case, some new elements could be added or deleted and your iterator might or might not reflect those changes)
"ConcurrentHashMap, along with the other concurrent collections, further improve on the synchronized collection classes by providing iterators that do not throw ConcurrentModificationException, thus eliminating the need to lock the collection during iteration. The iterators returned by ConcurrentHashMap are weakly consistent instead of fail-fast. A weakly consistent iterator can tolerate concurrent modification, traverses elements as they existed when the iterator was constructed, and may (but is not guaranteed to) reflect modifications to the collection after the construction of the iterator."
There are operations on ConcurrentHashMap which allows you to perform atomic operations on a specific key like compute, computeIfAbsent, computeIfPresent.
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
You could replace your combination of synchronization and access to the map with a "regular" collection and the use of a ReadWriteLock (e.g. the java.util.concurrent.locks.ReentrantReadWriteLock)
See this part of the description of the java.util.concurrent package:
The "Concurrent" prefix used with some classes in this package is a shorthand indicating several differences from similar "synchronized" classes. For example java.util.Hashtable and Collections.synchronizedMap(new HashMap()) are synchronized. But ConcurrentHashMap is "concurrent". A concurrent collection is thread-safe, but not governed by a single exclusion lock. In the particular case of ConcurrentHashMap, it safely permits any number of concurrent reads as well as a large number of concurrent writes. "Synchronized" classes can be useful when you need to prevent all access to a collection via a single lock, at the expense of poorer scalability. In other cases in which multiple threads are expected to access a common collection, "concurrent" versions are normally preferable. And unsynchronized collections are preferable when either collections are unshared, or are accessible only when holding other locks.
From the docs of ReadWriteLock:
A ReadWriteLock maintains a pair of associated locks, one for read-only operations and one for writing. The read lock may be held simultaneously by multiple reader threads, so long as there are no writers. The write lock is exclusive.
The "reentrant" implementation mimicks the behaviour of a synchronized block:
(from the docs of ReentrantLock)
A reentrant mutual exclusion Lock with the same basic behavior and semantics as the implicit monitor lock accessed using synchronized methods and statements, but with extended capabilities.
Your code for it could look like this:
public class MyClass<K, V> {
private final Map<K, V> map = new HashMap<>();
private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
private final Lock readLock = lock.readLock();
private final Lock writeLock = lock.writeLock();
public V get(K key) {
readLock.lock();
try {
return map.computeIfAbsent(key, this::calculateNewElement);
} finally {
readLock.unlock();
}
}
protected V calculateNewElement(K key) {
readLock.unlock();
writeLock.lock();
try {
V result;
// calculation of the new element (assigning it to result)
// with iterations over the whole map
// and possibly with other modifications over the same map
return result;
} finally {
writeLock.unlock();
}
}
public V put(K key, V value) {
writeLock.lock();
try {
return map.put(key, value);
} finally {
writeLock.unlock();
}
}
}
With this implementation reads are blocked while a write is happening and vice versa. Multiple reads are still possible at the same time, but write is exclusive.
But you have to take care that the map doesn't "escape" the object and is accessed somehow differently - also inside the class you have to protect all the access to the map with the lock.
The JavaDocs of ReentrantReadWriteLock provides you with examples and some conditions you should be aware of (e.g. a size limit of locks).
Thanks to all answers, I could finally program a solution.
public class MyClass<K, V> {
ConcurrentHashMap<K, V> map = new ConcurrentHashMap<>();
public V get(K key) {
V result = map.get(key);
if(result == null) {
result = calculateNewElement(key);
}
return
}
public synchronized void put(K key, V value) {
map.put(key, value);
}
protected synchronized V calculateNewElement(K key) {
V result = map.get(key);
if(result == null) {
// calculation of the new element (assignating it to result)
// with iterations over the whole map
// and possibly with other modifications over the same map
put(key, result);
}
return result;
}
}
I will describe the problem a little more:
Description of the particular problem that the solucion tries to solve:
K has two attributes which are objects of Class<?> type (let's call them originClass and destinationClass),
and V is a translator from a Pojo with a originClass ( or one of its superclasses) to a Pojo with destinationClass (or one of its derived classes), so it fits the translation originClass -> destinationClass
and when V is not found for a particular K, then
calculateNewElement function tries to look for a non direct path between an originClass and destinationClass,
(that means that we might have a key (K0) with our originClass and a destinationClass(0)
and another key (K1) that has an origin(1) equals to destinationClass(0), and a destinationClass(1), which is a derived class of our destinationClass).
That could lead to a new V that is a Combination of the Keys:
K0 ( originClass(0) = originClass, destinationClass(0) ) (V0) -->
K1 ( originClass(1) = destinationClass(0), destinationClass(1) (a derived class of destinationClass) ) (V1) -->
K2 ( originClass(2) = destinationClass(1), destinationClass(2) = destinationClass) (direct translation that does not exist in the map)
We could thing of joining K1 and K2 into a new_K, this way:
put( new_K(originClass(1), destinationClass), V1 ) // this is the new different key put inside calculateNewElement
then the new V (of our original K) created by calculateNewElement would be a combination of K0 and new_K:
V v = new VCombination(K0, new_K), which also will be put by our calculateNewElement function
As in my case put function is called seldom (only during initialization), the synchronization is acceptable.
This scenario fits with the limitations that Holger metioned below. That in my case do not apply, due to the nature of the particular problem:
Only when a "halfway" element did not exactly exist yet then calculateNewElement function will put it in the map
The new elements (which are a combination of existing ones) only need that the elements of the combination exist in the map, so it is not allowed to remove elements from Map (only clearing it would be acceptable)
Related
1.
I have multiple threads updating a ConcurrentHashMap.
Each thread appends a list of integers to the Value of a map entry based on the key.
There is no removal operation by any thread.
The main point here is that I want to minimize the scope of locking and synchronization as much as possible.
I saw that the doc of computeIf...() method says "Some attempted update operations on this map by other threads may be blocked while computation is in progress", which is not so encouraging. On the other hand, when I look into its source code, I failed to observe where it locks/synchronizes on the entire map.
Therefore, I wonder about the comparison of theoretical performance of using computeIf...() and the following home-grown 'method 2'.
2.
Also, I feel that the problem I described here is perhaps one of the most simplified check-n-set (or generally a 'compound') operation you can carry out on ConcurrentHashMap.
Yet I'm not quite confident and can't quite find much guideline about how to do even this kind of simple compound operations on ConcurrentHashMap, without Locking/Synchronizing on the entire map.
So any general good practice advice for this will be much appreciated.
public void myConcurrentHashMapTest1() {
ConcurrentHashMap<String, List<Integer>> myMap = new ConcurrentHashMap<String, List<Integer>>();
// MAP KEY: a Word found by a thread on a page of a book
String myKey = "word1";
// -- Method 1:
// Step 1.1 first, try to use computeIfPresent(). doc says it may lock the
// entire myMap.
myMap.computeIfPresent(myKey, (key,val) -> val.addAll(getMyVals()));
// Step 1.2 then use computeIfAbsent(). Again, doc says it may lock the
// entire myMap.
myMap.computeIfAbsent(myKey, key -> getMyVals());
}
public void myConcurrentHashMapTest2() {
// -- Method 2: home-grown lock splitting (kind of). Will it theoretically
// perform better?
// Step 2.1: TRY to directly put an empty list for the key
// This may have no effect if the key is already present in the map
List<Integer> myEmptyList = new ArrayList<Integer>();
myMap.putIfAbsent(myKey, myEmptyList);
// Step 2.2: By now, we should have the key present in the map
// ASSUMPTION: no thread does removal
List<Integer> listInMap = myMap.get(myKey);
// Step 2.3: Synchronize on that list, append all the values
synchronized(listInMap){
listInMap.addAll(getMyVals());
}
}
public List<Integer> getMyVals(){
// MAP VALUE: e.g. Page Indices where word is found (by a thread)
List<Integer> myValList = new ArrayList<Integer>();
myValList.add(1);
myValList.add(2);
return myValList;
}
You're basing your assumption (that using ConcurrentHashMap as intended will be too slow for you) on a misinterpretation of the Javadoc. The Javadoc doesn't state that the whole map will be locked. It also doesn't state that each computeIfAbsent() operation performs pessimistic locking.
What could actually be locked is a bin (a.k.a. bucket) which corresponds to a single element in the internal array backing of the ConcurrentHashMap. Note that this is not Java 7's map segment containing multiple buckets. When such a bin is locked, potentially blocked operations are solely updates for keys that hash to the same bin.
On the other hand, your solution doesn't mean that all internal locking within ConcurrentHashMap is avoided - computeIfAbsent() is just one of the methods that can degrade to using a synchronized block while updating. Even the putIfAbsent() with which you're initially putting an empty list for some key, can block if it doesn't hit an empty bin.
What's worse though is that your solution doesn't guarantee the visibility of your synchronized bulk updates. You are guaranteed that a get() happens-before a putIfAbsent() which value it observes, but there's no happens-before between your bulk updates and a subsequent get().
P.S. You can read further about the locking in ConcurrentHashMap in its OpenJDK implementation: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/concurrent/ConcurrentHashMap.java, lines 313-352.
As already explained by Dimitar Dimitrov, a compute… method doesn’t generally lock the entire map. In the best case, i.e. there’s no need to increase the capacity and there’s no hash collision, only the mapping for the single key is locked.
However, there are still things you can do better:
generally, avoid performing multiple lookups. This applies to both variants, using computeIfPresent followed by computeIfAbsent, as well as using putIfAbsent followed by get
it’s still recommended to minimize the code executed when holding a lock, i.e. don’t invoke getMyVals() while holding the lock as it doesn’t depend on the map’s state
Putting it together, the update should look like:
// compute without holding a lock
List<Integer> toAdd=getMyVals();
// update the map
myMap.compute(myKey, (key,val) -> {
if(val==null) val=toAdd; else val.addAll(toAdd);
return val;
});
or
// compute without holding a lock
List<Integer> toAdd=getMyVals();
// update the map
myMap.merge(myKey, toAdd, (a,b) -> { a.addAll(b); return a; });
which can be simplified to
myMap.merge(myKey, getMyVals(), (a,b) -> { a.addAll(b); return a; });
Back to concurrency. By now it is clear that for the double checked locking to work the variable needs to be declared as volatile. But then what if double checked locking is used as below.
class Test<A, B> {
private final Map<A, B> map = new HashMap<>();
public B fetch(A key, Function<A, B> loader) {
B value = map.get(key);
if (value == null) {
synchronized (this) {
value = map.get(key);
if (value == null) {
value = loader.apply(key);
map.put(key, value);
}
}
}
return value;
}
}
Why does it really have to be a ConcurrentHashMap and not a regular HashMap? All map modification is done within the synchronized block and the code doesn't use iterators so technically there should be no "concurrent modification" problems.
Please avoid suggesting the use of putIfAbsent/computeIfAbsent as I am asking about the concept and not the use of API :) unless using this API contributes to HashMap vs ConcurrentHashMap subject.
Update 2016-12-30
This question was answered by a comment below by Holger "HashMap.get doesn’t modify the structure, but your invocation of put does. Since there is an invocation of get outside of the synchronized block, it can see an incomplete state of a put operation happening concurrently." Thanks!
This question is muddled on so many counts that its hard to answer.
If this code is only ever called from a single thread, then you're making it too complicated; you don't need any synchronization. But clearly that's not your intention.
So, multiple threads will call the fetch method, which delegates to HashMap.get() without any synchronization. HashMap is not thread-safe. Bam, end of story. Doesn't even matter if you're trying to simulate double-checked locking; the reality is that calling get() and put() on a map will manipulate the internal mutable data structures of the HashMap, without consistent synchronization on all code paths, and since you can be calling these concurrently from multiple threads, you're already dead.
(Also, you probably think that HashMap.get() is a pure read operation, but that's wrong too. What if the HashMap is actually a LinkedHashMap (which is a subclass of HashMap.) LinkedHashMap.get() will update the access order, which involves writing to internal data structures -- here, concurrently without synchronization. But even if get() is doing no writing, your code here is still broken.)
Rule of thumb: when you think you have a clever trick that lets you avoid synchronizing, you're almost certainly wrong.
Is the code below thread/concurrency safe when there are multiple threads calling the totalBadRecords() method from inside other method? Both map objects parameters to this method are ConcurrentHashMap. I want to ensure that each call updates the total properly.
If it is not safe, please explain what do I have to do to ensure thread safety.
Do I need to synchronize the add/put or is there a better way?
Do i need to synchronize the get method in TestVO. TestVO is simple java bean and having getter/setter method.
Below is my Sample code:
public void totalBadRecords(final Map<Integer, TestVO> sourceMap,
final Map<String, String> logMap) {
BigDecimal badCharges = new BigDecimal(0);
boolean badRecordsFound = false;
for (Entry<Integer, TestVO> e : sourceMap.entrySet()) {
if ("Y".equals(e.getValue().getInd()))
badCharges = badCharges.add(e.getValue()
.getAmount());
badRecordsFound = true;
}
if (badRecordsFound)
logMap.put("badRecordsFound:", badCharges.toPlainString());
}
That depends on how your objects are used in your whole application.
If each call to totalBadRecords takes a different sourceMap and the map (and its content) is not mutated while counting, it's thread-safe:
badCharges is a local variable, it can't be shared between thread, and is thus thread-safe (no need to synchronize add)
logMap can be shared between calls to totalBadRecords: the method put of ConcurrentHashMap is already synchronized (or behaves as if it was).
if instances of TestVO are not mutated, the value from getValue() and getInd() are always coherent with one other.
the sourceMap is not mutated, so you can iterate over it.
Actually, in this case, you don't need a concurrent map for sourceMap. You could even make it immutable.
If the instances of TestVO and the sourceMap can change while counting, then of course you could be counting wrongly.
It depends on what you mean by thread-safe. And that boils down to what the requirements for this method are.
At the data structure level, the method will not corrupt any data structures, because the only data structures that could be shared with other threads are ConcurrentHashMap instances, and they safe against that kind of problem.
The potential thread-safety issue is that iterating a ConcurrentHashMap is not an atomic operation. The guarantees for the iterators are such that you are not guaranteed to see all entries in the iteration if the map is updated (e.g. by another thread) while you are iterating. That means that the totalBadRecords method may not give an accurate count if some other thread modifies the map during the call. Whether this is a real thread-safety issue depends on whether or not the totalBadRecords is required to give an accurate result in that circumstance.
If you need to get an accurate count, then you have to (somehow) lock out updates to the sourceMap while making the totalBadRecords call. AFAIK, there is no way to do this using (just) the ConcurrentHashMap API, and I can't think of a way to do it that doesn't make the map a concurrency bottleneck.
In fact, if you need to calculate accurate counts, you have to use external locking for (at least) the counting operation, and all operations that could change the outcome of the counting. And even that doesn't deal with the possibility that some thread may modify one of the TestVO objects while you are counting records, and cause the TestVO to change from "good" to "bad" or vice-versa.
You could use something like the following.
That would guarantee you that after a call to the totalBadRecords method, the String representing the bad charges in the logMap is accurate, you don't have lost updates. Of course a phantom read can always happen, as you do not lock the sourceMap.
private static final String BAD_RECORDS_KEY = "badRecordsFound:";
public void totalBadRecords(final ConcurrentMap<Integer, TestVO> sourceMap,
final ConcurrentMap<String, String> logMap) {
while (true) {
// get the old value that is going to be replaced.
String oldValue = logMap.get(BAD_RECORDS_KEY);
// calculate new value
BigDecimal badCharges = BigDecimal.ZERO;
for (TestVO e : sourceMap.values()) {
if ("Y".equals(e.getInd()))
badCharges = badCharges.add(e.getAmount());
}
final String newValue = badCharges.toPlainString();
// insert into map if there was no mapping before
if (oldValue == null) {
oldValue = logMap.putIfAbsent(BAD_RECORDS_KEY, newValue);
if (oldValue == null) {
oldValue = newValue;
}
}
// replace the entry in the map
if (logMap.replace(BAD_RECORDS_KEY, oldValue, newValue)) {
// update succeeded -> there where no updates to the logMap while calculating the bad charges.
break;
}
}
}
To write a fully functional pool of Java objects, using READ/WRITE locks is not a big problem.
The problem I see is that READ operation will have to wait until the storage monitor (or something similar, depending on the model) is released, which really slows it.
So, the following requirements should be met:
READ (or GET) operation should be INSTANT - using some key, the latest version of the object should be returned immediately, without waiting for any lock.
WRITE (CREATE/UPDATE) - may be queued, reasonably delayed in time, probably waiting for some storage lock.
Any code sample?
I didn't find a question that directly targets the issue.
It popped up in some discussions, but I couldn't find a question that was fully devoted to the problems of creating such a pool in Java.
when the modification on the datastructure takes too long (for whatever reason), simply waiting and write-locking the structure will not be successful. You just cannot foresee when you will have enough time to perform the modification without blocking any reads.
the only thing you can do (try to do) is to reduce the time within the write-operation to a minimum. As #assylias stated, a CopyOnWrite* does this by cloning the datastructure upon write operations and atomically activates the modified structure when the operation is complete.
By this the read-locks will take as long as the duration of the clone-operation plus the time for switching the reference. You can work that down to small parts of the datastructure: if only states in an object change, you can modify a copy of that object and change the reference in your more complex datastructure to that copy afterwards.
The other way around is to do that copy on or before read operations. Often you return a copy of an Object via the API of you datastructure anyway, so just "cache" that copy and during the modifications let the readers access the cached copy. This is what database-caches aso do.
It depends on your model what is best for you. If you will have few writes on data that can be copied easily, CopyOnWrite will probably perform best. If you will have lots of writes you probably better provide a single "read"/cached-state of your structure and switch it from time to time.
AtomicReference<Some> datastructure = ...;
//copy on write
synchronized /*one writer*/ void change(Object modification)
throws CloneNotSupportedException {
Object copy = datastructure.clone();
apply(copy, modification);
datastructure.set(copy);
}
Object search(Object select) {
return datastructure.get().search(select);
}
// copy for read
AtomicReference<Some> cached = new AtomicReference<Some>(datastructure.get().clone());
synchronized void change(Object modification) {
apply(datastructure, modification);
cached.set(datastructure);
}
Object search(Object select) {
return cached.get().search(select);
}
For both operations there is no wait when reading .. but for the time it needs to switch the reference.
In this case you can simply use a volatile variable to avoid locking on the reader side and keep the writes exclusive with a synchronized method. volatile will add little to no overhead to reads but writes will be a little slow. This might be a good solution depending on expected throughput and read/write ratio.
class Cache {
private volatile Map<K, V> cache; //Assuming map is the right data structure
public V get(K key) {
return cache.get(key);
}
//synchronized writes for exclusive access
public synchronized void put(K key, V value) {
Map<K, V> copy = new HashMap<> (cache);
V value = copy.put(key, value);
//volatile guarantees that this will be visible from the getter
cache = copy;
return value;
}
}
Here is a totally-lock-free Java object pool solution. FYI
http://daviddengcn.blogspot.com/2015/02/a-lock-free-java-object-pool.html
I'm reading the docs for java.util.HashMap, and it says:
If multiple threads access this map concurrently, and at least one of
the threads modifies the map structurally, it must be synchronized
externally.
What does "it" mean? "it" could be interpreted to mean the thread that modifies the map, or it could mean the map itself.
Both the "safe for multiple threads reading" and "safe only on single thread when there is a writer" cases are no brainers (at least to me), which makes me believe that calling the "multiple readers and single writer" case speciffically in the documentation means the statement should be interpreted as "safe to have multiple threads reading and a single thread writing", rather than the no brainer "lock everything when you have a writer".
More so, the hashtable implementation in .Net is (unambiguously) documented as:
Hashtable is thread safe for use by multiple reader threads and a single writing thread
(the .Net classes are not thread safe by default), so there must be something to the "multiple reader threads and one single writer thread" case.
The internal elements are in an indeterminate state when a thread "modifies the map structurally" so reads could be affected. Thus the requirement to use some method external to the map to synchronize both reads and writes.
Perhaps the writers of the .Net library were more careful to keep their internal structure in a determinate state during updates.
The map itself. HashMap is not thread safe.
Look at ConcurrentHashMap, it is a thread-safe map.
You also can manage it by yourself. The code maybe looks like below
class SomeClass {
private Map<Object, Object> map = new HashMap<Object, Object>();
public synchronized void put(Object key, Object value) {
map.put(key, value);
}
public synchronized Object get(Object key) {
return map.get(key);
}
}
More safer, return the copy of the value object to avoid unexpected behaviors.
public synchronized ValueType get(Object key) {
return map.get(key).clone(); // assume that the ValueType implements Cloneable
// of course, you can return a copy in many ways you like
}
This will only allow the put method to modify the map. And all the operations will be thread-safe.