minimizing lock scope for JDK8 ConcurrentHashMap check-and-set operation - java

1.
I have multiple threads updating a ConcurrentHashMap.
Each thread appends a list of integers to the Value of a map entry based on the key.
There is no removal operation by any thread.
The main point here is that I want to minimize the scope of locking and synchronization as much as possible.
I saw that the doc of computeIf...() method says "Some attempted update operations on this map by other threads may be blocked while computation is in progress", which is not so encouraging. On the other hand, when I look into its source code, I failed to observe where it locks/synchronizes on the entire map.
Therefore, I wonder about the comparison of theoretical performance of using computeIf...() and the following home-grown 'method 2'.
2.
Also, I feel that the problem I described here is perhaps one of the most simplified check-n-set (or generally a 'compound') operation you can carry out on ConcurrentHashMap.
Yet I'm not quite confident and can't quite find much guideline about how to do even this kind of simple compound operations on ConcurrentHashMap, without Locking/Synchronizing on the entire map.
So any general good practice advice for this will be much appreciated.
public void myConcurrentHashMapTest1() {
ConcurrentHashMap<String, List<Integer>> myMap = new ConcurrentHashMap<String, List<Integer>>();
// MAP KEY: a Word found by a thread on a page of a book
String myKey = "word1";
// -- Method 1:
// Step 1.1 first, try to use computeIfPresent(). doc says it may lock the
// entire myMap.
myMap.computeIfPresent(myKey, (key,val) -> val.addAll(getMyVals()));
// Step 1.2 then use computeIfAbsent(). Again, doc says it may lock the
// entire myMap.
myMap.computeIfAbsent(myKey, key -> getMyVals());
}
public void myConcurrentHashMapTest2() {
// -- Method 2: home-grown lock splitting (kind of). Will it theoretically
// perform better?
// Step 2.1: TRY to directly put an empty list for the key
// This may have no effect if the key is already present in the map
List<Integer> myEmptyList = new ArrayList<Integer>();
myMap.putIfAbsent(myKey, myEmptyList);
// Step 2.2: By now, we should have the key present in the map
// ASSUMPTION: no thread does removal
List<Integer> listInMap = myMap.get(myKey);
// Step 2.3: Synchronize on that list, append all the values
synchronized(listInMap){
listInMap.addAll(getMyVals());
}
}
public List<Integer> getMyVals(){
// MAP VALUE: e.g. Page Indices where word is found (by a thread)
List<Integer> myValList = new ArrayList<Integer>();
myValList.add(1);
myValList.add(2);
return myValList;
}

You're basing your assumption (that using ConcurrentHashMap as intended will be too slow for you) on a misinterpretation of the Javadoc. The Javadoc doesn't state that the whole map will be locked. It also doesn't state that each computeIfAbsent() operation performs pessimistic locking.
What could actually be locked is a bin (a.k.a. bucket) which corresponds to a single element in the internal array backing of the ConcurrentHashMap. Note that this is not Java 7's map segment containing multiple buckets. When such a bin is locked, potentially blocked operations are solely updates for keys that hash to the same bin.
On the other hand, your solution doesn't mean that all internal locking within ConcurrentHashMap is avoided - computeIfAbsent() is just one of the methods that can degrade to using a synchronized block while updating. Even the putIfAbsent() with which you're initially putting an empty list for some key, can block if it doesn't hit an empty bin.
What's worse though is that your solution doesn't guarantee the visibility of your synchronized bulk updates. You are guaranteed that a get() happens-before a putIfAbsent() which value it observes, but there's no happens-before between your bulk updates and a subsequent get().
P.S. You can read further about the locking in ConcurrentHashMap in its OpenJDK implementation: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/concurrent/ConcurrentHashMap.java, lines 313-352.

As already explained by Dimitar Dimitrov, a compute… method doesn’t generally lock the entire map. In the best case, i.e. there’s no need to increase the capacity and there’s no hash collision, only the mapping for the single key is locked.
However, there are still things you can do better:
generally, avoid performing multiple lookups. This applies to both variants, using computeIfPresent followed by computeIfAbsent, as well as using putIfAbsent followed by get
it’s still recommended to minimize the code executed when holding a lock, i.e. don’t invoke getMyVals() while holding the lock as it doesn’t depend on the map’s state
Putting it together, the update should look like:
// compute without holding a lock
List<Integer> toAdd=getMyVals();
// update the map
myMap.compute(myKey, (key,val) -> {
if(val==null) val=toAdd; else val.addAll(toAdd);
return val;
});
or
// compute without holding a lock
List<Integer> toAdd=getMyVals();
// update the map
myMap.merge(myKey, toAdd, (a,b) -> { a.addAll(b); return a; });
which can be simplified to
myMap.merge(myKey, getMyVals(), (a,b) -> { a.addAll(b); return a; });

Related

How to prevent threads from reading inconsistent data from ConcurrentHashMap?

I'm using a ConcurrentHashMap<String, String> that works as a cache, and where read operations are performed to validate if an element is already in the cache and write operations to add an element to the cache.
So, my question is: what are the best practices to always read the most recent ConcorrentHashMap values?
I want to ensure data consistency and not have cases like:
With the map.get("key") method, the first thread validates that this key does not yet exist in the map, then it does the map.put("value")
The second thread reads the data before the first thread puts the element on the map, leading to inconsistent data.
Code example:
Optional<String> cacheValue = Optional.ofNullable(cachedMap.get("key"));
if (cacheValue.isPresent()) {
// Perform actions
} else {
cachedMap.putIfAbsent("key", "value");
// Perform actions
}
How can I ensure that my ConcurrentHashMap is synchronized and doesn't retrieve inconsistent data?
Should I perform these map operations inside a synchronized block?
You probably need to do it this way:
if (cachedMap.putIfAbsent("key", "value") == null) {
// Perform actions "IS NOT PRESENT"
} else {
// Perform actions "IS PRESENT"
}
Doing it in two checks is obviously not atomic, so if you're having problems with the wrong values getting put in the cache, then that's likely your problem.
what are the best practices to always read the most recent ConcurrentHashMap values?
Oracle's Javadoc for ConcurrentHashMap says, "Retrievals reflect the results of the most recently completed update operations holding upon their onset." In other words, any time you call map.get(...) or any other method on the map, you are always working with the "most recent" content.
*BUT*
Is that enough? Maybe not. If your program threads expect any kind of consistency between two or more keys in the map, or if your threads expect any kind of consistency between something that is stored in the map and something that is stored elsewhere, then you are going to need to provide some explicit higher-level synchronization between the threads.
I can't provide an example that would be specific to the problem that's puzzling you because your question doesn't really say what that problem is.

Thread safety in Set obtained from a cache

I stumbled upon the following piece of code:
public static final Map<String, Set<String>> fooCacheMap = new ConcurrentHashMap<>();
this cache is accessed from rest controller method:
public void fooMethod(String fooId) {
Set<String> fooSet = cacheMap.computeIfAbsent(fooId, k -> new ConcurrentSet<>());
//operations with fooSet
}
Is ConcurrentSet really necessary? when I know for sure that the set is accessed only in this method?
As you use it in the controller then multiple threads can call your method simultaneously (ex. multiple parallel requests can call your method)
As this method does not look like synchronized in any way then ConcurrentSet is probably necessary here.
Is ConcurrentSet really necessary?
Possibly, possibly not. We don't know how this code is being used.
However, assuming that it is being used in a multithreaded way (specifically: that two threads can invoke fooMethod concurrently), yes.
The atomicity in ConcurrentHashMap is only guaranteed for each invocation of computeIfAbsent. Once this completes, the lock is released, and other threads are able to invoke the method. As such, access to the return value is not atomic, and so you can get thread inference when accessing that value.
In terms of the question "do I need `ConcurrentSet"? No: you can do it so that accesses to the set are atomic:
cacheMap.compute(fooId, (k, fooSet) -> {
if (fooSet == null) fooSet = new HashSet<>();
// Operations with fooSet
return v;
});
Using a concurrent map will not guarantee thread safety. Additions to the Map need to be performed in a synchronized block to ensure that two threads don't attempt to add the same key to the map. Therefore, the concurrent map is not really needed, especially because the Map itself is static and final. Furthermore, if the code modifies the Set inside the Map, which appears likely, that needs to be synchronized as well.
The correct approach is to the Map is to check for the key. If it does not exist, enter a synchronized block and check the key again. This guarantees that the key does not exist without entering a synchronized block every time.
Set modifications should typically occur in a synchronized block as well.

Thread safe or not? Updating a not-thread-safe-map from a parallel stream

The code snippet below updates a not-thread-safe map (itemsById is not thread safe) from a parallel stream's forEach block.
// Update stuff in `itemsById` by iterating over all stuff in newItemsById:
newItemsById.entrySet()
.parallelStream()
.unordered()
.filter(...)
.forEach(entry -> {
itemsById.put(entry.getKey(), entry.getValue()); <-- look
});
To me, this looks like not-thread-safe, because the parallel stream will call the forEach block in many threads at the same time, and thus call itemsById.put(..) in many threads at the same time, and itemsById isn't thread safe. (However, with a ConcurrentMap the code would be safe I think)
I wrote to a colleague: "Please note that the map might allocate new memory when you insert new data. That's likely not thread safe, since the collection is not thread safe. -- Whether or not writing to different keys from many threads, is thread safe, is implementation dependent, I would think. It's nothing I would choose to rely on."
He however says that the above code is thread safe. -- Is it?
((Please note: I don't think this question is too localized. Actually now with Java 8 I think fairly many people will do something like: parallelStream()...foreach(...) and then it might be good know about thread safety issues, for many people))
You're right: this code is not thread-safe and depending on the Map implementation and race condition may produce any random effect: correct result, silent loss of data, some exception or endless loop. You may easily check it like this:
int equal = 0;
for(int i=0; i<100; i++) {
// create test input map like {0 -> 0, 1 -> 1, 2 -> 2, ...}
Map<Integer, Integer> input = IntStream.range(0, 200).boxed()
.collect(Collectors.toMap(x -> x, x -> x));
Map<Integer, Integer> result = new HashMap<>();
// write it into another HashMap in parallel way without key collisions
input.entrySet().parallelStream().unordered()
.forEach(entry -> result.put(entry.getKey(), entry.getValue()));
if(result.equals(input)) equal++;
}
System.out.println(equal);
On my machine this code usually prints something between 20 and 40 instead of 100. If I change HashMap to TreeMap, it usually fails with NullPointerException or becomes stuck in the infinite loop inside TreeMap implementation.
I'm no expert on streams but I assume there is no fancy synchronization employed here and thus I wouldn't consider adding elements to itemsById in parallel as threadsafe.
One of the things that could happen would be an endless loop since if both elements would happen to end up in the same bucket the underlying list might be messed up and elements could refer to each other in a cycle (A.next = B, B.next = A). A ConcurrentHashMap would prevent that by synchronizing write access on the bucket, i.e. unless the elements end up in the same bucket it would not block but if they do the add is sequential.
This code is not thread-safe.
Oracle docs state:
Operations like forEach and peek are designed for side effects; a
lambda expression that returns void, such as one that invokes
System.out.println, can do nothing but have side effects. Even so, you
should use the forEach and peek operations with care; if you use one
of these operations with a parallel stream, then the Java runtime may
invoke the lambda expression that you specified as its parameter
concurrently from multiple threads.

Sort array based on constantly changing map

I have a ConcurrentHashMap which is asynchronously updated to mirror the data in a database. I am attempting to sort an array based on this data which works fine most of the time but if the data updates while sorting then things can get messy.
I have thought of copying the map and then sorting with the copied map but due to the frequency I need to sort and the size of the map this is not a possibility.
I'm not sure I understood your requirements perfectly, so I'll deal with two separate cases.
Let's say your async "update" operation requires you to update 2 keys in your map.
Scenario 1 is : it's OK if a "sort" operation occurs while only 1 of the two updates is visible.
Scenario 2 is : you need the 2 updates to be visible simultaneously or not at all (which is called atomic behavior).
Case 1 : you do not need atomic bulk updates
In this case, ConcurrentHashMap is OK as is, seeing iterators are guaranteed not to fail upon modification of the map. From ConcurrentHashMap documentation (emphasis mine) :
Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
So you are guaranteed that you can iterate through the map even while it is being modified without the iteration crashing for concurrent modifications. But (see the emphasis) you are NOT guaranteed that all modifications made concurrently to the map are immediately visible not even if only part of them are, and in which order.
Case 2 : you need bulk updates to be atomic
Further more with ConcurrentHashMap, you do not have any guarantee that bulk operations (putAll) will behave atomically :
For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries.
So I see two scenarios for working this case, each of which entail locking.
Solution 1 : building a copy
Building a "frozen" copy can help you only if this copy is built during a phase where all other updates are locked, because the copying of your map implies iterating through it, and our hypothesis is that iteration is not safe if we have concurrent modification.
This could look like :
ConcurrentMap<String, String> map = new ConcurrentHashMap<String, String>(); //
AtomicReference<Map<String, String>> frozenCopy = new AtomicReference<Map<String, String>>(map);
public void sortOperation() {
sortUsingFrozenCopy();
}
public void updateOperation() {
synchronized (map) { // Exclusive access to the map instance
updateMap();
Map<String, String> newCopy = new HashMap<String, String>();
newCopy.putAll(map); // You build the copy. This is safe thanks to the exclusive access.
frozenCopy.set(newCopy); // And you update the reference to the copy
}
}
This solution could be refined...
Seeing your 2 operations (map read and map writes) are totally asynchronous, one can assume that your read operations can not know (and should not care) wether the previous write operation occured 0.1 sec before or will occur 0.1 sec after.
So having your read operations depend on a "frozen copy" of the map that is actually updated once every 1 (or 2, or 5, or 10) seconds (or update events) instead of each time may be a possibility for your case.
Solution 2 : lock the map for updates
Locking the Map without copying it is a solution. You'd want a ReadWriteLock (or StampedLock in Java 8) so as to have multiple sorts possible, and a mutual exclusion of read and write operations.
Solution 2 is actually easy to implement. You'd have something like
ReadWriteLock lock = new ReentrantReadWriteLock();
public void sortOperation() {
lock.readLock().lock();
// read lock granted, which prevents writeLock to be granted
try {
sort(); // This is safe, nobody can write
} finally {
lock.readLock().unlock();
}
}
public void updateOperation() {
lock.writeLock().lock();
// Write lock granted, no other writeLock (except to myself) can be granted
// nor any readLock
try {
updateMap(); // Nobody is reading, that's OK.
} finally {
lock.writeLock().unlock();
}
}
With a ReadWriteLock, multiple reads can occur simultaneously, or a single write, but not multiple writes nor reads and writes.
You'd have to consider the possibility of using a fair variant of the lock, so that you are sure that every read and write process will eventually have a chance of being executed, depending on your usage pattern.
(NB : if you use Locking/synchronized, your Map may not need to be concurrent, as write and read operations will be exclusive, but this is another topic).

concurrent HashMap: checking size

Concurrent Hashmap could solve synchronization issue which is seen in hashmap. So adding and removing would be fast if we are using synchronize key work with hashmap. What about checking hashmap size, if mulitple threads checking concurrentHashMap size? do we still need synchronzation key word: something as follows:
public static synchronized getSize(){
return aConcurrentHashmap.size();
}
concurentHashMap.size() will return the size known at the moment of the call, but it might be a stale value when you use that number because another thread has added / removed items in the meantime.
However the whole purpose of ConcurrentMaps is that you don't need to synchronize it as it is a thread safe collection.
You can simply call aConcurrentHashmap.size(). However, you have to bear in mind that by the time you get the answer it might already be obsolete. This would happen if another thread where to concurrently modify the map.
You don't need to use synchronized with ConcurretnHashMap except in very rare occasions where you need to perform multiple operations atomically.
To just get the size, you can call it without synchronization.
To clarify when I would use synchronization with ConcurrentHashMap...
Say you have an expensive object you want to create on demand. You want concurrent reads, but also want to ensure that values are only created once.
public ExpensiveObject get(String key) {
return map.get(key); // can work concurrently.
}
public void put(String key, ExepensiveBuilder builder) {
// cannot use putIfAbsent because it needs the object before checking.
synchronized(map) {
if (!map.containsKey(key))
map.put(key, builder.create());
}
}
Note: This requires that all writes are synchronized, but reads can still be concurrent.
The designers of ConcurrentHashMap thought of giving weightage to individual operations like : get(), put() and remove() over methods which operate over complete HashMap like isEmpty() or size(). This is done because the changes of these methods getting called (in general) are less than the other individual methods.
A synchronization for size() is not needed here. We can get the size by calling concurentHashMap.size() method. This method may return stale values as other thread might modify the map in the meanwhile. But, this is explicitely assumed to be broken as these operations are deprioritized.
ConcorrentHashMap is fail-safe. it won't give any concurrent modification exceptions. it works good for multi threaded operations.
The whole implementation of ConcurrentHashMap is same as HashMap but the while retrieving the elements , HashMap locks whole map restricting doing further modifications which gives concurrent modification exception.'
But in ConcurrentHashMap, the locking happens at bucket level so the chance of giving concurrent modification exception is not present.
So to answer you question here, checking size of ConcurrentHashMap doesn't help because , it keeps chaining based on the operations or modification code that you write on the map. It has size method which is same from the HashMap.

Categories

Resources