Sort array based on constantly changing map - java

I have a ConcurrentHashMap which is asynchronously updated to mirror the data in a database. I am attempting to sort an array based on this data which works fine most of the time but if the data updates while sorting then things can get messy.
I have thought of copying the map and then sorting with the copied map but due to the frequency I need to sort and the size of the map this is not a possibility.

I'm not sure I understood your requirements perfectly, so I'll deal with two separate cases.
Let's say your async "update" operation requires you to update 2 keys in your map.
Scenario 1 is : it's OK if a "sort" operation occurs while only 1 of the two updates is visible.
Scenario 2 is : you need the 2 updates to be visible simultaneously or not at all (which is called atomic behavior).
Case 1 : you do not need atomic bulk updates
In this case, ConcurrentHashMap is OK as is, seeing iterators are guaranteed not to fail upon modification of the map. From ConcurrentHashMap documentation (emphasis mine) :
Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
So you are guaranteed that you can iterate through the map even while it is being modified without the iteration crashing for concurrent modifications. But (see the emphasis) you are NOT guaranteed that all modifications made concurrently to the map are immediately visible not even if only part of them are, and in which order.
Case 2 : you need bulk updates to be atomic
Further more with ConcurrentHashMap, you do not have any guarantee that bulk operations (putAll) will behave atomically :
For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries.
So I see two scenarios for working this case, each of which entail locking.
Solution 1 : building a copy
Building a "frozen" copy can help you only if this copy is built during a phase where all other updates are locked, because the copying of your map implies iterating through it, and our hypothesis is that iteration is not safe if we have concurrent modification.
This could look like :
ConcurrentMap<String, String> map = new ConcurrentHashMap<String, String>(); //
AtomicReference<Map<String, String>> frozenCopy = new AtomicReference<Map<String, String>>(map);
public void sortOperation() {
sortUsingFrozenCopy();
}
public void updateOperation() {
synchronized (map) { // Exclusive access to the map instance
updateMap();
Map<String, String> newCopy = new HashMap<String, String>();
newCopy.putAll(map); // You build the copy. This is safe thanks to the exclusive access.
frozenCopy.set(newCopy); // And you update the reference to the copy
}
}
This solution could be refined...
Seeing your 2 operations (map read and map writes) are totally asynchronous, one can assume that your read operations can not know (and should not care) wether the previous write operation occured 0.1 sec before or will occur 0.1 sec after.
So having your read operations depend on a "frozen copy" of the map that is actually updated once every 1 (or 2, or 5, or 10) seconds (or update events) instead of each time may be a possibility for your case.
Solution 2 : lock the map for updates
Locking the Map without copying it is a solution. You'd want a ReadWriteLock (or StampedLock in Java 8) so as to have multiple sorts possible, and a mutual exclusion of read and write operations.
Solution 2 is actually easy to implement. You'd have something like
ReadWriteLock lock = new ReentrantReadWriteLock();
public void sortOperation() {
lock.readLock().lock();
// read lock granted, which prevents writeLock to be granted
try {
sort(); // This is safe, nobody can write
} finally {
lock.readLock().unlock();
}
}
public void updateOperation() {
lock.writeLock().lock();
// Write lock granted, no other writeLock (except to myself) can be granted
// nor any readLock
try {
updateMap(); // Nobody is reading, that's OK.
} finally {
lock.writeLock().unlock();
}
}
With a ReadWriteLock, multiple reads can occur simultaneously, or a single write, but not multiple writes nor reads and writes.
You'd have to consider the possibility of using a fair variant of the lock, so that you are sure that every read and write process will eventually have a chance of being executed, depending on your usage pattern.
(NB : if you use Locking/synchronized, your Map may not need to be concurrent, as write and read operations will be exclusive, but this is another topic).

Related

How to prevent threads from reading inconsistent data from ConcurrentHashMap?

I'm using a ConcurrentHashMap<String, String> that works as a cache, and where read operations are performed to validate if an element is already in the cache and write operations to add an element to the cache.
So, my question is: what are the best practices to always read the most recent ConcorrentHashMap values?
I want to ensure data consistency and not have cases like:
With the map.get("key") method, the first thread validates that this key does not yet exist in the map, then it does the map.put("value")
The second thread reads the data before the first thread puts the element on the map, leading to inconsistent data.
Code example:
Optional<String> cacheValue = Optional.ofNullable(cachedMap.get("key"));
if (cacheValue.isPresent()) {
// Perform actions
} else {
cachedMap.putIfAbsent("key", "value");
// Perform actions
}
How can I ensure that my ConcurrentHashMap is synchronized and doesn't retrieve inconsistent data?
Should I perform these map operations inside a synchronized block?
You probably need to do it this way:
if (cachedMap.putIfAbsent("key", "value") == null) {
// Perform actions "IS NOT PRESENT"
} else {
// Perform actions "IS PRESENT"
}
Doing it in two checks is obviously not atomic, so if you're having problems with the wrong values getting put in the cache, then that's likely your problem.
what are the best practices to always read the most recent ConcurrentHashMap values?
Oracle's Javadoc for ConcurrentHashMap says, "Retrievals reflect the results of the most recently completed update operations holding upon their onset." In other words, any time you call map.get(...) or any other method on the map, you are always working with the "most recent" content.
*BUT*
Is that enough? Maybe not. If your program threads expect any kind of consistency between two or more keys in the map, or if your threads expect any kind of consistency between something that is stored in the map and something that is stored elsewhere, then you are going to need to provide some explicit higher-level synchronization between the threads.
I can't provide an example that would be specific to the problem that's puzzling you because your question doesn't really say what that problem is.

minimizing lock scope for JDK8 ConcurrentHashMap check-and-set operation

1.
I have multiple threads updating a ConcurrentHashMap.
Each thread appends a list of integers to the Value of a map entry based on the key.
There is no removal operation by any thread.
The main point here is that I want to minimize the scope of locking and synchronization as much as possible.
I saw that the doc of computeIf...() method says "Some attempted update operations on this map by other threads may be blocked while computation is in progress", which is not so encouraging. On the other hand, when I look into its source code, I failed to observe where it locks/synchronizes on the entire map.
Therefore, I wonder about the comparison of theoretical performance of using computeIf...() and the following home-grown 'method 2'.
2.
Also, I feel that the problem I described here is perhaps one of the most simplified check-n-set (or generally a 'compound') operation you can carry out on ConcurrentHashMap.
Yet I'm not quite confident and can't quite find much guideline about how to do even this kind of simple compound operations on ConcurrentHashMap, without Locking/Synchronizing on the entire map.
So any general good practice advice for this will be much appreciated.
public void myConcurrentHashMapTest1() {
ConcurrentHashMap<String, List<Integer>> myMap = new ConcurrentHashMap<String, List<Integer>>();
// MAP KEY: a Word found by a thread on a page of a book
String myKey = "word1";
// -- Method 1:
// Step 1.1 first, try to use computeIfPresent(). doc says it may lock the
// entire myMap.
myMap.computeIfPresent(myKey, (key,val) -> val.addAll(getMyVals()));
// Step 1.2 then use computeIfAbsent(). Again, doc says it may lock the
// entire myMap.
myMap.computeIfAbsent(myKey, key -> getMyVals());
}
public void myConcurrentHashMapTest2() {
// -- Method 2: home-grown lock splitting (kind of). Will it theoretically
// perform better?
// Step 2.1: TRY to directly put an empty list for the key
// This may have no effect if the key is already present in the map
List<Integer> myEmptyList = new ArrayList<Integer>();
myMap.putIfAbsent(myKey, myEmptyList);
// Step 2.2: By now, we should have the key present in the map
// ASSUMPTION: no thread does removal
List<Integer> listInMap = myMap.get(myKey);
// Step 2.3: Synchronize on that list, append all the values
synchronized(listInMap){
listInMap.addAll(getMyVals());
}
}
public List<Integer> getMyVals(){
// MAP VALUE: e.g. Page Indices where word is found (by a thread)
List<Integer> myValList = new ArrayList<Integer>();
myValList.add(1);
myValList.add(2);
return myValList;
}
You're basing your assumption (that using ConcurrentHashMap as intended will be too slow for you) on a misinterpretation of the Javadoc. The Javadoc doesn't state that the whole map will be locked. It also doesn't state that each computeIfAbsent() operation performs pessimistic locking.
What could actually be locked is a bin (a.k.a. bucket) which corresponds to a single element in the internal array backing of the ConcurrentHashMap. Note that this is not Java 7's map segment containing multiple buckets. When such a bin is locked, potentially blocked operations are solely updates for keys that hash to the same bin.
On the other hand, your solution doesn't mean that all internal locking within ConcurrentHashMap is avoided - computeIfAbsent() is just one of the methods that can degrade to using a synchronized block while updating. Even the putIfAbsent() with which you're initially putting an empty list for some key, can block if it doesn't hit an empty bin.
What's worse though is that your solution doesn't guarantee the visibility of your synchronized bulk updates. You are guaranteed that a get() happens-before a putIfAbsent() which value it observes, but there's no happens-before between your bulk updates and a subsequent get().
P.S. You can read further about the locking in ConcurrentHashMap in its OpenJDK implementation: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/concurrent/ConcurrentHashMap.java, lines 313-352.
As already explained by Dimitar Dimitrov, a compute… method doesn’t generally lock the entire map. In the best case, i.e. there’s no need to increase the capacity and there’s no hash collision, only the mapping for the single key is locked.
However, there are still things you can do better:
generally, avoid performing multiple lookups. This applies to both variants, using computeIfPresent followed by computeIfAbsent, as well as using putIfAbsent followed by get
it’s still recommended to minimize the code executed when holding a lock, i.e. don’t invoke getMyVals() while holding the lock as it doesn’t depend on the map’s state
Putting it together, the update should look like:
// compute without holding a lock
List<Integer> toAdd=getMyVals();
// update the map
myMap.compute(myKey, (key,val) -> {
if(val==null) val=toAdd; else val.addAll(toAdd);
return val;
});
or
// compute without holding a lock
List<Integer> toAdd=getMyVals();
// update the map
myMap.merge(myKey, toAdd, (a,b) -> { a.addAll(b); return a; });
which can be simplified to
myMap.merge(myKey, getMyVals(), (a,b) -> { a.addAll(b); return a; });

is the below code not thread safe?

What's wrong with this below code?.
private Map<Integer, Integer> aMap = new ConcurrentHashMap<Integer, Integer>();
Record rec = records.get(id);
if (rec == null) {
rec = new Record(id);
records.put(id, rec);
}
return rec;
Is the above code not Thread-safe?. Why should i use putIfAbsent here in this case?.
Locking is applied only for updates. In case of of retrievals, it
allows full concurrency. What does this statement mean?.
It's not thread safe.
If there was another thread, then in the time between records.get and records.put the other thread might have put the record as well.
Read only operations (i.e. ones that do not modify a structure) can be done by multiple threads at the same time. For example, 1000 threads can safely read the value of an int. However, those 1000 threads cannot update the value of the int without some sort of locking operation.
I know that this may sound like a very unlikely event, but remember that a 1 in a million event happens 1000 times per second at 1GHz.
This is thread safe:
private Map<Integer, Integer> aMap = new ConcurrentHashMap<Integer, Integer>();
// presumably aMap is a member and the code below is in a function
aMap.putIfAbsent(id, new Record(id))
Record rec = records.get(id);
return rec;
Note that this might create a Record and never use it.
It could or could not be thread-safe, depending on how you want it to act.
By the end of the code, aMap will safely have a Record for id. However, it's possible that two threads will both create and put a Record in, such that there are two (or more, if more threads do it) Records in existence. That might be fine, and it might not be -- really depends on your application.
One of the dangers of thread-safety (for instance, if you use a normal HashMap without synchronization) is that threads can read partially-created or partially-updated objects across threads; in other words, things can go really haywire. This will not happen in your code, because ConcurrentHashMap will ensure memory is kept up-to-date between threads, and in that sense it is thread-safe.
One thing you can do is to use putIfAbsent, which will atomically put a key-value pair into the map, but only if there's nothing at that key already:
if (rec == null) {
records.putIfAbsent(id, new Record(id));
rec = records.get(id);
}
In this approach, you might create a second Record object, but if so, it'll not get inserted and will immediately be available for garbage collection. By the end of the snippet:
records will contain a Record for the given id
only one Record will have ever been put into records for that id (whether put there by this thread or another)
rec will point to that record

Java synchronized block vs concurrentHashMap vs Collections.synchronizedMap

Say If have a synchronized method and within that method, I update a hashmap like this:
public synchronized void method1()
{
myHashMap.clear();
//populate the hashmap, takes about 5 seconds.
}
now while the method1 is running and the hashmap is being re-populated, if there are other threads tring to get the value of the hashmap, I assume they will get blocked?
Now instead of using sync method, if I change hashmap to ConcurrentHashMap like below, what's the behaviour?
public void method1()
{
myConcurrentHashMap.clear();
//populate the hashmap, takes about 5 seconds.
}
what if i use Collections.synchronizedMap ? is it the same?
CHM(ConcurrentHashMap), instead of synchronizing every method on a common lock, restricting access to a single thread
at a time, it uses a finer-grained locking mechanism called lock striping to allow a greater degree of shared access. Arbitrarily many reading threads
can access the map concurrently, readers can access the map concurrently with
writers, and a limited number of writers can modify the map concurrently. The result
is far higher throughput under concurrent access, with little performance penalty for
single-threaded access.
ConcurrentHashMap, along with the other concurrent collections, further improve on
the synchronized collection classes by providing iterators that do not throw
ConcurrentModificationException, thus eliminating the need to lock the collection
during iteration.
As with all improvements, there are still a few tradeoffs. The semantics of methods
that operate on the entire Map, such as size and isEmpty, have been slightly
weakened to reflect the concurrent nature of the collection. Since the result of size
could be out of date by the time it is computed, it is really only an estimate, so size
is allowed to return an approximation instead of an exact count. While at first this
may seem disturbing, in reality methods like size and isEmpty are far less useful in
concurrent environments because these quantities are moving targets.
Secondly, Collections.synchronizedMap
It's just simple HashMap with synchronized methods - I'd call it deprecated dute to CHM
If you want to have all read and write actions to your HashMap synchronized, you need to put the synchronize on all methods accessing the HashMap; it is not enough to block just one method.
ConcurrentHashMap allows thread-safe access to your data without locking. That means you can add/remove values in one thread and at the same time get values out in another thread without running into an exception. See also the documentation of ConcurrentHashMap
you could probably do
volatile private HashMap map = newMap();
private HashMap newMap() {
HashMap map = new HashMap();
//populate the hashmap, takes about 5 seconds
return map;
}
public void updateMap() {
map = newMap();
}
A reader sees a constant map, so reads don't require synchronization, and are not blocked.

concurrent HashMap: checking size

Concurrent Hashmap could solve synchronization issue which is seen in hashmap. So adding and removing would be fast if we are using synchronize key work with hashmap. What about checking hashmap size, if mulitple threads checking concurrentHashMap size? do we still need synchronzation key word: something as follows:
public static synchronized getSize(){
return aConcurrentHashmap.size();
}
concurentHashMap.size() will return the size known at the moment of the call, but it might be a stale value when you use that number because another thread has added / removed items in the meantime.
However the whole purpose of ConcurrentMaps is that you don't need to synchronize it as it is a thread safe collection.
You can simply call aConcurrentHashmap.size(). However, you have to bear in mind that by the time you get the answer it might already be obsolete. This would happen if another thread where to concurrently modify the map.
You don't need to use synchronized with ConcurretnHashMap except in very rare occasions where you need to perform multiple operations atomically.
To just get the size, you can call it without synchronization.
To clarify when I would use synchronization with ConcurrentHashMap...
Say you have an expensive object you want to create on demand. You want concurrent reads, but also want to ensure that values are only created once.
public ExpensiveObject get(String key) {
return map.get(key); // can work concurrently.
}
public void put(String key, ExepensiveBuilder builder) {
// cannot use putIfAbsent because it needs the object before checking.
synchronized(map) {
if (!map.containsKey(key))
map.put(key, builder.create());
}
}
Note: This requires that all writes are synchronized, but reads can still be concurrent.
The designers of ConcurrentHashMap thought of giving weightage to individual operations like : get(), put() and remove() over methods which operate over complete HashMap like isEmpty() or size(). This is done because the changes of these methods getting called (in general) are less than the other individual methods.
A synchronization for size() is not needed here. We can get the size by calling concurentHashMap.size() method. This method may return stale values as other thread might modify the map in the meanwhile. But, this is explicitely assumed to be broken as these operations are deprioritized.
ConcorrentHashMap is fail-safe. it won't give any concurrent modification exceptions. it works good for multi threaded operations.
The whole implementation of ConcurrentHashMap is same as HashMap but the while retrieving the elements , HashMap locks whole map restricting doing further modifications which gives concurrent modification exception.'
But in ConcurrentHashMap, the locking happens at bucket level so the chance of giving concurrent modification exception is not present.
So to answer you question here, checking size of ConcurrentHashMap doesn't help because , it keeps chaining based on the operations or modification code that you write on the map. It has size method which is same from the HashMap.

Categories

Resources