java.util.ConcurrentModificationException when combing results from Future's into HashMap - java

I'm running into an issue where I intermittently get an java.util.ConcurrentModificationException error whilst merging the results of futures. I've tried making all my HashMaps concurrent and I tried using an iterator but the error still persists and I'm totally confused to how this error is occurring. I'm not looping through the same HashMap I'm inserting too either (as done in similar questions with this error).
For my code, to start off with I build a bunch of tasks that return's HashMap<String, HashSet>, then using .invokeAll my tasks all return the above HashMap which I then try to merge together (I'm reading a large CSV to get all unique results for each column).
I start by defining all the keys.
HashMap<String, HashSet<String>> headersHashset = new HashMap<String, HashSet<String>>();
headers = Arrays.asList(br.readLine().split(delimiter));
for (String headerID : headers) {
headersHashset.put(headerID, new HashSet<>());
}
I then make my tasks, which clone the keys, do the processing and return it's result
tasks.add(() -> {
HashMap<String, HashSet<String>> localHeadersHashset = (HashMap<String, HashSet<String>>) headersHashset.clone();
for (String[] values : sampleSet.values()) { // sampleSet is a SortedMap<Integer, String[]>
int headerAsINT = 0;
for (String value : values) {
localHeadersHashset.get(headers.get(headerAsINT)).add(value);
headerAsINT++;
}
}
return localHeadersHashset;
});
I invokeAll the tasks, where the results are put into a List of Future's
ExecutorService es = Executors.newCachedThreadPool();
List<Future<HashMap<String, HashSet<String>>>> futures = null;
try {
futures = es.invokeAll(tasks);
es.shutdown();
} catch (InterruptedException e) {
e.printStackTrace();
}
I then, for very safe keeping, make a ConruentHashMap with a copy of all the keys pre-defined. I then loop through all the results from my tasks, put them into a ConcurrentHashMap as well, and addAll the results into the original copied ConcurrentHashMap with the pre-defined keys. This is when the error occurs (only sometimes though, which backs that it's something to do with threads, but I thought all the threading stuff is done at this point?).
ConcurrentHashMap<String, HashSet<String>> headersHashsetConcurrent = new ConcurrentHashMap<>(headersHashset);
try {
for (Future<HashMap<String, HashSet<String>>> f : futures) {
ConcurrentHashMap<String, HashSet<String>> threadSafeItems = new ConcurrentHashMap<>(f.get());
for (Map.Entry<String, HashSet<String>> items : threadSafeItems.entrySet()) {
headersHashsetConcurrent.get(items.getKey()).addAll(items.getValue()); // ERROR SHOWS HERE
}
}
} catch (Exception e) {
e.printStackTrace();
}
Here is the error in full:
java.util.ConcurrentModificationException
at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1584)
at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1607)
at java.base/java.util.AbstractCollection.addAll(AbstractCollection.java:335)
at project/project.parseCSV.processFile(parseCSV.java:101)
at project/project.parseCSV.call(parseCSV.java:126)
at project/project.parseCSV.call(parseCSV.java:11)
at javafx.graphics/javafx.concurrent.Task$TaskCallable.call(Task.java:1425)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:832)
I'm really not sure why this is causing an error, as addAll is only run on the main thread, and it's inserting into a ConCurrent HashMap too anyway - any ideas on why this is happening would be hugely appreciated!

The problem is that headersHashset.clone() doesn't clone HashSets in the values.
From the docs:
Returns a shallow copy of this HashMap instance: the keys and values themselves are not cloned.
It means that localHeadersHashset in your tasks, and your headersHashsetConcurrent, and threadSafeItems returned by futures — all of these use the same HashSet objects for the same keys.
Since tasks are executed in parallel threads, it is entirely possible that some task executes HashSet.add() at the same time when the main thread iterates over the elements of the same HashSet inside:
headersHashsetConcurrent.get(items.getKey()).addAll(items.getValue());
This is what causes your ConcurrentModificationException.

Related

ConcurrentHashMap with synchronized

I am maintaining some legacy code and found some implementation with synchronized key-word on ConcurrentHashMap. It seem unnecessary to me:
public class MyClass{
private final Map<MyObj, Map<String, List<String>>> conMap = new ConcurrentHashMap<>();
//...
//adding new record into conMap:
private void addToMap(MyObj id, String name, String value){
conMap.putIfAbsent(id, new ConcurrentHashMap<>());
Map<String, List<String>> subMap = conMap.get(id);
synchronized(subMap){ // <-- is it necessary?
subMap.putIfAbsent(name, new ArrayList<>());
subMap.get(name).add(value);
}
}
//...
public void doSomthing((MyObj id){
List<Map<String, List<String>>> mapsList = new LinkedList<>();
for(MyObj objId: conMap.keySet()){
if(objId.key1.equals(id.key1)){
mapsList.add(conMap.get(objId));
}
}
for(Map<String, List<String>> map: mapsList){
synchronized(map){ // <-- is it necessary?
if(timeout <= 0){
log(map.size());
for(List<String> value: map.values(){
log(id, value);
}
}
else{
int sum = 0;
for(map.Entry<String, List<String>> val: map.entrySet()){
sum += val.getValue().size();
}
log(sum);
map.wait(timeout);
}
}
//...
}
So, is it reasonable to use synchronized key on object that already concurrent? Or those are two different things?
In this case:
synchronized(subMap){ // <-- is it necessary?
subMap.putIfAbsent(name, new ArrayList<>());
subMap.get(name).add(value);
}
the synchronized is necessary. Without it, you could have two threads simultaneously updating the same ArrayList instance. Since ArrayList is not thread-safe, the addToMap method would not be thread-safe either.
In this case:
synchronized(map){ // <-- is it necessary?
if(/*condition*/){
log(map.size());
for(List<String> value: map.values(){
log(id, value);
}
}
else{
int sum = 0;
for(map.Entry<String, List<String>> val: map.entrySet()){
sum += val.getValue().size();
}
log(sum);
map.wait(timeout);
}
the synchronized is necessary.
In the if branch, the log method (or something called from it) will probably call ArrayList::toString which will iterate each ArrayList. Without the synchronizing at the submap level, there could be a simultaneous add by another thread (e.g. an addToMap call). That means that there are memory hazards, and a ConcurrentModificationException may be possible in the toString() method.
In the else branch, the size() call is accessing a size field in each ArrayList in the submap. Without the synchronizing at the submap level, there could be a simultaneous add on one of those list. That could cause the size() method to return a stale value. In addition, you are not guaranteed to see map entries added to a submap while your are iterating it. If either of those events happen, the sum could be inaccurate. (Whether that is really an issue depends on the requirements for this method: inaccurate counts could be acceptable.)
ConcurrentHashMap synchronizes each individual method call itself, so that no other thread can access the map (and possibly break the internal data structure of the map).
Synchronized block synchronizes two or more consecutive method calls, so that no other thread can modify the data structure between the calls (and possibly break the consistency of the data, with regards to the application logic).
Note that the synchornized block only works if all access to the HashMap is performed from synchronized blocks using the same monitor object.
It is sort of necessary, as multiple threads may try to append to the same ArrayList at the same time. The synchonized is protecting against that happening as ArrayList is obviously not synchronized.
Since Java 8 we have computeIfAbsent which means the puts followed by gets they are doing can be simplified. I would write it like this, no synchronization required:
conMap.computeIfAbsent(id, k -> new ConcurrentHashMap<>())
.computeIfAbsent(name, k -> new CopyOnWriteArrayList<>()) // or other thread-safe list
.add(value);
Other answers don't adequately this bit...
for(Map<String, List<String>> map: mapsList){
synchronized(map){ // <-- is it necessary?
if(/*condition*/){
...iterate over map...
}
else {
...iterate over map...
}
}
}
Is it necessary? Hard to tell.
What is /*condition*/ ? Does synchronizing on map prevent some other thread A from changing the value of /*condition*/ after thread B has tested it, but before or while thread B is performing either of the two branches? If so, then the synchronized block could be very important.
How about those iterations? Does synchronizing on map prevent some other thread A from changing the contents of the map while thread B is iterating? If so, then the synchronized block could be very important.

Java: get+clear atomic for map

I would like to implement the following logic:
-the following structure is to be used
//Map<String, CopyOnWriteArrayList> keeping the pending updates
//grouped by the id of the updated object
final Map<String, List<Update>> updatesPerId = new ConcurrentHashMap<>();
-n producers will add updates to updatesPerId map (for the same id, 2 updates can be added at the same time)
-one TimerThread will run from time to time and has to process the received updates. Something like:
final Map<String, List<Update>> toBeProcessed = new HashMap<>(updatesPerId);
updatesPerId.clear();
// iterate over toBeProcessed and process them
Is there any way to make this logic thread safe without synchronizing the adding logic from producers and the logic from timerThread(consumer)? I am thinking about an atomic clear+get but it seems that ConcurrentMap does not provide something like that.
Also, I have to mention that updates should be kept by updated object id so I cannot replace the map with a queue or something else.
Any ideas?
Thanks!
You can leverage the fact that ConcurrentHashMap.compute executes atomically.
You can put into the updatesPerId like so:
updatesPerId.compute(k, (k, list) -> {
if (list == null) list = new ArrayList<>();
// ... add to the list
// Return a non-null list, so the key/value pair is stored in the map.
return list;
});
This is not using computeIfAbsent then adding to the return value, which would not be atomic.
Then in your thread to remove things:
for (String key : updatesPerId.keySet()) {
List<Update> list = updatesPerId.put(key, null);
updatesPerId.compute(key, (k, list) -> {
// ... Process the contents of the list.
// Removes the key/value pair from the map.
return null;
});
}
So, adding a key to the list (or processing all the values for that key) might block if you so happen to try to process the key in both places at once; otherwise, it will not be blocked.
Edit: as pointed out by #StuartMarks, it might be better to simply get all things out of the map first, and then process them later, in order to avoid blocking other threads trying to add:
Map<String, List<Update>> newMap = new HashMap<>();
for (String key : updatesPerId.keySet()) {
newMap.put(key, updatesPerId.remove(key));
}
// ... Process entries in newMap.
I'd suggest using LinkedBlockingQueue instead of CopyOnWriteArrayList as the map value. With COWAL, adds get successively more expensive, so adding N elements results in N^2 performance. LBQ addition is O(1). Also, LBQ has drainTo which can be used effectively here. You could do this:
final Map<String, Queue<Update>> updatesPerId = new ConcurrentHashMap<>();
Producer:
updatesPerId.computeIfAbsent(id, LinkedBlockingQueue::new).add(update);
Consumer:
updatesPerId.forEach((id, queue) -> {
List<Update> updates = new ArrayList<>();
queue.drainTo(updates);
processUpdates(id, updates);
});
This is somewhat different from what you had suggested. This technique processes the updates for each id, but lets producers continue to add updates to the map while this is going on. This leaves map entries and queues in the map for each id. If the ids end up getting reused a lot, the number of map entries will plateau at a high-water mark.
If new ids are continually coming in, and old ids becoming disused, the map will grow continually, which probably isn't what you want. If this is the case you could use the technique in Andy Turner's answer.
If the consumer really needs to snapshot and clear the entire map, I think you have to use locking, which you wanted to avoid.
Is there any way to make this logic thread safe without synchronizing the adding logic from producers and the logic from timerThread(consumer)?
In short, no - depending on what you mean by "synchronizing".
The easiest way is to wrap your Map into a class of your own.
class UpdateManager {
Map<String,List<Update>> updates = new HashMap<>();
public void add(Update update) {
synchronized (updates) {
updates.computeIfAbsent(update.getKey(), k -> new ArrayList<>()).add(update);
}
}
public Map<String,List<Update>> getUpdatesAndClear() {
synchronized (updates) {
Map<String,List<Update>> copy = new HashMap<>(updates);
updates.clear();
return copy;
}
}
}

Does Collections.synchronized map makes Iterator threadsafe

There are two threads in a system. One is a reader thread and another is a writer thread.
The map is synchronized using the following code.
Map<String,ArrayList<String>> m = Collections.synchronizedMap(new HashMap<String,ArrayList<String>())
The reader thread obtains an Iterator on the values of the map and at the same time writer thread modifies the map.
So, my question is will the Iterator throw ConcurrentModificationException?
Maybe. It's not safe to do so. The documentation says
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views
Collections.synchronized... makes single method calls atomic so they don't need further synchronization. But iteration is more than a single method call so it needs extra synchronization. Below is an example
Map<String, String> shared = Collections.synchronizedMap(new HashMap<>());
new Thread(() -> {
while (true) {
synchronized (shared) {
for (String key : shared.keySet()) {
System.out.println(key);
}
}
try {
Thread.sleep(1000);
} catch (Exception e) {
break;
}
}
}).start();
new Thread(() -> {
while (true) {
try {
// this is atomic
shared.put(UUID.randomUUID().toString(), "Yo!");
Thread.sleep(1000);
} catch (Exception e) {
break;
}
}
}).start();
}
Yes,the Iterator may still throw ConcurrentModificationException as it is not related with synchronization(although its name suggests so).The Iterator tries to detect structural modifications(addition or deletion of Objects) in a best effort attempt and is irrespective of whether the operations of a List are synchronized or not.
Once an Iterator has been obtained through List.iterator() or List.listIterator(),any changes done to the List (apart by the Iterator itself) will throw a CME exception in best effort basis.
The only way you can ensure that ConcurrentModificationException is not thrown
is either by having your Reader operations complete first and then writer operations(or vice-versa) or by using a fail-safe Iterator from ConcurrentHashMap
Map hashmap = new HashMap<String,ArrayList<String>();
----
Map<String,ArrayList<String>> m = new ConcurrentHashMap<String,ArrayList<String>(hashmap));
ConcurrentHashMap is a fail-safe iterator and now you can focus on synchronizing reader-writer operations than worrying about ConcurrentModificationException.

Android: Hashmap concurrent Modification Exception

I keep getting a concurrent modification exception on my code. I'm simply iterating through a hashmap and modifying values. From researching this I found people said to use iterators and iterator.remove, etc. I tried implementing with this and still kept getting the error. I thought maybe multiple threads accessed it? (Although in my code this block is only run in one thread) So I put it in a synchronized block. However, I'm still getting the error.....
Map map= Collections.synchronizedMap(questionNumberAnswerCache);
synchronized (map) {
for (Iterator<Map.Entry<String, Integer>> it = questionNumberAnswerCache.entrySet().iterator(); it.hasNext(); ) {
Map.Entry<String, Integer> entry = it.next();
if (entry.getKey() == null || entry.getValue() == null) {
continue;
} else {
try {
Question me = Question.getQuery().get(entry.getKey());
int i = Activity.getQuery()
.whereGreaterThan(Constants.kQollegeActivityCreatedAtKey, lastUpdated.get("AnswerNumberCache " + entry.getKey()))
.whereEqualTo(Constants.kQollegeActivityTypeKey, Constants.kQollegeActivityTypeAnswer)
.whereEqualTo(Constants.kQollegeActivityQuestionKey, me)
.find().size();
lastUpdated.put("AnswerNumberCache " + entry.getKey(), Calendar.getInstance().getTime());
int old_num = entry.getValue();
entry.setValue(i + old_num);
} catch (ParseException e) {
entry.setValue(0);
}
}
}
}
Error:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:787)
at java.util.HashMap$EntryIterator.next(HashMap.java:824)
at java.util.HashMap$EntryIterator.next(HashMap.java:822)
at com.juryroom.qollege_android_v1.QollegeCache.refreshQuestionAnswerNumberCache(QollegeCache.java:379)
at com.juryroom.qollege_android_v1.QollegeCache.refreshQuestionCaches(QollegeCache.java:267)
at com.juryroom.qollege_android_v1.UpdateCacheService.onHandleIntent(UpdateCacheService.java:28)
at android.app.IntentService$ServiceHandler.handleMessage(IntentService.java:65)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:135)
at android.os.HandlerThread.run(HandlerThread.java:61)
What is happening:
The iterator is looping through the map. The map isn't really like a list, because it doesn't care about order. So when you add something to the map it might get inserted into the middle, somewhere in the middle of the objects you already looped through, at the end, etc. So instead of giving you random behavior it fails.
Your solutions:
Synchronized map and synchronized blocks allow you to have two threads going at it at the same time. It doesn't really help here, since the problem is that the same thread is modifying it in an illegal manner.
What you should do:
You could just save the keys you want to modify. Making a map with keys and new values won't be a problem unless this is a really time critical piece of code.
Then you just iterate through the newValues map and update the oldValues map. Since you are not iterating through the map being updated it's not a problem.
Or you could simply iterate just through the keys (for String s : yourMap) and then look up the values you want to change. Since you are just iterating through the keys you are free to change the values (but you can't remove values).
You could also try to use a ConcurrentHashMap which should allow you to modify it, but the behavior is undefined so this is risky. Just changing values shouldn't lead to problems, but if you add or remove you never know if it will end up being iterated through or not.
Create an object, and is locked to it - a good way to shoot yourself in the foot.
I recommend the following code to remove the hash map.
HashMap<Key, Object> hashMap = new HashMap<>();
LinkedList<Key> listToRemove = new LinkedList<>();
for(Map.Entry<Key, Object> s : hashMap.entrySet()) {
if(s.getValue().equals("ToDelete")){
listToRemove.add(s.getKey());
}
}
for(Key s : listToRemove) {
hashMap.remove(s);
}
It's not the most beautiful and fastest option, but it should help you to understand how to work with HashMap.
As you will understand how to work my option. You can learn how to work iterators, how to work iterators in loop. (rather than simply copy-paste)
Iterator it = tokenMap.keySet())
while(it.hasNext()) {
if(/* some condition */) it.remove();
}
I would suggest the following for your use case:
for(Key key : hashMap.keySet()) {
Object value = hashMap.get(key);
if(<condition>){
hashMap.put(key, <new value>);
}
If you are not deleting any entries and just changing the value, this should work for you.

Is it thread-safe to iterate a HashMap object concurrently?

If multiple threads concurrently iterate a HashMap object, without modifying it, is there a chance for race conditions?
No race, if you can guarantee that no other thread would modify this HashMap while it is being iterated.
Nope, that is perfectly fine. As long as all reads are synchronized with all writes, and all writes are synchronized with each other, there is no harm in concurrent reads; so if there are no writes at all, then all concurrent access is safe.
It will be al right. But if any of the threads add or remove an item, this will throw exception in any other threads that are just iterating HashMap (any collection in fact)
If you are going to iterate of a Map repeatedly you may find it marginally faster to iterate over an array copy.
private final HashMap<String, String> properties = new HashMap<String, String>();
private volatile Map.Entry<String, String>[] propertyEntries = null;
private void updatePropertyEntries() {
propertyEntries = properties.entrySet().toArray(new Map.Entry[properties.size()]);
}
{
// no objects created
for (Map.Entry<String, String> entry : propertyEntries) {
}
}
BTW: You can have one thread modify/replace the propertyEntries while iterating in many threads with this pattern.

Categories

Resources