Does Collections.synchronized map makes Iterator threadsafe - java

There are two threads in a system. One is a reader thread and another is a writer thread.
The map is synchronized using the following code.
Map<String,ArrayList<String>> m = Collections.synchronizedMap(new HashMap<String,ArrayList<String>())
The reader thread obtains an Iterator on the values of the map and at the same time writer thread modifies the map.
So, my question is will the Iterator throw ConcurrentModificationException?

Maybe. It's not safe to do so. The documentation says
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views
Collections.synchronized... makes single method calls atomic so they don't need further synchronization. But iteration is more than a single method call so it needs extra synchronization. Below is an example
Map<String, String> shared = Collections.synchronizedMap(new HashMap<>());
new Thread(() -> {
while (true) {
synchronized (shared) {
for (String key : shared.keySet()) {
System.out.println(key);
}
}
try {
Thread.sleep(1000);
} catch (Exception e) {
break;
}
}
}).start();
new Thread(() -> {
while (true) {
try {
// this is atomic
shared.put(UUID.randomUUID().toString(), "Yo!");
Thread.sleep(1000);
} catch (Exception e) {
break;
}
}
}).start();
}

Yes,the Iterator may still throw ConcurrentModificationException as it is not related with synchronization(although its name suggests so).The Iterator tries to detect structural modifications(addition or deletion of Objects) in a best effort attempt and is irrespective of whether the operations of a List are synchronized or not.
Once an Iterator has been obtained through List.iterator() or List.listIterator(),any changes done to the List (apart by the Iterator itself) will throw a CME exception in best effort basis.
The only way you can ensure that ConcurrentModificationException is not thrown
is either by having your Reader operations complete first and then writer operations(or vice-versa) or by using a fail-safe Iterator from ConcurrentHashMap
Map hashmap = new HashMap<String,ArrayList<String>();
----
Map<String,ArrayList<String>> m = new ConcurrentHashMap<String,ArrayList<String>(hashmap));
ConcurrentHashMap is a fail-safe iterator and now you can focus on synchronizing reader-writer operations than worrying about ConcurrentModificationException.

Related

java.util.ConcurrentModificationException when combing results from Future's into HashMap

I'm running into an issue where I intermittently get an java.util.ConcurrentModificationException error whilst merging the results of futures. I've tried making all my HashMaps concurrent and I tried using an iterator but the error still persists and I'm totally confused to how this error is occurring. I'm not looping through the same HashMap I'm inserting too either (as done in similar questions with this error).
For my code, to start off with I build a bunch of tasks that return's HashMap<String, HashSet>, then using .invokeAll my tasks all return the above HashMap which I then try to merge together (I'm reading a large CSV to get all unique results for each column).
I start by defining all the keys.
HashMap<String, HashSet<String>> headersHashset = new HashMap<String, HashSet<String>>();
headers = Arrays.asList(br.readLine().split(delimiter));
for (String headerID : headers) {
headersHashset.put(headerID, new HashSet<>());
}
I then make my tasks, which clone the keys, do the processing and return it's result
tasks.add(() -> {
HashMap<String, HashSet<String>> localHeadersHashset = (HashMap<String, HashSet<String>>) headersHashset.clone();
for (String[] values : sampleSet.values()) { // sampleSet is a SortedMap<Integer, String[]>
int headerAsINT = 0;
for (String value : values) {
localHeadersHashset.get(headers.get(headerAsINT)).add(value);
headerAsINT++;
}
}
return localHeadersHashset;
});
I invokeAll the tasks, where the results are put into a List of Future's
ExecutorService es = Executors.newCachedThreadPool();
List<Future<HashMap<String, HashSet<String>>>> futures = null;
try {
futures = es.invokeAll(tasks);
es.shutdown();
} catch (InterruptedException e) {
e.printStackTrace();
}
I then, for very safe keeping, make a ConruentHashMap with a copy of all the keys pre-defined. I then loop through all the results from my tasks, put them into a ConcurrentHashMap as well, and addAll the results into the original copied ConcurrentHashMap with the pre-defined keys. This is when the error occurs (only sometimes though, which backs that it's something to do with threads, but I thought all the threading stuff is done at this point?).
ConcurrentHashMap<String, HashSet<String>> headersHashsetConcurrent = new ConcurrentHashMap<>(headersHashset);
try {
for (Future<HashMap<String, HashSet<String>>> f : futures) {
ConcurrentHashMap<String, HashSet<String>> threadSafeItems = new ConcurrentHashMap<>(f.get());
for (Map.Entry<String, HashSet<String>> items : threadSafeItems.entrySet()) {
headersHashsetConcurrent.get(items.getKey()).addAll(items.getValue()); // ERROR SHOWS HERE
}
}
} catch (Exception e) {
e.printStackTrace();
}
Here is the error in full:
java.util.ConcurrentModificationException
at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1584)
at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1607)
at java.base/java.util.AbstractCollection.addAll(AbstractCollection.java:335)
at project/project.parseCSV.processFile(parseCSV.java:101)
at project/project.parseCSV.call(parseCSV.java:126)
at project/project.parseCSV.call(parseCSV.java:11)
at javafx.graphics/javafx.concurrent.Task$TaskCallable.call(Task.java:1425)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:832)
I'm really not sure why this is causing an error, as addAll is only run on the main thread, and it's inserting into a ConCurrent HashMap too anyway - any ideas on why this is happening would be hugely appreciated!
The problem is that headersHashset.clone() doesn't clone HashSets in the values.
From the docs:
Returns a shallow copy of this HashMap instance: the keys and values themselves are not cloned.
It means that localHeadersHashset in your tasks, and your headersHashsetConcurrent, and threadSafeItems returned by futures — all of these use the same HashSet objects for the same keys.
Since tasks are executed in parallel threads, it is entirely possible that some task executes HashSet.add() at the same time when the main thread iterates over the elements of the same HashSet inside:
headersHashsetConcurrent.get(items.getKey()).addAll(items.getValue());
This is what causes your ConcurrentModificationException.

ConcurrentHashMap with synchronized

I am maintaining some legacy code and found some implementation with synchronized key-word on ConcurrentHashMap. It seem unnecessary to me:
public class MyClass{
private final Map<MyObj, Map<String, List<String>>> conMap = new ConcurrentHashMap<>();
//...
//adding new record into conMap:
private void addToMap(MyObj id, String name, String value){
conMap.putIfAbsent(id, new ConcurrentHashMap<>());
Map<String, List<String>> subMap = conMap.get(id);
synchronized(subMap){ // <-- is it necessary?
subMap.putIfAbsent(name, new ArrayList<>());
subMap.get(name).add(value);
}
}
//...
public void doSomthing((MyObj id){
List<Map<String, List<String>>> mapsList = new LinkedList<>();
for(MyObj objId: conMap.keySet()){
if(objId.key1.equals(id.key1)){
mapsList.add(conMap.get(objId));
}
}
for(Map<String, List<String>> map: mapsList){
synchronized(map){ // <-- is it necessary?
if(timeout <= 0){
log(map.size());
for(List<String> value: map.values(){
log(id, value);
}
}
else{
int sum = 0;
for(map.Entry<String, List<String>> val: map.entrySet()){
sum += val.getValue().size();
}
log(sum);
map.wait(timeout);
}
}
//...
}
So, is it reasonable to use synchronized key on object that already concurrent? Or those are two different things?
In this case:
synchronized(subMap){ // <-- is it necessary?
subMap.putIfAbsent(name, new ArrayList<>());
subMap.get(name).add(value);
}
the synchronized is necessary. Without it, you could have two threads simultaneously updating the same ArrayList instance. Since ArrayList is not thread-safe, the addToMap method would not be thread-safe either.
In this case:
synchronized(map){ // <-- is it necessary?
if(/*condition*/){
log(map.size());
for(List<String> value: map.values(){
log(id, value);
}
}
else{
int sum = 0;
for(map.Entry<String, List<String>> val: map.entrySet()){
sum += val.getValue().size();
}
log(sum);
map.wait(timeout);
}
the synchronized is necessary.
In the if branch, the log method (or something called from it) will probably call ArrayList::toString which will iterate each ArrayList. Without the synchronizing at the submap level, there could be a simultaneous add by another thread (e.g. an addToMap call). That means that there are memory hazards, and a ConcurrentModificationException may be possible in the toString() method.
In the else branch, the size() call is accessing a size field in each ArrayList in the submap. Without the synchronizing at the submap level, there could be a simultaneous add on one of those list. That could cause the size() method to return a stale value. In addition, you are not guaranteed to see map entries added to a submap while your are iterating it. If either of those events happen, the sum could be inaccurate. (Whether that is really an issue depends on the requirements for this method: inaccurate counts could be acceptable.)
ConcurrentHashMap synchronizes each individual method call itself, so that no other thread can access the map (and possibly break the internal data structure of the map).
Synchronized block synchronizes two or more consecutive method calls, so that no other thread can modify the data structure between the calls (and possibly break the consistency of the data, with regards to the application logic).
Note that the synchornized block only works if all access to the HashMap is performed from synchronized blocks using the same monitor object.
It is sort of necessary, as multiple threads may try to append to the same ArrayList at the same time. The synchonized is protecting against that happening as ArrayList is obviously not synchronized.
Since Java 8 we have computeIfAbsent which means the puts followed by gets they are doing can be simplified. I would write it like this, no synchronization required:
conMap.computeIfAbsent(id, k -> new ConcurrentHashMap<>())
.computeIfAbsent(name, k -> new CopyOnWriteArrayList<>()) // or other thread-safe list
.add(value);
Other answers don't adequately this bit...
for(Map<String, List<String>> map: mapsList){
synchronized(map){ // <-- is it necessary?
if(/*condition*/){
...iterate over map...
}
else {
...iterate over map...
}
}
}
Is it necessary? Hard to tell.
What is /*condition*/ ? Does synchronizing on map prevent some other thread A from changing the value of /*condition*/ after thread B has tested it, but before or while thread B is performing either of the two branches? If so, then the synchronized block could be very important.
How about those iterations? Does synchronizing on map prevent some other thread A from changing the contents of the map while thread B is iterating? If so, then the synchronized block could be very important.

Why to synchronize on SynchronizedMap or SynchronizedCollections?

I am referring to question asked here and using authors code example, now my question is
Why does author uses synchronized(synchronizedMap), is it really necessary because synchronizedMap will always make sure that there are no two threads trying to do read/put operation on Map so why do we need to synchronize on that map itself?
Would really appreciate an explanation.
public class MyClass {
private static Map<String, List<String>> synchronizedMap =
Collections.synchronizedMap(new HashMap<String, List<String>>());
public void doWork(String key) {
List<String> values = null;
while ((values = synchronizedMap.remove(key)) != null) {
//do something with values
}
}
public static void addToMap(String key, String value) {
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
}
}
}
why do we need to synchronize on that synchronizemap itself?
You may need to synchronize on an already synchronized collection because you are performing two operations on the collection -- in your example, a containsKey() and then a put(). You are trying to protect against race conditions in the code that is calling the collection. In addition, in this case, the synchronized block also protects the ArrayList values so that multiple threads can add their values to these unsynchronized collections.
If you look at the code you linked to, they first check for the existence of the key and then put a value into the map if the key did not exist. You need to protect against 2 threads checking for a key's existence and then both of them putting into the map. The race is which one will put first and which one will overwrite the previous put.
The synchronized collection protects itself from multiple threads corrupting the map itself. It does not protect against logic race conditions around multiple calls to the map.
synchronized (synchronizedMap) {
// test for a key in the map
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
} else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
// store a value into the map
synchronizedMap.put(key, valuesList);
}
}
This is one of the reasons why the ConcurrentMap interface has the putIfAbsent(K key, V value);. That does not require two operations so you may not need to synchronize around it.
Btw, I would rewrite the above code to be:
synchronized (synchronizedMap) {
// test for a key in the map
List<String> valuesList = synchronizedMap.get(key);
if (valueList == null) {
valuesList = new ArrayList<String>();
// store a value into the map
synchronizedMap.put(key, valuesList);
}
valuesList.add(value);
}
Lastly, if most of the operations on the map need to be in a synchronized block anyway, you might as well not pay for the synchronizedMap and just use a HashMap always inside of synchronized blocks.
It is not just about updating synchronizedMap values, it is about sequence of operations effecting the map. There are two operations happening on the map inside same method.
If you don't synchronize block/method, assume there may be case like Thread1 executing first part and thread2 executing second part, your business operation may results in weird results (even though updates to map are synchronized)

Is it thread-safe to iterate a HashMap object concurrently?

If multiple threads concurrently iterate a HashMap object, without modifying it, is there a chance for race conditions?
No race, if you can guarantee that no other thread would modify this HashMap while it is being iterated.
Nope, that is perfectly fine. As long as all reads are synchronized with all writes, and all writes are synchronized with each other, there is no harm in concurrent reads; so if there are no writes at all, then all concurrent access is safe.
It will be al right. But if any of the threads add or remove an item, this will throw exception in any other threads that are just iterating HashMap (any collection in fact)
If you are going to iterate of a Map repeatedly you may find it marginally faster to iterate over an array copy.
private final HashMap<String, String> properties = new HashMap<String, String>();
private volatile Map.Entry<String, String>[] propertyEntries = null;
private void updatePropertyEntries() {
propertyEntries = properties.entrySet().toArray(new Map.Entry[properties.size()]);
}
{
// no objects created
for (Map.Entry<String, String> entry : propertyEntries) {
}
}
BTW: You can have one thread modify/replace the propertyEntries while iterating in many threads with this pattern.

Using iterator on a TreeSet

SITUATION: I have a TreeSet of custom Objects and I have also used a custom Comparator. I have created an iterator to use on this TreeSet.
TreeSet<Custom> ts=new TreeSet<Custom>();
Iterator<Custom> itr=ts.iterator();
while(itr.hasNext()){
Custom c=itr.next();
//Code to add a new element to the TreeSet ts
}
QUESTION: Well I want to know that if I add a new element to the TreeSet within the while loop, then will that new element get sorted immediately. In other words, if I add a new element within the while loop and it is less than the one which I am currently holding in c, then in the next iteration will I be getting the same element in c as in the last iteration?(since after sorting, the newly added element will occupy a place somewhere before the current element).
If you add an element during your iteration, your next iterator call will likely throw a ConcurrentModificationException. See the fail-fast behavior in TreeSet docs.
To iterate and add elements, you could copy first to another set:
TreeSet<Custom> ts = ...
TreeSet<Custom> tsWithExtra = new TreeSet(ts);
for (Custom c : ts) {
// possibly add to tsWithExtra
}
// continue, using tsWithExtra
or create a separate collection to be merged with ts after iteration, as Colin suggests.
You will get a java.util.ConcurrentModificationException if you add an element into the TreeSet inside while loop.
Set<String> ts = new TreeSet<>();
ts.addAll(Arrays.asList(new String[]{"abb", "abd", "abg"}));
Iterator<String> itr = ts.iterator();
while(itr.hasNext()){
String s = itr.next();
System.out.println("s: " + s);
if (s.equals("abd"))
ts.add("abc");
}
###Output
Exception in thread "main" java.util.ConcurrentModificationException
public static void main(String[] args) {
TreeSet<Integer> ts=new TreeSet<Integer>();
ts.add(2);
ts.add(4);
ts.add(0);
Iterator<Integer> itr=ts.iterator();
while(itr.hasNext()){
Integer c=itr.next();
System.out.println(c);
//Code
ts.add(1);
}
}
Exception in thread "main" java.util.ConcurrentModificationException
This will come to all collections like List , Map , Set
Because when iterator starts it may be putting some lock on it .
if you iterate list using iterator then this exception will come. I think otherwise this loop will be infinite as you are adding element whole iterating.
Consider without iterator:
public static void main(String[] args) {
List<Integer> list=new ArrayList<Integer>();
list.add(2);
list.add(4);
list.add(0);
for (int i = 0; i < 3; i++) {
System.out.println(list.get(i));
list.add(3);
}
System.out.println("Size" +list.size());
}
this will be fine .
In order to avoid the ConcurrentModificationException you might want to check out my UpdateableTreeSet. I have even added a new test case showing how to add elements during a loop. To be more exact, you mark new elements for a later, deferred update of the set. This works quite nicely. Basically you do something like
for (MyComparableElement element : myUpdateableTreeSet) {
if (someCondition) {
// Add new element (deferred)
myUpdateableTreeSet.markForUpdate(
new MyComparableElement("foo", "bar", 1, 2)
);
}
}
// Perform bulk update
myUpdateableTreeSet.updateMarked();
I guess this is quite exactly what you need. :-)
To prevent the ConcurrentModificationException while walking.
Below is my version to allow high frequency insertion into the TreeSet() and allow concurrently iterate on it. This class use a extra queue to store the inserting object when the TreeSet is being iterating.
public class UpdatableTransactionSet {
TreeSet <DepKey> transactions = new TreeSet <DepKey> ();
LinkedList <DepKey> queue = new LinkedList <DepKey> ();
boolean busy=false;
/**
* directly call it
* #param e
*/
void add(DepKey e) {
boolean bb = getLock();
if(bb) {
transactions.add(e);
freeLock();
} else {
synchronized(queue) {
queue.add(e);
}
}
}
/**
* must getLock() and freeLock() while call this getIterator function
* #return
*/
Iterator<DepKey> getIterator() {
return null;
}
synchronized boolean getLock() {
if(busy) return false;
busy = true;
return true;
}
synchronized void freeLock() {
synchronized(queue) {
for(DepKey e:queue) {
transactions.add(e);
}
}
busy = false;
}
}
While the question has already been answered, I think the most satisfactory answer lies in javadoc of TreeSet itself
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, >generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
To avoid the concurrent modification error that's bound to occur when you're doing the insertion, you could also create a temporary copy of the Set, iterate through the copy instead, and modify the original.

Categories

Resources