Java ConcurrentHashMap and for each loop

Java ConcurrentHashMap and for each loop - java

Supposed I have the following ConcurrentHashMap:
ConcurrentHashMap<Integer,String> identificationDocuments = new ConcurrentHashMap<Integer,String>();
identificationDocuments.put(1, "Passport");
identificationDocuments.put(2, "Driver's Licence");
How would I safely iterate over the map with a for each loop and append the value of each entry to a string?

Iterators produced by a ConcurrentHashMap are weakly consistent. That is:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.
The last bullet-point is pretty important, an iterator returns a view of the map at some point since the creation of the iterator, to quote a different section of the javadocs for ConcurrentHashMap:
Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
So when you loop through a keyset like the following, you need to double check if the item still exists in the collection:
for(Integer i: indentificationDocuments.keySet()){
// Below line could be a problem, get(i) may not exist anymore but may still be in view of the iterator
// someStringBuilder.append(indentificationDocuments.get(i));
// Next line would work
someStringBuilder.append(identificationDocuments.getOrDefault(i, ""));
}
The act of appending all the strings to the StringBuilder itself is safe, as long as you are doing it on one thread or have encapsulated the StringBuilder entirely in a thread-safe manner.

I don't know If you are really asking this, but to iterate over any map you iterate over the keySet()
StringBuffer result = new StringBuffer("");
for(Integer i: indentificationDocuments.keySet()){
result.append(indentificationDocuments.get(i));
}
return result.toString();

Related

For loop vs. Iterator to avoid ConcurrentModificationException with an ArrayList

Question: What is the optimal (performance-wise) solution for the add, removal, modification of items within an ArrayList which at the same time avoids the ConcurrentModificationException from being thrown during operations?
Context: Based on my research looking into this question, there doesn't seem to be any straight-forward answers to the question at hand - most recommend using CopyOnWriteArrayList, but my understanding is that it is not recommended for array lists of large size (which I am working with, hence the performance-aspect of the question).
Thus, my understanding can be summarized as the following, but want to make sure if is correct/incorrect:
IMPORTANT NOTE: The following statements all assume that the operation is done within a synchronized block.
Remove during iteration of an ArrayList should be done with an Iterator, because for loop results in unpredictable behavior if removal is done within the middle of a collection. Example:
Iterator<Item> itemIterator = items.iterator();
while (itemIterator.hasNext()) {
Item item = itemIterator.next();
// check if item needs to be removed
itemIterator.remove();
}
For add operations, cannot be done with an Iterator, but can be with ListIterator. Example:
ListIterator<Item> itemIterator = list.listIterator();
while(itemIterator.hasNext()){
\\ do some operation which requires iteration of the ArrayList
itemIterator.add(item);
}
For add operations, a ListIterator does NOT have to be necessarily be used (i.e. simply items.add(item) should not cause any problems).
For add operations while going through the collection can be done with EITHER a ListIterator or a for loop, but NOT an Iterator. Example:
Iterator<Item> itemIterator = item.iterator();
while (itemIterator.hasNext()) {
\\ do some operation which requires iteration of the ArrayList
items.add(item); \\ NOT acceptable - cannot modify ArrayList while in an Iterator of that ArrayList
}
Modification of an item within an ArrayList can be done with either an Iterator or a for loop with the same performance complexity (is this true?). Example:
\\ iterator example
Iterator<Item> itemIterator = item.iterator();
while (itemIterator.hasNext()) {
Item item = itemIterator.next();
item.update(); // modifies the item within the ArrayList during iteration
}
\\ for loop example
for (Item item : items){
item.update();
}
Will modification during iteration with the Iterator have the same performance as the for loop? Are there any thread-safety differences between the approaches?
Bonus question: what advantage does using a synchronizedList of the ArrayList for add/remove/modify operations vs. for loop vs. iterator if it also requires a synchronized block?

There is no difference between while loops and for loops and in fact, the idiomatic form of a loop using an iterator explicitly, is a for loop:
for(Iterator<Item> it = items.iterator(); it.hasNext(); ) {
Item item = it.next();
item.update();
}
which gets compiled to exactly the same code as
for(Item item: items) {
item.update();
}
Try it online!
There are no performance differences for identical compiled code dependent to the original source code used to produce it.
Instead of focusing on the loop form, you have to focus on the fundamental limitations when inserting or removing elements of an ArrayList. Each time you insert or remove an element, the elements behind the affected index have to be copied to a new location. This isn’t very expensive, as the array only consists of references to the objects, but the costs can easily add up when doing it repeatedly.
So, if you know that the number of insertions or removals is predictably small or will happen at the end or close to the end (so there is only a small number of elements to copy), it’s not a problem. But when inserting or removing an arbitrary number of elements at arbitrary positions in a loop, you run into a quadratic time complexity.
You can avoid this, by using
items.removeIf(item -> /* check and return whether to remove the item*/);
This will use an internal iteration and postpone the moving of elements until their final position is known, leading to a linear time complexity.
If that’s not feasible, you might be better off copying the list into a new list, skipping the unwanted elements. This will be slightly less efficient but still have a linear time complexity. That’s also the solution for inserting a significant number of items at arbitrary positions.
The item.update(); in an entirely different category. “the item within the ArrayList” is a wrong mindset. As said above, the ArrayList contains references to objects whereas the object itself is not affected by “being inside the ArrayList”. In fact, objects can be in multiple collections at the same time, as all standard collections only contain references.
So item.update(); changes the Item object, which is an operation independent of the ArrayList, which is dangerous when you assume a thread safety based on the list.
When you have code like
Item item = items.get(someIndex);
// followed by using item
where get is from a synchronizedList
or a manually synchronized retrieval operation which returns the item to the caller or any other form of code which uses a retrieved Item outside the synchronized block,
then your code is not thread safe. It doesn’t help when the update() call is done under a synchronization or lock when looping over the list, when the other uses are outside the synchronization or lock. To be thread safe, all uses of an object must be protected by the same thread safety construct.
So even when you use the synchronizedList, you must not only guard your loops manually, as the documentation already tells, you also have to expand the protection to all other uses of the contained elements, if they are mutable.
Alternatively, you could have different mechanisms for the list and the contained elements, if you know what you are doing, but it still means that the simplicity of “just wrap the list with synchronizedList” isn’t there.
So what advantage does it have? Actually none. It might have helped developers during the migration from Java 1.1 and its all-synchronized Vector and Hashtable to Java 2’s Collection API. But I never had a use for the synchronized wrappers at all. Any nontrivial use case requires manual synchronization (or locking) anyway.

Why is removing the 1st entry from a ConcurrentHashMap not immediately reflected in the iterator, but removing the 2nd or subsequent entries is?

I created an iterator() and then removed the 1st entry from the map before iterating it. I always get the 1st item returned from the iterator. But when I remove the 2nd or subsequent entries, the current iterator does not return that entry.
Example of removing 1st entry from map:
Map<Integer,Integer> m1 = new ConcurrentHashMap<>();
m1.put(4, 1);
m1.put(5, 2);
m1.put(6, 3);
Iterator i1 = m1.entrySet().iterator();
m1.remove(4); // remove entry from map
while (i1.hasNext())
System.out.println("value :: "+i1.next()); //still shows entry 4=1
and the output is:
value :: 4=1
value :: 5=2
value :: 6=3
Example of removing 3rd entry from map:
Map<Integer,Integer> m1 = new ConcurrentHashMap<>();
m1.put(4, 1);
m1.put(5, 2);
m1.put(6, 3);
Iterator i1 = m1.entrySet().iterator();
m1.remove(6); // remove entry from map
while (i1.hasNext())
System.out.println("value :: "+i1.next()); //does not show entry 6=3
and the output is:
value :: 4=1
value :: 5=2
Why is removing the 1st entry from the map not reflected in the iterator, but removing the 2nd or subsequent entry is?
The Java documentation says:
Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException.
That means, its iterators reflect the state of the hash table at the point of creation of the iterator.
And when we add or remove an entry to or from the Map, the Iterator will show the original entries?

According to this iterators and spliterators are weakly consistent. The definition of "weakly consistent" can be found here:
Most concurrent Collection implementations (including most Queues)
also differ from the usual java.util conventions in that their
Iterators and Spliterators provide weakly consistent rather than
fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse
elements as they existed upon construction exactly once, and may (but
are not guaranteed to) reflect any modifications subsequent to
construction.
Which means that any modifications made after an iterator has been created may be reflected, but it's not guaranteed. That's just a normal behaviour of a concurrent iterator\spliterator.

To achieve exactly once iteration behavior, when you remove an element via the Iterator object, the iterator data structure would need to be updated to keep it in step with what has happened to the collection. This is not possible in the current implementations because they don't keep links to the outstanding iterators. And if they did, they would need to use Reference objects or risk memory leaks.
The iterator is guaranteed to reflect the state of the map at the time of it's creation. Further changes may be reflected in the iterator, but they do not have to be.
Think Iterator as a LinkedList and you have head reference with you. If you happen to remove the head from linked list and does not reset the head.next value and start iterating from head you still be traversing the from the same head because you are using an outdated head reference. But when you remove non-head elements the prior element.next is updated.

The answer is in the documentation you quoted:
Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
At OR since. The iterator might or might not show changes to the map since the iterator's creation.
It is not practical for the designers to enforce more precise behavior during concurrent modification and iteration, but it is not broken.

Actual value check is done in next(),
public final Map.Entry<K,V> next() {
Node<K,V> p;
if ((p = next) == null)
throw new NoSuchElementException();
K k = p.key;
V v = p.val;
lastReturned = p;
advance();
return new MapEntry<K,V>(k, v, map);
}
advance method advances if possible, returning next valid node, or null if none.
So for first entry K k = 4; V v = 1; even if it is removed. But for subsequent k,v would decided with update fromadvance()
So if you call next() after remove, it won't be there(which is obvious),
Map<Integer,Integer> m1 = new ConcurrentHashMap<>();
m1.put(4, 1);
m1.put(5, 2);
m1.put(6, 3);
Iterator<Map.Entry<Integer, Integer>> i1 = m1.entrySet().iterator();
m1.remove(4); // remove entry from map
i1.next();
while (i1.hasNext())
System.out.println("value :: "+i1.next());

How come I am not getting "ConcurrentModificationException" while iterating and modifying on HashMap at the same time?

I have a map
Map<String, String> map = new HashMap<String, String>();
map.put("Pujan", "pujan");
map.put("Swati", "swati");
map.put("Manish", "manish");
map.put("Jayant", "jayant");
Iterator<Map.Entry<String, String>> itr = map.entrySet().iterator();
while(itr.hasNext()){
Entry<String,String> entry=(Entry<String, String>) itr.next();
map.put("Manish", "Updated");
}
I don't get an exception here (where I am trying to modify an existing key value "Manish"). But if I try to add a new key map.put("Manish123", "Updated") I get ConcurrentModificationException.

Because your aren't modifying the iterator,
put will mutate an existing entry in this case because a Map.Entry with the same key already exists in the Map.

If you see the Javadoc for the modCount field of a HashMap (in Java 8 HashMap.java source), you will see:
/**
* The number of times this HashMap has been structurally modified.
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
*/
Thus this field keeps the number of times there have been structural modifications to the map. The various iterators in this class throw the ConcurrentModificationException (a better name could have been chosen) when the expected modification count expectedModCount (which is initialized to modCount when you construct this iterator, for example, at the line Iterator<Map.Entry<String, String>> itr = map.entrySet().iterator();) does not match modCount which is mutated any time there are structural modifications to the map (e.g. calling put with a new entry, among other things). Note that different threads are not involved here. All of this can happen in one single thread, for example, when you remove entries from the map or add entries to it while iterating).
As you can now relate, remapping an existing key to a different value should not result in changes to the internal structure of the hash map (since it simply replaces the value associated with the key). And all you are doing is simply remapping the key Manish to a value Updated repeatedly as many times as there are entries in the map (which is 4 and is fixed for the duration of iteration). If, however, added or removed any key you will get the ConcurrentModificationException.
This is analogous to the following code (Note: for illustration purposes only):
List<String> names = Arrays.asList("Larry", "Moe", "Curly");
int i = 0;
Iterator<String> strIter = names.iterator();
while (strIter.hasNext()) {
names.set(i, strIter.next() + " " + i); // value changed, no structural modification to the list
i += 1;
}
System.out.println(names);
which prints:
[Larry 0, Moe 1, Curly 2]

According to Java API : Iterating over collection using Iterator is subject to ConcurrentModificationException if Collection is modified after Iteration started, but this only happens in case of fail-fast Iterators.
There are two types of Iterators in Java, fail-fast and fail-safe, check difference between fail-safe and fail-fast Iterator for more details.
Fail-Fast Iterators in Java
Difference between fail-safe vs fail-fast iterator in javaAs name suggest fail-fast Iterators fail as soon as they realized that structure of Collection has been changed since iteration has begun. Structural changes means adding, removing or updating any element from collection while one thread is Iterating over that collection. fail-fast behavior is implemented by keeping a modification count and if iteration thread realizes the change in modification count it throws ConcurrentModificationException.
Java doc says this is not a guaranteed behavior instead its done of "best effort basis", So application programming can not rely on this behavior. Also since multiple threads are involved while updating and checking modification count and this check is done without synchronization, there is a chance that Iteration thread still sees a stale value and might not be able to detect any change done by parallel threads. Iterators returned by most of JDK1.4 collection are fail-fast including Vector, ArrayList, HashSet etc
Fail-Safe Iterator in java
Contrary to fail-fast Iterator, fail-safe iterator doesn't throw any Exception if Collection is modified structurally
while one thread is Iterating over it because they work on clone of Collection instead of original collection and that’s why they are called as fail-safe iterator. Iterator of CopyOnWriteArrayList is an example of fail-safe Iterator also iterator written by ConcurrentHashMap keySet is also fail-safe iterator and never throw ConcurrentModificationException in Java.

Difference between CopyOnWriteArrayList and synchronizedList

As per my understanding concurrent collection classes preferred over synchronized collections because the concurrent collection classes don't take a lock on the complete collection object. Instead they take locks on a small segment of the collection object.
But when I checked the add method of CopyOnWriteArrayList, we are acquiring a lock on complete collection object. Then how come CopyOnWriteArrayList is better than a list returned by Collections.synchronizedList? The only difference I see in the add method of CopyOnWriteArrayList is that we are creating copy of that array each time the add method is called.
public boolean add(E e) {
final ReentrantLock lock = this.lock;
lock.lock();
try {
Object[] elements = getArray();
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len + 1);
newElements[len] = e;
setArray(newElements);
return true;
} finally {
lock.unlock();
}
}

As per my understanding concurrent collection classes preferred over synchronized collection because concurrent collection classes don't take lock on complete collection object. Instead it takes lock on small segment of collection object.
This is true for some collections but not all. A map returned by Collections.synchronizedMap locks the entire map around every operation, whereas ConcurrentHashMap locks only one hash bucket for some operations, or it might use a non-blocking algorithm for others.
For other collections, the algorithms in use, and thus the tradeoffs, are different. This is particularly true of lists returned by Collections.synchronizedList compared to CopyOnWriteArrayList. As you noted, both synchronizedList and CopyOnWriteArrayList take a lock on the entire array during write operations. So why are the different?
The difference emerges if you look at other operations, such as iterating over every element of the collection. The documentation for Collections.synchronizedList says,
It is imperative that the user manually synchronize on the returned list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized (list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic behavior.
In other words, iterating over a synchronizedList is not thread-safe unless you do locking manually. Note that when using this technique, all operations by other threads on this list, including iterations, gets, sets, adds, and removals, are blocked. Only one thread at a time can do anything with this collection.
By contrast, the doc for CopyOnWriteArrayList says,
The "snapshot" style iterator method uses a reference to the state of the array at the point that the iterator was created. This array never changes during the lifetime of the iterator, so interference is impossible and the iterator is guaranteed not to throw ConcurrentModificationException. The iterator will not reflect additions, removals, or changes to the list since the iterator was created.
Operations by other threads on this list can proceed concurrently, but the iteration isn't affected by changes made by any other threads. So, even though write operations lock the entire list, CopyOnWriteArrayList still can provide higher throughput than an ordinary synchronizedList. (Provided that there is a high proportion of reads and traversals to writes.)

For write (add) operation, CopyOnWriteArrayList uses ReentrantLock and creates a backup copy of the data and the underlying volatile array reference is only updated via setArray(Any read operation on the list during before setArray will return the old data before add).Moreover, CopyOnWriteArrayList provides snapshot fail-safe iterator and doesn't throw ConcurrentModifficationException on write/ add.
But when I checked add method of CopyOnWriteArrayList.class, we are acquiring lock on complete collection object. Then how come CopyOnWriteArrayList is better than synchronizedList. The only difference I see in add method of CopyOnWriteArrayList is we are creating copy of that array each time add method get called.
No, the lock is not on the entire Collection object. As stated above it is a ReentrantLock and it is different from the intrinsic object lock.
The add method will always create a copy of the existing array and do the modification on the copy and then finally update the volatile reference of the array to point to this new array. And that's why we have the name "CopyOnWriteArrayList" - makes copy when you write into it.. This also avoids the ConcurrentModificationException

1) get and other read operation on CopyOnWriteArrayList are not synchronized.
2) CopyOnWriteArrayList's iterator never throws ConcurrentModificationException while Collections.synchronizedList's iterator may throw it.

ConcurrentModificationException when deleting an element from ArrayList

Java is throwing ConcurrentModificationException when I am running the following code. Any idea why is that?
ArrayList<String> list1 = new ArrayList<String>();
list1.add("Hello");
list1.add("World");
list1.add("Good Evening");
for (String s : list1){
list1.remove(2);
System.out.println(s);
}

If you take a look at documentation of ConcurrentModificationException you will find that
This exception may be thrown by methods that have detected concurrent
modification of an object when such modification is not permissible.
For example, it is not generally permissible for one thread to modify
a Collection while another thread is iterating over it
...
Note that this exception does not always indicate that an object has
been concurrently modified by a different thread. If a single thread
issues a sequence of method invocations that violates the contract of
an object, the object may throw this exception. For example, if a
thread modifies a collection directly while it is iterating over the
collection with a fail-fast iterator, the iterator will throw this
exception.
Important thing about this exception is that we can't guarantee it will always be thrown as stated in documentation
Note that fail-fast behavior cannot be guaranteed as it is, generally
speaking, impossible to make any hard guarantees in the presence of
unsynchronized concurrent modification. Fail-fast operations throw
ConcurrentModificationException on a best-effort basis.
Also from ArrayList documentation
The iterators returned by this class's iterator and listIterator
methods are fail-fast: if the list is structurally modified at any
time after the iterator is created, in any way except through the
iterator's own remove or add methods, the iterator will throw a
ConcurrentModificationException.
(emphasis mine)
So you can't manipulate content of Collection (in your case List) while iterating over it via enhanced for loop because you are not doing it via iterator for-each is using internally.
To solve it just get your own Iterator and use it in your loop. To remove elements from collection use remove like in this example
Iterator<String> it = list1.iterator();
int i=0;
while(it.hasNext()){
String s = it.next();
i++;
if (i==2){
it.remove();
System.out.println("removed: "+ s);
}
}

form the doc you can read
This exception may be thrown by methods that have detected concurrent
modification of an object when such modification is not permissible.
For example, it is not generally permissible for one thread to modify
a Collection while another thread is iterating over it. In general,
the results of the iteration are undefined under these circumstances.
Some Iterator implementations (including those of all the general
purpose collection implementations provided by the JRE) may choose to
throw this exception if this behavior is detected. Iterators that do
this are known as fail-fast iterators, as they fail quickly and
cleanly, rather that risking arbitrary, non-deterministic behavior at
an undetermined time in the future.
further more, suppose that you can remove item 2 at each iteration. You will end up in a index out of bound exception:
at first iteration you remove item #2 ("Good evening") and list size become 1 (item 0 "hello" and item 1 "World")
at next iteration you remove item #2 which actually does not exist in your list. Your list is indeed of size two, but counter starts from 0, thus you end up removing something which is not there: this is an example of the non-deterministic behavior.

You cann't iterating over an list after the underlying list is modified.if you do that its give you ConcurrentModificationException
to resolve this issue use java.util.ListIterator for iteration of list.
ListIterator<String> it = list1.listIterator();
for (String s : list1) {
if (it.hasNext()) {
String item = it.next();
System.out.println(item);
}
}

You can't delete an item with ArrayList's remove() method while iterating over it. Use Iterator if you want to delete the item also while iterating.
But you are deleting an item based on index then simply moving below line outside the loop will solve your problem.
list1.remove(2);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.