How can I iterate over ConcurrentSkipListMap while preserving weak consistency - java

Assume I have multiple threads adding entries to and removing entries from a ConcurrentSkipListMap.
I have another thread that on predefined periods runs over the collection and update it's data using an iterator.How this can be done considering concurrent access.
how to iterate?
Does the iterator supports weak consistency?

Read the Javadoc:
Iterators are weakly consistent, returning elements reflecting the state of the map at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations.

Related

why does `weakly consistent iterator` only reflect modification and removal changes, but does not reflect insertion changes

I have been reading the book Java generics and collection and in a section discussing iterator, the author mentioned
Collections which rely on CAS (compare and swap) have weakly consistent iterators, which
reflect some but not necessarily all of the changes that have been
made to their backing collection since they were created. For example,
if elements in the collection have been modified or removed before the
iterator reaches them, it definitely will reflect these changes, but
no such guarantee is made for insertions. Weakly consistent iterators
also do not throw ConcurrentModificationException.
I was wondering, why does weakly consistent iterator only reflect modification and removal changes, but does not reflect insertion changes. What is the reason behind defining behaviour like this? What use case does it serve?
I don't think it's correct. The official definition of weakly consistent iteration is:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect
any modifications subsequent to construction.
There's no distinction made between removal and insertion.

Weakly consistent iterator by ConcurrentHashMap

The Java Concurrency in Practice mentions that:
The iterator returned by the ConcurrentHashMap are weakly consistent
than fail-fast. A weakly consistent iterator can tolerate the
concurrent modifications, traverses elements as they existed when the
iterator was constructed, and may (but is not guaranteed to) reflect modifications to the collection after the construction of the iterator.
How making the iterator weakly consistent or fail-safe helps in the concurrent environment because still state of the ConcurrentHashMap will be modified. The only thing is that it'll not throw the ConcurrentModificationException.
Why fail-fast iterator is returned by the Collections when creating the fail-safe iterator is good for concurrency.
Correctness in your particular case
Please keep in mind that Fail Fast iterator iterates over the original collection.
In contrast Fail Safe (a.k.a weakly consistent) iterator iterates over a copy of the original collection. Therefore any changes to the original collection go unnoticed, and that's how it guarantees lack of ConcurrentModificationExceptions.
To answer your questions:
Using Fail Safe iterator helps concurrency as you don't have to block on the reading threads on the whole collection. Collection can be modified underneath while the reading happens. The drawback is that the reading thread will see the state of the collection as a snapshot taken at the time when the iterator got created.
If the above limitation is not good for your particular use case (your readers should always see the same state of the collection) you have to use Fail Fast iterator and keep the concurrent access to the collection controlled tighter.
As you can see it's a trade-off between correctness of your use case and speed.
ConcurrentHashMap
ConcurrentHashMap (CHM) exploits multiple tricks in order to increase concurrency of access.
Firstly CHM is actually a grouping of multiple maps; each MapEntry gets stored in one of the number of segments each itself being a hashtable which can be concurrently read (read methods do not block).
The number of segments is the last argument in the 3 argument constructor and it is called concurrencyLevel (default 16). The number of segments determines the number of concurrent writers across the whole of the data. The equal spread of entries between the segments is ensured by additional internal hashing algorithm.
Each HashMapEntrys value is volatile thereby ensuring fine grain consistency for contended modifications and subsequent reads; each read reflects the most recently completed update
Iterators and Enumerations are Fail Safe - reflecting the state at some point since the creation of iterator/enumeration; this allows for simultaneous reads and modifications at the cost of reduced consistency.
TL;DR: Because locking.
If you want a consistent iterator, then you have to lock all modifications to the Map - this is a massive penalty in a concurrent environment.
You can of course do this manually if that is what you want, but iterating over a Map is not its purpose so the default behaviour allows for concurrent writes while iterating.
The same argument does not apply for normal collections, which are only (allowed to be) accessed by a single thread. Iteration over an ArrayList is expected to be consistent, so the fail fast iterators enforce consistency.
First of all, the iterators of concurrent collections are not fail-safe because they do not have failure modes which they could somehow handle with some kind of emergency procedure. They simply do not fail.
The iterators of the non-concurrent collections are fail-fast because of performance reasons they are designed in a way that does not allow the internal structure of the collection they iterate over to be modified. E.g. a hashmap's iterator would not know how to continue iterating after the reshuffling that happens when a hashmap gets resized.
That means they would not just fail because other threads access them, they would also fail if the current thread performs a modification that invalidates the assumptions of the iterator.
Instead of ignoring those troublesome modifications and returning unpredictable and corrupted results those collections instead try to track modifications and throw an exception during iteration to inform the programmer that something is wrong. This is called fail-fast.
Those fail-fast mechanisms are not thread-safe either. Which means if the illegal modifications don't happen from the current thread but from a different threads they are not guaranteed to be detected anymore. In that case it can only be thought of as a best-effort failure detection mechanism.
On the other hand concurrent collections must be designed in a manner that can deal with multiple writes and reads at the same time and the underlying structure changing constantly.
So iterators can't always assume that the underlying structure is never modified during iteration.
Instead they're designed to provide weaker guarantees, such as either iterating over outdated data or maybe also showing some but not all updates that happened after the creation of the iterator. Which also means that they might return outdated data when they are modified during iteration within a single thread, which might be somewhat counter-intuitive for a programmer as one would usually expect immediate visibility of modifications within a single thread.
Examples:
HashMap: best-effort fail-fast iterator.
iterator supports removal
structural modification from same thread, such as clear()ing the Map during iteration: guaranteed to throw a ConcurrentModificationException on the next iterator step
structural modification from different thread during iteration: iterator usually throws an exception, but might also cause inconsistent, unpredictable behavior
CopyOnWriteArrayList: snapshot iterator
iterator does not support removal
iterator shows a view on the items frozen at the time it was created
collection can be modified by any thread including the current one during iteration without causing an exception, but it has no effect on the items visited by the iterator
clear()ing the list will not stop iteration
iterator never throws CME
ConcurrentSkipListMap: weakly consistent iterator
iterator supports removal, but may cause surprising behavior since it's solely based on Map keys, not the current value
iterator may see updates that happened since its creation but is not guaranteed to. that means for example that clear()ing the Map may or may not stop iteration and removing entries may or may not stop them from showing up during the remaining iteration
iterator never throws CME

Ways to avoid Iterator ConcurrentModificationException

As far as I know there are two ways to avoid ConcurrentModificationException while one threading iterates the collection and another thread modifies the collection.
client-locking, basically lock the collection during the iteration. Other threads that need to access the collection will block until the iteration is complete.
"thread-confined" that clones the collection and iterate the copy.
I am wondering are there any other alternatives ?
because the first way obvious is undesirable and poor performance-wise, if the collection is large that other threads could wait for a long time. second way I am not sure that since we clone the collection, and iterate the copy, so if other threads come in and modify the original one, then the copied one becomes stale right ? does that mean we need to restart over by cloning and iterate it again once it's modified ?
I am wondering are there any other alternatives ?
Use one of the concurrent collections which doesn't throw this exception. Instead they provide weak consistency. i.e. an added or delete element may or may not appear while iterating.
http://docs.oracle.com/javase/tutorial/essential/concurrency/collections.html
The java.util.concurrent package includes a number of additions to the Java Collections Framework. These are most easily categorized by the collection interfaces provided:
BlockingQueue defines a first-in-first-out data structure that blocks or times out when you attempt to add to a full queue, or retrieve from an empty queue.
ConcurrentMap is a subinterface of java.util.Map that defines useful atomic operations. These operations remove or replace a key-value pair only if the key is present, or add a key-value pair only if the key is absent. Making these operations atomic helps avoid synchronization. The standard general-purpose implementation of ConcurrentMap is ConcurrentHashMap, which is a concurrent analog of HashMap.
ConcurrentNavigableMap is a subinterface of ConcurrentMap that supports approximate matches. The standard general-purpose implementation of ConcurrentNavigableMap is ConcurrentSkipListMap, which is a concurrent analog of TreeMap.
you could use Class's from java.util.Concurrent like CopyOnWriteArrayList

How does ConcurrentHashMap work internally?

I was reading the official Oracle documentation about Concurrency in Java and I was wondering what could be the difference between a Collection returned by
public static <T> Collection<T> synchronizedCollection(Collection<T> c);
and using for example a
ConcurrentHashMap. I'm assuming that I use synchronizedCollection(Collection<T> c) on a HashMap. I know that in general a synchronized collection is essentially just a decorator for my HashMap so it is obvious that a ConcurrentHashMap has something different in its internals. Do you have some information about those implementation details?
Edit: I realized that the source code is publicly available:
ConcurrentHashMap.java
I would read the source of ConcurrentHashMap as it is rather complicated in the detail. In short it has
Multiple partitions which can be locked independently. (16 by default)
Using concurrent Locks operations for thread safety instead of synchronized.
Has thread safe Iterators. synchronizedCollection's iterators are not thread safe.
Does not expose the internal locks. synchronizedCollection does.
The ConcurrentHashMap is very similar to the java.util.HashTable class, except that ConcurrentHashMap offers better concurrency than HashTable or synchronizedMap does. ConcurrentHashMap does not lock the Map while you are reading from it. Additionally,ConcurrentHashMap does not lock the entire Mapwhen writing to it. It only locks the part of the Map that is being written to, internally.
Another difference is that ConcurrentHashMap does not throw ConcurrentModificationException if the ConcurrentHashMap is changed while being iterated. The Iterator is not designed to be used by more than one thread though whereas synchronizedMap may throw ConcurrentModificationException
This is the article that helped me understand it Why ConcurrentHashMap is better than Hashtable and just as good as a HashMap
Hashtable’s offer concurrent access to their entries, with a small caveat, the entire map is locked to perform any sort of operation.
While this overhead is ignorable in a web application under normal
load, under heavy load it can lead to delayed response times and
overtaxing of your server for no good reason.
This is where ConcurrentHashMap’s step in. They offer all the features
of Hashtable with a performance almost as good as a HashMap.
ConcurrentHashMap’s accomplish this by a very simple mechanism.
Instead of a map wide lock, the collection maintains a list of 16
locks by default, each of which is used to guard (or lock on) a single
bucket of the map. This effectively means that 16 threads can modify
the collection at a single time (as long as they’re all working on
different buckets). Infact there is no operation performed by this
collection that locks the entire map. The concurrency level of the
collection, the number of threads that can modify it at the same time
without blocking, can be increased. However a higher number means more
overhead of maintaining this list of locks.
The "scalability issues" for Hashtable are present in exactly the same way in Collections.synchronizedMap(Map) - they use very simple synchronization, which means that only one thread can access the map at the same time.
This is not much of an issue when you have simple inserts and lookups (unless you do it extremely intensively), but becomes a big problem when you need to iterate over the entire Map, which can take a long time for a large Map - while one thread does that, all others have to wait if they want to insert or lookup anything.
The ConcurrentHashMap uses very sophisticated techniques to reduce the need for synchronization and allow parallel read access by multiple threads without synchronization and, more importantly, provides an Iterator that requires no synchronization and even allows the Map to be modified during interation (though it makes no guarantees whether or not elements that were inserted during iteration will be returned).
Returned by synchronizedCollection() is an object all methods of which are synchronized on this, so all concurrent operations on such wrapper are serialized. ConcurrentHashMap is a truly concurrent container with fine grained locking optimized to keep contention as low as possible. Have a look at the source code and you will see what it is inside.
ConcurrentHashMap implements ConcurrentMap which provides the concurrency.
Deep internally its iterators are designed to be used by only one thread at a time which maintains the synchronization.
This map is used widely in concurrency.

How to get a fixed state iterator for a set/map without cloning overheads

I'm looking to avoid a ConcurrentModificationException where the functionality is to iterate over an expanding set (there are no removes), and the add operations are being done by different threads.
I considered cloning the collection before iterating, but if this solution doesn't scale very well as the set becomes large. Synchronizing doesn't work because the collection is being used in tonnes of places and the code is pretty old. Short of a massive refactoring, the only bet is to change the set implementation.
Wondering if there's a Java implementation where the iterator returns a snapshot state of the collection (which is okay for my functionality) but avoid the cost of cloning too much. I checked out CopyOnWriteArrayList but it doesn't fit the bill mainly because of being a list.
The java.util.concurrent package has everything you need.
The classes there are like the java.util collections, but are highly optimized to cater for concurrent access, interestingly addressing specifically your comment:
the iterator returns a snapshot state of the collection
Don't reinvent the wheel :)
Wondering if there's a Java implementation where the iterator returns a snapshot state of the collection
Yes, there is. Unlike the synchronized collections made available via the Collections.synchronizedxxx() methods, the Concurrentxxx classes in java.util.concurrent package would allow for this scenario. The concurrent collection classes allow for multiple threads to access the collection at the same point in time, without the need to synchronize on a lock.
Depending on the exact nature of your problem, ConcurrentHashMaps can be used. The relevant section of the documentation of the class, that applies to your problem is:
Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
Note the last sentence carefully.
Also, remember that these are not consistent snapshots of the collection being returned. The iterators from most methods returned possess the following property:
The view's iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.
Related questions
Is iterating ConcurrentHashMap values thread safe?
Java ConcurrentHashMap not thread safe.. wth?

Categories

Resources