As far as I know there are two ways to avoid ConcurrentModificationException while one threading iterates the collection and another thread modifies the collection.
client-locking, basically lock the collection during the iteration. Other threads that need to access the collection will block until the iteration is complete.
"thread-confined" that clones the collection and iterate the copy.
I am wondering are there any other alternatives ?
because the first way obvious is undesirable and poor performance-wise, if the collection is large that other threads could wait for a long time. second way I am not sure that since we clone the collection, and iterate the copy, so if other threads come in and modify the original one, then the copied one becomes stale right ? does that mean we need to restart over by cloning and iterate it again once it's modified ?
I am wondering are there any other alternatives ?
Use one of the concurrent collections which doesn't throw this exception. Instead they provide weak consistency. i.e. an added or delete element may or may not appear while iterating.
http://docs.oracle.com/javase/tutorial/essential/concurrency/collections.html
The java.util.concurrent package includes a number of additions to the Java Collections Framework. These are most easily categorized by the collection interfaces provided:
BlockingQueue defines a first-in-first-out data structure that blocks or times out when you attempt to add to a full queue, or retrieve from an empty queue.
ConcurrentMap is a subinterface of java.util.Map that defines useful atomic operations. These operations remove or replace a key-value pair only if the key is present, or add a key-value pair only if the key is absent. Making these operations atomic helps avoid synchronization. The standard general-purpose implementation of ConcurrentMap is ConcurrentHashMap, which is a concurrent analog of HashMap.
ConcurrentNavigableMap is a subinterface of ConcurrentMap that supports approximate matches. The standard general-purpose implementation of ConcurrentNavigableMap is ConcurrentSkipListMap, which is a concurrent analog of TreeMap.
you could use Class's from java.util.Concurrent like CopyOnWriteArrayList
Related
I'm researching about iterator invalidation rules in Java, but I couldn't find proper information like this one for C++. All things that I found for java is more generic like this one. Is there a documentation that I could follow?
Java "Collections Framework Overview" documentation says
The general-purpose implementations support all of the optional operations in the collection interfaces and have no restrictions on the elements they may contain. They are unsynchronized, but the Collections class contains static factories called synchronization wrappers that can be used to add synchronization to many unsynchronized collections. All of the new implementations have fail-fast iterators, which detect invalid concurrent modification, and fail quickly and cleanly (rather than behaving erratically).
Java has concurrent thread safe collections implementations. They are part of java.util.concurrent package, which doc says
Most concurrent Collection implementations (including most Queues)
also differ from the usual java.util conventions in that their
Iterators and Spliterators provide weakly consistent rather than
fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect
any modifications subsequent to construction.
For example for ConcurrentHashMap
Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control.
So the short answer is: if you want to iterate the collection while it may change by another thread, just use concurrent implementation collections. This java iterator is never invalidated in "C++ meaning"
Or just use thread-unsafe collections and catch ConcurrentModificationException for fixing the collection modification issue. In this case, java iterator also is never invalidated in "C++ meaning".
I want to use a Comparator based key value Map. This will have reads and a rare write operation (once every 3 months through a scheduler). The initial load of the collection will be done at application startup.
Also note that the write will:
Add a single entry to the Map
Will not modify any existing entry to the Map.
Will ConcurrentSkipListMap be a good candidate for this. Is the get operation on this allows access to multiple threads simultaneously? I'm looking for concurrent non blocking read but atomic write.
ConcurrentHashMap is exactly what you're looking for. From the Javadoc:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.)
That sounds like it satisfies your requirement for "concurrent non blocking read but atomic write".
Since you're doing so few writes, you may want to specify a high loadFactor and appropriate initialSize when creating the ConcurrentHashMap, which will prevent table resizing as you're populating the map, though this is a modest benefit at best. (You could also set a concurrencyLevel of 1, though Java 8's Javadoc seems to imply that is no longer used as a sizing hint.)
If you absolutely must have a SortedMap or NavigableMap, then ConcurrentSkipListMap is the out-of-the-box way to go. But I would double-check that you actually need the functionality provided by those interfaces (getting the first/last key, submaps, finding nearby entries, etc.) before using them. You will pay a steep price (log n vs. constant time for most operations).
Since you are looking for concurrent operations you have basically 3 competitors.
Hashtable, ConcurrentHashMap, ConcurrentSkipListMap (or Collections.synchronizedMap() but that's not efficient).
Out of these 3 latter 2 are more suitable for concurrent operation as they just lock the portion of map rather than locking the entire map like Hashtable.
Out of latter 2 SkipListMap uses skip list data structure which ensures average O(log n) performance for fast search and variety of operations.
It also offers number of operations that ConcurrentHashMap can't, i.e. ceilingEntry/Key(), floorEntry/Key(), etc. It also maintains a sort order which would otherwise have to be calculated.
Thus if you had asked only for faster search i'd have suggested ConcurrentHashMap, but since you have also mentioned 'rare write operations' and 'desired sorting' order I think ConcurrentSkipListMap wins the race.
If you are willing to try third party code, you could consider a copy-on-write version of maps, which are ideal for infrequent writes. Here's one that came up via Googling:
https://bitbucket.org/atlassian/atlassian-util-concurrent/wiki/CopyOnWrite%20Maps
Never tried it myself so caveat emptor.
Assume I have multiple threads adding entries to and removing entries from a ConcurrentSkipListMap.
I have another thread that on predefined periods runs over the collection and update it's data using an iterator.How this can be done considering concurrent access.
how to iterate?
Does the iterator supports weak consistency?
Read the Javadoc:
Iterators are weakly consistent, returning elements reflecting the state of the map at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations.
I was reading the official Oracle documentation about Concurrency in Java and I was wondering what could be the difference between a Collection returned by
public static <T> Collection<T> synchronizedCollection(Collection<T> c);
and using for example a
ConcurrentHashMap. I'm assuming that I use synchronizedCollection(Collection<T> c) on a HashMap. I know that in general a synchronized collection is essentially just a decorator for my HashMap so it is obvious that a ConcurrentHashMap has something different in its internals. Do you have some information about those implementation details?
Edit: I realized that the source code is publicly available:
ConcurrentHashMap.java
I would read the source of ConcurrentHashMap as it is rather complicated in the detail. In short it has
Multiple partitions which can be locked independently. (16 by default)
Using concurrent Locks operations for thread safety instead of synchronized.
Has thread safe Iterators. synchronizedCollection's iterators are not thread safe.
Does not expose the internal locks. synchronizedCollection does.
The ConcurrentHashMap is very similar to the java.util.HashTable class, except that ConcurrentHashMap offers better concurrency than HashTable or synchronizedMap does. ConcurrentHashMap does not lock the Map while you are reading from it. Additionally,ConcurrentHashMap does not lock the entire Mapwhen writing to it. It only locks the part of the Map that is being written to, internally.
Another difference is that ConcurrentHashMap does not throw ConcurrentModificationException if the ConcurrentHashMap is changed while being iterated. The Iterator is not designed to be used by more than one thread though whereas synchronizedMap may throw ConcurrentModificationException
This is the article that helped me understand it Why ConcurrentHashMap is better than Hashtable and just as good as a HashMap
Hashtable’s offer concurrent access to their entries, with a small caveat, the entire map is locked to perform any sort of operation.
While this overhead is ignorable in a web application under normal
load, under heavy load it can lead to delayed response times and
overtaxing of your server for no good reason.
This is where ConcurrentHashMap’s step in. They offer all the features
of Hashtable with a performance almost as good as a HashMap.
ConcurrentHashMap’s accomplish this by a very simple mechanism.
Instead of a map wide lock, the collection maintains a list of 16
locks by default, each of which is used to guard (or lock on) a single
bucket of the map. This effectively means that 16 threads can modify
the collection at a single time (as long as they’re all working on
different buckets). Infact there is no operation performed by this
collection that locks the entire map. The concurrency level of the
collection, the number of threads that can modify it at the same time
without blocking, can be increased. However a higher number means more
overhead of maintaining this list of locks.
The "scalability issues" for Hashtable are present in exactly the same way in Collections.synchronizedMap(Map) - they use very simple synchronization, which means that only one thread can access the map at the same time.
This is not much of an issue when you have simple inserts and lookups (unless you do it extremely intensively), but becomes a big problem when you need to iterate over the entire Map, which can take a long time for a large Map - while one thread does that, all others have to wait if they want to insert or lookup anything.
The ConcurrentHashMap uses very sophisticated techniques to reduce the need for synchronization and allow parallel read access by multiple threads without synchronization and, more importantly, provides an Iterator that requires no synchronization and even allows the Map to be modified during interation (though it makes no guarantees whether or not elements that were inserted during iteration will be returned).
Returned by synchronizedCollection() is an object all methods of which are synchronized on this, so all concurrent operations on such wrapper are serialized. ConcurrentHashMap is a truly concurrent container with fine grained locking optimized to keep contention as low as possible. Have a look at the source code and you will see what it is inside.
ConcurrentHashMap implements ConcurrentMap which provides the concurrency.
Deep internally its iterators are designed to be used by only one thread at a time which maintains the synchronization.
This map is used widely in concurrency.
I'm looking to avoid a ConcurrentModificationException where the functionality is to iterate over an expanding set (there are no removes), and the add operations are being done by different threads.
I considered cloning the collection before iterating, but if this solution doesn't scale very well as the set becomes large. Synchronizing doesn't work because the collection is being used in tonnes of places and the code is pretty old. Short of a massive refactoring, the only bet is to change the set implementation.
Wondering if there's a Java implementation where the iterator returns a snapshot state of the collection (which is okay for my functionality) but avoid the cost of cloning too much. I checked out CopyOnWriteArrayList but it doesn't fit the bill mainly because of being a list.
The java.util.concurrent package has everything you need.
The classes there are like the java.util collections, but are highly optimized to cater for concurrent access, interestingly addressing specifically your comment:
the iterator returns a snapshot state of the collection
Don't reinvent the wheel :)
Wondering if there's a Java implementation where the iterator returns a snapshot state of the collection
Yes, there is. Unlike the synchronized collections made available via the Collections.synchronizedxxx() methods, the Concurrentxxx classes in java.util.concurrent package would allow for this scenario. The concurrent collection classes allow for multiple threads to access the collection at the same point in time, without the need to synchronize on a lock.
Depending on the exact nature of your problem, ConcurrentHashMaps can be used. The relevant section of the documentation of the class, that applies to your problem is:
Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
Note the last sentence carefully.
Also, remember that these are not consistent snapshots of the collection being returned. The iterators from most methods returned possess the following property:
The view's iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.
Related questions
Is iterating ConcurrentHashMap values thread safe?
Java ConcurrentHashMap not thread safe.. wth?