Sanity checks on dynamic set - java

I have a object QueueSet made from ConcurrentSkipListSet in Java
Beware that, unlike in most collections, the size method is not a constant-time operation. Because of the asynchronous nature of these sets, determining the current number of elements requires a traversal of the elements, and so may report inaccurate results if this collection is modified during traversal. Additionally, the bulk operations addAll, removeAll, retainAll, containsAll, equals, and toArray are not guaranteed to be performed atomically. For example, an iterator operating concurrently with an addAll operation might view only some of the added elements.
Problem: There is flawed sanity check on this if(!activeQueueSet.add(queue)) but as you can see from the documentation its a O(n) operation i.e. the whole set is traversed which somehow misinterprets the state of the list quite a lot of times. I'm looking for a foolproof sanity check on this.

It is true that your ConcurrentSkipListSet.add(element) can return true or false depending on whether the set is being simultaneously modified by another thread using iterator, which here is weakly consistent, or by bulk methods (i.e. xxxAll()) which are not atomic.
Please mind however that add() and remove() methods are thread safe, so as long as you modify your set using only these you will be fine.
It will be down to your specific application what to do about it. If the element was not there, but got added, that's good. Is it so bad if the element was there in the first place and therefore not added?
You can devise a class containing (or perhaps extending) ConcurrentSkipListSet with very controlled API preventing any of the problematic operations or making them thread safe by using locks.

Related

why does `weakly consistent iterator` only reflect modification and removal changes, but does not reflect insertion changes

I have been reading the book Java generics and collection and in a section discussing iterator, the author mentioned
Collections which rely on CAS (compare and swap) have weakly consistent iterators, which
reflect some but not necessarily all of the changes that have been
made to their backing collection since they were created. For example,
if elements in the collection have been modified or removed before the
iterator reaches them, it definitely will reflect these changes, but
no such guarantee is made for insertions. Weakly consistent iterators
also do not throw ConcurrentModificationException.
I was wondering, why does weakly consistent iterator only reflect modification and removal changes, but does not reflect insertion changes. What is the reason behind defining behaviour like this? What use case does it serve?
I don't think it's correct. The official definition of weakly consistent iteration is:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect
any modifications subsequent to construction.
There's no distinction made between removal and insertion.

Java Concurrent collection for few writes and frequent reads

I want to use a Comparator based key value Map. This will have reads and a rare write operation (once every 3 months through a scheduler). The initial load of the collection will be done at application startup.
Also note that the write will:
Add a single entry to the Map
Will not modify any existing entry to the Map.
Will ConcurrentSkipListMap be a good candidate for this. Is the get operation on this allows access to multiple threads simultaneously? I'm looking for concurrent non blocking read but atomic write.
ConcurrentHashMap is exactly what you're looking for. From the Javadoc:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.)
That sounds like it satisfies your requirement for "concurrent non blocking read but atomic write".
Since you're doing so few writes, you may want to specify a high loadFactor and appropriate initialSize when creating the ConcurrentHashMap, which will prevent table resizing as you're populating the map, though this is a modest benefit at best. (You could also set a concurrencyLevel of 1, though Java 8's Javadoc seems to imply that is no longer used as a sizing hint.)
If you absolutely must have a SortedMap or NavigableMap, then ConcurrentSkipListMap is the out-of-the-box way to go. But I would double-check that you actually need the functionality provided by those interfaces (getting the first/last key, submaps, finding nearby entries, etc.) before using them. You will pay a steep price (log n vs. constant time for most operations).
Since you are looking for concurrent operations you have basically 3 competitors.
Hashtable, ConcurrentHashMap, ConcurrentSkipListMap (or Collections.synchronizedMap() but that's not efficient).
Out of these 3 latter 2 are more suitable for concurrent operation as they just lock the portion of map rather than locking the entire map like Hashtable.
Out of latter 2 SkipListMap uses skip list data structure which ensures average O(log n) performance for fast search and variety of operations.
It also offers number of operations that ConcurrentHashMap can't, i.e. ceilingEntry/Key(), floorEntry/Key(), etc. It also maintains a sort order which would otherwise have to be calculated.
Thus if you had asked only for faster search i'd have suggested ConcurrentHashMap, but since you have also mentioned 'rare write operations' and 'desired sorting' order I think ConcurrentSkipListMap wins the race.
If you are willing to try third party code, you could consider a copy-on-write version of maps, which are ideal for infrequent writes. Here's one that came up via Googling:
https://bitbucket.org/atlassian/atlassian-util-concurrent/wiki/CopyOnWrite%20Maps
Never tried it myself so caveat emptor.

Weakly consistent iterator by ConcurrentHashMap

The Java Concurrency in Practice mentions that:
The iterator returned by the ConcurrentHashMap are weakly consistent
than fail-fast. A weakly consistent iterator can tolerate the
concurrent modifications, traverses elements as they existed when the
iterator was constructed, and may (but is not guaranteed to) reflect modifications to the collection after the construction of the iterator.
How making the iterator weakly consistent or fail-safe helps in the concurrent environment because still state of the ConcurrentHashMap will be modified. The only thing is that it'll not throw the ConcurrentModificationException.
Why fail-fast iterator is returned by the Collections when creating the fail-safe iterator is good for concurrency.
Correctness in your particular case
Please keep in mind that Fail Fast iterator iterates over the original collection.
In contrast Fail Safe (a.k.a weakly consistent) iterator iterates over a copy of the original collection. Therefore any changes to the original collection go unnoticed, and that's how it guarantees lack of ConcurrentModificationExceptions.
To answer your questions:
Using Fail Safe iterator helps concurrency as you don't have to block on the reading threads on the whole collection. Collection can be modified underneath while the reading happens. The drawback is that the reading thread will see the state of the collection as a snapshot taken at the time when the iterator got created.
If the above limitation is not good for your particular use case (your readers should always see the same state of the collection) you have to use Fail Fast iterator and keep the concurrent access to the collection controlled tighter.
As you can see it's a trade-off between correctness of your use case and speed.
ConcurrentHashMap
ConcurrentHashMap (CHM) exploits multiple tricks in order to increase concurrency of access.
Firstly CHM is actually a grouping of multiple maps; each MapEntry gets stored in one of the number of segments each itself being a hashtable which can be concurrently read (read methods do not block).
The number of segments is the last argument in the 3 argument constructor and it is called concurrencyLevel (default 16). The number of segments determines the number of concurrent writers across the whole of the data. The equal spread of entries between the segments is ensured by additional internal hashing algorithm.
Each HashMapEntrys value is volatile thereby ensuring fine grain consistency for contended modifications and subsequent reads; each read reflects the most recently completed update
Iterators and Enumerations are Fail Safe - reflecting the state at some point since the creation of iterator/enumeration; this allows for simultaneous reads and modifications at the cost of reduced consistency.
TL;DR: Because locking.
If you want a consistent iterator, then you have to lock all modifications to the Map - this is a massive penalty in a concurrent environment.
You can of course do this manually if that is what you want, but iterating over a Map is not its purpose so the default behaviour allows for concurrent writes while iterating.
The same argument does not apply for normal collections, which are only (allowed to be) accessed by a single thread. Iteration over an ArrayList is expected to be consistent, so the fail fast iterators enforce consistency.
First of all, the iterators of concurrent collections are not fail-safe because they do not have failure modes which they could somehow handle with some kind of emergency procedure. They simply do not fail.
The iterators of the non-concurrent collections are fail-fast because of performance reasons they are designed in a way that does not allow the internal structure of the collection they iterate over to be modified. E.g. a hashmap's iterator would not know how to continue iterating after the reshuffling that happens when a hashmap gets resized.
That means they would not just fail because other threads access them, they would also fail if the current thread performs a modification that invalidates the assumptions of the iterator.
Instead of ignoring those troublesome modifications and returning unpredictable and corrupted results those collections instead try to track modifications and throw an exception during iteration to inform the programmer that something is wrong. This is called fail-fast.
Those fail-fast mechanisms are not thread-safe either. Which means if the illegal modifications don't happen from the current thread but from a different threads they are not guaranteed to be detected anymore. In that case it can only be thought of as a best-effort failure detection mechanism.
On the other hand concurrent collections must be designed in a manner that can deal with multiple writes and reads at the same time and the underlying structure changing constantly.
So iterators can't always assume that the underlying structure is never modified during iteration.
Instead they're designed to provide weaker guarantees, such as either iterating over outdated data or maybe also showing some but not all updates that happened after the creation of the iterator. Which also means that they might return outdated data when they are modified during iteration within a single thread, which might be somewhat counter-intuitive for a programmer as one would usually expect immediate visibility of modifications within a single thread.
Examples:
HashMap: best-effort fail-fast iterator.
iterator supports removal
structural modification from same thread, such as clear()ing the Map during iteration: guaranteed to throw a ConcurrentModificationException on the next iterator step
structural modification from different thread during iteration: iterator usually throws an exception, but might also cause inconsistent, unpredictable behavior
CopyOnWriteArrayList: snapshot iterator
iterator does not support removal
iterator shows a view on the items frozen at the time it was created
collection can be modified by any thread including the current one during iteration without causing an exception, but it has no effect on the items visited by the iterator
clear()ing the list will not stop iteration
iterator never throws CME
ConcurrentSkipListMap: weakly consistent iterator
iterator supports removal, but may cause surprising behavior since it's solely based on Map keys, not the current value
iterator may see updates that happened since its creation but is not guaranteed to. that means for example that clear()ing the Map may or may not stop iteration and removing entries may or may not stop them from showing up during the remaining iteration
iterator never throws CME

Ways to avoid Iterator ConcurrentModificationException

As far as I know there are two ways to avoid ConcurrentModificationException while one threading iterates the collection and another thread modifies the collection.
client-locking, basically lock the collection during the iteration. Other threads that need to access the collection will block until the iteration is complete.
"thread-confined" that clones the collection and iterate the copy.
I am wondering are there any other alternatives ?
because the first way obvious is undesirable and poor performance-wise, if the collection is large that other threads could wait for a long time. second way I am not sure that since we clone the collection, and iterate the copy, so if other threads come in and modify the original one, then the copied one becomes stale right ? does that mean we need to restart over by cloning and iterate it again once it's modified ?
I am wondering are there any other alternatives ?
Use one of the concurrent collections which doesn't throw this exception. Instead they provide weak consistency. i.e. an added or delete element may or may not appear while iterating.
http://docs.oracle.com/javase/tutorial/essential/concurrency/collections.html
The java.util.concurrent package includes a number of additions to the Java Collections Framework. These are most easily categorized by the collection interfaces provided:
BlockingQueue defines a first-in-first-out data structure that blocks or times out when you attempt to add to a full queue, or retrieve from an empty queue.
ConcurrentMap is a subinterface of java.util.Map that defines useful atomic operations. These operations remove or replace a key-value pair only if the key is present, or add a key-value pair only if the key is absent. Making these operations atomic helps avoid synchronization. The standard general-purpose implementation of ConcurrentMap is ConcurrentHashMap, which is a concurrent analog of HashMap.
ConcurrentNavigableMap is a subinterface of ConcurrentMap that supports approximate matches. The standard general-purpose implementation of ConcurrentNavigableMap is ConcurrentSkipListMap, which is a concurrent analog of TreeMap.
you could use Class's from java.util.Concurrent like CopyOnWriteArrayList

How to get a fixed state iterator for a set/map without cloning overheads

I'm looking to avoid a ConcurrentModificationException where the functionality is to iterate over an expanding set (there are no removes), and the add operations are being done by different threads.
I considered cloning the collection before iterating, but if this solution doesn't scale very well as the set becomes large. Synchronizing doesn't work because the collection is being used in tonnes of places and the code is pretty old. Short of a massive refactoring, the only bet is to change the set implementation.
Wondering if there's a Java implementation where the iterator returns a snapshot state of the collection (which is okay for my functionality) but avoid the cost of cloning too much. I checked out CopyOnWriteArrayList but it doesn't fit the bill mainly because of being a list.
The java.util.concurrent package has everything you need.
The classes there are like the java.util collections, but are highly optimized to cater for concurrent access, interestingly addressing specifically your comment:
the iterator returns a snapshot state of the collection
Don't reinvent the wheel :)
Wondering if there's a Java implementation where the iterator returns a snapshot state of the collection
Yes, there is. Unlike the synchronized collections made available via the Collections.synchronizedxxx() methods, the Concurrentxxx classes in java.util.concurrent package would allow for this scenario. The concurrent collection classes allow for multiple threads to access the collection at the same point in time, without the need to synchronize on a lock.
Depending on the exact nature of your problem, ConcurrentHashMaps can be used. The relevant section of the documentation of the class, that applies to your problem is:
Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
Note the last sentence carefully.
Also, remember that these are not consistent snapshots of the collection being returned. The iterators from most methods returned possess the following property:
The view's iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.
Related questions
Is iterating ConcurrentHashMap values thread safe?
Java ConcurrentHashMap not thread safe.. wth?

Categories

Resources