Iteration of ConcurrentHashMap - java

I was reading about ConcurrentHashMap.
I read that it provides an Iterator that requires no synchronization and even allows the Map to be modified during iteration and thus there will be no ConcurrentModificationException.
I was wondering if this is a good thing as I might not get the element, put into ConcurrentHashMap earlier, during iteration as another thread might have changed it.
Is my thinking correct? If yes, is it good or bad?

I was wondering if this is a good thing as I might not get the element, put into ConcurrentHashMap earlier, during iteration as another thread might have changed it.
I don't think this should be a concern - the same statement is true if you use synchronization and the thread doing the iteration happens to grab the lock and execute it's loop prior to the thread that would insert the value.
If you need some sort of coordination between your threads to ensure that some action takes place after (and only after) another action, then you still need to manage this coordination, regardless of the type of Map used.

Usually, the ConcurrentHashMap weakly consistent iterator is sufficient. If instead you want a strongly consistent iterator, then you have a couple of options:
The ctrie is a hash array mapped trie that provides constant time snapshots. There is Java source code available for the data structure.
Clojure has a PersistentHashMap that you can use - this lets you iterate over a snapshot of the data.
Use a local database, e.g. HSQLDB to store the data instead of using a ConcurrentHashMap. Use a composite primary key of key|timestamp, and when you "update" a value you instead store a new entry with the current timestamp. To get an iterator, retrieve a resultset with a where timetamp < System.currentTimeMillis() clause, and iterate over the resultset.
In either case you're iterating over a snapshot, so you've got a strongly consistent iterator; in the former case you run the risk of running out of memory, while the latter case is a more complex solution.

The whole point of concurrent -anything is that you acknowledge concurrent activity, and don't trust that all access is serialized. With most collections, you cannot expect inter-element consistency without working for it.
If you don't care about seeing the latest data, but want a consistent (but possibly old) view of data, have a look at purely functional structures like Finger Trees.

Related

Fast multi-threaded Set implementation that needs no read/write-sync

I'm looking for a fast Set or Map implementation that has a weaker thread-safety in favor of speed.
The idea is to have a data structure that can quickly be checked whether it contains a (Long) entry at best without thread-synchronization. It is okay if a new entry that is written by another thread becomes visible to the other threads at a later time.
I know already that the non thread-safe HashSet Java standard implementation may disrupt the datastructures while inserting a new element and a reader thread ends up in an endless loop during lookup.
I also know that whenever the writing methods are using synchronized blocks, all reader methods should be synchronized as well in a multi-threaded implementation.
So my ultimate goal is to find a possibility to insert in O(1) and lookup in O(1) where
the inserts might get queued in some way for bulk insert at a sync-point (if there is no other possibility)
the read does not get stuck but should not need to wait (for any writers)
the inserted element should be visible to any subsequent reads of the thread that added the element (which might prevent the aforementioned queue)
I am experimenting with Longs that represent hash-codes mapping to Lists of usually one, sometimes two or more entries.
Is there a way to achieve this e.g. via an array and compare-and-exchange and which is faster than using the ConcurrentHashMap?
How would a sketched implementation look like given that the input consists of node-ids (type Long) of a graph that is traversed with multiple threads that somehow exchange information which nodes have been visited already (as described in the list above).
I really appreciate any comments and ideas,
thanks in advance!
Edit: Added extended information on the actual task that I am doing some hobby-research on and which led me to asking the question here in the forum.

When using forEach on a ConcurrentHashMap, are updates reflected or does it behave like a failsafe Iterator?

If I start a forEach action on a ConcurrentHashMap, and other threads are still performing puts on this map, will I see new updates into other bins?
The reason for this is I am trying to find the most effective way to broadcast the contents of a ConcurrentHashMap to listeners, without causing contention to writers of new data to the map. But I want all listeners to receive the same snapshot of the Map when I notify the listeners.
It's not fail safe, but updates are not reflected in way that you think they are; so if you have already seen a bin before you will not get updates from that bin anymore.
If you know how spread works internally you can even cause an OOM(this was an excellent comment from a question I've answered from Holger, but I can't seem to find it right now... )
ConcurrentHashMap<Integer, Integer> chm = new ConcurrentHashMap<>(500_000_000);
chm.put(1, 1);
chm.forEach((key, value) -> chm.put(++value^(value>>>16), value));
The class-level API docs have this to say:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
(Emphasis added.) That does not explicitly address forEach(), but I would expect its behavior to be similar to that achievable via an Iterator over the map's entry set. That is, the forEach() iteration would reflect the map's contents as of some fixed point in time. Therefore, I do not think it at all safe to suppose that modifications to the map by other threads will be seen by the forEach(). I would in fact expect modifications by other threads generally not to be reflected in the forEach()'s behavior, though there is room in the spec to allow it seeing some modifications.
To provide a snapshot of a map you need to copy the map at a given point. If the iterator would simply be a snapshot, it would have to create a copy initially as well. As that costs extra memory and computation it won't do that, and then there are architectural reasons why that might not be desirable anyways.
Any non-null result returned from get(key) and related access methods bears a happens-before relation with the associated insertion or update. The result of any bulk operation reflects the composition of these per-element relations
In these lines it is (not so clearly) stated, that any changes that happened before any get (iterator or single call) are already included and reflected by that get operation. So a forEach bulk operation will work on the most recent state of the map at any given time.
You already gave the (only) solution to this problem in your question: create a local snap-shot using the copy constructor of the map prior to distribution. It is an extra memory overhead, but that is the only way to get a snapshot.

Why should we use HashMap in multi-threaded environments?

Today I was reading about how HashMap works in Java. I came across a blog and I am quoting directly from the article of the blog. I have gone through this article on Stack Overflow. Still
I want to know the detail.
So the answer is Yes there is potential race condition exists while
resizing HashMap in Java, if two thread at the same time found that
now HashMap needs resizing and they both try to resizing. on the
process of resizing of HashMap in Java , the element in bucket which
is stored in linked list get reversed in order during there migration
to new bucket because java HashMap doesn't append the new element at
tail instead it append new element at head to avoid tail traversing.
If race condition happens then you will end up with an infinite loop.
It states that as HashMap is not thread-safe during resizing of the HashMap a potential race condition can occur. I have seen in our office projects even, people are extensively using HashMaps knowing they are not thread safe. If it is not thread safe, why should we use HashMap then? Is it just lack of knowledge among developers as they might not be aware about structures like ConcurrentHashMap or some other reason. Can anyone put a light on this puzzle.
I can confidently say ConcurrentHashMap is a pretty ignored class. Not many people know about it and not many people care to use it. The class offers a very robust and fast method of synchronizing a Map collection. I have read a few comparisons of HashMap and ConcurrentHashMap on the web. Let me just say that they’re totally wrong. There is no way you can compare the two, one offers synchronized methods to access a map while the other offers no synchronization whatsoever.
What most of us fail to notice is that while our applications, web applications especially, work fine during the development & testing phase, they usually go tilts up under heavy (or even moderately heavy) load. This is due to the fact that we expect our HashMap’s to behave a certain way but under load they usually misbehave. Hashtable’s offer concurrent access to their entries, with a small caveat, the entire map is locked to perform any sort of operation.
While this overhead is ignorable in a web application under normal load, under heavy load it can lead to delayed response times and overtaxing of your server for no good reason. This is where ConcurrentHashMap’s step in. They offer all the features of Hashtable with a performance almost as good as a HashMap. ConcurrentHashMap’s accomplish this by a very simple mechanism.
Instead of a map wide lock, the collection maintains a list of 16 locks by default, each of which is used to guard (or lock on) a single bucket of the map. This effectively means that 16 threads can modify the collection at a single time (as long as they’re all working on different buckets). Infact there is no operation performed by this collection that locks the entire map.
There are several aspects to this: First of all, most of the collections are not thread safe. If you want a thread safe collection you can call synchronizedCollection or synchronizedMap
But the main point is this: You want your threads to run in parallel, no synchronization at all - if possible of course. This is something you should strive for but of course cannot be achieved every time you deal with multithreading.
But there is no point in making the default collection/map thread safe, because it should be an edge case that a map is shared. Synchronization means more work for the jvm.
In a multithreaded environment, you have to ensure that it is not modified concurrently or you can reach a critical memory problem, because it is not synchronized in any way.
Dear just check Api previously I also thinking in same manner.
I thought that the solution was to use the static Collections.synchronizedMap method. I was expecting it to return a better implementation. But if you look at the source code you will realize that all they do in there is just a wrapper with a synchronized call on a mutex, which happens to be the same map, not allowing reads to occur concurrently.
In the Jakarta commons project, there is an implementation that is called FastHashMap. This implementation has a property called fast. If fast is true, then the reads are non-synchronized, and the writes will perform the following steps:
Clone the current structure
Perform the modification on the clone
Replace the existing structure with the modified clone
public class FastSynchronizedMap implements Map,
Serializable {
private final Map m;
private ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
.
.
.
public V get(Object key) {
lock.readLock().lock();
V value = null;
try {
value = m.get(key);
} finally {
lock.readLock().unlock();
}
return value;
}
public V put(K key, V value) {
lock.writeLock().lock();
V v = null;
try {
v = m.put(key, value);
} finally {
lock.writeLock().lock();
}
return v;
}
.
.
.
}
Note that we do a try finally block, we want to guarantee that the lock is released no matter what problem is encountered in the block.
This implementation works well when you have almost no write operations, and mostly read operations.
Hashmap can be used when a single thread has an access to it. However when multiple threads start accessing the Hashmap there will be 2 main problems:
1. resizing of hashmap is not gauranteed to work as expected.
2. Concurrent Modification exception would be thrown. This can also be thrown when its accessed by single thread to read and write onto the hashmap at the same time.
A workaround for using HashMap in multi-threaded environment is to initialize it with the expected number of objects' count, hence avoiding the need for a re-sizing.

Data structure for non-blocking aggregation of Thread values?

Background:
I have a large thread-pool in java each process has some internal state.
I would like to gather some global information about the states -- to do that I have an associative commutative aggregation function (e.g. sum -- mine needs to be plug-able though).
The solution needs to have a fixed memory consumption and be log-free in best case not disturbing the pool at all. So no thread should need to require a log (or enter a synchronized area) when writing to the data-structure. The aggregated value is only read after the threads are done, so I don't need an accurate value all the time. Simply collecting all values and aggregate them after the pool is done might lead to memory problems.
The values are going to be more complex datatypes so I cannot use AtomicInteger etc.
My general Idea for the solution:
Have a log-free collection where all threads put their updates to. I don't even need the order of the events.
If it gets to big run the aggregation function on it (compacting it) while the threads continue filling it.
My question:
Is there a data structure that allows for something like that or do I need to implement it from scratch? I couldn't find anything that directly matches my problem. If I have to implement from scratch what would be a good non-blocking collection class to start from?
If the updates are infrequent (relatively speaking) and the aggregation function is fast, I would recommend aggregrating every time:
State myState;
AtomicReference<State> combinedState;
do
{
State original = combinedState.get();
State newCombined = Aggregate(original, myState);
} while(!combinedState.compareAndSet(original, newCombined));
I don't quite understand the question but I would, at first sight, suggest an IdentityHashMap where keys are (references to) your thread objects and values are where your thread objects write their statistics.
An IdentityHashMap only relies on reference equality, as such there would never be any conflict between two thread objects; you could pass a reference to that map to each thread (which would then call .get(this) on the map to get a reference to the collecting data structure), which would then collect the data it wants. Otherwise you could just pass a reference to the collecting data structure to the thread object.
Such a map is inherently thread safe for your use case, as long as you create the key/value pair for that thread before starting the thread, and because no thread object will ever modify the map anyway since they won't have a referece to it. With some management smartness you can even remove entries from this map, even if the map is not even thread-safe, once the thread is done with its work.
When all is done, you have a map whose values contains all the data collected.
Hope this helps... Reading the question again, in any case...

ConcurrentHashMap and putAll() method

Normally (ie. not concurrently), putAll() cannot be more efficient than using lot's of calls to put(), even assuming that you exclude the cost of building the other Map that you pass to putAll(). That's because putAll() will need to iterate the passed Map's elements, as well as running through the algorithm for adding each key value pair to the Map that put() executes.
But for a ConcurrentHashMap, does it make sense to construct a regular Map and then use putAll() to update it? Or should I just do 10 (or 100, or 1000) calls to put()?
Does the answer change for multiple calls to putIfAbsent()?
Thanks!
The first (mostly) threadsafe Map in Java Collections was the synchronized HashMap using Collections.synchronizedMap(). It worked by only allowing one operation at a time. Java 5 added the ConcurrentHashMap, which works differently. Basically the Map is divided into slices. A put() operation will only lock the relevant slice. It also added the threadsafe primitives like putIfAbsent().
The reason I explain this is that putAll() might be more or less efficient depending on how it's implemented. It may work by locking the whole map, which may actually be more efficient than trying to obtain individual locks on every put(). Or it may work by doing a bunch of put() calls anyway, in which case there's not much difference.
So if it makes sense for your code and you're doing a lot of updates at once, I would use putAll() unless it's putIfAbsent() that you're after.
Edit: I just checked, the Java 6 ConcurrentHashMap implements putAll() as a loop of put() operations so it's no better or worse than doing this yourself.
putAll() just delegates to put(). There's no need to build up an intermediate map if you don't have it already. You can see this in the source code, and it doesn't matter what Java implementation you're using, since the code is public domain and shared by all of them.
Note that putAll() is not atomic, but merely has the guarantee that each individual put() is atomic.

Categories

Resources