Thread safe way to copy a map

Thread safe way to copy a map - java

Am using JDK 7, SQLite, and have Guava in my project.
I have a TreeMap with less than 100 entries that is being updated by a single "worker" thread hundreds of times a second. I am now writing a component (another thread - the "DB thread") that will write the map to my database every 5 or 10 seconds.
I know that I need to make a deep copy of the map so the DB thread will use a snapshot, while the worker thread continues its job. I am looking at the Guava Maps class which has many methods that make copies, but I am not sure if any of them meet my needs to synchronize on the map whenever a copy is needed. Is there a method there that will meet my needs, or should I write a synchronized block to make my own deep copy?

It depends on what you want:
If you want a fully concurrent map (cant read while adding and so on) You should use what JSlain said before me.
If all you want is the CURRENT snapshot of the map and you do not care if the map will be modified as long as the iterator you are using wont be changed.
Then use ConcurrentSkipListMap
This will provide each iteration with a new independent iterator so even if the real map is changed you wont notice it.
You will see it in the next update (5 seconds in your case.)

From TreeMap javadoc:
Note that this implementation is not synchronized. If multiple threads
access a map
concurrently, and at least one of the threads modifies the map structurally, it must be
synchronized externally. (A structural modification is any operation that adds or deletes one
or more mappings; merely changing the value associated with an existing key is not a
structural modification.) This is typically accomplished by synchronizing on some object that
naturally encapsulates the map. If no such object exists, the map should be "wrapped" using
the Collections.synchronizedSortedMap method. This is best done at creation time, to prevent
accidental unsynchronized access to the map:
SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));

Related

Is Hashmap's containsKey method threadsafe if the map is initialized once, and is never modified again

Can we use Hashmap's containsKey() method without synchronizing in an multi-threaded environment?
Note: Threads are only going to read the Hashmap. The map is initialized once, and is never modified again.

It really depends on how/when your map is accessed.
Assuming the map is initialized once, and never modified again, then methods that don't modify the internal state like containsKey() should be safe.
In this case though, you should make sure your map really is immutable, and is published safely.
Now if in your particular case the state does change during the course of your program, then no, it is not safe.
From the documentation:
Note that this implementation is not synchronized.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
In this case, you should use ConcurrentHashMap, or synchronize externally.

You shouldn't look at a single method this way. A HashMap is not meant to be used in a multi-threaded setup.
Having said that, the one exception would be: a map that gets created once (single threaded), and afterwards is "read" only. In other words: if a map doesn't get changed anymore, then you can have as many threads reading it as you want.
From that point of view, just containsKey() calls shouldn't call a problem. The problem arises when the information that this method relies on changes over time.

No, it is not thread-safe for any operations. You need to synchronise all access or use something like ConcurrentHashMap.
My favourite production system troubleshooting horror story is when we found that HashMap.get went into an infinite loop locking up 100% CPU forever because of missing synchronisation. This happened because the linked lists that were used within each bucket got into an inconsistent state. The same could happen with containsKey.
You should be safe if no one modifies the HashMap after it has been initially published, but better use an implementation that guarantees this explicitly (such as ImmutableMap or, again, a ConcurrentMap).

No. (No it is not. Not at all. 30 characters?)

It's complicated, but, mostly, no.
The spec of HashMap makes no guarantees whatsoever. It therefore reserves the right to blast yankee doodle dandy from your speakers if you try: You're just not supposed to use it that way.
... however, in practice, whilst the API of HashMap makes no guarantees, generally it works out. But, mind the horror story of #Thilo's answer.
... buuut, the Java Memory Model works like this: You should consider that each thread gets an individual copy of each and every field across the entire heap of the VM. These individual copies are then synced up at indeterminate times. That means that all sorts of code simply isn't going to work right; you add an entry to the map from one thread, and if you then access that map from another you won't see it even though a lot of time has passed – that's theoretically possible. Also, internally, map uses multiple fields and presumably these fields must be consistent with each other or you'll get weird behaviours (exceptions and wrong results). The JMM makes no guarantees about consistency either. The way out of this dilemma is that the JMM offers these things called 'comes-before/comes-after' relationships which give you guarantees that changes have been synced up. Using the 'synchronized' keyword is one easy way to get such relationships going.
Why not use a ConcurrentHashMap which has all the bells and whistles built in and does in fact guarantee that adding an entry from thread A and then querying it via containsKey from thread B will get you a consistent answer (which might still be 'no, that key is not in the map', because perhaps thread B got there slightly before thread A or slightly after but there's no way for you to know. It won't throw any exceptions or do something really bizarre such as returning 'false' for things you added ages ago all of a sudden).
So, whilst it's complicated, the answer is basically: Don't do that; either use a synchronized guard, or, probably the better choice: ConcurrentHashMap.

No, Read the bold part of HashMap documentation:
Note that this implementation is not synchronized.
So you should handle it:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
And suggested solutions:
This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method

#user7294900 is right.
If your application does not modifies the HashMap structurally which is build thread-safely and your application just invoke containsKey method, it's thread safe.
For instance, I've used HashMap like this:
#Component
public class SpringSingletonBean {
private Map<String, String> map = new HashMap<>();
public void doSomething() {
//
if (map.containsKey("aaaa")) {
//do something
}
}
#PostConstruct
public void init() {
// do something to initialize the map
}
}
It works well.

When using forEach on a ConcurrentHashMap, are updates reflected or does it behave like a failsafe Iterator?

If I start a forEach action on a ConcurrentHashMap, and other threads are still performing puts on this map, will I see new updates into other bins?
The reason for this is I am trying to find the most effective way to broadcast the contents of a ConcurrentHashMap to listeners, without causing contention to writers of new data to the map. But I want all listeners to receive the same snapshot of the Map when I notify the listeners.

It's not fail safe, but updates are not reflected in way that you think they are; so if you have already seen a bin before you will not get updates from that bin anymore.
If you know how spread works internally you can even cause an OOM(this was an excellent comment from a question I've answered from Holger, but I can't seem to find it right now... )
ConcurrentHashMap<Integer, Integer> chm = new ConcurrentHashMap<>(500_000_000);
chm.put(1, 1);
chm.forEach((key, value) -> chm.put(++value^(value>>>16), value));

The class-level API docs have this to say:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
(Emphasis added.) That does not explicitly address forEach(), but I would expect its behavior to be similar to that achievable via an Iterator over the map's entry set. That is, the forEach() iteration would reflect the map's contents as of some fixed point in time. Therefore, I do not think it at all safe to suppose that modifications to the map by other threads will be seen by the forEach(). I would in fact expect modifications by other threads generally not to be reflected in the forEach()'s behavior, though there is room in the spec to allow it seeing some modifications.

To provide a snapshot of a map you need to copy the map at a given point. If the iterator would simply be a snapshot, it would have to create a copy initially as well. As that costs extra memory and computation it won't do that, and then there are architectural reasons why that might not be desirable anyways.
Any non-null result returned from get(key) and related access methods bears a happens-before relation with the associated insertion or update. The result of any bulk operation reflects the composition of these per-element relations
In these lines it is (not so clearly) stated, that any changes that happened before any get (iterator or single call) are already included and reflected by that get operation. So a forEach bulk operation will work on the most recent state of the map at any given time.
You already gave the (only) solution to this problem in your question: create a local snap-shot using the copy constructor of the map prior to distribution. It is an extra memory overhead, but that is the only way to get a snapshot.

Concurrent read access on map structure without lock

I have a hashmap:
LinkedHashMap<Long, List<IOperation>> operations.
Which is written at by multiple threads. I use a lock around it.
synchronized (lock){...}
To make sure that only 1 thread writes on it at a given moment.
Neveretheless, in some occasions I need to operate some long read requests on it.
For this purpose I copy the map:
temp.putAll(operations);
or
= new LinkedHashMap<>(operations)
Is there a way to make such a copy with the following premises:
No need to lock the map to copy it.
No call by value on the members of the map and its copy.
Thanks already
Some additional details.
I execute quite often some series of long reads on it and performance is crucial.

It would most likely be easier for you when initializing your LinkedHashMap<Long, List<IOperation>> to wrap it in Collections#synchronizedMap so each atomic operation on the map will be synchronized, rather than having to do it yourself.
If you insist on copying the Map without having to lock it, I would just create another LinkedHashMap with the synchronized Map when beginning to read it.
Map<Long, List<IOperation>> copy = new LinkedHashMap<>(synchronizedMap);
You can also eliminate the need for the copy of the Map by simply wrapping the "long read requests" in a synchronized block using the Map as the lock.

Get and Put to Map at same time in Java

I have one doubt. What will happen if I get from map at same time when I am putting to map some data?
What I mean is if map.get() and map.put() are called by two separate processes at the same time. Will get() wait until put() has been executed?

It depends on which Map implementation you are using.
For example, ConcurrentHashMap supports full concurrency, and get() will not wait for put() to get executed, and stated in the Javadoc :
* <p> Retrieval operations (including <tt>get</tt>) generally do not
* block, so may overlap with update operations (including
* <tt>put</tt> and <tt>remove</tt>). Retrievals reflect the results
* of the most recently <em>completed</em> update operations holding
* upon their onset.
Other implementations (such as HashMap) don't support concurrency and shouldn't be used by multiple threads at the same time.

It might throw ConcurrentModificationException- not sure about it. It is always better to use synchronizedMap.This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method.This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map map = Collections.synchronizedMap(new HashMap(...));

Map is an interface, so the answer depends on the implementation you're using.
Generally speaking, the simpler implementations of this interface, such as HashMap and TreeMap are not thread safe. If you don't have some synchronization built around them, concurrently puting and geting will result in an undefined behavior - you may get the new value, you may get the old one, bust most probably you'd just get a ConcurrentModificationException, or something worse.
If you want to handle the same Map from different threads, either use one of the implementations of a ConcurrentMap (e.g., a ConcurrentHashMap), which guarantees a happens-before-sequence (i.e., if the get was fired before the put, you'd get the old value, even if the put is ongoing, and vise-versa), or synchronize the Map's access (e.g., by calling Collections#synchronizedMap(Map).

ConcurrentHashMap and its operations

Suppose there is a ConcurrentHashMap and there are two threads.
If both threads are reading some data from the same bucket, then my understanding says that both can read that bucket concurrently, as CHM does not block reading operations.
But suppose one thread is writing (put) to a bucket. Then, can a second thread simultaneously read (get) from the same bucket or will the second thread have to wait for the put operation to complete?
If it were Hashtable then get will have to wait until the put operation is complete. But in case of CHM how it will behave?

There is no need for speculation. The source code for ConcurrentHashMap is open, and anyone can read it. (This is JDK 8 build 128, the first JDK 8 release candidate.)
You should have no trouble understanding it, as it's only 6,300 lines long. :-) Actually, a good fraction of this is comments, and most of the code goes toward handling edge cases. The straightforward paths of get() and put() aren't terribly complicated and are only a few dozen lines of code.
Your understanding of read operations (get(), contains()) is correct; there is no blocking. Hashing to a bucket and searching within the bucket, if necessary, is straightforward, with no locking. Memory visibility is ensured by volatile reads. (At lines 622-623, the val and next fields of Node are volatile.) Read operations proceed concurrently with other reads and also with writes to the same bucket.
The policy for removing and replacing values is fairly straightforward in that the head of the bucket is locked while the bucket is being searched and modified. See the synchronized block at line 1117 of replaceNode. A put that adds to an existing bucket is similar; see the synchronized block at line 1027 of putVal. These operations will of course block other threads attempting to remove, replace, or add entries to this same bucket. If a value is in the midst of being replaced, a thread that is getting the value for this key will see either the old value or the new value, depending on whether the reading thread finds the node before or after the value is replaced by the writing thread.
There is a special case for putting the first element into a bucket. At lines 1018-1020, if putVal finds a bucket empty, it will create a new Node and CAS (compare-and-swap) it into place. If this succeeds, the operation is complete. If two threads are attempting to add nodes into the same bucket more-or-less simultaneously, the CAS for the first will succeed, and the CAS for the second will fail. But note that this code is within a for-loop (line 1014). The thread whose CAS has failed simply goes around the loop and retries. In fact, all the other write operations are within a loop. The general approach is that operations proceed optimistically but are checked for concurrent writers. If the optimistic attempt fails, the operation is retried and goes through a (possibly) different path based on the now updated state.

Hi as Per my knowledge ConcurrentHashMap allows multiple readers to read concurrently without any blocking. This is achieved by partitioning Map into different parts based on concurrency level and locking only a portion of Map during updates. Default concurrency level is 16, and accordingly Map is divided into 16 part and each part is governed with different lock. This means, 16 thread can operate on Map simultaneously, until they are operating on different part of Map. This makes ConcurrentHashMap high performance despite keeping thread-safety intact. Though, it comes with caveat. Since update operations like put(), remove(), putAll() or clear() is not synchronized, concurrent retrieval may not reflect most recent change on Map.
I hope this will help..

This is from the JavaDocs of ConcurrentHashMap class:
"Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset"

In Hastable concurrent operations will lock the whole collection, but in ConcurrentHashMap only one bucket will be locked.

From the doc:
A hash table supporting full concurrency of retrievals and adjustable
expected concurrency for updates. This class obeys the same functional
specification as Hashtable, and includes versions of methods
corresponding to each method of Hashtable. However, even though all
operations are thread-safe, retrieval operations do not entail
locking, and there is not any support for locking the entire table in
a way that prevents all access. This class is fully interoperable with
Hashtable in programs that rely on its thread safety but not on its
synchronization details.
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset. For aggregate operations such as putAll and
clear, concurrent retrievals may reflect insertion or removal of only
some entries. Similarly, Iterators and Enumerations return elements
reflecting the state of the hash table at some point at or since the
creation of the iterator/enumeration. They do not throw
ConcurrentModificationException. However, iterators are designed to be
used by only one thread at a time.
So, you shouldn't expect operations to synchronize exactly as a Hashtable, but the same (series of) operation are threadsafe. The second highlighted sentence does not imply, but in my opinion strongly suggest, what is going on here: a put in progress, i.e. not finished, will not block a get - the get will simply not see the changes yet.
Although I have not worked myself through the whole CHM class, this piece of documentation supports my hypothesis (taken from OpenJDK 6)
static final class Segment<K,V> extends ReentrantLock implements Serializable {
/*
* Segments maintain a table of entry lists that are always
* kept in a consistent state, so can be read (via volatile
* reads of segments and tables) without locking. This
* requires replicating nodes when necessary during table
* resizing, so the old lists can be traversed by readers
* still using old version of table.
When an update is "complete" doesn't seem to be explicitly defined; generally as soon as the new bucket is linked into the list of buckets, I guess. CHM also makes heavy use of volatile fields to ensure that threads read the most recent buckets in the list.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.