I have one doubt. What will happen if I get from map at same time when I am putting to map some data?
What I mean is if map.get() and map.put() are called by two separate processes at the same time. Will get() wait until put() has been executed?
It depends on which Map implementation you are using.
For example, ConcurrentHashMap supports full concurrency, and get() will not wait for put() to get executed, and stated in the Javadoc :
* <p> Retrieval operations (including <tt>get</tt>) generally do not
* block, so may overlap with update operations (including
* <tt>put</tt> and <tt>remove</tt>). Retrievals reflect the results
* of the most recently <em>completed</em> update operations holding
* upon their onset.
Other implementations (such as HashMap) don't support concurrency and shouldn't be used by multiple threads at the same time.
It might throw ConcurrentModificationException- not sure about it. It is always better to use synchronizedMap.This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method.This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map map = Collections.synchronizedMap(new HashMap(...));
Map is an interface, so the answer depends on the implementation you're using.
Generally speaking, the simpler implementations of this interface, such as HashMap and TreeMap are not thread safe. If you don't have some synchronization built around them, concurrently puting and geting will result in an undefined behavior - you may get the new value, you may get the old one, bust most probably you'd just get a ConcurrentModificationException, or something worse.
If you want to handle the same Map from different threads, either use one of the implementations of a ConcurrentMap (e.g., a ConcurrentHashMap), which guarantees a happens-before-sequence (i.e., if the get was fired before the put, you'd get the old value, even if the put is ongoing, and vise-versa), or synchronize the Map's access (e.g., by calling Collections#synchronizedMap(Map).
Related
Task is to keep track of some running processes. Keeping that information in memory is just fine, so I'm using a concurrent hash map to store that data:
ConcurrentHashMap<String, ProcessMetaData> RUNNING_PROCESSES = new ConcurrentHashMap();
It's all good and fine with safely putting new objects in the map, problem is that state of those processes change so I have to update ProcessMetaData from time to time. I made ProcessMetaData immutable and use ConcurrentHashMap's compute() method to update values, but now the problem is ProcessMetaData gets more complicated and keeping it immutable gets hardly manageable. The question is - as long as I only update ProcessMetaData in atomic (as per javadoc) compute() method - the object may be mutable and overall things will still be thread-safe? Is my assumption correct?
As long as you only access the value within the function passed to compute, modifications made in that function are safe.
This, however, is a pointless theoretical view. The purpose of storing values into a collection or map, is to eventually retrieve and use them. And this is where the problems start.
The compute method returns the result value just like get returns the currently stored value. Once a caller starts using that value, this use may be concurrent to subsequent compute operations on the map. The get method may even retrieve the value while a compute operation is in progress. Allowing non-blocking retrieval operation is one of ConcurrentHashMap’s main features. Therefore, all kind of race conditions may occur.
So, using a mutable object and modifying an already stored value in compute is only safe, when you use the map as write-only memory, which is a far-fetched scenario. It might work when you use a different thread safe mechanism to ensure that all updates have been completed before starting to read the map, but your use case seems to be different.
Can we use Hashmap's containsKey() method without synchronizing in an multi-threaded environment?
Note: Threads are only going to read the Hashmap. The map is initialized once, and is never modified again.
It really depends on how/when your map is accessed.
Assuming the map is initialized once, and never modified again, then methods that don't modify the internal state like containsKey() should be safe.
In this case though, you should make sure your map really is immutable, and is published safely.
Now if in your particular case the state does change during the course of your program, then no, it is not safe.
From the documentation:
Note that this implementation is not synchronized.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
In this case, you should use ConcurrentHashMap, or synchronize externally.
You shouldn't look at a single method this way. A HashMap is not meant to be used in a multi-threaded setup.
Having said that, the one exception would be: a map that gets created once (single threaded), and afterwards is "read" only. In other words: if a map doesn't get changed anymore, then you can have as many threads reading it as you want.
From that point of view, just containsKey() calls shouldn't call a problem. The problem arises when the information that this method relies on changes over time.
No, it is not thread-safe for any operations. You need to synchronise all access or use something like ConcurrentHashMap.
My favourite production system troubleshooting horror story is when we found that HashMap.get went into an infinite loop locking up 100% CPU forever because of missing synchronisation. This happened because the linked lists that were used within each bucket got into an inconsistent state. The same could happen with containsKey.
You should be safe if no one modifies the HashMap after it has been initially published, but better use an implementation that guarantees this explicitly (such as ImmutableMap or, again, a ConcurrentMap).
No. (No it is not. Not at all. 30 characters?)
It's complicated, but, mostly, no.
The spec of HashMap makes no guarantees whatsoever. It therefore reserves the right to blast yankee doodle dandy from your speakers if you try: You're just not supposed to use it that way.
... however, in practice, whilst the API of HashMap makes no guarantees, generally it works out. But, mind the horror story of #Thilo's answer.
... buuut, the Java Memory Model works like this: You should consider that each thread gets an individual copy of each and every field across the entire heap of the VM. These individual copies are then synced up at indeterminate times. That means that all sorts of code simply isn't going to work right; you add an entry to the map from one thread, and if you then access that map from another you won't see it even though a lot of time has passed – that's theoretically possible. Also, internally, map uses multiple fields and presumably these fields must be consistent with each other or you'll get weird behaviours (exceptions and wrong results). The JMM makes no guarantees about consistency either. The way out of this dilemma is that the JMM offers these things called 'comes-before/comes-after' relationships which give you guarantees that changes have been synced up. Using the 'synchronized' keyword is one easy way to get such relationships going.
Why not use a ConcurrentHashMap which has all the bells and whistles built in and does in fact guarantee that adding an entry from thread A and then querying it via containsKey from thread B will get you a consistent answer (which might still be 'no, that key is not in the map', because perhaps thread B got there slightly before thread A or slightly after but there's no way for you to know. It won't throw any exceptions or do something really bizarre such as returning 'false' for things you added ages ago all of a sudden).
So, whilst it's complicated, the answer is basically: Don't do that; either use a synchronized guard, or, probably the better choice: ConcurrentHashMap.
No, Read the bold part of HashMap documentation:
Note that this implementation is not synchronized.
So you should handle it:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
And suggested solutions:
This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method
#user7294900 is right.
If your application does not modifies the HashMap structurally which is build thread-safely and your application just invoke containsKey method, it's thread safe.
For instance, I've used HashMap like this:
#Component
public class SpringSingletonBean {
private Map<String, String> map = new HashMap<>();
public void doSomething() {
//
if (map.containsKey("aaaa")) {
//do something
}
}
#PostConstruct
public void init() {
// do something to initialize the map
}
}
It works well.
Am using JDK 7, SQLite, and have Guava in my project.
I have a TreeMap with less than 100 entries that is being updated by a single "worker" thread hundreds of times a second. I am now writing a component (another thread - the "DB thread") that will write the map to my database every 5 or 10 seconds.
I know that I need to make a deep copy of the map so the DB thread will use a snapshot, while the worker thread continues its job. I am looking at the Guava Maps class which has many methods that make copies, but I am not sure if any of them meet my needs to synchronize on the map whenever a copy is needed. Is there a method there that will meet my needs, or should I write a synchronized block to make my own deep copy?
It depends on what you want:
If you want a fully concurrent map (cant read while adding and so on) You should use what JSlain said before me.
If all you want is the CURRENT snapshot of the map and you do not care if the map will be modified as long as the iterator you are using wont be changed.
Then use ConcurrentSkipListMap
This will provide each iteration with a new independent iterator so even if the real map is changed you wont notice it.
You will see it in the next update (5 seconds in your case.)
From TreeMap javadoc:
Note that this implementation is not synchronized. If multiple threads
access a map
concurrently, and at least one of the threads modifies the map structurally, it must be
synchronized externally. (A structural modification is any operation that adds or deletes one
or more mappings; merely changing the value associated with an existing key is not a
structural modification.) This is typically accomplished by synchronizing on some object that
naturally encapsulates the map. If no such object exists, the map should be "wrapped" using
the Collections.synchronizedSortedMap method. This is best done at creation time, to prevent
accidental unsynchronized access to the map:
SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));
The ConcurrentHashMap provides thread-safe but the docs state:
" However, even though all operations are thread-safe, retrieval operations do not entail locking"
So from this I understand that getting or setting a key and value are thread-safe, but modifying the actual VALUE of any given key isn't (by value I actaully mean the value or state of that object).
I'm just confused on how this works, at the moment I think things work like this.
The ConcurrentHashMap only gaurantees the key's are thread-safe in terms setting/getting them. But the object you put inside the map has to gaurd for concurrency by itself.
Is this correct?
But the object you put inside the map has to gaurd for concurrency by itself.
Your understanding is correct.
From the documentation:
However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.
What the above is also saying is that there is no built-in mechanism for automatic locking of the hash map while the reading takes place. In particular, this means that get() operations can overlap with concurrent modifications performed by other threads.
The document goes on to explain the concurrency semantics:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
What you say is true by default -- there would be no way for the map to enforce the thread safety of either its keys or its values since these are objects that come from outside. What you read about the retrieval of objects, however, has nothing to do with that fact. The map doesn't block while retrieving a value so another update may be happening at the same time (these operation can overlap).
The basic idea of the ConcurrentHashMap is that only modifications use a lock, while retrieval-only operations don't. This is possible because the entire data structure and the operations on it are defined in a way that allows get() to only ever see a "consistent enough" state of the map to do its work. If there's currently an insert operation in progress, then get() either sees the result or it doesn't, but it won't ever see a partial result or even temporarily invalid data.
Normally (ie. not concurrently), putAll() cannot be more efficient than using lot's of calls to put(), even assuming that you exclude the cost of building the other Map that you pass to putAll(). That's because putAll() will need to iterate the passed Map's elements, as well as running through the algorithm for adding each key value pair to the Map that put() executes.
But for a ConcurrentHashMap, does it make sense to construct a regular Map and then use putAll() to update it? Or should I just do 10 (or 100, or 1000) calls to put()?
Does the answer change for multiple calls to putIfAbsent()?
Thanks!
The first (mostly) threadsafe Map in Java Collections was the synchronized HashMap using Collections.synchronizedMap(). It worked by only allowing one operation at a time. Java 5 added the ConcurrentHashMap, which works differently. Basically the Map is divided into slices. A put() operation will only lock the relevant slice. It also added the threadsafe primitives like putIfAbsent().
The reason I explain this is that putAll() might be more or less efficient depending on how it's implemented. It may work by locking the whole map, which may actually be more efficient than trying to obtain individual locks on every put(). Or it may work by doing a bunch of put() calls anyway, in which case there's not much difference.
So if it makes sense for your code and you're doing a lot of updates at once, I would use putAll() unless it's putIfAbsent() that you're after.
Edit: I just checked, the Java 6 ConcurrentHashMap implements putAll() as a loop of put() operations so it's no better or worse than doing this yourself.
putAll() just delegates to put(). There's no need to build up an intermediate map if you don't have it already. You can see this in the source code, and it doesn't matter what Java implementation you're using, since the code is public domain and shared by all of them.
Note that putAll() is not atomic, but merely has the guarantee that each individual put() is atomic.