Synchronize HashMap in multithreaded environment only for put and remove - java

I have two thread that shares a common HashMap, one thread will always insert an objects to the Map and the second thread will remove objects from the HashMap.
my question is if this is the only logic of the two thread should I "protect" the Map with synchronize or ConcurrentHashMap, could I have a race condition?
if yes can you please explain what is the risk of not protecting the Map.
thanks

could I have a race condition
Yes, without synchronization or ConcurrentHashMap.
It is clearly stated in the Javadoc of HashMap:
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
You have two threads modifying the map structurally, so you need to synchronize if you use HashMap.
please explain what is the risk of not protecting the Map
Undefined behavior, ranging from doing completely the wrong thing (the very best kind of undefined behavior, because you know it needs fixing), down to seeming to work, until you change your JVM version and it mysteriously stops working (the very worst kind of undefined behavior).

Related

Is it safe to replace all the occurrences of Hashtable with ConcurrentHashmap?

Our legacy multi-threaded application has a lots of usage of Hashtable. Is it safe to replace the Hashtable instances with ConcurrentHashmap instances for performance gain? Will there be any side effect?
Is it safe to replace the Hashtable instances with ConcurrentHashmap instances for performance gain?
In most cases it should be safe and yield better performance. The effort on changing depends on whether you used the Map interface or Hashtable directly.
Will there be any side effect?
There might be side effects if your application expects to immediately be able to access elements that were put into the map by another thread.
From the JavaDoc on ConcurrentHashMap:
Retrieval operations (including get) generally do not block, so may overlap
with update operations (including put and remove). Retrievals reflect the
results of the most recently completed update operations holding upon their onset.
Edit: to clarify on "immediately" consider thread 1 adds element A to the map and while that write is executed thread 2 tries to whether A exists in the map. With Hashtable thread 2 would be blocked until after the write so the check would return true but when using ConcurrentHashMap it would return false since thread 2 would not be blocked and the write operation is not yet completed (thus thread 2 would see an outdated version of the bucket).
Depending on the size of your Hashtable objects you might get some performance gains by switching to ConcurrentHashmap.
ConcurrentHashmap is broken into segments which allow for the table to be only partially locked. This means that you can get more accesses per second than a Hashtable, which requires that you lock the entire table.
The tables themselves are both thread safe and both implement the Map interface, so replacement should be relatively easy.

ConcurrentHashMap operations are thread safe

Java Docs for the ConcurrentHashMap says,
even though all operations are thread-safe
What is the meaning when we say all operations of ConcurrentHashMap are thread safe?
EDIT:
what i mean to ask is that suppose there is put() operation. then according to above statement put() in CHM is thread safe. What does this mean?
From Wikipedia:
A piece of code is thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time.
To answer your expanded question, if multiple threads were to execute put() the effect would be that the last one to run would set the value for that key in the map. All of the puts would happen in some sequence, but they would not interfere with each other. How might they interfere without a concurrency guarantee? Well, put() returns null if no value had previously been associated with the mapping or the previous value. If two puts happened on a non-concurrent map they can both get the same return value from the put.
This sequence is possible without concurrency:
Thread1: map.put("key1", "value1") => null
then
Thread2: map.put("key2", "value2") => "value1"
Thread3: map.put("key3", "value3") => "value1"
If Thread3 got in just after Thread2, it might see "value1" rather than "value2", even though that's not what it replaces. This won't happen in a concurrent map.
What thread safety means is that you are permitted to share a ConcurrentHashMap object across multiple threads, and to access/modify that object concurrently without external locking.
Thread-safety means that an object can be used simultaneously by multiple threads while still operating correctly. In the specific case of ConcurrentHashMap, these characteristics are guaranteed:
Iterators produced by the map never throw ConcurrentModificationException, and they'll iterate in an order that's fixed when they're created. They may or may not reflect any modifications made while the map is being accessed. Ordinary HashMap iterators will throw exceptions if modified while a thread is iterating over them.
Insertion and removal operations are thread-safe. Ordinary HashMaps might get into an inconsistent internal state if multiple threads tried to insert or remove items simultaneously, especially if modifications required a rehash.
that if two threads will concurrently try to do operations on the ConcurrentHashMap you are guaranteed that the operations will not leave the data structure in an inconsistent state.
That's not something other non concurrent data structure guarantee.
It means that all the operations you do to add/delete objects into your hash map is thread safe, but retrieving is not thread safe. Means that when you added a object in a perfect thread safe environment, after that moment that object should be visible to all the thread who are retrieving object from this MAP. But this thing is not guaranteed here.

advantages of java's ConcurrentHashMap for a get-only map?

Consider these two situations:
a map which you are going to populate once at the beginning and then will be accessed from many different threads.
a map which you are going to use as cache that will be accessed from many different threads. you would like to avoid computing the result that will be stored in the map unless it is missing, the get-computation-store block will be synchronized. (and the map will not otherwise be used)
In either of these cases, does ConcurrentHashMap offer you anything additional in terms of thread safety above an ordinary HashMap?
In the first case, it should not matter in practice, but there is no guarantee that modifications written to a regular hashmap will ever be seen by other threads. So if one thread initially creates and populates the map, and that thread never synchronized with your other threads, then those threads may never see the initial values set into the map.
The above situation is unlikely in practice, and would only take a single synchronization event or happens before guarantee between the threads (read / write to a volatile variable for instance) to ensure even theoretical correctness.
In the second case, there is a concern since access to a HashMap that modifies it structurally (adding a value) requires synchronization. Furthermore, you need some type of synchronization to establish a happens-before relationship / shared visibility with the other threads or there is no guarantee that the other threads will see the new values you put in. ConcurrentHashMap offers these guarantees and will not break when one thread modifies it structurally.
There is no difference in thread safety, no. For scenario #2 there is a difference in performance and a small difference in timing guarantees.
There will be no synchronization for your scenario #2, so threads that want to use the cache don't have to queue up and wait for others to finish. However, in order to get that benefit you don't have hard happens-before relationships at the synchronization boundaries, so it's possible two threads will compute the same cached value more or less at the same time. This is generally harmless as long as the computation is repeatable.
(There is also the slight difference that ConcurrentHashMap does not allow null to be used as a key.)

Can concurrntHashMap guarantee true thread safety and concurrency at the same time?

We know that ConcurrentHashMap can provide concurrent access to multiple threads to boost performance , and inside this class, segments are synchronized up (am I right?). Question is, can this design guarantee the thread safety? Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
From my recollection that the book "Java Concurrency in Practice" says the ConcurrentHashMap provide concurrent reading and a decent level of concurrent writing. in the aforementioned scenario, and if my guess is correct, the performance won't be better than using the Collection's static synchonization wrapper api?
Thanks for clarifying,
John
You will still have to synchronize any access to the object being modified, and as you suspect all access to the same key will still have contention. The performance improvement comes in access to different keys, which is of course the more typical case.
All a ConcurrentMap can give you wrt to concurrency is that modifications to the map itself are done atomically, and that any writes happen-before any reads (this is important as it provides safe publishing of any reference from the map.
Safe-publishing means that any (mutable) object retrieved from the map will be seen with all writes to it before it was placed in the map. It won't help for publishing modifications that are made after retrieving it though.
However, concurrency and thread-safety is generally hard to reason about and make correct if you have mutable objects that are being modified by multiple parties. Usually you have to lock in order to get it right. A better approach is often to use immutable objects in conjunction with the ConcurrentMap conditional putIfAbsent/replace methods and linearize your algorithm that way. This lock-free style tends to be easier to reason about.
Question is, can this design guarantee the thread safety?
It guarantees the thread safety of the map; i.e. that access and updates on the map have a well defined and orderly behaviour in the presence of multiple threads performing updates simultaneously.
It does guarantee thread safety of the key or value objects. And it does not provide any form of higher level synchronization.
Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
If you have multiple threads trying to use the same key, then their operations will inevitably be serialized to some degree. That is unavoidable.
In fact, from briefly looking at the source code, it looks like ConcurrentHashMap falls back to using conventional locks if there is too much contention for a particular segment of the map. And if you have multiple threads trying to access AND update the same key simultaneously, that will trigger locking.
first remember that a thread safe tool doesn't guarantee thread safe usage of it in and of itself
the if(!map.contains(k))map.put(k,v); construct to putIfAbsent for example is not thread safe
and each value access/modification still has to be made thread safe independently
Reads are concurrent, even for the same key, so performance will be better for typical applications.

ConcurrentHashMap with ArrayList as value

I need to use a HashMap of the form <String, ArrayList<String>> that is going to be
accessed by several different threads. From what I've managed to understand, ConcurrentHashMap is the preferred method. But will there be any problem with the fact that the value of the map is an ArrayList? Do I have to define the value as a synchronized ArrayList or something like that?
yes, there can be a problem. The ConcurrentHashMap will be thread safe for accesses into the Map, but the Lists served out need to be thread-safe, if multiple threads can operate on the same List instances concurrently.
So use a thread-safe list if that is true.
Edit -- now that i think about it, the rabbit-hole goes further. You have your Map, you have your List, and you have the objects in the list. Anything multiple threads can modify should be thread safe. So if many threads can modify the Map, Lists, and Objects in the Lists, then all of those should have thread-safety guards. If only the Map and List instances can be modified concurrently, only they need thread safety. If multiple threads can read everything, but not modify, then you don't need any thread safety (I think, someone will correct me if this is wrong)
ConcurrentHashMap guarantees atomicity in its mutating methods e.g. putIfAbsent, computeIfAbsent, computeIfPresent so if all the modification is done through these methods no problems will arise.
But at the same time, multiple threads can read the same map entry (concurrent reads are allowed). therefore multiple threads can access and modify an unsafe collection (map value)

Categories

Resources