Hashmap and hashtable in multithreaded environment - java

I am really confused on how these 2 collections behave in multithreaded environment.
Hash table is synchronized that means no 2 threads will be updating its value simultaneously right?

Look at ConcurrentHashMaps for Thread safe Maps.
They offer all the features of HashTable with a performance very close to a HashMap.
Performance is gained by instead of using a map wide lock, the collection maintains a list of 16 locks by default, each of which is used to lock a single bucket of the map. You can even configure the number of buckets :) Tweaking this can help performance depending on your data.
I can't recommend enough Java Concurrency in Practice by Brian Goetz
http://jcip.net/
I still learn something new every time I read it.

Exactly, HashTable is synchronized that mean that it's safe to use it in multi-thread environment (many thread access the same HashTable) If two thread try to update the hashtable at the sametime, one of them will have to wait that the other thread finish his update.
HashMap is not synchronized, so it's faster, but you can have problem in a multi-thread environment.

Also note that Hashtable and Collections.synchronizedMap are safe only for individual operations. Any operations involving multiple keys or check-then-act that need to be atomic will not be so and additional client side locking will be required.
For example, you cannot write any of the following methods without additional locking:
swap the values at two different keys: swapValues(Map, Object k1, Object k2)
append the parameter to value at a key: appendToValue(Map, Object k1, String suffix)
And yes, all of this is covered in JCIP :-)

Yes, all the methods are done atomically, but values() method not (see docs).
Paul was faster than me recommending you the java.util.concurrent package, which gives you a very fine control and data structures for multithreade environments.

Hashtables are synchronized but they're an old implementation that you could almost say is deprecated. Also, they don't allow null keys (maybe not null values either? not sure).
One problem is that although every method call is synchronized, most interesting actions require more than one call so you have to synchronize around the several calls.
A similar level of synchronization can be obtained for HashMaps by calling:
Map m = Collections.synchronizedMap(new HashMap());
which wraps a map in synchronized method calls. But this has the same concurrency drawbacks as Hashtable.
As Paul says, ConcurrentHashMaps provide thread safe maps with additional useful methods for atomic updates.

Related

Is it safe to replace all the occurrences of Hashtable with ConcurrentHashmap?

Our legacy multi-threaded application has a lots of usage of Hashtable. Is it safe to replace the Hashtable instances with ConcurrentHashmap instances for performance gain? Will there be any side effect?
Is it safe to replace the Hashtable instances with ConcurrentHashmap instances for performance gain?
In most cases it should be safe and yield better performance. The effort on changing depends on whether you used the Map interface or Hashtable directly.
Will there be any side effect?
There might be side effects if your application expects to immediately be able to access elements that were put into the map by another thread.
From the JavaDoc on ConcurrentHashMap:
Retrieval operations (including get) generally do not block, so may overlap
with update operations (including put and remove). Retrievals reflect the
results of the most recently completed update operations holding upon their onset.
Edit: to clarify on "immediately" consider thread 1 adds element A to the map and while that write is executed thread 2 tries to whether A exists in the map. With Hashtable thread 2 would be blocked until after the write so the check would return true but when using ConcurrentHashMap it would return false since thread 2 would not be blocked and the write operation is not yet completed (thus thread 2 would see an outdated version of the bucket).
Depending on the size of your Hashtable objects you might get some performance gains by switching to ConcurrentHashmap.
ConcurrentHashmap is broken into segments which allow for the table to be only partially locked. This means that you can get more accesses per second than a Hashtable, which requires that you lock the entire table.
The tables themselves are both thread safe and both implement the Map interface, so replacement should be relatively easy.

Update two ConcurrentHashMaps atomically

Map<String,Integer> m1 = new ConcurrentHashMap<>();
Map<String,Integer> m2 = new ConcurrentHashMap<>();
public void update(int i) {
m1.put("key", i);
m2.put("key", i);
}
In the dummy code above, the updates to m1 & m2 are not atomic. If I try to synchronize this block, findBugs complains "Synchronization performed on util.concurrent instance"
Is there a recommended way to do this other than not using concurrent collections and doing all synchronization explicitly.
As an aside I also don't know the exact implications of wrapping concurrent collections in explicit synchronization.
If the intention is that for a key both maps always retrieve the same value, the design of using two different concurrent maps is flawed. Even if you synchronize the write, reading threads can still access the two maps while youre writing. Thats probably what the FindBugs rule tries to catch.
There are two ways to go about this, either use explicit synchronization (that is two regular maps with synchronized reads and writes), or use just one concurrent map and put in a value object hold both ints.
A ConcurrentMap is not a synchronized map. The latter does indeed synchronizes on itself, so synchronizing on the Map will have an effect regarding other Map operations.
In contrast, a ConcurrentMap is designed to allow concurrent updates of different keys and does not synchronizes on itself, so if other code synchronizes on the Map instance, it has no effect. It wouldn’t help to use a different mutex; as long as other thread perform operations on the same maps without using this mutex, they won’t respect you intended atomicity. On the other hand, if all threads would use the mutex for all accesses to the maps, you wouldn’t need a ConcurrentMap anymore.
But as Jarrod Roberson noted, this is likely an X-Y Problem, as updating two maps atomically without having an atomic read of both maps, no thread would notice whether the update happened atomically or not. The question is what you actually want to achieve, either, you don’t really need the atomicity or you shouldn’t use ConcurrentMaps.

Is ConcurrentSkipListMap put method thread-safe?

Recently while exploring ConcurrentSkipListMap I went through its implementation and found that its put method is not thread-safe. It internally calls doPut which actually adds the item. But I found that this method does not use any kind of lock similar to ConcurrentHashMap.
Therefore, I want to know whether add is thread-safe or not. Looking at the method it seems that it is not thread-safe--that is if this method is executed by two threads simultaneously then a problem may occur.
I know ConcurrentSkipListMap internally uses a skiplist data structure but I was expecting add method to be thread safe. Am I understanding anything wrong ? Is ConcurrentSkipListMap really not thread-safe ?
Just because it doesn't use a Lock doesn't make it thread unsafe. The Skip list structure can be implemented lock free.
You should read the API carefully.
... Insertion, removal, update, and access operations safely execute concurrently by multiple threads. Iterators are weakly consistent, returning elements reflecting the state of the map at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations. ...
The comments in the implementation say:
Given the use of tree-like index nodes, you might wonder why this
doesn't use some kind of search tree instead, which would support
somewhat faster search operations. The reason is that there are no
known efficient lock-free insertion and deletion algorithms for search
trees. The immutability of the "down" links of index nodes (as opposed
to mutable "left" fields in true trees) makes this tractable using
only CAS operations.
So they use some low level programming features with compare-and-swap operations to make changes to the map atomic. With this they ensure thread safety without the need to synchronize access.
You can read it in more detail in the source code.
We should trust Java API. And this is what java.util.concurrent package docs says:
Concurrent Collections
Besides Queues, this package supplies Collection implementations designed for use in multithreaded contexts: ConcurrentHashMap, ConcurrentSkipListMap, ConcurrentSkipListSet, CopyOnWriteArrayList, and CopyOnWriteArraySet.

Does a Collection which is Thread Safe have to be Synchronized?

I wanted to use Collection for only single threaded environment and I am using a HashMap that is synchronized.
However, I still doubt if it is thread safe to have it synchronized or not.
If you're only using a single thread, you don't need a thread-safe collection - HashMap should be fine.
You should be very careful to work out your requirements:
If you're really using a single thread, stick with HashMap (or consider LinkedHashMap)
If you're sharing the map, you need to work out what kind of safety you want:
If the map is fully populated before it's used by multiple threads, which just read, then
HashMap is still fine.
Collections.synchronizedMap will only synchronize each individual operation; it still isn't
safe to iterate in one thread and modify the map in another thread without synchronization.
ConcurrentHashMap is a more "thoroughly" thread-safe approach, and one I'd generally prefer
over synchronizedMap. It allows for modification during iteration, but doesn't guarantee
where such modifications will be seen while iterating. Also note that while HashMap allows null
keys and values, ConcurrentHashMap doesn't.
For your needs, use ConcurrentHashMap. It allows concurrent modification of the Map from several threads without the need to block them. Collections.synchronizedMap(map) creates a blocking Map which will degrade performance, albeit ensure consistency
the standard java HashMap is not synchronized.
If you are in a single threaded environment you don't need to worry about synchronization.
The commonly used Collection classes, such as java.util.ArrayList, are not synchronized. However, if there's a chance that two threads could be altering a collection concurrently, you can generate a synchronized collection from it using the synchronizedCollection() method. Similar to the read-only methods, the thread-safe APIs let you generate a synchronized collection, list, set, or map. For instance, this is how you can generate a synchronized map from a HashMap:
Map map = Collections.synchronizedMap(new HashMap());
map.put(...
As its a single-threaded environment you can safely use HashMap.

Can concurrntHashMap guarantee true thread safety and concurrency at the same time?

We know that ConcurrentHashMap can provide concurrent access to multiple threads to boost performance , and inside this class, segments are synchronized up (am I right?). Question is, can this design guarantee the thread safety? Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
From my recollection that the book "Java Concurrency in Practice" says the ConcurrentHashMap provide concurrent reading and a decent level of concurrent writing. in the aforementioned scenario, and if my guess is correct, the performance won't be better than using the Collection's static synchonization wrapper api?
Thanks for clarifying,
John
You will still have to synchronize any access to the object being modified, and as you suspect all access to the same key will still have contention. The performance improvement comes in access to different keys, which is of course the more typical case.
All a ConcurrentMap can give you wrt to concurrency is that modifications to the map itself are done atomically, and that any writes happen-before any reads (this is important as it provides safe publishing of any reference from the map.
Safe-publishing means that any (mutable) object retrieved from the map will be seen with all writes to it before it was placed in the map. It won't help for publishing modifications that are made after retrieving it though.
However, concurrency and thread-safety is generally hard to reason about and make correct if you have mutable objects that are being modified by multiple parties. Usually you have to lock in order to get it right. A better approach is often to use immutable objects in conjunction with the ConcurrentMap conditional putIfAbsent/replace methods and linearize your algorithm that way. This lock-free style tends to be easier to reason about.
Question is, can this design guarantee the thread safety?
It guarantees the thread safety of the map; i.e. that access and updates on the map have a well defined and orderly behaviour in the presence of multiple threads performing updates simultaneously.
It does guarantee thread safety of the key or value objects. And it does not provide any form of higher level synchronization.
Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
If you have multiple threads trying to use the same key, then their operations will inevitably be serialized to some degree. That is unavoidable.
In fact, from briefly looking at the source code, it looks like ConcurrentHashMap falls back to using conventional locks if there is too much contention for a particular segment of the map. And if you have multiple threads trying to access AND update the same key simultaneously, that will trigger locking.
first remember that a thread safe tool doesn't guarantee thread safe usage of it in and of itself
the if(!map.contains(k))map.put(k,v); construct to putIfAbsent for example is not thread safe
and each value access/modification still has to be made thread safe independently
Reads are concurrent, even for the same key, so performance will be better for typical applications.

Categories

Resources