Why hashmap is not thread safe？ [duplicate]

Why hashmap is not thread safe？ [duplicate] - java

This question already has answers here:
How to prove that HashMap in java is not thread-safe
(12 answers)
Closed 4 years ago.
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.*;
public class TestLock {
private static ExecutorService executor = Executors.newCachedThreadPool();
private static Map<Integer, Integer> map = new HashMap<>(1000000);
private static CountDownLatch doneSignal = new CountDownLatch(1000);
public static void main(String[] args) throws Exception {
for (int i = 0; i < 1000; i++) {
final int j = i;
executor.execute(new Runnable() {
#Override
public void run() {
map.put(j, j);
doneSignal.countDown();
}
});
}
doneSignal.await();
System.out.println("done,size:" + map.size());
}
}
Some people say that hashmap insertion is not safe when concurrency. Because the hashmap will perform capacity expansion operations, but I set the size here to 1000000, which will only expand at 750,000. I am doing 1000 inserts here, so I won't expand it. So there should be no problem. But the result is always less than 1000, what went wrong?

Why hashmap is not thread safe？
Because the javadocs say so. See below.
You stated:
Some people say that hashmap insertion is not safe when concurrency.
It is not just "some people"1. The javadocs state this clearly:
"Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification."
You asked:
I am doing 1000 inserts here, so I won't expand it. So there should be no problem. But the result is always less than 1000, what went wrong?
It is not just expansion of the hash array that you need to think about. It is not just insertion. Any operation that performs a structural modification on the HashMap needs to be synchronized ... or you may get unspecified behavior2.
And that is what you got.
1 - I strongly recommend that you do not rely on your intuition or on what "some people" say. Instead, take the time to read and understand the relevant specifications; i.e. the javadocs and the Java Language Specification.
2 - In this example, you can easily see why you get unspecified behavior by reading the HashMap source code. For instance in the OpenJDK Java 11 source code, size() is not synchronized, and it returns the value of the private transient int size field. This is not thread-safe. When other threads add or remove map entries they will update size, the thread calling size() is liable to get a stale value.

"Because the hashmap will perform capacity expansion operations" is not only reason why HashMap is not thread safe.
You have to refer to Java Memory Model to understand what guarantee it can offer.
One of such guarantee is visibility. This mean that changes made in one thread may not be visible in other threads unless specific conditions are meet.

Well the Question title is not really describing what you are asking for. Anyways,
Here you have set the Capacity to 1000000. Not the size.
Capacity : Initially how many slots to have in this hashmap. Basically
Empty Slots.
Size : Number of Elements filled in the Map.
So even though you set the capacity to 1000000, you don't have that many elements at the end. So the number of elements filled in the map will be returned through .size() method. Its nothing related to a concurrent issue. And yes HashMap is not thread safe due to several reasons.

If you see the implementation of 'put' in HashMap class here, no where 'synchronize' is used although it does many thread-unsafe operations like creation of TreeNode if key's hash is not found, increment the modCount etc.
ConcurrentHashMap would be suitable for your use case

If you need a thread-safe HashMap, you may use the Hashtable class instead.
Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable, says the javadoc about Hashtable.
Same, if you need a thread-safe ArrayList one day, use Vector.
EDIT : Oh, I suggested the wrong way to do. My appologies !
comments suggest better solutions than mine :
Collections.synchronizedXxx() or ConcurrentHashMap, that worked for the opener of this question.

Related

Java Array elements & Memory visibility question

I have read some questions & answers on visibility of Java array elements from multiple threads, but I still can't really wrap my head around some cases. To demonstrate what I'm having trouble with, I have come up with a simple scenario: Assume that I have a simple collection that adds elements into one of its n buckets by hashing them into one (Bucket is like a list of some sort). And each bucket is separately synchronized. E.g. :
private final Object[] locks = new Object[10];
private final Bucket[] buckets = new Bucket[10];
Here a bucket i is supposed to be guarded by lock[i]. Here is how add elements code looks like:
public void add(Object element) {
int bucketNum = calculateBucket(element); //hashes element into a bucket
synchronized (locks[bucketNum]) {
buckets[bucketNum].add(element);
}
}
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
And finally, a bit trickier part. Assume I want to copy out and merge the contents of all buckets and empty the whole data structure, from an arbitrary thread, like this:
public List<Bucket> clear() {
List<Bucket> allBuckets = new List<>();
for(int bucketNum = 0; bucketNum < buckets.length; bucketNum++) {
synchronized (locks[bucketNum]) {
allBuckets.add(buckets[bucketNum]);
buckets[bucketNum] = new Bucket();
}
}
return allBuckets;
}
I basically swap the old bucket with a newly created one and return the old one. This case is different from the add() one because we are not modifying the object referred by the reference in the array but we are directly changing the array/reference.
Note that I do not care if bucket 2 is modified while I'm holding the lock for bucket 1, I don't need the structure to be fully synchronized and consistent, just visibility and near consistency is enough.
So assuming every bucket[i] is only ever modified under lock[i], would you say that this code works? I hope to be able to learn why and why nots and have a better grasp of visibility, thanks.

First question.
Thread safety in this case depends on whether the reference to the object containing locks and buckets (let's call it Container) is properly shared.
Just imagine: one thread is busy instantiating a new Container object (allocating memory, instantiating arrays, etc.), while another thread starts using this half-instantiated object where locks and buckets are still null (they have not been instantiated by the first thread yet). In this case this code:
synchronized (locks[bucketNum]) {
becomes broken and throws NullPointerException. The final keyword prevents this and guarantees that by the time the reference to Container is not null, its final fields have already been initialized:
An object is considered to be completely initialized when its
constructor finishes. A thread that can only see a reference to an
object after that object has been completely initialized is guaranteed
to see the correctly initialized values for that object's final
fields. (JLS 17.5)
Second question.
Assuming that locks and buckets fields are final and you don't care about consistency of the whole array and "every bucket[i] is only ever modified under lock[i]", this code is fine.

Just to add to Pavel's answer:
In your first question you ask
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
I'm not sure what you mean by 'visibility problems', but for sure, without the synchronized this code would be incorrect, if multiple threads would access buckets[i] with one of them modifying it (e.g. writing to it). There would be no guarantee that what one thread have written, becomes visible to another one. This also involves internal structures of the bucket which might be modified by the call to add.
Remember that final on buckets pertain only to the single reference to the array itself, not to its cells.

Java visibility: final static non-threadsafe collection changes after construction

I found the following code snippet in luaj and I started to doubt that if there is a possibility that changes made to the Map after it has been constructed might not be visible to other threads since there is no synchronization in place.
I know that since the Map is declared final, its initialized values after construction is visible to other threads, but what about changes that happen after that.
Some might also realize that this class is so not thread-safe that calling coerce in a multi-threaded environment might even cause infinite loop in the HashMap, but my question is not about that.
public class CoerceJavaToLua {
static final Map COERCIONS = new HashMap(); // this map is visible to all threads after construction, since its final
public static LuaValue coerce(Object paramObject) {
...;
if (localCoercion == null) {
localCoercion = ...;
COERCIONS.put(localClass, localCoercion); // visible?
}
return ...;
}
...
}

You're correct that changes to the Map may not be visible to other threads. Every method that accesses COERCIONS (both reading and writing) should be synchronized on the same object. Alternatively, if you never need sequences of accesses to be atomic, you could use a synchronized collection.
(BTW, why are you using raw types?)

This code is actually bad and may cause many problems (probably not infinite loop, that's more common with TreeMap, with HashMap it's more likely to get the silent data loss due to overwrite or probably some random exception). And you're right, it's not guaranteed that the changes made in one thread will be visible by another one.
Here the problem may look not very big as this Map is used for caching purposes, thus silent overwrites or visibility lag doesn't lead to real problems (just two distinct instances of coersion will be used for the same class, which is probably ok in this case). However it's still possible that such code will break your program. If you like, you can submit a patch to LuaJ team.

Two options:
// Synchronized (since Java 1.2)
static final Map COERCIONS = Collections.synchronizedMap(new HashMap());
// Concurrent (since Java 5)
static final Map COERCIONS = new ConcurrentHashMap();
They each have their pros and cons.
ConcurrentHashMap pro is no locking. Con is that operations are not atomic, e.g. an Iterator in one thread and a call to putAll in another will allow iterator to see some of the values added.

Java synchronized block vs concurrentHashMap vs Collections.synchronizedMap

Say If have a synchronized method and within that method, I update a hashmap like this:
public synchronized void method1()
{
myHashMap.clear();
//populate the hashmap, takes about 5 seconds.
}
now while the method1 is running and the hashmap is being re-populated, if there are other threads tring to get the value of the hashmap, I assume they will get blocked?
Now instead of using sync method, if I change hashmap to ConcurrentHashMap like below, what's the behaviour?
public void method1()
{
myConcurrentHashMap.clear();
//populate the hashmap, takes about 5 seconds.
}
what if i use Collections.synchronizedMap ? is it the same?

CHM(ConcurrentHashMap), instead of synchronizing every method on a common lock, restricting access to a single thread
at a time, it uses a finer-grained locking mechanism called lock striping to allow a greater degree of shared access. Arbitrarily many reading threads
can access the map concurrently, readers can access the map concurrently with
writers, and a limited number of writers can modify the map concurrently. The result
is far higher throughput under concurrent access, with little performance penalty for
single-threaded access.
ConcurrentHashMap, along with the other concurrent collections, further improve on
the synchronized collection classes by providing iterators that do not throw
ConcurrentModificationException, thus eliminating the need to lock the collection
during iteration.
As with all improvements, there are still a few tradeoffs. The semantics of methods
that operate on the entire Map, such as size and isEmpty, have been slightly
weakened to reflect the concurrent nature of the collection. Since the result of size
could be out of date by the time it is computed, it is really only an estimate, so size
is allowed to return an approximation instead of an exact count. While at first this
may seem disturbing, in reality methods like size and isEmpty are far less useful in
concurrent environments because these quantities are moving targets.
Secondly, Collections.synchronizedMap
It's just simple HashMap with synchronized methods - I'd call it deprecated dute to CHM

If you want to have all read and write actions to your HashMap synchronized, you need to put the synchronize on all methods accessing the HashMap; it is not enough to block just one method.
ConcurrentHashMap allows thread-safe access to your data without locking. That means you can add/remove values in one thread and at the same time get values out in another thread without running into an exception. See also the documentation of ConcurrentHashMap

you could probably do
volatile private HashMap map = newMap();
private HashMap newMap() {
HashMap map = new HashMap();
//populate the hashmap, takes about 5 seconds
return map;
}
public void updateMap() {
map = newMap();
}
A reader sees a constant map, so reads don't require synchronization, and are not blocked.

concurrent HashMap: checking size

Concurrent Hashmap could solve synchronization issue which is seen in hashmap. So adding and removing would be fast if we are using synchronize key work with hashmap. What about checking hashmap size, if mulitple threads checking concurrentHashMap size? do we still need synchronzation key word: something as follows:
public static synchronized getSize(){
return aConcurrentHashmap.size();
}

concurentHashMap.size() will return the size known at the moment of the call, but it might be a stale value when you use that number because another thread has added / removed items in the meantime.
However the whole purpose of ConcurrentMaps is that you don't need to synchronize it as it is a thread safe collection.

You can simply call aConcurrentHashmap.size(). However, you have to bear in mind that by the time you get the answer it might already be obsolete. This would happen if another thread where to concurrently modify the map.

You don't need to use synchronized with ConcurretnHashMap except in very rare occasions where you need to perform multiple operations atomically.
To just get the size, you can call it without synchronization.
To clarify when I would use synchronization with ConcurrentHashMap...
Say you have an expensive object you want to create on demand. You want concurrent reads, but also want to ensure that values are only created once.
public ExpensiveObject get(String key) {
return map.get(key); // can work concurrently.
}
public void put(String key, ExepensiveBuilder builder) {
// cannot use putIfAbsent because it needs the object before checking.
synchronized(map) {
if (!map.containsKey(key))
map.put(key, builder.create());
}
}
Note: This requires that all writes are synchronized, but reads can still be concurrent.

The designers of ConcurrentHashMap thought of giving weightage to individual operations like : get(), put() and remove() over methods which operate over complete HashMap like isEmpty() or size(). This is done because the changes of these methods getting called (in general) are less than the other individual methods.
A synchronization for size() is not needed here. We can get the size by calling concurentHashMap.size() method. This method may return stale values as other thread might modify the map in the meanwhile. But, this is explicitely assumed to be broken as these operations are deprioritized.

ConcorrentHashMap is fail-safe. it won't give any concurrent modification exceptions. it works good for multi threaded operations.
The whole implementation of ConcurrentHashMap is same as HashMap but the while retrieving the elements , HashMap locks whole map restricting doing further modifications which gives concurrent modification exception.'
But in ConcurrentHashMap, the locking happens at bucket level so the chance of giving concurrent modification exception is not present.
So to answer you question here, checking size of ConcurrentHashMap doesn't help because , it keeps chaining based on the operations or modification code that you write on the map. It has size method which is same from the HashMap.

Java HashMap race condition

I am trying to find out if there is going to be any race condition in this piece of code. If the key weren't 'Thread.currentThread' then I would think yes. But since the thread itself is key, how is it possible to have a race condition? No other thread can possibly update the same key in the HashMap!
public class SessionTracker {
static private final Map<Thread,Session> threadSessionMap = new HashMap<Thread,Session>();
static public Session get() {
return threadSessionMap.get(Thread.currentThread());
}
static public void set(Session s) {
threadSessionMap.put(Thread.currentThread(),s);
}
static public void reset() {
threadSessionMap.remove(Thread.currentThread());
}
}

The answer is yes, there are potential race conditions:
when resizing an HashMap by two threads at the same time
when collisions happens. Collision can happen when two elements map to the same cell even if they have a different hashcode. During the conflict resolution, there can be a race condition and one added key/value pair could be overwritten by another pair inserted by another thread.
To explain better what I mean on the second point, I was looking at the source code of HashMap in OpenJdk 7
389 int hash = hash(key.hashCode());
390 int i = indexFor(hash, table.length);
First it calculates an Hash of your key (combining two hash functions), then it maps to a cell with indexFor, then it checks if that cell contains the same key or is already occupied by another one. If it's the same key, it just overwrite the value and there is no problem here.
If it's occupied it looks at the next cell and then the next until it finds an empty position and call addEntry(), which could even decide to resize the array if the array is more loaded than a certain loadFactor.
Our table containing the entries is just a vector of Entry which holds key and value.
146 /**
147 * The table, resized as necessary. Length MUST Always be a power of two.
148 */
149 transient Entry[] table;
In a concurrent environment, all sort of evil things can happen, for instance one thread gets a collision for cell number 5 and looks for the next cell (6) and finds it empty.
Meanwhile another thread gets an index of 6 as a result of indexFor and both decide to use that cell at the same time, one of the two overwriting the other.

Without getting into specific details of the Hashmap implementations, I would say that there is still the possibility of an error, given the fact that the Hashmap class is not safe for concurrent access.
While I agree that there should be only 1 modification to a single key at a time, because you are using currentThread(), there is still the possibility that multiple threads will be modifying the Hashmap concurrently. Unless you look at the specific implementation, you should not assume that only concurrent access to the same key would cause a problem on the Hashmap, and that concurrent modification to different keys would not.
Imagine a case where two different keys generate to the same hash value, and its easy to see that there can still be errors with concurrent modification.

Yes, that is not a safe thing to do (as the other answers already point out). A better solution entirely might be to use a ThreadLocal which is a more natural way to keep thread local data than using a Map. It's got a couple of nice features including default values and that the values are removed when a thread terminates.

According to article written by Pierre Hugues, if you share hashmap between multiple threads, your process may hang and eat all your cpu resource due to infinite looping.

I agree with previous answers that your code is not thread safe and while using ConcurrentHashMap would solve your problem, this is the perfect use case for ThreadLocal.
A short introduction for ThreadLocal:
ThreadLocal will internally hold a different instance of a class for each thread that accesses the ThreadLocal, therefor solving any concurrency issues. Additionally (depending on situation this could be good/bad), the value stored in a ThreadLocal can only be accessed by the thread that populated that value in the first place. If it is the first time the current thread is accessing ThreadLocal, the value will be null.
Simple example of ThreadLocal that holds String values:
private static ThreadLocal<String> threadVar = new ThreadLocal<>();
public void foo() {
String myString = threadVar.get();
if (myString == null) {
threadVar.set("some new value");
myString = threadVar.get();
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.