Java HashMap race condition

Java HashMap race condition - java

I am trying to find out if there is going to be any race condition in this piece of code. If the key weren't 'Thread.currentThread' then I would think yes. But since the thread itself is key, how is it possible to have a race condition? No other thread can possibly update the same key in the HashMap!
public class SessionTracker {
static private final Map<Thread,Session> threadSessionMap = new HashMap<Thread,Session>();
static public Session get() {
return threadSessionMap.get(Thread.currentThread());
}
static public void set(Session s) {
threadSessionMap.put(Thread.currentThread(),s);
}
static public void reset() {
threadSessionMap.remove(Thread.currentThread());
}
}

The answer is yes, there are potential race conditions:
when resizing an HashMap by two threads at the same time
when collisions happens. Collision can happen when two elements map to the same cell even if they have a different hashcode. During the conflict resolution, there can be a race condition and one added key/value pair could be overwritten by another pair inserted by another thread.
To explain better what I mean on the second point, I was looking at the source code of HashMap in OpenJdk 7
389 int hash = hash(key.hashCode());
390 int i = indexFor(hash, table.length);
First it calculates an Hash of your key (combining two hash functions), then it maps to a cell with indexFor, then it checks if that cell contains the same key or is already occupied by another one. If it's the same key, it just overwrite the value and there is no problem here.
If it's occupied it looks at the next cell and then the next until it finds an empty position and call addEntry(), which could even decide to resize the array if the array is more loaded than a certain loadFactor.
Our table containing the entries is just a vector of Entry which holds key and value.
146 /**
147 * The table, resized as necessary. Length MUST Always be a power of two.
148 */
149 transient Entry[] table;
In a concurrent environment, all sort of evil things can happen, for instance one thread gets a collision for cell number 5 and looks for the next cell (6) and finds it empty.
Meanwhile another thread gets an index of 6 as a result of indexFor and both decide to use that cell at the same time, one of the two overwriting the other.

Without getting into specific details of the Hashmap implementations, I would say that there is still the possibility of an error, given the fact that the Hashmap class is not safe for concurrent access.
While I agree that there should be only 1 modification to a single key at a time, because you are using currentThread(), there is still the possibility that multiple threads will be modifying the Hashmap concurrently. Unless you look at the specific implementation, you should not assume that only concurrent access to the same key would cause a problem on the Hashmap, and that concurrent modification to different keys would not.
Imagine a case where two different keys generate to the same hash value, and its easy to see that there can still be errors with concurrent modification.

Yes, that is not a safe thing to do (as the other answers already point out). A better solution entirely might be to use a ThreadLocal which is a more natural way to keep thread local data than using a Map. It's got a couple of nice features including default values and that the values are removed when a thread terminates.

According to article written by Pierre Hugues, if you share hashmap between multiple threads, your process may hang and eat all your cpu resource due to infinite looping.

I agree with previous answers that your code is not thread safe and while using ConcurrentHashMap would solve your problem, this is the perfect use case for ThreadLocal.
A short introduction for ThreadLocal:
ThreadLocal will internally hold a different instance of a class for each thread that accesses the ThreadLocal, therefor solving any concurrency issues. Additionally (depending on situation this could be good/bad), the value stored in a ThreadLocal can only be accessed by the thread that populated that value in the first place. If it is the first time the current thread is accessing ThreadLocal, the value will be null.
Simple example of ThreadLocal that holds String values:
private static ThreadLocal<String> threadVar = new ThreadLocal<>();
public void foo() {
String myString = threadVar.get();
if (myString == null) {
threadVar.set("some new value");
myString = threadVar.get();
}
}

Related

Using Threads and Flyweight pattern together in Java?

I'm new to both multi-threading and using design patterns.
I've some threads using explicit multi-threading and each is suppose to compute the factorial of a number if it hasn't been computed ever by any thread. I'm using Flyweight Pattern for this.
private final long Comp;
private static Map<String, Fact> instances=new HashMap<String, Fact>();
private Fact(long comp) {
Comp=comp;
}
public static Fact getInstance(int num){
String key=String.valueOf(num);
if(!instances.containsKey(key)){
int comp=//calculate factorial of num
instances.put(key, new Fact(comp));
}
return instances.get(key);
}
public long get_Comp(){
return this.Comp;
}
}
public class Th implements Runnable {
// code elited
#Override
public void run() {
//get number and check if it's already in the HashMap, if no,
compute
}
}
If I do so then is it right to say that my Threads Th are computing Factorials?
If I add the computation in Fact (Flyweight) class then does it remain Flyweight, I guess yes.
Any other way of doing what I wish would be highly appreciated as well.

There's a couple of aims you might have here. What to do is dependent on what you are trying to do.
So it seems in this case you are attempting to avoid repeated computation, but that computation is not particularly expensive. You could run into a problem of lock contention. Therefore, to make it thread safe use ThreadLocal<Map<String, Fact>>. Potentially InheritableThreadLocal<Map<String, Fact>> where childValue copies the Map.
Often there are a known set of values that are likely to be common, and you just want these. In that case, compute a Map (or array) during class static initialisation.
If you want the flyweights to be shared between thread and be unique, use ConcurrentHashMap with together with the Map.computeIfAbsent method.
If you want the flyweights to be shared between thread, be unique and you want to make sure you only do the computation once, it gets a bit more difficult. You need to put (if absent) a placeholder into the ConcurrentMap; if the current thread wins replace that with the computed value and notify, otherwise wait for the computation.
Now if you want the flyweights to be garbage collected, you would want WeakHashMap. This cannot be a ConcurrentMap using the Java SE collections which makes it a bit hopeless. You can use good old fashioned locking. Alternatively the value can be a WeakReference<Fact>, but you'll need to manage eviction yourself.
It may be that a strong reference to Fact is only kept intermittently but you don't want it to be recreated too often, in which case you will need SoftReference instead of WeakReference. Indeed WeakHashMap can behave surprisingly, in some circumstances causing performance to drop to unusable after previously working fine.
(Note, in this case your Map would be better keyed on Integer.)

Java Array elements & Memory visibility question

I have read some questions & answers on visibility of Java array elements from multiple threads, but I still can't really wrap my head around some cases. To demonstrate what I'm having trouble with, I have come up with a simple scenario: Assume that I have a simple collection that adds elements into one of its n buckets by hashing them into one (Bucket is like a list of some sort). And each bucket is separately synchronized. E.g. :
private final Object[] locks = new Object[10];
private final Bucket[] buckets = new Bucket[10];
Here a bucket i is supposed to be guarded by lock[i]. Here is how add elements code looks like:
public void add(Object element) {
int bucketNum = calculateBucket(element); //hashes element into a bucket
synchronized (locks[bucketNum]) {
buckets[bucketNum].add(element);
}
}
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
And finally, a bit trickier part. Assume I want to copy out and merge the contents of all buckets and empty the whole data structure, from an arbitrary thread, like this:
public List<Bucket> clear() {
List<Bucket> allBuckets = new List<>();
for(int bucketNum = 0; bucketNum < buckets.length; bucketNum++) {
synchronized (locks[bucketNum]) {
allBuckets.add(buckets[bucketNum]);
buckets[bucketNum] = new Bucket();
}
}
return allBuckets;
}
I basically swap the old bucket with a newly created one and return the old one. This case is different from the add() one because we are not modifying the object referred by the reference in the array but we are directly changing the array/reference.
Note that I do not care if bucket 2 is modified while I'm holding the lock for bucket 1, I don't need the structure to be fully synchronized and consistent, just visibility and near consistency is enough.
So assuming every bucket[i] is only ever modified under lock[i], would you say that this code works? I hope to be able to learn why and why nots and have a better grasp of visibility, thanks.

First question.
Thread safety in this case depends on whether the reference to the object containing locks and buckets (let's call it Container) is properly shared.
Just imagine: one thread is busy instantiating a new Container object (allocating memory, instantiating arrays, etc.), while another thread starts using this half-instantiated object where locks and buckets are still null (they have not been instantiated by the first thread yet). In this case this code:
synchronized (locks[bucketNum]) {
becomes broken and throws NullPointerException. The final keyword prevents this and guarantees that by the time the reference to Container is not null, its final fields have already been initialized:
An object is considered to be completely initialized when its
constructor finishes. A thread that can only see a reference to an
object after that object has been completely initialized is guaranteed
to see the correctly initialized values for that object's final
fields. (JLS 17.5)
Second question.
Assuming that locks and buckets fields are final and you don't care about consistency of the whole array and "every bucket[i] is only ever modified under lock[i]", this code is fine.

Just to add to Pavel's answer:
In your first question you ask
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
I'm not sure what you mean by 'visibility problems', but for sure, without the synchronized this code would be incorrect, if multiple threads would access buckets[i] with one of them modifying it (e.g. writing to it). There would be no guarantee that what one thread have written, becomes visible to another one. This also involves internal structures of the bucket which might be modified by the call to add.
Remember that final on buckets pertain only to the single reference to the array itself, not to its cells.

Why hashmap is not thread safe？ [duplicate]

This question already has answers here:
How to prove that HashMap in java is not thread-safe
(12 answers)
Closed 4 years ago.
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.*;
public class TestLock {
private static ExecutorService executor = Executors.newCachedThreadPool();
private static Map<Integer, Integer> map = new HashMap<>(1000000);
private static CountDownLatch doneSignal = new CountDownLatch(1000);
public static void main(String[] args) throws Exception {
for (int i = 0; i < 1000; i++) {
final int j = i;
executor.execute(new Runnable() {
#Override
public void run() {
map.put(j, j);
doneSignal.countDown();
}
});
}
doneSignal.await();
System.out.println("done,size:" + map.size());
}
}
Some people say that hashmap insertion is not safe when concurrency. Because the hashmap will perform capacity expansion operations, but I set the size here to 1000000, which will only expand at 750,000. I am doing 1000 inserts here, so I won't expand it. So there should be no problem. But the result is always less than 1000, what went wrong?

Why hashmap is not thread safe？
Because the javadocs say so. See below.
You stated:
Some people say that hashmap insertion is not safe when concurrency.
It is not just "some people"1. The javadocs state this clearly:
"Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification."
You asked:
I am doing 1000 inserts here, so I won't expand it. So there should be no problem. But the result is always less than 1000, what went wrong?
It is not just expansion of the hash array that you need to think about. It is not just insertion. Any operation that performs a structural modification on the HashMap needs to be synchronized ... or you may get unspecified behavior2.
And that is what you got.
1 - I strongly recommend that you do not rely on your intuition or on what "some people" say. Instead, take the time to read and understand the relevant specifications; i.e. the javadocs and the Java Language Specification.
2 - In this example, you can easily see why you get unspecified behavior by reading the HashMap source code. For instance in the OpenJDK Java 11 source code, size() is not synchronized, and it returns the value of the private transient int size field. This is not thread-safe. When other threads add or remove map entries they will update size, the thread calling size() is liable to get a stale value.

"Because the hashmap will perform capacity expansion operations" is not only reason why HashMap is not thread safe.
You have to refer to Java Memory Model to understand what guarantee it can offer.
One of such guarantee is visibility. This mean that changes made in one thread may not be visible in other threads unless specific conditions are meet.

Well the Question title is not really describing what you are asking for. Anyways,
Here you have set the Capacity to 1000000. Not the size.
Capacity : Initially how many slots to have in this hashmap. Basically
Empty Slots.
Size : Number of Elements filled in the Map.
So even though you set the capacity to 1000000, you don't have that many elements at the end. So the number of elements filled in the map will be returned through .size() method. Its nothing related to a concurrent issue. And yes HashMap is not thread safe due to several reasons.

If you see the implementation of 'put' in HashMap class here, no where 'synchronize' is used although it does many thread-unsafe operations like creation of TreeNode if key's hash is not found, increment the modCount etc.
ConcurrentHashMap would be suitable for your use case

If you need a thread-safe HashMap, you may use the Hashtable class instead.
Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable, says the javadoc about Hashtable.
Same, if you need a thread-safe ArrayList one day, use Vector.
EDIT : Oh, I suggested the wrong way to do. My appologies !
comments suggest better solutions than mine :
Collections.synchronizedXxx() or ConcurrentHashMap, that worked for the opener of this question.

Does double-checked locking work with a final Map in Java?

I'm trying to implement a thread-safe Map cache, and I want the cached Strings to be lazily initialized. Here's my first pass at an implementation:
public class ExampleClass {
private static final Map<String, String> CACHED_STRINGS = new HashMap<String, String>();
public String getText(String key) {
String string = CACHED_STRINGS.get(key);
if (string == null) {
synchronized (CACHED_STRINGS) {
string = CACHED_STRINGS.get(key);
if (string == null) {
string = createString();
CACHED_STRINGS.put(key, string);
}
}
}
return string;
}
}
After writing this code, Netbeans warned me about "double-checked locking," so I started researching it. I found The "Double-Checked Locking is Broken" Declaration and read it, but I'm unsure if my implementation falls prey to the issues it mentioned. It seems like all the issues mentioned in the article are related to object instantiation with the new operator within the synchronized block. I'm not using the new operator, and Strings are immutable, so I'm not sure that if the article is relevant to this situation or not. Is this a thread-safe way to cache strings in a HashMap? Does the thread-safety depend on what action is taken in the createString() method?

No it's not correct because the first access is done out side of a sync block.
It's somewhat down to how get and put might be implemented. You must bare in mind that they are not atomic operations.
For example, what if they were implemented like this:
public T get(string key){
Entry e = findEntry(key);
return e.value;
}
public void put(string key, string value){
Entry e = addNewEntry(key);
//danger for get while in-between these lines
e.value = value;
}
private Entry addNewEntry(key){
Entry entry = new Entry(key, ""); //a new entry starts with empty string not null!
addToBuckets(entry); //now it's findable by get
return entry;
}
Now the get might not return null when the put operation is still in progress, and the whole getText method could return the wrong value.
The example is a bit convoluted, but you can see that correct behaviour of your code relies on the inner workings of the map class. That's not good.
And while you can look that code up, you cannot account for compiler, JIT and processor optimisations and inlining which effectively can change the order of operations just like the wacky but correct way I chose to write that map implementation.

Consider use of a concurrent hashmap and the method Map.computeIfAbsent() which takes a function to call to compute a default value if key is absent from the map.
Map<String, String> cache = new ConcurrentHashMap<>( );
cache.computeIfAbsent( "key", key -> "ComputedDefaultValue" );
Javadoc: If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

Non-trivial problem domains:
Concurrency is easy to do and hard to do correctly.
Caching is easy to do and hard to do correctly.
Both are right up there with Encryption in the category of hard to get right without an intimate understanding of the problem domain and its many subtle side effects and behaviors.
Combine them and you get a problem an order of magnitude harder than either one.
This is a non-trivial problem that your naive implementation will not solve in a bug free manner. The HashMap you are using is not going to threadsafe if any accesses are not checked and serialized, it will not be performant and will cause lots of contention that will cause lot of blocking and latency depending on the use.
The proper way to implement a lazy loading cache is to use something like Guava Cache with a Cache Loader it takes care of all the concurrency and cache race conditions for you transparently. A cursory glance through the source code shows how they do it.

No, and ConcurrentHashMap would not help.
Recap: the double check idiom is typically about assigning a new instance to a variable/field; it is broken because the compiler can reorder instructions, meaning the field can be assigned with a partially constructed object.
For your setup, you have a distinct issue: the map.get() is not safe from the put() which may be occurring thus possibly rehashing the table. Using a Concurrent hash map fixes ONLY that but not the risk of a false positive (that you think the map has no entry but it is actually being made). The issue is not so much a partially constructed object but the duplication of work.
As for the avoidable guava cacheloader: this is just a lazy-init callback that you give to the map so it can create the object if missing. This is essentially the same as putting all the 'if null' code inside the lock, which is certainly NOT going to be faster than good old direct synchronization. (The only times it makes sense to use a cacheloader is for pluggin-in a factory of such missing objects while you are passing the map to classes who don't know how to make missing objects and don't want to be told how).

Concurrent add on non threadsafe HashSet - what is the worst that could happen?

Situation:
Multiple Threads are only adding values to a non threadsafe java.util.HashSet and no other operation is done on the Set until these threads have been stopped.
Question:
What is the worst that could happen?

That depends on what you consider as being "worst".
I'm not sure whether this question aimed at a detailed, technical analysis of the current implementation considering all possible race conditions and the nitty-gritty details of the Java memory model.
So if the question is: "What can provably happen in the current implementation?" then I have to say: "I don't know". And I assume that hardly anybody knows this for sure in detail. (It's somehow like asking "Which parts of your car will be broken after you hit a wall with 100 mph?" - well, maybe the steering wheel will still be intact, but does this matter?)
But if the question is "What is not unlikely to happen when accessing a not-threadsafe HashMap with multiple threads?" then there are many possible answers:
Deadlocks
Exceptions
Missing elements
Elements being inserted multiple times
Elements being inserted into the wrong hash bin
...
(Roughly ordered by my subjective interpretation of "badness"...)
EDIT: A clarification for the comment: Of course, an element could only be added twice if the call to insert it occurs multiple times. According to the specifiction, the HashMap should contain each key at most once. But the call for adding a new entry to the HashMap eventually delegates to the call
void createEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
size++;
}
And there is no (obvious) reason why no other thread should cause a rehash (and thus, the creation of a new table array) between the first and the second line of this method. Then the bucketIndex for this call would be wrong. When the entry is then added a second time, it could use the (then) right bucketIndex, and thus, would afterwards be contained twice in the map.
But again: In order to really prove that this might happen, one would have to study the implementation in a detail that is hardly feasible. The bottom line is: Basically anything can go wrong when adding elements with multiple threads to a non-threadsafe HashMap.

The worst that can happen (besides an erroneous state of course) is probably an infinite loop when adding a value, blocking one of your threads.
See Paul Tyma article for more information on this case.

What I have seen happen is you can get a corrupted linked list in your underlying HashMap (for handling collisions) which points back to itself. This is an issue I have seen a number of times over the years and it results in the thread going into an infinite loop.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.