Double checked locking with regular HashMap

Double checked locking with regular HashMap - java

Back to concurrency. By now it is clear that for the double checked locking to work the variable needs to be declared as volatile. But then what if double checked locking is used as below.
class Test<A, B> {
private final Map<A, B> map = new HashMap<>();
public B fetch(A key, Function<A, B> loader) {
B value = map.get(key);
if (value == null) {
synchronized (this) {
value = map.get(key);
if (value == null) {
value = loader.apply(key);
map.put(key, value);
}
}
}
return value;
}
}
Why does it really have to be a ConcurrentHashMap and not a regular HashMap? All map modification is done within the synchronized block and the code doesn't use iterators so technically there should be no "concurrent modification" problems.
Please avoid suggesting the use of putIfAbsent/computeIfAbsent as I am asking about the concept and not the use of API :) unless using this API contributes to HashMap vs ConcurrentHashMap subject.
Update 2016-12-30
This question was answered by a comment below by Holger "HashMap.get doesn’t modify the structure, but your invocation of put does. Since there is an invocation of get outside of the synchronized block, it can see an incomplete state of a put operation happening concurrently." Thanks!

This question is muddled on so many counts that its hard to answer.
If this code is only ever called from a single thread, then you're making it too complicated; you don't need any synchronization. But clearly that's not your intention.
So, multiple threads will call the fetch method, which delegates to HashMap.get() without any synchronization. HashMap is not thread-safe. Bam, end of story. Doesn't even matter if you're trying to simulate double-checked locking; the reality is that calling get() and put() on a map will manipulate the internal mutable data structures of the HashMap, without consistent synchronization on all code paths, and since you can be calling these concurrently from multiple threads, you're already dead.
(Also, you probably think that HashMap.get() is a pure read operation, but that's wrong too. What if the HashMap is actually a LinkedHashMap (which is a subclass of HashMap.) LinkedHashMap.get() will update the access order, which involves writing to internal data structures -- here, concurrently without synchronization. But even if get() is doing no writing, your code here is still broken.)
Rule of thumb: when you think you have a clever trick that lets you avoid synchronizing, you're almost certainly wrong.

Related

Does double-checked locking work with a final Map in Java?

I'm trying to implement a thread-safe Map cache, and I want the cached Strings to be lazily initialized. Here's my first pass at an implementation:
public class ExampleClass {
private static final Map<String, String> CACHED_STRINGS = new HashMap<String, String>();
public String getText(String key) {
String string = CACHED_STRINGS.get(key);
if (string == null) {
synchronized (CACHED_STRINGS) {
string = CACHED_STRINGS.get(key);
if (string == null) {
string = createString();
CACHED_STRINGS.put(key, string);
}
}
}
return string;
}
}
After writing this code, Netbeans warned me about "double-checked locking," so I started researching it. I found The "Double-Checked Locking is Broken" Declaration and read it, but I'm unsure if my implementation falls prey to the issues it mentioned. It seems like all the issues mentioned in the article are related to object instantiation with the new operator within the synchronized block. I'm not using the new operator, and Strings are immutable, so I'm not sure that if the article is relevant to this situation or not. Is this a thread-safe way to cache strings in a HashMap? Does the thread-safety depend on what action is taken in the createString() method?

No it's not correct because the first access is done out side of a sync block.
It's somewhat down to how get and put might be implemented. You must bare in mind that they are not atomic operations.
For example, what if they were implemented like this:
public T get(string key){
Entry e = findEntry(key);
return e.value;
}
public void put(string key, string value){
Entry e = addNewEntry(key);
//danger for get while in-between these lines
e.value = value;
}
private Entry addNewEntry(key){
Entry entry = new Entry(key, ""); //a new entry starts with empty string not null!
addToBuckets(entry); //now it's findable by get
return entry;
}
Now the get might not return null when the put operation is still in progress, and the whole getText method could return the wrong value.
The example is a bit convoluted, but you can see that correct behaviour of your code relies on the inner workings of the map class. That's not good.
And while you can look that code up, you cannot account for compiler, JIT and processor optimisations and inlining which effectively can change the order of operations just like the wacky but correct way I chose to write that map implementation.

Consider use of a concurrent hashmap and the method Map.computeIfAbsent() which takes a function to call to compute a default value if key is absent from the map.
Map<String, String> cache = new ConcurrentHashMap<>( );
cache.computeIfAbsent( "key", key -> "ComputedDefaultValue" );
Javadoc: If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

Non-trivial problem domains:
Concurrency is easy to do and hard to do correctly.
Caching is easy to do and hard to do correctly.
Both are right up there with Encryption in the category of hard to get right without an intimate understanding of the problem domain and its many subtle side effects and behaviors.
Combine them and you get a problem an order of magnitude harder than either one.
This is a non-trivial problem that your naive implementation will not solve in a bug free manner. The HashMap you are using is not going to threadsafe if any accesses are not checked and serialized, it will not be performant and will cause lots of contention that will cause lot of blocking and latency depending on the use.
The proper way to implement a lazy loading cache is to use something like Guava Cache with a Cache Loader it takes care of all the concurrency and cache race conditions for you transparently. A cursory glance through the source code shows how they do it.

No, and ConcurrentHashMap would not help.
Recap: the double check idiom is typically about assigning a new instance to a variable/field; it is broken because the compiler can reorder instructions, meaning the field can be assigned with a partially constructed object.
For your setup, you have a distinct issue: the map.get() is not safe from the put() which may be occurring thus possibly rehashing the table. Using a Concurrent hash map fixes ONLY that but not the risk of a false positive (that you think the map has no entry but it is actually being made). The issue is not so much a partially constructed object but the duplication of work.
As for the avoidable guava cacheloader: this is just a lazy-init callback that you give to the map so it can create the object if missing. This is essentially the same as putting all the 'if null' code inside the lock, which is certainly NOT going to be faster than good old direct synchronization. (The only times it makes sense to use a cacheloader is for pluggin-in a factory of such missing objects while you are passing the map to classes who don't know how to make missing objects and don't want to be told how).

Java visibility: final static non-threadsafe collection changes after construction

I found the following code snippet in luaj and I started to doubt that if there is a possibility that changes made to the Map after it has been constructed might not be visible to other threads since there is no synchronization in place.
I know that since the Map is declared final, its initialized values after construction is visible to other threads, but what about changes that happen after that.
Some might also realize that this class is so not thread-safe that calling coerce in a multi-threaded environment might even cause infinite loop in the HashMap, but my question is not about that.
public class CoerceJavaToLua {
static final Map COERCIONS = new HashMap(); // this map is visible to all threads after construction, since its final
public static LuaValue coerce(Object paramObject) {
...;
if (localCoercion == null) {
localCoercion = ...;
COERCIONS.put(localClass, localCoercion); // visible?
}
return ...;
}
...
}

You're correct that changes to the Map may not be visible to other threads. Every method that accesses COERCIONS (both reading and writing) should be synchronized on the same object. Alternatively, if you never need sequences of accesses to be atomic, you could use a synchronized collection.
(BTW, why are you using raw types?)

This code is actually bad and may cause many problems (probably not infinite loop, that's more common with TreeMap, with HashMap it's more likely to get the silent data loss due to overwrite or probably some random exception). And you're right, it's not guaranteed that the changes made in one thread will be visible by another one.
Here the problem may look not very big as this Map is used for caching purposes, thus silent overwrites or visibility lag doesn't lead to real problems (just two distinct instances of coersion will be used for the same class, which is probably ok in this case). However it's still possible that such code will break your program. If you like, you can submit a patch to LuaJ team.

Two options:
// Synchronized (since Java 1.2)
static final Map COERCIONS = Collections.synchronizedMap(new HashMap());
// Concurrent (since Java 5)
static final Map COERCIONS = new ConcurrentHashMap();
They each have their pros and cons.
ConcurrentHashMap pro is no locking. Con is that operations are not atomic, e.g. an Iterator in one thread and a call to putAll in another will allow iterator to see some of the values added.

Is a java synchronized method entry point thread safe enough?

I have a Singleton class handling a kind of cache with different objects in a Hashmap.
(The format of a key is directly linked to the type of object stored in the map - hence the map is of )
Three different actions are possible on the map : add, get, remove.
I secured the access to the map by using a public entry point method (no intense access) :
public synchronized Object doAction(String actionType, String key, Object data){
Object myObj = null;
if (actionType.equalsIgnorecase("ADD"){
addDataToMyMap(key,data);
} else if (actionType.equalsIgnorecase("GET"){
myObj = getDataFromMyMap(key);
} else if (actionType.equalsIgnorecase("REM"){
removeDataFromMyMap(key);
}
return myObj;
}
Notes:
The map is private. Methods addDataToMyMap(), getDataFromMyMap() and removeDataFromMyMap() are private. Only the entry point method is public and nothing else except the static getInstance() of the class itself.
Do you confirm it is thread safe for concurrent access to the map since there is no other way to use map but through that method ?
If it is safge for a Map, I guess this principle could be applied to any other kind of shared ressource.
Many thanks in advance for your answers.
David

I would need to see your implementation of your methods, but it could be enough.
BUT i would recommend you to use a Map from the Collection API of java then you wouldnt need to synchronize your method unless your sharing some other instance.
read this: http://www.java-examples.com/get-synchronized-map-java-hashmap-example

Yes your class will be thread safe as long as the only entry point is doAction.

If your cache class has private HashMap and you have three methods and all are public synchronized and not static and if you don't have any other public instance variable then i think your cache is thread-safe.
Better to post your code.

This is entirely safe. As long as all the threads are accessing it using a common lock, which in this case is the Object, then it's thread-safe. (Other answers may be more performant but your implementation is safe.)

You can use Collections.synchronizedMap to synchronize access to the Map.

As is it is hard to determine if the code is thread safe. Important information missing from your example are:
Are the methods public
Are the methods synchronized
It the map only accessed through the methods
I would advice you to look into synchronization to get a grasp of the problems and how to tackle them. Exploring the ConcurrentHashMap class would give further information about your problem.

You should use ConcurrentHashMap. It offers better throughput than synchronized doAction and better thread safety than Collections.synchronizedMap().

This depends on your code. As someone else stated, you can use Collections.synchronizedMap. However, this only synchronizes the individual method calls on the map. So if:
map.get(key);
map.put(key,value);
Are executed at the same time in two different threads, one will block until the other exits. However, if your critical section is larger than the single call into the map:
SomeExpensiveObject value = map.get(key);
if (value == null) {
value = new SomeExpensiveObject();
map.put(key,value);
}
Now let's assume the key is not present. The first thread executes, and gets a null value back. The scheduler yields that thread, and runs thread 2, which also gets back a null value.
It constructs the new object and puts it in the map. Then thread 1 resumes and does the same, since it still has a null value.
This is where you'd want a larger synchronization block around your critical section
SomeExpensiveObject value = null;
synchronized (map) {
value = map.get(key);
if (value == null) {
value = new SomeExpensiveObject();
map.put(key,value);
}
}

concurrent HashMap: checking size

Concurrent Hashmap could solve synchronization issue which is seen in hashmap. So adding and removing would be fast if we are using synchronize key work with hashmap. What about checking hashmap size, if mulitple threads checking concurrentHashMap size? do we still need synchronzation key word: something as follows:
public static synchronized getSize(){
return aConcurrentHashmap.size();
}

concurentHashMap.size() will return the size known at the moment of the call, but it might be a stale value when you use that number because another thread has added / removed items in the meantime.
However the whole purpose of ConcurrentMaps is that you don't need to synchronize it as it is a thread safe collection.

You can simply call aConcurrentHashmap.size(). However, you have to bear in mind that by the time you get the answer it might already be obsolete. This would happen if another thread where to concurrently modify the map.

You don't need to use synchronized with ConcurretnHashMap except in very rare occasions where you need to perform multiple operations atomically.
To just get the size, you can call it without synchronization.
To clarify when I would use synchronization with ConcurrentHashMap...
Say you have an expensive object you want to create on demand. You want concurrent reads, but also want to ensure that values are only created once.
public ExpensiveObject get(String key) {
return map.get(key); // can work concurrently.
}
public void put(String key, ExepensiveBuilder builder) {
// cannot use putIfAbsent because it needs the object before checking.
synchronized(map) {
if (!map.containsKey(key))
map.put(key, builder.create());
}
}
Note: This requires that all writes are synchronized, but reads can still be concurrent.

The designers of ConcurrentHashMap thought of giving weightage to individual operations like : get(), put() and remove() over methods which operate over complete HashMap like isEmpty() or size(). This is done because the changes of these methods getting called (in general) are less than the other individual methods.
A synchronization for size() is not needed here. We can get the size by calling concurentHashMap.size() method. This method may return stale values as other thread might modify the map in the meanwhile. But, this is explicitely assumed to be broken as these operations are deprioritized.

ConcorrentHashMap is fail-safe. it won't give any concurrent modification exceptions. it works good for multi threaded operations.
The whole implementation of ConcurrentHashMap is same as HashMap but the while retrieving the elements , HashMap locks whole map restricting doing further modifications which gives concurrent modification exception.'
But in ConcurrentHashMap, the locking happens at bucket level so the chance of giving concurrent modification exception is not present.
So to answer you question here, checking size of ConcurrentHashMap doesn't help because , it keeps chaining based on the operations or modification code that you write on the map. It has size method which is same from the HashMap.

Using putIfAbsent like a short circuit operator

Is it possible to use putIfAbsent or any of its equivalents like a short circuit operator.
myConcurrentMap.putIfAbsent(key,calculatedValue)
I want that if there is already a calculatedValue it shouldnt be calculated again.
by default putIfAbsent would still do the calculation every time even though it will not actually store the value again.

Java doesn't allow any form of short-circuiting save the built-in cases, sadly - all method calls result in the arguments being fully evaluated before control passes to the method. Thus you couldn't do this with "normal" syntax; you'd need to manually wrap up the calculation inside a Callable or similar, and then explicitly invoke it.
In this case I find it difficult to see how it could work anyway, though. putIfAbsent works on the basis of being an atomic, non-blocking operation. If it were to do what you want, the sequence of events would roughly be:
Check if key exists in the map (this example assumes it doesn't)
Evaluate calculatedValue (probably expensive, given the context of the question)
Put result in map
It would be impossible for this to be non-blocking if the value didn't already exist at step two - two different threads calling this method at the same time could only perform correctly if blocking happened. At this point you may as well just use synchronized blocks with the flexibility of implementation that that entails; you can definitely implement what you're after with some simple locking, something like the following:
private final Map<K, V> map = ...;
public void myAdd(K key, Callable<V> valueComputation) {
synchronized(map) {
if (!map.containsKey(key)) {
map.put(key, valueComputation.call());
}
}
}

You can put Future<V> objects into the map. Using putIfAbsent, only one object will be there, and computation of final value will be performed by calling Future.get() (e.g. by FutureTask + Callable classes). Check out Java Concurrency in Practice for discussion about using this technique. (Example code is also in this question here on SO.
This way, your value is computed only once, and all threads get same value. Access to map isn't blocked, although access to value (through Future.get()) will block until this value is computed by one of the threads.

You could consider to use a Guava ComputingMap
ConcurrentMap<Key, Value> myConcurrentMap = new MapMaker()
.makeComputingMap(
new Function<Key, Value>() {
public Value apply(Key key) {
Value calculatedValue = calculateValue(key);
return calculatedValue;
}
});

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Double checked locking with regular HashMap - java

Related

Does double-checked locking work with a final Map in Java?

Java visibility: final static non-threadsafe collection changes after construction

Is a java synchronized method entry point thread safe enough?

concurrent HashMap: checking size

Using putIfAbsent like a short circuit operator

Categories

Resources