Thread safe swap of entire map in Java

Thread safe swap of entire map in Java - java

I'm implementing thread-safe map in the spring web service.
The map is such like this.
The map is read simultaneously in thousands of client threads.
The map's content has to be entirely updated sometimes(about once per hour).
I've chosen ConcurrentHashMap for thread-safe map, but there was no functionality to simply swap its content with newer one, like std::map::swap() in c++.
(I thought that atomic update of the entire content is required for multi-thread environment, maybe I'm wrong)
Is there an alternative map with swap?
Any suggestion or reply will be appreciated. Thanks.

If it isn't necessary to mutate the map, just atomically replacing it, you could wrap the map in an AtomicReference and atomically replace the reference in a single go. The different threads wouldn't keep a reference to the map instance itself, but the surrounding AtomicReference instance.
class Example {
private final AtomicReference<Map<String, String>> mapRef = new AtomicReference<>(someInitialState);
private void consumerThread() {
// Get the current version of the map and look up a value from it.
String value = mapRef.get().get("Hello");
// Do something with value.
}
private void producerThread() {
// Time to replace the whole map for all threads
Map<String, String> newMap = calculateNewMap();
mapRef.set(newMap);
}
}

Related

What is the different between map.put and creating a new map?

i'm reading the source code of sentinel, i find when the map need adding a entry, it create a new hashmap replacing the old rather than using map.put directly. like this:
public class NodeSelectorSlot extends AbstractLinkedProcessorSlot<Object> {
private volatile Map<String, DefaultNode> map = new HashMap<String, DefaultNode>(10);
#Override
public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
throws Throwable {
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
if (node == null) {
node = new DefaultNode(resourceWrapper, null);
// create a new hashmap
HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
cacheMap.putAll(map);
cacheMap.put(context.getName(), node);
map = cacheMap;
((DefaultNode) context.getLastNode()).addChild(node);
}
}
}
context.setCurNode(node);
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
...
}
what's the different between them?

The code you are looking is fetching a Node from the map, creating and adding a new Node if one is not present.
Clearly, this operation needs to be thread-safe. The simple ways to implement this would be:
Lock the map and perform get and put operations while holding the lock.
Use a ConcurrentHashMap which has operations for doing this kind of thing atomically; e.g. computeIfAbsent.
The authors of this code have chosen a different approach. They are using so-called Double Checked Locking (DCL) to avoid doing the initial get while holding a lock. That is what this code does:
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
...
The authors have decided that when they then need to add a new entry to the map they need to do it by replacing the entire map with a new one. On the face of it, that seems unnecessary. The map updates are being performed while holding the lock and the volatile adds a happens before that seems to ensure that the initial map.get call sees any recent writes to the HashMap.
But that reasoning is INCORRECT. The problem is that there is a small time window between fetching the map reference and the get call completing. During that time window, a simultaneous put operation may be updating the HashMap data structures. This is harmful because those changes could cause the get to read stale data (because there is no happens before relationship from the put writes to the get reads). Even worse, the put could trigger reconstruction of a hash chain or even an expansion of the hash array. The resulting behavior is (at least) outside of the HashMap spec, since HashMap is not defined to be thread-safe.
The authors' solution is to create a new HashMap with the existing entries and the new one, then update map with a single assignment. I haven't done a formal analysis, but I think that this approach is thread-safe.
In short, the reason that the code creates a new HashMap is to make the DCL approach thread-safe.
And if you ignore the thread-safety aspect, this approach is functionality equivalent to a simple put.
Finally, we need to consider whether the authors' approach is going to give optimal performance. The answer will depend on whether the number of cache entries stabilizes, and whether it is relatively small. One observation is that the cost of adding N entries to the cache is O(N^2) !! (Assuming that entries are never removed, as appears to be the case.)

It is so-called copy-on-write, which is intended to ensure thread-safe. When read operations are a lot more than write operations, it is more efficient than mechanisms like ConcurrentHashMap.
Ref: https://github.com/alibaba/Sentinel/issues/1733

Implementing per-key or striped locking in a Map - best approach?

I came across this dilemma at work and wanted to see if there is a better solution... it feels like there should be an easier, cleaner answer.
Goal: Concurrently access a map with locks at the key level, not at the entire map level, to ensure atomicity while impacting performance as little as possible.
I have a Map which needs to be concurrent. *(Added) The map will be filled with an unknown amount of entries over time. I have multiple readers and a single writer. The writer does a "check-then-put" and the reader does a simple get(). I need these to be atomic... but only at the key level. So for example, if the reader is checking for Key X, and the writer is writing to Key Y, I don't care if I miss the write to Key Y. If the reader/writer is working on the same key however I need that to be atomic.
The easiest solution is to lock the whole map. But this seems like it would impact performance, since there are about 10,000 keys that will end up in the map. (If that doesn't seem like it would hurt performance because the size of the Map is relatively small, let's pretend the Map has many more keys, for arguments sake.)
As far as I know, ConcurrentHashMap will not guarantee the "per-key" atomic behavior I need.
The next solution that came to mind was to have an array of lock objects. You would index into that array of lock Object()'s based on a hash of the original key. This would still have some contention since you have less locks than you have keys into the original map. I'm aware that ConcurrentHashMap does a similar thing under the hood (striping) to provide concurrency (but not atomicity).
Is there an easier way to perform this type of per-key or striped locking?
Thanks.

This concern can come up when value generation is a time-consuming process. You don't want to lock the whole map and find a missing value, and keep the map locked while you generate the value. You could release the map during generation, but then you could have two simultaneous misses and generations.
Instead of directly storing the value with the key, store it inside a reference object:
public class Ref<T>
{
private T value;
public T getValue()
{
return value;
}
public void setValue(T value)
{
this.value = value;
}
}
So if you originally had a map of Map<String, MyThing>, you instead use Map<String, Ref<MyThing>>. Don't bother with a concurrent implementation, just use HashMap or LinkedHashMap or whatever.
Now you can lock the map to find or create a reference holder, and then release the map. Following that, you can lock the reference to find or create the value object:
String key; // key you're looking up
Map<String, Ref<MyThing>> map; // the map
// Find the reference container, create it if necessary
Ref<MyThing> ref;
synchronized(map)
{
ref = map.get(key);
if (ref == null)
{
ref = new Ref<MyThing>();
map.put(key, ref);
}
}
// Map is released at this point
// Now get the value, creating if necessary
MyThing result;
synchronized(ref)
{
result = ref.getValue();
if (result == null)
{
result = generateMyThing();
ref.setValue(result);
}
}
// result == your existing or new object

Does map need to be synchronized if for each entry only one thread is accessing it?

I have a map. Lets say:
Map<String, Object> map = new HashMap<String, Object>();
Multiple threads are accessing this map, however each thread accesses only its own entries in the map. This means that if thread T1 inserts object A into the map, it is guaranteed that no other thread will access object A. Finally thread T1 will also remove object A.
It is guaranteed as well that no thread will iterate over the map.
Does this map need to be synchronized? If yes how would you synchronize it? (ConcurrentHashMap, Collections.synchronizedMap() or synchronized block)

Yes, you would need synchronization, or a concurrent map. Just think about the size of the map: two threads could add an element in parallel, and both increment the size. If you don't synchronize the map, you could have a race condition and it would result in an incorrect size. There are many other things that could go wrong.
But you could also use a different map for each thread, couldn't you?
A ConcurrentHashMap is typically faster that a synchronized HashMap. But the choice depends on your requirements.

If you're sure that there's only one entry per thread and none thread iterates/searches through the map, then why do you need a map?
You can use ThreadLocal object instead which will contain thread-specific data. If you need to keep string-object pairs, you can create an special class for this pair, and keep it inside ThreadLocal field.
class Foo {
String key;
Object value;
....
}
//below was your Map declaration
//Map<String, Object> map = ...
//Use here ThreadLocal instead
final ThreadLocal<Foo> threadLocalFoo = new ThreadLocal<Foo>();
...
threadLocalFoo.set(new Foo(...));
threadLocalFoo.get() //returns your object
threadLocalFoo.remove() //clears threadLocal container
More info on ThreadLocals you can find in ThreadLocal javadocs.

I would say that yes. Getting the data is not the issue, adding the data is.
The HashMap has a series of buckets (lists); when you put data to the HashMap, the hashCode is used to decide in which bucket the item goes, and the item is added to the list.
So it can be that two items are added to the same bucket at the same time and, due to some run condition, only one of them is effectively stored.

You have to synchronize writing operations in the map. If after initializating the map, no thread is going to insert new entries, or delete entries in the map you don't need to synchronize it.
However, in your case (where each thread has its own entry) I'd recommend using ThreadLocal, which allows you to have a "local" object which will have different values per thread.
Hope it helps

For this scenario I think ConcurrentHashMap is the best Map, because both Collections.synchronizedMap() or synchronized block (which are basically the same) have more overhead.
If you want to insert entries and not only read them in different threads you have to synchronize them because of the way the HashMap works.

- First of all its always a practice to write a Thread-safe code, specially in cases like the above, not in all conditions.
- Well its better to use HashTable which is a synchronized Map, or java.util.concurrent.ConcurrentHashMap<K,V>.

How do I create a thread-safe write-once read-many Map in Java?

I have a Java class with private static maps used to store information during the execution of the application. I would only ever put a key/value once into the Map but the map value may be read many times.
So the way I have it now, the code does a get and checks for null. If null then I gather the data I need and put it into the map. Subsequent calls by the client code would be guaranteed to get the value from the map. The client would not need to do null checks.
The reason for this is that getting the data to put in the map could be expensive so I only want to do this once per key.
Is there any pattern for this? I can't seem to find anything out there that discusses this situation.
TIA
Here's a totally non-thread-safe example:
public class TestWorm {
private static Map<String, Object> map = new HashMap<String, Object>(32);
public Object getValue(String key) {
if (map.get(key) != null) {
return map.get(key);
}
// do some process to get Object
Object o = new Object();
map.put(key, o);
return o;
}
}

Your best bet is ConcurrentHashMap and it's putIfAbsent method.
Your implementation is not thread safe. To make it thread safe, declare field final, change implementation class to ConcurrentHashMap and that is enough, if you don't care if sometimes values will be computed and stored several times (this would be rare: in case two thread simultaneously enter get and corresponding value is not yet computed. This is usually good trade off, as usually the most common case is when you have something in the cache. And in this case you do not use any extra synchronization to retrieve existing value.)
If you want to make sure that there is at most one value present in your application for the given key you can further extend your implementation by using putIfAbsent, instead of put.
Another way to implement this would be to use guava library: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/MapMaker.html
Yet another way to do that would be to use computeIfAbsent of ConcurrentHashMapV8 (http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/ConcurrentHashMapV8.java?view=markup) which will someday appear in Java 8.

checkout ConcurrentHashMap.putIfAbsent() which will provide all you need for storing and retrieving synchronously.

This is a typical caching pattern.
For example the Spring Framework offers a good API since 3.1
But there are specialized Frameworks there like Ehcache, too.

Java concurrent access to field, trick to not use volatile

Preface: I'm know that in most cases using a volatile field won't yield any measurable performance penalty, but this question is more theoretical and targeted towards a design with an extremly high corrency support.
I've got a field that is a List<Something> which is filled after constrution. To save some performance I would like to convert the List into a read only Map. Doing so at any point requires at least a volatile Map field so make changes visible for all threads.
I was thinking of doing the following:
Map map;
public void get(Object key){
if(map==null){
Map temp = new Map();
for(Object value : super.getList()){
temp.put(value.getKey(),value);
}
map = temp;
}
return map.get(key);
}
This could cause multiple threads to generate the map even if they enter the get block in a serialized way. This would be no big issue, if threads work on different identical instances of the map. What worries me more is:
Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Answers to comments:
The threads only modify the temporary map after that it is read only.
I must convert a List to a Map because of some speical JAXB setup which doesn't make it feasable to have a Map to begin with.

Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Yes, this is absolutely possible; for example, an optimizing compiler could actually completely get rid of the local temp variable, and just use the map field the whole time, provided it restored map to null in the case of an exception.
Similarly, a thread could also see a non-null, non-empty map that is nonetheless not fully populated. And unless your Map class is carefully designed to allow simultaneous reads and writes (or uses synchronized to avoid the issue), you could also get bizarre behavior if one thread is calling its get method while another is calling its put.

Can you create your Map in the ctor and declare it final? Provided you don't leak the map so others can modify it, that should suffice to make your get() safely sharable by multiple threads.

When you really in doubt whether an other thread could read an "half completed" map
(I don't think so, but never say never ;-), you may try this.
map is null or complete
static class MyMap extends HashMap {
MyMap (List pList) {
for(Object value : pList){
put(value.getKey(), value);
}
}
}
MyMap map;
public Object get(Object key){
if(map==null){
map = new MyMap (super.getList());
}
return map.get(key);
}
Or does someone see a new introduced problem ?

In addition to the visibility concerns previously mentioned, there is another problem with the original code, viz. it can throw a NullPointerException here:
return this.map.get(key)
Which is counter-intuitive, but that is what you can expect from incorrectly synchronized code.
Sample code to prevent this:
Map temp;
if ((temp = this.map) == null)
{
temp = new ImmutableMap(getList());
this.map = temp;
}
return temp.get(key);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.