Implementing per-key or striped locking in a Map - best approach?

Implementing per-key or striped locking in a Map - best approach? - java

I came across this dilemma at work and wanted to see if there is a better solution... it feels like there should be an easier, cleaner answer.
Goal: Concurrently access a map with locks at the key level, not at the entire map level, to ensure atomicity while impacting performance as little as possible.
I have a Map which needs to be concurrent. *(Added) The map will be filled with an unknown amount of entries over time. I have multiple readers and a single writer. The writer does a "check-then-put" and the reader does a simple get(). I need these to be atomic... but only at the key level. So for example, if the reader is checking for Key X, and the writer is writing to Key Y, I don't care if I miss the write to Key Y. If the reader/writer is working on the same key however I need that to be atomic.
The easiest solution is to lock the whole map. But this seems like it would impact performance, since there are about 10,000 keys that will end up in the map. (If that doesn't seem like it would hurt performance because the size of the Map is relatively small, let's pretend the Map has many more keys, for arguments sake.)
As far as I know, ConcurrentHashMap will not guarantee the "per-key" atomic behavior I need.
The next solution that came to mind was to have an array of lock objects. You would index into that array of lock Object()'s based on a hash of the original key. This would still have some contention since you have less locks than you have keys into the original map. I'm aware that ConcurrentHashMap does a similar thing under the hood (striping) to provide concurrency (but not atomicity).
Is there an easier way to perform this type of per-key or striped locking?
Thanks.

This concern can come up when value generation is a time-consuming process. You don't want to lock the whole map and find a missing value, and keep the map locked while you generate the value. You could release the map during generation, but then you could have two simultaneous misses and generations.
Instead of directly storing the value with the key, store it inside a reference object:
public class Ref<T>
{
private T value;
public T getValue()
{
return value;
}
public void setValue(T value)
{
this.value = value;
}
}
So if you originally had a map of Map<String, MyThing>, you instead use Map<String, Ref<MyThing>>. Don't bother with a concurrent implementation, just use HashMap or LinkedHashMap or whatever.
Now you can lock the map to find or create a reference holder, and then release the map. Following that, you can lock the reference to find or create the value object:
String key; // key you're looking up
Map<String, Ref<MyThing>> map; // the map
// Find the reference container, create it if necessary
Ref<MyThing> ref;
synchronized(map)
{
ref = map.get(key);
if (ref == null)
{
ref = new Ref<MyThing>();
map.put(key, ref);
}
}
// Map is released at this point
// Now get the value, creating if necessary
MyThing result;
synchronized(ref)
{
result = ref.getValue();
if (result == null)
{
result = generateMyThing();
ref.setValue(result);
}
}
// result == your existing or new object

Related

What is the different between map.put and creating a new map?

i'm reading the source code of sentinel, i find when the map need adding a entry, it create a new hashmap replacing the old rather than using map.put directly. like this:
public class NodeSelectorSlot extends AbstractLinkedProcessorSlot<Object> {
private volatile Map<String, DefaultNode> map = new HashMap<String, DefaultNode>(10);
#Override
public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
throws Throwable {
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
if (node == null) {
node = new DefaultNode(resourceWrapper, null);
// create a new hashmap
HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
cacheMap.putAll(map);
cacheMap.put(context.getName(), node);
map = cacheMap;
((DefaultNode) context.getLastNode()).addChild(node);
}
}
}
context.setCurNode(node);
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
...
}
what's the different between them?

The code you are looking is fetching a Node from the map, creating and adding a new Node if one is not present.
Clearly, this operation needs to be thread-safe. The simple ways to implement this would be:
Lock the map and perform get and put operations while holding the lock.
Use a ConcurrentHashMap which has operations for doing this kind of thing atomically; e.g. computeIfAbsent.
The authors of this code have chosen a different approach. They are using so-called Double Checked Locking (DCL) to avoid doing the initial get while holding a lock. That is what this code does:
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
...
The authors have decided that when they then need to add a new entry to the map they need to do it by replacing the entire map with a new one. On the face of it, that seems unnecessary. The map updates are being performed while holding the lock and the volatile adds a happens before that seems to ensure that the initial map.get call sees any recent writes to the HashMap.
But that reasoning is INCORRECT. The problem is that there is a small time window between fetching the map reference and the get call completing. During that time window, a simultaneous put operation may be updating the HashMap data structures. This is harmful because those changes could cause the get to read stale data (because there is no happens before relationship from the put writes to the get reads). Even worse, the put could trigger reconstruction of a hash chain or even an expansion of the hash array. The resulting behavior is (at least) outside of the HashMap spec, since HashMap is not defined to be thread-safe.
The authors' solution is to create a new HashMap with the existing entries and the new one, then update map with a single assignment. I haven't done a formal analysis, but I think that this approach is thread-safe.
In short, the reason that the code creates a new HashMap is to make the DCL approach thread-safe.
And if you ignore the thread-safety aspect, this approach is functionality equivalent to a simple put.
Finally, we need to consider whether the authors' approach is going to give optimal performance. The answer will depend on whether the number of cache entries stabilizes, and whether it is relatively small. One observation is that the cost of adding N entries to the cache is O(N^2) !! (Assuming that entries are never removed, as appears to be the case.)

It is so-called copy-on-write, which is intended to ensure thread-safe. When read operations are a lot more than write operations, it is more efficient than mechanisms like ConcurrentHashMap.
Ref: https://github.com/alibaba/Sentinel/issues/1733

Thread safe swap of entire map in Java

I'm implementing thread-safe map in the spring web service.
The map is such like this.
The map is read simultaneously in thousands of client threads.
The map's content has to be entirely updated sometimes(about once per hour).
I've chosen ConcurrentHashMap for thread-safe map, but there was no functionality to simply swap its content with newer one, like std::map::swap() in c++.
(I thought that atomic update of the entire content is required for multi-thread environment, maybe I'm wrong)
Is there an alternative map with swap?
Any suggestion or reply will be appreciated. Thanks.

If it isn't necessary to mutate the map, just atomically replacing it, you could wrap the map in an AtomicReference and atomically replace the reference in a single go. The different threads wouldn't keep a reference to the map instance itself, but the surrounding AtomicReference instance.
class Example {
private final AtomicReference<Map<String, String>> mapRef = new AtomicReference<>(someInitialState);
private void consumerThread() {
// Get the current version of the map and look up a value from it.
String value = mapRef.get().get("Hello");
// Do something with value.
}
private void producerThread() {
// Time to replace the whole map for all threads
Map<String, String> newMap = calculateNewMap();
mapRef.set(newMap);
}
}

Can we use Synchronized for each entry instead of ConcurrentHashMap?

This is the problem: we want a hash table whose entries are thread-safe.
Suppose I have a hash table of <String, Long>, and I want to increase the value of one of the entries thread safely: is the following OK?:
HashMap<String , Long> hashTable = new HashMap<String, Long>();
Then whenever I want to increase an entry:
Synchronized (hashTable.get("key"))
{
Long value = hashTable.get("key");
value++;
hashTable.put("key", value);
}
I think it is better than ConcurrentHashMap, as it locks just one entry, unlike ConcurrentHashMap which uses buckets, and lock a group of entries together.
More importantly, I don't know how to increment it using COncurrenHashMap safely. For example I think the following code is not correct:
ConcurrentHashMap<String , Long> hashTable = new ConcurrentHashMap<String, Long>();
Long value = hashTable.get("key");
value++;
hashTable.put("key", value);
I think it is not correct, because two threads can read the key one after another, and write one after another and end up in a wrong value.
What do you think guys?

Your proposed approach is not thread-safe because the initial hashTable.get() operation -- by which you obtain the object on which you intend to synchronize -- is not itself synchronized relative to other threads put()ing a value associated with the same key. Moreover, your code does not account for the possibility of new values being added to the map or keys being removed from the map (so-called "structural modifications"). If ever that can happen, regardless of key, then those actions have to be synchronized with respect to all other accesses to the map.
You are right, however, that ConcurrentHashMap does not solve these problems either. It is thread-safe with respect to the individual operations it provides, which include some that Map itself does not define, but series of operations that must be performed as an uninterrupted unit still need to be protected by synchronization.
I suggest a slightly different approach: use a ConcurrentHashMap with AtomicLong, which is mutable, as your value type instead of Long:
ConcurrentHashMap<String, AtomicLong> map;
Then, to update the value for a key, even if you're not confident that the key already has an entry in the map, you do this:
AtomicLong value = map.putIfAbsent(key, new AtomicLong(0));
long updatedValue = value.incrementAndGet();
The putIfAbsent() ensures that value objects are not clobbered by conflicting put operations. The use of AtomicLong avoids the need for multiple operations to be jointly synchronized, because only one map access is needed -- the value retrieved is shared by all threads accessing it, and can itself be atomically updated without further accessing the map.
If you can be certain that the map already has a mapping for the given key, then you can simply do this:
AtomicLong value = map.get(key);
long updatedValue = value.incrementAndGet();
One way or the other, I think this is about the best you can do for the operations you describe and imply.
Update:
You could even consider combining the two approaches like this:
AtomicLong value = map.get(key);
if (value == null) {
value = map.putIfAbsent(key, new AtomicLong(0));
}
long updatedValue = value.incrementAndGet();
That supposes that it will be comparatively rare that there is not already a mapping for the given key, and it avoids creating a new AtomicLong in that case. If no mapping is found then the map must be accessed a second time to ensure that there is a mapping and to get the corresponding value, but here we still need putIfAbsent() if we want to avoid synchronization, because it is possible for two threads to both try to add a mapping for the same key, at about the same time. That's more costly when a new entry needs to be added, but it's possible that it would turn out to be less costly on average than my first suggestion. As with any performance question, however, it is essential to test.

Java concurrent access to field, trick to not use volatile

Preface: I'm know that in most cases using a volatile field won't yield any measurable performance penalty, but this question is more theoretical and targeted towards a design with an extremly high corrency support.
I've got a field that is a List<Something> which is filled after constrution. To save some performance I would like to convert the List into a read only Map. Doing so at any point requires at least a volatile Map field so make changes visible for all threads.
I was thinking of doing the following:
Map map;
public void get(Object key){
if(map==null){
Map temp = new Map();
for(Object value : super.getList()){
temp.put(value.getKey(),value);
}
map = temp;
}
return map.get(key);
}
This could cause multiple threads to generate the map even if they enter the get block in a serialized way. This would be no big issue, if threads work on different identical instances of the map. What worries me more is:
Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Answers to comments:
The threads only modify the temporary map after that it is read only.
I must convert a List to a Map because of some speical JAXB setup which doesn't make it feasable to have a Map to begin with.

Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Yes, this is absolutely possible; for example, an optimizing compiler could actually completely get rid of the local temp variable, and just use the map field the whole time, provided it restored map to null in the case of an exception.
Similarly, a thread could also see a non-null, non-empty map that is nonetheless not fully populated. And unless your Map class is carefully designed to allow simultaneous reads and writes (or uses synchronized to avoid the issue), you could also get bizarre behavior if one thread is calling its get method while another is calling its put.

Can you create your Map in the ctor and declare it final? Provided you don't leak the map so others can modify it, that should suffice to make your get() safely sharable by multiple threads.

When you really in doubt whether an other thread could read an "half completed" map
(I don't think so, but never say never ;-), you may try this.
map is null or complete
static class MyMap extends HashMap {
MyMap (List pList) {
for(Object value : pList){
put(value.getKey(), value);
}
}
}
MyMap map;
public Object get(Object key){
if(map==null){
map = new MyMap (super.getList());
}
return map.get(key);
}
Or does someone see a new introduced problem ?

In addition to the visibility concerns previously mentioned, there is another problem with the original code, viz. it can throw a NullPointerException here:
return this.map.get(key)
Which is counter-intuitive, but that is what you can expect from incorrectly synchronized code.
Sample code to prevent this:
Map temp;
if ((temp = this.map) == null)
{
temp = new ImmutableMap(getList());
this.map = temp;
}
return temp.get(key);

modifying a ConcurrentHashMap and Synchronized ArrayList in same method

I have a collection of objects that is modified by one thread and read by another (more specifically the EDT). I needed a solution that gave me fast look up and also fast indexing (by order inserted), so I'm using a ConcurrentHashMap with an accompanying ArrayList of the keys, so if want to index an entry, I can index the List for the key and then use the returned key to get the value from the hash map. So I have a wrapper class that makes sure when and entry is added, the mapping is added to the hash map and the key is added to the list at the same time, similarly for removal.
I'm posting an example of the code in question:
private List<K> keys = Collections.synchronizedList(new ArrayList<K>(INITIAL_CAPACITY));
private ConcurrentMap<K, T> entries = new ConcurrentHashMap<K, T>(INITIAL_CAPACITY, .75f);
public synchronized T getEntryAt(int index){
return entries.get(keys.get(index));
}
**public synchronized void addOrReplaceEntry(K key, T value){
T result = entries.get(key);
if(result == null){
entries.putIfAbsent(key, value);
keys.add(key);
}
else{
entries.replace(key, result);
}
}**
public syncrhonized T removeEntry(K key, T value){
keys.remove(key);
entries.remove(key, value);
}
public synchronized int getSize(){
return keys.size();
}
my question is: am I losing all the benefits of using the ConcurrentHashMap (over syncrhonized hashmap) by operating on it in synchronized methods? I have to synchronize the methods to safely modify/read from the ArrayList of keys (CopyOnWriteArrayList is not an option because a lot of modification happens...) Also, if you know of a better way to do this, that would be appreciated...

Yes, using a Concurrent collection and a Synchronized collection in only synchronized blocks is a waste. You wont get the benefits of ConcurrentHashMap because only one thread will be accesing it at a time.
You could have a look at this implementation of a concurrent linked hashmap, I havnt use it so can't attest to it's features.
One thing to consider would be to switching from synchronized blocks to a ReadWriteLock to improve concurrent read only performance.
I'm not really sure of the utility of proving a remove at index method, perhaps you could give some more details about the problem you are trying to solve?

It seems that you only care about finding values by index. If so, dump the Map and just use a List. Why do you need the Map?

Mixing synchronized and concurrent collections the way you have done it is not recommended. Any reason why you are maintaining two copies of the stuff you are interested in? You can easily get a list of all the keys from the map anytime rather than maintaining a separate list.

Why not store the values in the list and in the map the key -> index mapping?
so for getEntry you only need on lookup (in the list which should be anyway faster than a map) and for remove you do not have to travers the whole list. Syhnronization happens so.

You can get all access to the List keys onto the event queue using EventQueue.invokeLater. This will get rid of the synchronization. With all the synching you were not running much in parallel anyway. Also it means the getSize method will give the same answer for the duration of an event.
If you stick with synchronization instead of using invokeLater, at least get the entries hash table out of the synch block. Either way, you get more parallel processing. Of course, entries can now become out-of-synch with keys. The only down side is sometimes a key will come up with a null entry. With such a dynamic table this is unlikely to matter much.
Using the suggestion made by chrisichris to put the values in the list will solve this problem if it is one. In fact, this puts a nice wall between keys and entries; they are now used in completely separate ways. (If your only need for entries is to provide values to the JTable, you can get rid of it.) But entries (if still needed) should reference the entries, not contain an index; maintaining indexes there would be a hopeless task. And always remember that keys and entries are snapshots of "reality" (for lack of a better word) taken at different times.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.