how to make access to a value in a Java Hashmap synchronized? - java

Let's say I have a Java Hashmap where the keys are strings or whatever, and the values are lists of other values, for example
Map<String,List<String>> myMap=new HashMap<String,List<String>>();
//adding value to it would look like this
myMap.put("catKey", new ArrayList<String>(){{add("catValue1");}} );
If we have many threads adding and removing values from the lists (not changing the keys just the values of the Hashmap) is there a way to make the access to the lists only threadsafe? so that many threads can edit many values in the same time?

Use a synchronized or concurrent list implementation instead of ArrayList, e.g.
new Vector() (synchronized)
Collections.synchronizedList(new ArrayList<>()) (synchronized wrapper)
new CopyOnWriteArrayList<>() (concurrent)
new ConcurrentLinkedDeque<>() (concurrent, not a List)
The last is not a list, but is useful if you don't actually need access-by-index (i.e. random access), because it performs better than the others.
Note, since you likely need concurrent insertion of the initial empty list into the map for a new key, you should use a ConcurrentHashMap for the Map itself, instead of a plain HashMap.
Recommendation
Map<String, Deque<String>> myMap = new ConcurrentHashMap<>();
// Add new key/value pair
String key = "catKey";
String value = "catValue1";
myMap.computeIfAbsent(key, k -> new ConcurrentLinkedDeque<>()).add(value);
The above code is fully thread-safe when adding a new key to the map, and fully thread-safe when adding a new value to a list. The code doesn't spend time obtaining synchronization locks, and don't suffer the degradation that CopyOnWriteArrayList has when the list grows large.
The only problem is that it uses a Deque, not a List, but the reality is that most uses of List could as easily be using a Deque, but is specifying a List out of habit, so this is likely an acceptable change.

There is a ConcurrentHashMap class which implements ConcurrentMap which can be used for thread-safe Map handling. compute, putIfAbsent, merge, all thread-safely handle multiple things trying to affect the same value at once.

Firstly use the concurrent hash map which will synchronize that particular bucket.
Secondly atomic functions must be used, otherwise when one thread will use the get method another thread can call the put method. Like below
// wrong
if(myMap.get("catKey") == null){
myMap.put("catKey",new ArrayList<String>(){{add("catValue1");}});
}
//correct
myMap.compute("catKey", (key, value) -> if(value==null){return new ArrayList<String>(){{add("catValue1");}}} return value;);

Related

What is the different between map.put and creating a new map?

i'm reading the source code of sentinel, i find when the map need adding a entry, it create a new hashmap replacing the old rather than using map.put directly. like this:
public class NodeSelectorSlot extends AbstractLinkedProcessorSlot<Object> {
private volatile Map<String, DefaultNode> map = new HashMap<String, DefaultNode>(10);
#Override
public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
throws Throwable {
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
if (node == null) {
node = new DefaultNode(resourceWrapper, null);
// create a new hashmap
HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
cacheMap.putAll(map);
cacheMap.put(context.getName(), node);
map = cacheMap;
((DefaultNode) context.getLastNode()).addChild(node);
}
}
}
context.setCurNode(node);
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
...
}
what's the different between them?
The code you are looking is fetching a Node from the map, creating and adding a new Node if one is not present.
Clearly, this operation needs to be thread-safe. The simple ways to implement this would be:
Lock the map and perform get and put operations while holding the lock.
Use a ConcurrentHashMap which has operations for doing this kind of thing atomically; e.g. computeIfAbsent.
The authors of this code have chosen a different approach. They are using so-called Double Checked Locking (DCL) to avoid doing the initial get while holding a lock. That is what this code does:
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
...
The authors have decided that when they then need to add a new entry to the map they need to do it by replacing the entire map with a new one. On the face of it, that seems unnecessary. The map updates are being performed while holding the lock and the volatile adds a happens before that seems to ensure that the initial map.get call sees any recent writes to the HashMap.
But that reasoning is INCORRECT. The problem is that there is a small time window between fetching the map reference and the get call completing. During that time window, a simultaneous put operation may be updating the HashMap data structures. This is harmful because those changes could cause the get to read stale data (because there is no happens before relationship from the put writes to the get reads). Even worse, the put could trigger reconstruction of a hash chain or even an expansion of the hash array. The resulting behavior is (at least) outside of the HashMap spec, since HashMap is not defined to be thread-safe.
The authors' solution is to create a new HashMap with the existing entries and the new one, then update map with a single assignment. I haven't done a formal analysis, but I think that this approach is thread-safe.
In short, the reason that the code creates a new HashMap is to make the DCL approach thread-safe.
And if you ignore the thread-safety aspect, this approach is functionality equivalent to a simple put.
Finally, we need to consider whether the authors' approach is going to give optimal performance. The answer will depend on whether the number of cache entries stabilizes, and whether it is relatively small. One observation is that the cost of adding N entries to the cache is O(N^2) !! (Assuming that entries are never removed, as appears to be the case.)
It is so-called copy-on-write, which is intended to ensure thread-safe. When read operations are a lot more than write operations, it is more efficient than mechanisms like ConcurrentHashMap.
Ref: https://github.com/alibaba/Sentinel/issues/1733

Can we use Synchronized for each entry instead of ConcurrentHashMap?

This is the problem: we want a hash table whose entries are thread-safe.
Suppose I have a hash table of <String, Long>, and I want to increase the value of one of the entries thread safely: is the following OK?:
HashMap<String , Long> hashTable = new HashMap<String, Long>();
Then whenever I want to increase an entry:
Synchronized (hashTable.get("key"))
{
Long value = hashTable.get("key");
value++;
hashTable.put("key", value);
}
I think it is better than ConcurrentHashMap, as it locks just one entry, unlike ConcurrentHashMap which uses buckets, and lock a group of entries together.
More importantly, I don't know how to increment it using COncurrenHashMap safely. For example I think the following code is not correct:
ConcurrentHashMap<String , Long> hashTable = new ConcurrentHashMap<String, Long>();
Long value = hashTable.get("key");
value++;
hashTable.put("key", value);
I think it is not correct, because two threads can read the key one after another, and write one after another and end up in a wrong value.
What do you think guys?
Your proposed approach is not thread-safe because the initial hashTable.get() operation -- by which you obtain the object on which you intend to synchronize -- is not itself synchronized relative to other threads put()ing a value associated with the same key. Moreover, your code does not account for the possibility of new values being added to the map or keys being removed from the map (so-called "structural modifications"). If ever that can happen, regardless of key, then those actions have to be synchronized with respect to all other accesses to the map.
You are right, however, that ConcurrentHashMap does not solve these problems either. It is thread-safe with respect to the individual operations it provides, which include some that Map itself does not define, but series of operations that must be performed as an uninterrupted unit still need to be protected by synchronization.
I suggest a slightly different approach: use a ConcurrentHashMap with AtomicLong, which is mutable, as your value type instead of Long:
ConcurrentHashMap<String, AtomicLong> map;
Then, to update the value for a key, even if you're not confident that the key already has an entry in the map, you do this:
AtomicLong value = map.putIfAbsent(key, new AtomicLong(0));
long updatedValue = value.incrementAndGet();
The putIfAbsent() ensures that value objects are not clobbered by conflicting put operations. The use of AtomicLong avoids the need for multiple operations to be jointly synchronized, because only one map access is needed -- the value retrieved is shared by all threads accessing it, and can itself be atomically updated without further accessing the map.
If you can be certain that the map already has a mapping for the given key, then you can simply do this:
AtomicLong value = map.get(key);
long updatedValue = value.incrementAndGet();
One way or the other, I think this is about the best you can do for the operations you describe and imply.
Update:
You could even consider combining the two approaches like this:
AtomicLong value = map.get(key);
if (value == null) {
value = map.putIfAbsent(key, new AtomicLong(0));
}
long updatedValue = value.incrementAndGet();
That supposes that it will be comparatively rare that there is not already a mapping for the given key, and it avoids creating a new AtomicLong in that case. If no mapping is found then the map must be accessed a second time to ensure that there is a mapping and to get the corresponding value, but here we still need putIfAbsent() if we want to avoid synchronization, because it is possible for two threads to both try to add a mapping for the same key, at about the same time. That's more costly when a new entry needs to be added, but it's possible that it would turn out to be less costly on average than my first suggestion. As with any performance question, however, it is essential to test.

is Treemap inside ConcurrentHashMap thread safe?

I have a case of nested maps as follows:
private final static Map<String, TreeMap<Long,String>> outerConcurrentMap = new ConcurrentHashMap<>();
I know that ConcurrentHashMap is thread safe, but I want to know about the TreeMaps this CHM holding, are they also thread safe inside CHM ?
The operations I am doing are:
If specific key is not found --> create new TreeMap and put against key.
If key is found then get the TreeMap, and update it.
Retrieve TreeMap from CHM using get(K).
Retreive data from TreeMap using tailMap(K,boolean) method.
clear() the CHM.
I want a thread-safe structure in this scenario. Is the above implementation thread-safe or not? If not then please suggest a solution.
Once you've done TreeMap<?, ?> tm = chm.get(key); you are not in thread safe territory any longer. In particular, if another thread updates the treemap (through the CHM or not) you may or may not see the change. Worse, the copy of the map that you have in tm may be corrupted...
One option would be to use a thread safe map, such as a ConcurrentSkipListMap.
Simple answer: no.
If your map is a ConcurrentHashMap, then all operations that affect the state of your hashmap are thread-safe. That does not at all mean that objects stored in that map become thread-safe.
How would that work; you create any kind of object, and by adding it to such a map, the object itself becomes thread-safe? And when you remove that object from the map, the "thread-unsafety" is restored?!
Assuming you're doing all of this in multiple threads, no, it's not thread-safe.
Ignore the fact that you've accessed the TreeMap via a ConcurrentHashMap - you end up with multiple threads accessing the TreeMap at the same time, including one or more of them writing to the map. That's not safe, because TreeMap isn't thread-safe for that situation:
Note that this implementation is not synchronized. If multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
Some your scenarios are thread-safe, some are not:
1. Yes this is thread safe though other threads cannot see newly created TreeMap until you put it to CHM. But this should be implemented carefully to avoid race conditions - you should make it sure that checking and insertion are performed atomically:
// create an empty treemap somewhere before
TreeMap<Long, String> emptyMap = new TreeMap<>();
...
// On access, use putIfAbsent method to make sure that if 2 threads
// try to get same key without associated value sumultaneously,
// the same empty map is returned
if (outerConcurrentMap.putIfAbsent(key, emptyMap) == null) {
emptyMap = new TreeMap<>();
};
map = outerConcurrentMap.get(key);
2, 3, 4. No, you first need to lock this TreeMap by explicit lock or using synchronized. TreeMap is not synchronized by itself.
5. Yes, this is operation is performed on CHM, so it is thread-safe.
If you need fully thread-safe sorted map, use ConcurrentSkipListMap instead. It is slower than TreeMap but its internal structure doesn't need to lock full collection during access thus making it effective in concurrent environment.
The TreeMap itself should not be thread safe. Since only the methods of the ConcurrentHashMap are effected.
What you could do is following:
private final static Map<String, SortedMap <Long,String>> outerConcurrentMap= new ConcurrentHashMap<String, SortedMap <Long,String> >();
static {
// Just an example
SortedMap map = Collections.synchronizedSortedMap(new TreeMap(...));
outerConcurrentMap.put("...",map);
}

Does map need to be synchronized if for each entry only one thread is accessing it?

I have a map. Lets say:
Map<String, Object> map = new HashMap<String, Object>();
Multiple threads are accessing this map, however each thread accesses only its own entries in the map. This means that if thread T1 inserts object A into the map, it is guaranteed that no other thread will access object A. Finally thread T1 will also remove object A.
It is guaranteed as well that no thread will iterate over the map.
Does this map need to be synchronized? If yes how would you synchronize it? (ConcurrentHashMap, Collections.synchronizedMap() or synchronized block)
Yes, you would need synchronization, or a concurrent map. Just think about the size of the map: two threads could add an element in parallel, and both increment the size. If you don't synchronize the map, you could have a race condition and it would result in an incorrect size. There are many other things that could go wrong.
But you could also use a different map for each thread, couldn't you?
A ConcurrentHashMap is typically faster that a synchronized HashMap. But the choice depends on your requirements.
If you're sure that there's only one entry per thread and none thread iterates/searches through the map, then why do you need a map?
You can use ThreadLocal object instead which will contain thread-specific data. If you need to keep string-object pairs, you can create an special class for this pair, and keep it inside ThreadLocal field.
class Foo {
String key;
Object value;
....
}
//below was your Map declaration
//Map<String, Object> map = ...
//Use here ThreadLocal instead
final ThreadLocal<Foo> threadLocalFoo = new ThreadLocal<Foo>();
...
threadLocalFoo.set(new Foo(...));
threadLocalFoo.get() //returns your object
threadLocalFoo.remove() //clears threadLocal container
More info on ThreadLocals you can find in ThreadLocal javadocs.
I would say that yes. Getting the data is not the issue, adding the data is.
The HashMap has a series of buckets (lists); when you put data to the HashMap, the hashCode is used to decide in which bucket the item goes, and the item is added to the list.
So it can be that two items are added to the same bucket at the same time and, due to some run condition, only one of them is effectively stored.
You have to synchronize writing operations in the map. If after initializating the map, no thread is going to insert new entries, or delete entries in the map you don't need to synchronize it.
However, in your case (where each thread has its own entry) I'd recommend using ThreadLocal, which allows you to have a "local" object which will have different values per thread.
Hope it helps
For this scenario I think ConcurrentHashMap is the best Map, because both Collections.synchronizedMap() or synchronized block (which are basically the same) have more overhead.
If you want to insert entries and not only read them in different threads you have to synchronize them because of the way the HashMap works.
- First of all its always a practice to write a Thread-safe code, specially in cases like the above, not in all conditions.
- Well its better to use HashTable which is a synchronized Map, or java.util.concurrent.ConcurrentHashMap<K,V>.

modifying a ConcurrentHashMap and Synchronized ArrayList in same method

I have a collection of objects that is modified by one thread and read by another (more specifically the EDT). I needed a solution that gave me fast look up and also fast indexing (by order inserted), so I'm using a ConcurrentHashMap with an accompanying ArrayList of the keys, so if want to index an entry, I can index the List for the key and then use the returned key to get the value from the hash map. So I have a wrapper class that makes sure when and entry is added, the mapping is added to the hash map and the key is added to the list at the same time, similarly for removal.
I'm posting an example of the code in question:
private List<K> keys = Collections.synchronizedList(new ArrayList<K>(INITIAL_CAPACITY));
private ConcurrentMap<K, T> entries = new ConcurrentHashMap<K, T>(INITIAL_CAPACITY, .75f);
public synchronized T getEntryAt(int index){
return entries.get(keys.get(index));
}
**public synchronized void addOrReplaceEntry(K key, T value){
T result = entries.get(key);
if(result == null){
entries.putIfAbsent(key, value);
keys.add(key);
}
else{
entries.replace(key, result);
}
}**
public syncrhonized T removeEntry(K key, T value){
keys.remove(key);
entries.remove(key, value);
}
public synchronized int getSize(){
return keys.size();
}
my question is: am I losing all the benefits of using the ConcurrentHashMap (over syncrhonized hashmap) by operating on it in synchronized methods? I have to synchronize the methods to safely modify/read from the ArrayList of keys (CopyOnWriteArrayList is not an option because a lot of modification happens...) Also, if you know of a better way to do this, that would be appreciated...
Yes, using a Concurrent collection and a Synchronized collection in only synchronized blocks is a waste. You wont get the benefits of ConcurrentHashMap because only one thread will be accesing it at a time.
You could have a look at this implementation of a concurrent linked hashmap, I havnt use it so can't attest to it's features.
One thing to consider would be to switching from synchronized blocks to a ReadWriteLock to improve concurrent read only performance.
I'm not really sure of the utility of proving a remove at index method, perhaps you could give some more details about the problem you are trying to solve?
It seems that you only care about finding values by index. If so, dump the Map and just use a List. Why do you need the Map?
Mixing synchronized and concurrent collections the way you have done it is not recommended. Any reason why you are maintaining two copies of the stuff you are interested in? You can easily get a list of all the keys from the map anytime rather than maintaining a separate list.
Why not store the values in the list and in the map the key -> index mapping?
so for getEntry you only need on lookup (in the list which should be anyway faster than a map) and for remove you do not have to travers the whole list. Syhnronization happens so.
You can get all access to the List keys onto the event queue using EventQueue.invokeLater. This will get rid of the synchronization. With all the synching you were not running much in parallel anyway. Also it means the getSize method will give the same answer for the duration of an event.
If you stick with synchronization instead of using invokeLater, at least get the entries hash table out of the synch block. Either way, you get more parallel processing. Of course, entries can now become out-of-synch with keys. The only down side is sometimes a key will come up with a null entry. With such a dynamic table this is unlikely to matter much.
Using the suggestion made by chrisichris to put the values in the list will solve this problem if it is one. In fact, this puts a nice wall between keys and entries; they are now used in completely separate ways. (If your only need for entries is to provide values to the JTable, you can get rid of it.) But entries (if still needed) should reference the entries, not contain an index; maintaining indexes there would be a hopeless task. And always remember that keys and entries are snapshots of "reality" (for lack of a better word) taken at different times.

Categories

Resources