Java concurrency with a Map of Lists

Java concurrency with a Map of Lists - java

I have a java class that is accessed by a lot of threads at once and want to make sure it is thread safe. The class has one private field, which is a Map of Strings to Lists of Strings. I've implemented the Map as a ConcurrentHashMap to ensure gets and puts are thread safe:
public class ListStore {
private Map<String, List<String>> innerListStore;
public ListStore() {
innerListStore = new ConcurrentHashMap<String, List<String>>();
}
...
}
So given that gets and puts to the Map are thread safe, my concern is with the lists that are stored in the Map. For instance, consider the following method that checks if a given entry exists in a given list in the store (I've omitted error checking for brevity):
public boolean listEntryExists(String listName, String listEntry) {
List<String> listToSearch = innerListStore.get(listName);
for (String entryName : listToSearch) {
if(entryName.equals(listEntry)) {
return true;
}
}
return false;
}
It would seem that I need to synchronize the entire contents of this method because if another method changed the contents of the list at innerListStore.get(listName) while this method is iterating over it, a ConcurrentModificationException would be thrown.
Is that correct and if so, do I synchronize on innerListStore or would synchronizing on the local listToSearch variable work?
UPDATE: Thanks for the responses. It sounds like I can synchronize on the list itself. For more information, here is the add() method, which can be running at the same time the listEntryExists() method is running in another thread:
public void add(String listName, String entryName) {
List<String> addTo = innerListStore.get(listName);
if (addTo == null) {
addTo = Collections.synchronizedList(new ArrayList<String>());
List<String> added = innerListStore.putIfAbsent(listName, addTo);
if (added != null) {
addTo = added;
}
}
addTo.add(entryName);
}
If this is the only method that modifies the underlying lists stored in the map and no public methods return references to the map or entries in the map, can I synchronize iteration on the lists themselves and is this implementation of add() sufficient?

You can synchronize on listToSearch ("synchronized(listToSearch) {...}"). Make sure that there is no race condition creating the lists (use innerListStore.putIfAbsent to create them).

You could synchronize on just listToSearch, there's no reason to lock the entire map any time anyone is using just one entry.
Just remember though, that you need to synchronize on the list everywhere it is modified! Synchronizing the iterator doesn't automagically block other people from doing an add() or whatnot if you passed out to them references to the unsynchronized list.
It would be safest to just store synchronized lists in the Map and then lock on them when you iterate, and also document when you return a reference to the list that the user must sycnhronize on it if they iterate. Synchronization is pretty cheap in modern JVMs when no actual contention is happening. Of course if you never let a reference to one of the lists escape your class, you can handle it internally with a finer comb.
Alternately you can use a threadsafe list such as CopyOnWriteArrayList that uses snapshot iterators. What kind of point in time consistency you need is a design decision we can't make for you. The javadoc also includes a helpful discussion of performance characteristics.

It would seem that I need to synchronize the entire contents of this method because if another method changed the contents of the list at innerListStore.get(listName) while this method is iterating over it, a ConcurrentModificationException would be thrown.
Are other threads accessing the List itself, or only though operations exposed by ListStore?
Will operations invoked by other threads result in the contents of the a List stored in the Map being changed? Or will entries only be added/removed from the Map?
You would only need to synchronize access to the List stored within the Map if different threads can result in changes to the same List instances. If the threads are only allowed to add/remove List instances from the Map (i.e. change the structure of the Map), then synchronization is not necessary.

if the lists stored in the map are of the type that don't throw CME (CopyOnWriteArrayList for example) you can iterate at will
this can introduce some races though if you're not careful

If the Map is already thread safe, then I think syncronizing the listToSearch should work. Im not 100% but I think it should work
synchronized(listToSearch)
{
}

You could use another abstraction from Guava
Note that this will synchronize on the whole map, so it might be not that useful for you.

As you haven't provided any client for the map of lists apart from the boolean listEntryExists(String listName, String listEntry) method, I wonder why you are storing lists at all? This structure seems to be more naturally a Map<String, Set<String>> and the listEntryExists should use the contains method (available on List as well, but O(n) to the size of the list):
public boolean listEntryExists(String name, String entry) {
SetString> set = map.get(name);
return (set == null) ? false : set.contains(entry;
}
Now, the contains call can encapsulate whatever internal concurrency protocol you want it to.
For the add you can either use a synchronized wrapper (simple, but maybe slow) or if writes are infrequent compared to reads, utilise ConcurrentMap.replace to implement your own copy-on-write strategy. For instance, using Guava ImmutableSet:
public boolean add(String name, String entry) {
while(true) {
SetString> set = map.get(name);
if (set == null) {
if (map.putIfAbsent(name, ImmutableSet.of(entry))
return true
continue;
}
if (set.contains(entry)
return false; // no need to change, already exists
Set<String> newSet = ImmutableSet.copyOf(Iterables.concat(set, ImmutableSet.of(entry))
if (map.replace(name, set, newSet)
return true;
}
}
This is now an entirely thread-safe lock-free structure, where concurrent readers and writers will not block each other (modulo the lock-freeness of the underlying ConcurrentMap implementation). This implementation does have an O(n) in its write, where your original implementation was O9n) in the read. Again if you are read-mostly rather than write-mostly this could be a big win.

Related

What is the different between map.put and creating a new map?

i'm reading the source code of sentinel, i find when the map need adding a entry, it create a new hashmap replacing the old rather than using map.put directly. like this:
public class NodeSelectorSlot extends AbstractLinkedProcessorSlot<Object> {
private volatile Map<String, DefaultNode> map = new HashMap<String, DefaultNode>(10);
#Override
public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
throws Throwable {
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
if (node == null) {
node = new DefaultNode(resourceWrapper, null);
// create a new hashmap
HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
cacheMap.putAll(map);
cacheMap.put(context.getName(), node);
map = cacheMap;
((DefaultNode) context.getLastNode()).addChild(node);
}
}
}
context.setCurNode(node);
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
...
}
what's the different between them?

The code you are looking is fetching a Node from the map, creating and adding a new Node if one is not present.
Clearly, this operation needs to be thread-safe. The simple ways to implement this would be:
Lock the map and perform get and put operations while holding the lock.
Use a ConcurrentHashMap which has operations for doing this kind of thing atomically; e.g. computeIfAbsent.
The authors of this code have chosen a different approach. They are using so-called Double Checked Locking (DCL) to avoid doing the initial get while holding a lock. That is what this code does:
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
...
The authors have decided that when they then need to add a new entry to the map they need to do it by replacing the entire map with a new one. On the face of it, that seems unnecessary. The map updates are being performed while holding the lock and the volatile adds a happens before that seems to ensure that the initial map.get call sees any recent writes to the HashMap.
But that reasoning is INCORRECT. The problem is that there is a small time window between fetching the map reference and the get call completing. During that time window, a simultaneous put operation may be updating the HashMap data structures. This is harmful because those changes could cause the get to read stale data (because there is no happens before relationship from the put writes to the get reads). Even worse, the put could trigger reconstruction of a hash chain or even an expansion of the hash array. The resulting behavior is (at least) outside of the HashMap spec, since HashMap is not defined to be thread-safe.
The authors' solution is to create a new HashMap with the existing entries and the new one, then update map with a single assignment. I haven't done a formal analysis, but I think that this approach is thread-safe.
In short, the reason that the code creates a new HashMap is to make the DCL approach thread-safe.
And if you ignore the thread-safety aspect, this approach is functionality equivalent to a simple put.
Finally, we need to consider whether the authors' approach is going to give optimal performance. The answer will depend on whether the number of cache entries stabilizes, and whether it is relatively small. One observation is that the cost of adding N entries to the cache is O(N^2) !! (Assuming that entries are never removed, as appears to be the case.)

It is so-called copy-on-write, which is intended to ensure thread-safe. When read operations are a lot more than write operations, it is more efficient than mechanisms like ConcurrentHashMap.
Ref: https://github.com/alibaba/Sentinel/issues/1733

Synchronization for inverse view of synchronized BiMap

The Maps.synchronizedBiMap() method states that
it is imperative that the user manually synchronize on the returned map
when accessing any of its collection views.
Does this include the inverse() view of the BiMap? For example, if the variables are initialized as in the following example, can invoking inverse.put() from other threads be problematic (e.g. the change is not visible in a get() call on either map or inverse, even if put happened-before get)?
BiMap<Object, Object> map = Maps.synchronizedBiMap(HashBiMap.create());
BiMap<Object, Object> inverse = map.inverse();
If this is in fact a problem, is there a standard/recommended way of solving this?
// EDIT
Looking at the implementation, it seems like the inverse() of a SynchronizedBiMap is also a SynchronizedBiMap, sharing the same mutex. Does this mean the described problem is non-existent? Confirmation from a Guava Collections expert would be much appreciated ;)

No, in this case you don't have to synchronize on inversed map. You cited only a fragment of the documentation, I'll also switch original keySet() with inverse() in the example code:
Returns a synchronized (thread-safe) bimap backed by the specified bimap. In order to guarantee serial access, it is critical that all access to the backing bimap is accomplished through the returned bimap.
It is imperative that the user manually synchronize on the returned map when accessing any of its collection views:
BiMap<Long, String> map = Maps.synchronizedBiMap(
HashBiMap.<Long, String>create());
//...
BiMap<String, Long> inverse = map.inverse(); // Needn't be in synchronized block
Set<String> set = inverse.keySet(); // Needn't be in synchronized block
//...
synchronized (map) { // Synchronizing on map, not set!
Iterator<String> it = set.iterator(); // Must be in synchronized block
while (it.hasNext()) {
foo(it.next());
}
}
Failure to follow this advice may result in non-deterministic behavior.
So when you want a deterministic behavior during iteration over its views (which includes in iterating over inverse view), you have to synchronize on your instance.
In case of .inverse(), as you mentioned, it creates new synchronized bimap using same mutex object, so it synchronizes properly on methods like get or contains.

thread safe data structure to preserve order of insertion [duplicate]

I need a data structure that is a LinkedHashMap and is thread safe.
How can I do that ?

You can wrap the map in a Collections.synchronizedMap to get a synchronized hashmap that maintains insertion order. This is not as efficient as a ConcurrentHashMap (and doesn't implement the extra interface methods of ConcurrentMap) but it does get you the (somewhat) thread safe behavior.
Even the mighty Google Collections doesn't appear to have solved this particular problem yet. However, there is one project that does try to tackle the problem.
I say somewhat on the synchronization, because iteration is still not thread safe in the sense that concurrent modification exceptions can happen.

There's a number of different approaches to this problem. You could use:
Collections.synchronizedMap(new LinkedHashMap());
as the other responses have suggested but this has several gotchas you'll need to be aware of. Most notably is that you will often need to hold the collections synchronized lock when iterating over the collection, which in turn prevents other threads from accessing the collection until you've completed iterating over it. (See Java theory and practice: Concurrent collections classes). For example:
synchronized(map) {
for (Object obj: map) {
// Do work here
}
}
Using
new ConcurrentHashMap();
is probably a better choice as you won't need to lock the collection to iterate over it.
Finally, you might want to consider a more functional programming approach. That is you could consider the map as essentially immutable. Instead of adding to an existing Map, you would create a new one that contains the contents of the old map plus the new addition. This sounds pretty bizarre at first, but it is actually the way Scala deals with concurrency and collections

There is one implementation available under Google code. A quote from their site:
A high performance version of java.util.LinkedHashMap for use as a software cache.
Design
A concurrent linked list runs through a ConcurrentHashMap to provide eviction ordering.
Supports insertion and access ordered eviction policies (FIFO, LRU, and Second Chance).

You can use a ConcurrentSkipListMap, only available in Java SE/EE 6 or later. It is order presevering in that keys are sorted according to their natural ordering. You need to have a Comparator or make the keys Comparable objects. In order to mimik a linked hash map behavior (iteration order is the order in time in which entries were added) I implemented my key objects to always compare to be greater than a given other object unless it is equal (whatever that is for your object).
A wrapped synchronized linked hash map did not suffice because as stated in
http://www.ibm.com/developerworks/java/library/j-jtp07233.html: "The synchronized collections wrappers, synchronizedMap and synchronizedList, are sometimes called conditionally thread-safe -- all individual operations are thread-safe, but sequences of operations where the control flow depends on the results of previous operations may be subject to data races. The first snippet in Listing 1 shows the common put-if-absent idiom -- if an entry does not already exist in the Map, add it. Unfortunately, as written, it is possible for another thread to insert a value with the same key between the time the containsKey() method returns and the time the put() method is called. If you want to ensure exactly-once insertion, you need to wrap the pair of statements with a synchronized block that synchronizes on the Map m."
So what only helps is a ConcurrentSkipListMap which is 3-5 times slower than a normal ConcurrentHashMap.

Collections.synchronizedMap(new LinkedHashMap())

Since the ConcurrentHashMap offers a few important extra methods that are not in the Map interface, simply wrapping a LinkedHashMap with a synchronizedMap won't give you the same functionality, in particular, they won't give you anything like the putIfAbsent(), replace(key, oldValue, newValue) and remove(key, oldValue) methods which make the ConcurrentHashMap so useful.
Unless there's some apache library that has implemented what you want, you'll probably have to use a LinkedHashMap and provide suitable synchronized{} blocks of your own.

I just tried synchronized bounded LRU Map based on insertion order LinkedConcurrentHashMap; with Read/Write Lock for synchronization.
So when you are using iterator; you have to acquire WriteLock to avoid ConcurrentModificationException. This is better than Collections.synchronizedMap.
public class LinkedConcurrentHashMap<K, V> {
private LinkedHashMap<K, V> linkedHashMap = null;
private final int cacheSize;
private ReadWriteLock readWriteLock = null;
public LinkedConcurrentHashMap(LinkedHashMap<K, V> psCacheMap, int size) {
this.linkedHashMap = psCacheMap;
cacheSize = size;
readWriteLock=new ReentrantReadWriteLock();
}
public void put(K key, V value) throws SQLException{
Lock writeLock=readWriteLock.writeLock();
try{
writeLock.lock();
if(linkedHashMap.size() >= cacheSize && cacheSize > 0){
K oldAgedKey = linkedHashMap.keySet().iterator().next();
remove(oldAgedKey);
}
linkedHashMap.put(key, value);
}finally{
writeLock.unlock();
}
}
public V get(K key){
Lock readLock=readWriteLock.readLock();
try{
readLock.lock();
return linkedHashMap.get(key);
}finally{
readLock.unlock();
}
}
public boolean containsKey(K key){
Lock readLock=readWriteLock.readLock();
try{
readLock.lock();
return linkedHashMap.containsKey(key);
}finally{
readLock.unlock();
}
}
public V remove(K key){
Lock writeLock=readWriteLock.writeLock();
try{
writeLock.lock();
return linkedHashMap.remove(key);
}finally{
writeLock.unlock();
}
}
public ReadWriteLock getLock(){
return readWriteLock;
}
public Set<Map.Entry<K, V>> entrySet(){
return linkedHashMap.entrySet();
}
}

The answer is pretty much no, there's nothing equivalent to a ConcurrentHashMap that is sorted (like the LinkedHashMap). As other people pointed out, you can wrap your collection using Collections.synchronizedMap(-yourmap-) however this will not give you the same level of fine grained locking. It will simply block the entire map on every operation.
Your best bet is to either use synchronized around any access to the map (where it matters, of course. You may not care about dirty reads, for example) or to write a wrapper around the map that determines when it should or should not lock.

How about this.
Take your favourite open-source concurrent HashMap implementation. Sadly it can't be Java's ConcurrentHashMap as it's basically impossible to copy and modify that due to huge numbers of package-private stuff. (Why do the Java authors always do that?)
Add a ConcurrentLinkedDeque field.
Modify all of the put methods so that if an insertion is successful the Entry is added to the end of the deque. Modify all of the remove methods so that any removed entries are also removed from the deque. Where a put method replaces the existing value, we don't have to do anything to the deque.
Change all iterator/spliterator methods so that they delegate to the deque.
There's no guarantee that the deque and the map have exactly the same contents at all times, but concurrent hash maps don't make those sort of promises anyway.
Removal won't be super fast (have to scan the deque). But most maps are never (or very rarely) asked to remove entries anyway.
You could also achieve this by extending ConcurrentHashMap, or decorating it (decorator pattern).

Java concurrent access to field, trick to not use volatile

Preface: I'm know that in most cases using a volatile field won't yield any measurable performance penalty, but this question is more theoretical and targeted towards a design with an extremly high corrency support.
I've got a field that is a List<Something> which is filled after constrution. To save some performance I would like to convert the List into a read only Map. Doing so at any point requires at least a volatile Map field so make changes visible for all threads.
I was thinking of doing the following:
Map map;
public void get(Object key){
if(map==null){
Map temp = new Map();
for(Object value : super.getList()){
temp.put(value.getKey(),value);
}
map = temp;
}
return map.get(key);
}
This could cause multiple threads to generate the map even if they enter the get block in a serialized way. This would be no big issue, if threads work on different identical instances of the map. What worries me more is:
Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Answers to comments:
The threads only modify the temporary map after that it is read only.
I must convert a List to a Map because of some speical JAXB setup which doesn't make it feasable to have a Map to begin with.

Is it possible that one thread assigns the new temp map to the map field, and then a second thread sees that map!=null and therefore accesses the map field without generating a new one, but to my suprise finds that the map is empty, because the put operations where not yet pushed to some shared memory area?
Yes, this is absolutely possible; for example, an optimizing compiler could actually completely get rid of the local temp variable, and just use the map field the whole time, provided it restored map to null in the case of an exception.
Similarly, a thread could also see a non-null, non-empty map that is nonetheless not fully populated. And unless your Map class is carefully designed to allow simultaneous reads and writes (or uses synchronized to avoid the issue), you could also get bizarre behavior if one thread is calling its get method while another is calling its put.

Can you create your Map in the ctor and declare it final? Provided you don't leak the map so others can modify it, that should suffice to make your get() safely sharable by multiple threads.

When you really in doubt whether an other thread could read an "half completed" map
(I don't think so, but never say never ;-), you may try this.
map is null or complete
static class MyMap extends HashMap {
MyMap (List pList) {
for(Object value : pList){
put(value.getKey(), value);
}
}
}
MyMap map;
public Object get(Object key){
if(map==null){
map = new MyMap (super.getList());
}
return map.get(key);
}
Or does someone see a new introduced problem ?

In addition to the visibility concerns previously mentioned, there is another problem with the original code, viz. it can throw a NullPointerException here:
return this.map.get(key)
Which is counter-intuitive, but that is what you can expect from incorrectly synchronized code.
Sample code to prevent this:
Map temp;
if ((temp = this.map) == null)
{
temp = new ImmutableMap(getList());
this.map = temp;
}
return temp.get(key);

modifying a ConcurrentHashMap and Synchronized ArrayList in same method

I have a collection of objects that is modified by one thread and read by another (more specifically the EDT). I needed a solution that gave me fast look up and also fast indexing (by order inserted), so I'm using a ConcurrentHashMap with an accompanying ArrayList of the keys, so if want to index an entry, I can index the List for the key and then use the returned key to get the value from the hash map. So I have a wrapper class that makes sure when and entry is added, the mapping is added to the hash map and the key is added to the list at the same time, similarly for removal.
I'm posting an example of the code in question:
private List<K> keys = Collections.synchronizedList(new ArrayList<K>(INITIAL_CAPACITY));
private ConcurrentMap<K, T> entries = new ConcurrentHashMap<K, T>(INITIAL_CAPACITY, .75f);
public synchronized T getEntryAt(int index){
return entries.get(keys.get(index));
}
**public synchronized void addOrReplaceEntry(K key, T value){
T result = entries.get(key);
if(result == null){
entries.putIfAbsent(key, value);
keys.add(key);
}
else{
entries.replace(key, result);
}
}**
public syncrhonized T removeEntry(K key, T value){
keys.remove(key);
entries.remove(key, value);
}
public synchronized int getSize(){
return keys.size();
}
my question is: am I losing all the benefits of using the ConcurrentHashMap (over syncrhonized hashmap) by operating on it in synchronized methods? I have to synchronize the methods to safely modify/read from the ArrayList of keys (CopyOnWriteArrayList is not an option because a lot of modification happens...) Also, if you know of a better way to do this, that would be appreciated...

Yes, using a Concurrent collection and a Synchronized collection in only synchronized blocks is a waste. You wont get the benefits of ConcurrentHashMap because only one thread will be accesing it at a time.
You could have a look at this implementation of a concurrent linked hashmap, I havnt use it so can't attest to it's features.
One thing to consider would be to switching from synchronized blocks to a ReadWriteLock to improve concurrent read only performance.
I'm not really sure of the utility of proving a remove at index method, perhaps you could give some more details about the problem you are trying to solve?

It seems that you only care about finding values by index. If so, dump the Map and just use a List. Why do you need the Map?

Mixing synchronized and concurrent collections the way you have done it is not recommended. Any reason why you are maintaining two copies of the stuff you are interested in? You can easily get a list of all the keys from the map anytime rather than maintaining a separate list.

Why not store the values in the list and in the map the key -> index mapping?
so for getEntry you only need on lookup (in the list which should be anyway faster than a map) and for remove you do not have to travers the whole list. Syhnronization happens so.

You can get all access to the List keys onto the event queue using EventQueue.invokeLater. This will get rid of the synchronization. With all the synching you were not running much in parallel anyway. Also it means the getSize method will give the same answer for the duration of an event.
If you stick with synchronization instead of using invokeLater, at least get the entries hash table out of the synch block. Either way, you get more parallel processing. Of course, entries can now become out-of-synch with keys. The only down side is sometimes a key will come up with a null entry. With such a dynamic table this is unlikely to matter much.
Using the suggestion made by chrisichris to put the values in the list will solve this problem if it is one. In fact, this puts a nice wall between keys and entries; they are now used in completely separate ways. (If your only need for entries is to provide values to the JTable, you can get rid of it.) But entries (if still needed) should reference the entries, not contain an index; maintaining indexes there would be a hopeless task. And always remember that keys and entries are snapshots of "reality" (for lack of a better word) taken at different times.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.