Necessary to synchronize a concurrent HashMap when calling values()? - java

In the following code:
private final Map<A, B> entriesMap = Collections
.synchronizedMap(new HashMap<A, B>());
// ...
List<B> entries = new ArrayList<>(this.entriesMap.values());
If entriesMap is being accessed/modified by multiple threads in other methods, is it necessary to synchronize on entriesMap? In other words:
List<B> entries;
synchronize (this.entriesMap) {
entries = new ArrayList<>(this.entriesMap.values());
}
If I am correct, values() is not an atomic operation, unlike put() and get(), right?
Thanks!

The problem is that even if values() itself were atomic, the act of iterating over it is isn't. The ArrayList constructor can't take a copy of the values in an atomic way - and the iterator will be invalidated if another thread changes the map while it's copying them.

Well, calling values might well be an atomic operation, but the Collection it returns is not a snapshot copy, but backed by the underlying Map, so it will croak when there are concurrent modifications to the Map as you iterate it afterwards (when copying it into the ArrayList).
Note that this (ConcurrentModificationException) also happens when there is just one thread, as long as you iterate the values and modify the Map in an interleaved fashion, so this is not really a problem of thread synchronization.
Further note that there is ConcurrentHashMap, which does provide for a snapshot iterator, that you can iterate while modifying the Map (modifications are not reflected in the iterator). But even with ConcurrentHashMap, the Collection of values() is not a snapshot, it works just like for the normal HashMap.

Yes, you are right.
When put the values of the map in the ArrayList, an iteration is performed on the values. So you need the synchronized block. See this page.

Collections.synchronizedMap() guarantees that each atomic operation you want to run on the map will be synchronized. and values() is also atomic , but when you are putting values to ArrayList that will not be synchronized in this case you need to synchronized the list also.

Related

Java Concurrency: HashMap Vs ConcurrentHashMap when concurrent threads only remove elements

i have a main thread that creates a HashMap, adds multiple runnable objects to it and then starts each runnable (passing the HashMap to each). The runnable removes its object from the map just before it is about to finish processing.
I would like to know if there is any reason to use a ConcurrentHashMap (rather than a HashMap) in this case - the only operation the runnables perform on the map is to remove themselves from it. Is there a concurrency consideration that necessitates the use of ConcurrentHashMap in this case?
Main thread
private final Map<Integer, Consumer> runnableMap = new HashMap<>();
Runnable runnable;
for (int i = 1; i <= NUM_RUNNABLES; i++) {
runnable = new Consumer(i, runnableMap);
runnableMap.put(i, runnable);
executionContext.execute(runnable);
}
Consumer implements Runnable
private final Integer consumerNumber;
private final Map<Integer, Consumer> runnableMap;
public Consumer(int consumerNumber, final Map<Integer, Consumer> runnableMap){
this.consumerNumber = consumerNumber;
this.runnableMap = runnableMap;
}
public void run() {
:::
// business logic
:::
// Below remove is the only operation this thread executes on the map
runnableMap.remove(consumerNumber);
}
If your reason for doing this is to track thread completion, why not use a CountdownLatch? Not sure if a HashMap can have concurrency issues only on remove, I recommend use it only if your code will not break on any possible issue, or go with a ConcurrentHashMap.
The javadoc of HashMap says:
Note that this implementation is not
synchronized.
If multiple threads access a hash map
concurrently, and at least one of the threads modifies the map
structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one
or more mappings; merely changing the value associated with a key
that an instance already contains is not a structural
modification.) This is typically accomplished by synchronizing on
some object that naturally encapsulates the map.
As mentioned above, deletion is a structural change and you must use synchronization.
Furthermore, in the removeNode() method of Hashmap (which is called by the remove() method), the modCount variable is incremented, which is responsible for ConcurrentModificationException. So you might get this exception if you remove elements without synchronization.
Therefore you must use a ConcurrentHashMap.
You asked about differences between HashMap and ConcurrentHashMap, but there is an additional data structure to consider: Hashtable. There are differences and trade-offs for each. You will need to evaluate which is the best fit for your intended usage.
HashMap is unsynchronized, so if more than one thread can read from or write to it, your results will be unpredictable. HashMap also permits null as key or value.
Hashtable is synchronized, doesn't support null keys or values. From the Hashtable Javadoc:
Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.
ConcurrentHashMap is thread-safe, doesn't allow null to be used as a key or value.

How to initialize a hashTable with safe-thread object as value?

HashTable is a thread-safe collection but does initializing it with an ArrayList (which is not thread-safe) as value endanger the whole thread-safety aspect?
Hashtable <Employee, ArrayList<Car>> carDealership = new Hashtable<>();
Further on, I am planning to wrap every action of ArrayLists in a synchronized block to prevent any race-conditions when operating with any methods.
Yet I haven't declared the ArrayLists in the HashTable as synchronized lists, this being achieved with the following code
Collections.synchronizedList(new ArrayList<>())
This will happen when I will be adding ArrayLists to the HashTable obviously.
How can I be sure that the ArrayLists in the HashTable are thread-safe?
Is it enough to pass a thread-safe ArrayList to the put() method of the hashTable and I'm good to go? (and not even worry about the constructor of the HashTable?) Therefore the put() method of the HashTable doesn't even recognize if I am passing a thread-safe/unsafe parameter?
Note: Thread-safety is a requirement. Otherwise I wouldn't have opted for this implementation.
The only way to ensure that the values in the Hashtable or ConcurrentHashMap are thread-safe is to wrap it in a way that prevents anyone from adding something that you don't control. Never expose the Map itself or any of the Lists contained in it to other parts of your code. Provide methods to get snapshot-copies if you need them, provide methods to add values to the lists, but make sure the class wrapping the map is the one that will create all lists that can ever get added to it. Iteration over the "live" lists in you map will require external synchronisation (as metioned in the JavaDocs of synchronizedList).
Both Hashtable and ConcurrentHashMap are thread-safe in that concurrent operations will not leave them in an invalid state. This means e.g. that if you invoke put from two threads with the same key, one of them will return the value the other inserted as the "old" value. But of course you can't tell which will be the first and which will be second in advance without some external synchronization.
The implementation is quite different, though: Hashtable and the synchronized Map returned by Collections.synchronizedMap(new HashMap()); are similar in that they basically add synchronized modifiers to most methods. This can be inefficient if you have lots of threads (i.e. high contention for the locks) that mostly read, but only occasionally modify the map. ConcurrentHashMap provides more fine grained locking:
Retrieval operations (including get) generally do not block
which can yield significantly better performance, depending on your use case. I also provides a richer API with powerful search- and bulk-modification-operations.
Yes, using ArrayList in this case is not thread safe. You can always get the object from the table and operate on it.
CopyOnWriteArrayList is a good substitue for it.
But you still have the case, when one thread takes (saves in a variable) the collection, and the other thread replaces with another one.
If you are not going to replace the lists inside the table, then this is not a problem.

is Treemap inside ConcurrentHashMap thread safe?

I have a case of nested maps as follows:
private final static Map<String, TreeMap<Long,String>> outerConcurrentMap = new ConcurrentHashMap<>();
I know that ConcurrentHashMap is thread safe, but I want to know about the TreeMaps this CHM holding, are they also thread safe inside CHM ?
The operations I am doing are:
If specific key is not found --> create new TreeMap and put against key.
If key is found then get the TreeMap, and update it.
Retrieve TreeMap from CHM using get(K).
Retreive data from TreeMap using tailMap(K,boolean) method.
clear() the CHM.
I want a thread-safe structure in this scenario. Is the above implementation thread-safe or not? If not then please suggest a solution.
Once you've done TreeMap<?, ?> tm = chm.get(key); you are not in thread safe territory any longer. In particular, if another thread updates the treemap (through the CHM or not) you may or may not see the change. Worse, the copy of the map that you have in tm may be corrupted...
One option would be to use a thread safe map, such as a ConcurrentSkipListMap.
Simple answer: no.
If your map is a ConcurrentHashMap, then all operations that affect the state of your hashmap are thread-safe. That does not at all mean that objects stored in that map become thread-safe.
How would that work; you create any kind of object, and by adding it to such a map, the object itself becomes thread-safe? And when you remove that object from the map, the "thread-unsafety" is restored?!
Assuming you're doing all of this in multiple threads, no, it's not thread-safe.
Ignore the fact that you've accessed the TreeMap via a ConcurrentHashMap - you end up with multiple threads accessing the TreeMap at the same time, including one or more of them writing to the map. That's not safe, because TreeMap isn't thread-safe for that situation:
Note that this implementation is not synchronized. If multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
Some your scenarios are thread-safe, some are not:
1. Yes this is thread safe though other threads cannot see newly created TreeMap until you put it to CHM. But this should be implemented carefully to avoid race conditions - you should make it sure that checking and insertion are performed atomically:
// create an empty treemap somewhere before
TreeMap<Long, String> emptyMap = new TreeMap<>();
...
// On access, use putIfAbsent method to make sure that if 2 threads
// try to get same key without associated value sumultaneously,
// the same empty map is returned
if (outerConcurrentMap.putIfAbsent(key, emptyMap) == null) {
emptyMap = new TreeMap<>();
};
map = outerConcurrentMap.get(key);
2, 3, 4. No, you first need to lock this TreeMap by explicit lock or using synchronized. TreeMap is not synchronized by itself.
5. Yes, this is operation is performed on CHM, so it is thread-safe.
If you need fully thread-safe sorted map, use ConcurrentSkipListMap instead. It is slower than TreeMap but its internal structure doesn't need to lock full collection during access thus making it effective in concurrent environment.
The TreeMap itself should not be thread safe. Since only the methods of the ConcurrentHashMap are effected.
What you could do is following:
private final static Map<String, SortedMap <Long,String>> outerConcurrentMap= new ConcurrentHashMap<String, SortedMap <Long,String> >();
static {
// Just an example
SortedMap map = Collections.synchronizedSortedMap(new TreeMap(...));
outerConcurrentMap.put("...",map);
}

Difference between CopyOnWriteArrayList and synchronizedList

As per my understanding concurrent collection classes preferred over synchronized collections because the concurrent collection classes don't take a lock on the complete collection object. Instead they take locks on a small segment of the collection object.
But when I checked the add method of CopyOnWriteArrayList, we are acquiring a lock on complete collection object. Then how come CopyOnWriteArrayList is better than a list returned by Collections.synchronizedList? The only difference I see in the add method of CopyOnWriteArrayList is that we are creating copy of that array each time the add method is called.
public boolean add(E e) {
final ReentrantLock lock = this.lock;
lock.lock();
try {
Object[] elements = getArray();
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len + 1);
newElements[len] = e;
setArray(newElements);
return true;
} finally {
lock.unlock();
}
}
As per my understanding concurrent collection classes preferred over synchronized collection because concurrent collection classes don't take lock on complete collection object. Instead it takes lock on small segment of collection object.
This is true for some collections but not all. A map returned by Collections.synchronizedMap locks the entire map around every operation, whereas ConcurrentHashMap locks only one hash bucket for some operations, or it might use a non-blocking algorithm for others.
For other collections, the algorithms in use, and thus the tradeoffs, are different. This is particularly true of lists returned by Collections.synchronizedList compared to CopyOnWriteArrayList. As you noted, both synchronizedList and CopyOnWriteArrayList take a lock on the entire array during write operations. So why are the different?
The difference emerges if you look at other operations, such as iterating over every element of the collection. The documentation for Collections.synchronizedList says,
It is imperative that the user manually synchronize on the returned list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized (list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic behavior.
In other words, iterating over a synchronizedList is not thread-safe unless you do locking manually. Note that when using this technique, all operations by other threads on this list, including iterations, gets, sets, adds, and removals, are blocked. Only one thread at a time can do anything with this collection.
By contrast, the doc for CopyOnWriteArrayList says,
The "snapshot" style iterator method uses a reference to the state of the array at the point that the iterator was created. This array never changes during the lifetime of the iterator, so interference is impossible and the iterator is guaranteed not to throw ConcurrentModificationException. The iterator will not reflect additions, removals, or changes to the list since the iterator was created.
Operations by other threads on this list can proceed concurrently, but the iteration isn't affected by changes made by any other threads. So, even though write operations lock the entire list, CopyOnWriteArrayList still can provide higher throughput than an ordinary synchronizedList. (Provided that there is a high proportion of reads and traversals to writes.)
For write (add) operation, CopyOnWriteArrayList uses ReentrantLock and creates a backup copy of the data and the underlying volatile array reference is only updated via setArray(Any read operation on the list during before setArray will return the old data before add).Moreover, CopyOnWriteArrayList provides snapshot fail-safe iterator and doesn't throw ConcurrentModifficationException on write/ add.
But when I checked add method of CopyOnWriteArrayList.class, we are acquiring lock on complete collection object. Then how come CopyOnWriteArrayList is better than synchronizedList. The only difference I see in add method of CopyOnWriteArrayList is we are creating copy of that array each time add method get called.
No, the lock is not on the entire Collection object. As stated above it is a ReentrantLock and it is different from the intrinsic object lock.
The add method will always create a copy of the existing array and do the modification on the copy and then finally update the volatile reference of the array to point to this new array. And that's why we have the name "CopyOnWriteArrayList" - makes copy when you write into it.. This also avoids the ConcurrentModificationException
1) get and other read operation on CopyOnWriteArrayList are not synchronized.
2) CopyOnWriteArrayList's iterator never throws ConcurrentModificationException while Collections.synchronizedList's iterator may throw it.

Does map need to be synchronized if for each entry only one thread is accessing it?

I have a map. Lets say:
Map<String, Object> map = new HashMap<String, Object>();
Multiple threads are accessing this map, however each thread accesses only its own entries in the map. This means that if thread T1 inserts object A into the map, it is guaranteed that no other thread will access object A. Finally thread T1 will also remove object A.
It is guaranteed as well that no thread will iterate over the map.
Does this map need to be synchronized? If yes how would you synchronize it? (ConcurrentHashMap, Collections.synchronizedMap() or synchronized block)
Yes, you would need synchronization, or a concurrent map. Just think about the size of the map: two threads could add an element in parallel, and both increment the size. If you don't synchronize the map, you could have a race condition and it would result in an incorrect size. There are many other things that could go wrong.
But you could also use a different map for each thread, couldn't you?
A ConcurrentHashMap is typically faster that a synchronized HashMap. But the choice depends on your requirements.
If you're sure that there's only one entry per thread and none thread iterates/searches through the map, then why do you need a map?
You can use ThreadLocal object instead which will contain thread-specific data. If you need to keep string-object pairs, you can create an special class for this pair, and keep it inside ThreadLocal field.
class Foo {
String key;
Object value;
....
}
//below was your Map declaration
//Map<String, Object> map = ...
//Use here ThreadLocal instead
final ThreadLocal<Foo> threadLocalFoo = new ThreadLocal<Foo>();
...
threadLocalFoo.set(new Foo(...));
threadLocalFoo.get() //returns your object
threadLocalFoo.remove() //clears threadLocal container
More info on ThreadLocals you can find in ThreadLocal javadocs.
I would say that yes. Getting the data is not the issue, adding the data is.
The HashMap has a series of buckets (lists); when you put data to the HashMap, the hashCode is used to decide in which bucket the item goes, and the item is added to the list.
So it can be that two items are added to the same bucket at the same time and, due to some run condition, only one of them is effectively stored.
You have to synchronize writing operations in the map. If after initializating the map, no thread is going to insert new entries, or delete entries in the map you don't need to synchronize it.
However, in your case (where each thread has its own entry) I'd recommend using ThreadLocal, which allows you to have a "local" object which will have different values per thread.
Hope it helps
For this scenario I think ConcurrentHashMap is the best Map, because both Collections.synchronizedMap() or synchronized block (which are basically the same) have more overhead.
If you want to insert entries and not only read them in different threads you have to synchronize them because of the way the HashMap works.
- First of all its always a practice to write a Thread-safe code, specially in cases like the above, not in all conditions.
- Well its better to use HashTable which is a synchronized Map, or java.util.concurrent.ConcurrentHashMap<K,V>.

Categories

Resources