Multiple threads iterating over the same map - java

I was recently writing a concurrent program in Java and came across the dollowing dilemma: suppose you have a global data structure, which is partof regular non-synchronized, non-concurrent lib such as HashMap. Is it OK to allow multiple threads to iterate through the collection (just reading, no modifications) perhaps at different, interleaved periods i.e. thread1 might be half way thrpugh iterating when thread2 gets his iterator on the same map?

It is OK. Ability to do this is the reason to create such interface as iterator. Every thread iterating over collection has its own instance of iterator that holds its state (e.g. where you are now in your iterating process).
This allows several threads to iterate over the same collection simultaneously.

It should be fine, as long as there are no writers.
This issue is similar to the readers-writer lock, where multiple readers are allowed to read from the data, but not during the time a writer "has" the lock for it. There is no concurrency issue for multiples read at the same time. [data race can occure only when you have at least one write].

Problems only arise when you attempt concurrent modifications on a data structure.
For instance, if one thread is iterating over the content of a Map, and another thread deletes elements from that collection, you'll be heading for serious trouble.
If you do need some threads to modify that collection safely, Java provides for mechanisms to do so, namely, ConcurrentHashMap.
ConcurrentHashMap in Java?
There is also Hashtable, which has the same interface as HashMap, but is synchronized, although it's use is not advised currently (deprecated), since it's performance suffers when the number of elements becomes larger (compared to ConcurrentHashMap which doesn't need to lock the entire Collection).
If you happen to have a Collection that is not synchronized and you need to have several threads reading and writing on top of it, you can use Collections.synchronizedMap(Map) to get a synchronized version of it.

The above answers are certainly good advice. In general, when writing Java with concurrent threads, so long as you do not modify a data structure, you need not worry about multiple threads concurrently reading that structure.
Should you have a similar problem in the future, except that the global data structure could be concurrently modified, I would suggest writing a Java class that all threads use to access and modify the structure. This class could impleement its own concurrency methodology, using either synchronized methods or locks. The Java tutorial has a very good explanation of Java's concurrency mechanisms. I have personally done this and it is fairly straight forward.

Related

Are Synchronized Blocks needed for Blocking Queues

public BlockingQueue<Message> Queue;
Queue = new LinkedBlockingQueue<>();
I know if I use, say a synchronized List, I need to surround it in synchronized blocks to safely use it across threads
Is that the same for Blocking Queues?
No you do not need to surround with synchronized blocks.
From the JDK javadocs...
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control. However, the bulk Collection operations addAll, containsAll, retainAll and removeAll are not necessarily performed atomically unless specified otherwise in an implementation. So it is possible, for example, for addAll(c) to fail (throwing an exception) after adding only some of the elements in c.
Just want to point out that from my experience the classes in the java.util.concurrent package of the JDK do not need synchronization blocks. Those classes manage the concurrency for you and are typically thread-safe. Whether intentional or not, seems like the java.util.concurrent has superseded the need to use synchronization blocks in modern Java code.
Depends on use case, will explain 2 scenarios where you may need synchronized blocks or dont need it.
Case 1: Not required while using queuing methods e.g. put, take etc.
Why not required is explained here, important line is below:
BlockingQueue implementations are thread-safe. All queuing methods
achieve their effects atomically using internal locks or other forms
of concurrency control.
Case 2: Required while iterating over blocking queues and most concurrent collections
Since iterator (one example from comments) is weakly consistent, meaning it reflects some but not necessarily all of the changes that have been made to its backing collection since it was created. So if you care about reflecting all changes you need to use synchronized blocks/ Locks while iterating.
You are thinking about synchronization at too low a level. It doesn't have anything to do with what classes you use. It's about protecting data and objects that are shared between threads.
If one thread is able to modify any single data object or group of related data objects while other threads are able to look at or modify the same object(s) at the same time, then you probably need synchronization. The reason is, it often is not possible for one thread to modify data in a meaningful way without temporarily putting the data into an invalid state.
The purpose of synchronization is to prevent other threads from seeing the invalid state and possibly doing bad things to the same data or to other data as a result.
Java's Collections.synchronizedList(...) gives you a way for two or more threads to share a List in such a way that the list itself is safe from being corrupted by the action of the different threads. But, It does not offer any protection for the data objects that are in the List. If your application needs that protection, then it's up to you to supply it.
If you need the equivalent protection for a queue, you can use any of the several classes that implement java.util.concurrent.BlockingQueue. But beware! The same caveat applies. The queue itself will be protected from corruption, but the protection does not automatically extend to the objects that your threads pass through the queue.

Is it safe to replace all the occurrences of Hashtable with ConcurrentHashmap?

Our legacy multi-threaded application has a lots of usage of Hashtable. Is it safe to replace the Hashtable instances with ConcurrentHashmap instances for performance gain? Will there be any side effect?
Is it safe to replace the Hashtable instances with ConcurrentHashmap instances for performance gain?
In most cases it should be safe and yield better performance. The effort on changing depends on whether you used the Map interface or Hashtable directly.
Will there be any side effect?
There might be side effects if your application expects to immediately be able to access elements that were put into the map by another thread.
From the JavaDoc on ConcurrentHashMap:
Retrieval operations (including get) generally do not block, so may overlap
with update operations (including put and remove). Retrievals reflect the
results of the most recently completed update operations holding upon their onset.
Edit: to clarify on "immediately" consider thread 1 adds element A to the map and while that write is executed thread 2 tries to whether A exists in the map. With Hashtable thread 2 would be blocked until after the write so the check would return true but when using ConcurrentHashMap it would return false since thread 2 would not be blocked and the write operation is not yet completed (thus thread 2 would see an outdated version of the bucket).
Depending on the size of your Hashtable objects you might get some performance gains by switching to ConcurrentHashmap.
ConcurrentHashmap is broken into segments which allow for the table to be only partially locked. This means that you can get more accesses per second than a Hashtable, which requires that you lock the entire table.
The tables themselves are both thread safe and both implement the Map interface, so replacement should be relatively easy.

Main difference between HashMap and ConcurrentHashMap in the case of non-select operation?

As per my knowledge while performing non-select operation in HashMap and ConcurrentHashMap , the ConcurrentHashMap blocks the entire Segments(by default-16)?
The difference is that HashMap is not thread-safe, which means that if you have several threads using it you need to take care of concurrent operations yourself. You can easily make it thread-safe using Collections.synchronizedMap(map) which creates a wrapper object, in which all methods are synchronized (on that wrapper object) and delegate to the wrapped map. As you can figure out this means poor performance as only one thread (reader or writer doesn't matter) has access to it.
As you pointed out ConcurrentHashMap is thread-safe and tries to handle synchronization a bit differently than Collections.synchronizedMap(). Instead of a single lock it tries to use multiple locks by partitioning the underlying map into buckets and having a lock per bucket. With that several threads can use the map concurrently without problems as they might be working on records in different buckets. In this case you don't need to lock the whole map (all buckets) but only the bucket in which the record you are interested in resides, which means that you can perform a parallel operation on all buckets at the same time. Obviously if all threads are working on the same record (or the same bucket) they will still be synchronized like in Collections.synchronizedMap() and will be blocked on writes. This approach is really good when the number of your reader threads outnumbers writer threads.
For instance you might have (very simplified):
ConcurrentHashMap = [ bucket1 = [r1->x], bucket2 = [r2->y], bucket3 = [r3, z] ]
When you are doing a read/write operation for entry with key r1 the map will choose bucket1 as that record should be there and lock bucket1 and only bucket1. Then when another thread comes if it queries for r2 it can safely do so but if it queries for r1 also it has to wait as bucket1 is locked.
Of course there's a performance penalty behind all of this but it's not terribly high (although I don't have any benchmarks to back this statement).
Does that answer your question?
No,In case of non select(EX. put) operation also,Lock is always taken at the segment level.
You can see the ConcurrentHashMap source code.
Whenever you call put() method on concurrent hashmap it first calculate hash position for the particular segment,then calls trylock() method on that segment only, not at the table level.
ConcurentHashMap has different structure than simple HashMap.
A ConcurrentHashMap is divided into number of segments (16 by default) on initialization.
It allows similar number of threads to access these segments concurrently so that each thread work on a specific segment.
While one thread is working on one segment it puts a lock on it. It does not block the rest of the segments.
If you want to read more about it, I suggest link below - good explanation with clear examples
Link

Can Java LinkedList be read in multiple-threads safely?

I would like multiple threads to iterate through the elements in a LinkedList. I do not need to write into the LinkedList. Is it safe to do so? Or do I need a synchronized list to make it work?
Thank you!
They can do this safely, PROVIDED THAT:
they synchronize with (all of) the threads that have written the list BEFORE they start the iterations, and
no threads modify the list during the iterations.
The first point is necessary, because unless there is proper synchronization before you start, there is a possibility that one of the "writing" threads has unflushed changes for the list data structures in local cache memory or registers, or one of the reading threads has stale list state in its cache or registers.
(This is one of those cases where a solid understanding of the Java memory model is needed to know whether the scenario is truly thread-safe.)
Or do I need a synchronized list to make it work
You don't necessarily need to go that far. All you need to do is to ensure that there is a "happens-before" relationship at the appropriate point, and there are a variety of ways to achieve that. For instance, if the list is created and written by the writer thread, and the writer then passes the list to the reader thread objects before calling start() on them.
From the Java documentation:
Note that this implementation is not synchronized. If multiple threads access a linked list concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.
In other words, if you are truly just iterating through then you're alright, just be careful.

Can concurrntHashMap guarantee true thread safety and concurrency at the same time?

We know that ConcurrentHashMap can provide concurrent access to multiple threads to boost performance , and inside this class, segments are synchronized up (am I right?). Question is, can this design guarantee the thread safety? Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
From my recollection that the book "Java Concurrency in Practice" says the ConcurrentHashMap provide concurrent reading and a decent level of concurrent writing. in the aforementioned scenario, and if my guess is correct, the performance won't be better than using the Collection's static synchonization wrapper api?
Thanks for clarifying,
John
You will still have to synchronize any access to the object being modified, and as you suspect all access to the same key will still have contention. The performance improvement comes in access to different keys, which is of course the more typical case.
All a ConcurrentMap can give you wrt to concurrency is that modifications to the map itself are done atomically, and that any writes happen-before any reads (this is important as it provides safe publishing of any reference from the map.
Safe-publishing means that any (mutable) object retrieved from the map will be seen with all writes to it before it was placed in the map. It won't help for publishing modifications that are made after retrieving it though.
However, concurrency and thread-safety is generally hard to reason about and make correct if you have mutable objects that are being modified by multiple parties. Usually you have to lock in order to get it right. A better approach is often to use immutable objects in conjunction with the ConcurrentMap conditional putIfAbsent/replace methods and linearize your algorithm that way. This lock-free style tends to be easier to reason about.
Question is, can this design guarantee the thread safety?
It guarantees the thread safety of the map; i.e. that access and updates on the map have a well defined and orderly behaviour in the presence of multiple threads performing updates simultaneously.
It does guarantee thread safety of the key or value objects. And it does not provide any form of higher level synchronization.
Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
If you have multiple threads trying to use the same key, then their operations will inevitably be serialized to some degree. That is unavoidable.
In fact, from briefly looking at the source code, it looks like ConcurrentHashMap falls back to using conventional locks if there is too much contention for a particular segment of the map. And if you have multiple threads trying to access AND update the same key simultaneously, that will trigger locking.
first remember that a thread safe tool doesn't guarantee thread safe usage of it in and of itself
the if(!map.contains(k))map.put(k,v); construct to putIfAbsent for example is not thread safe
and each value access/modification still has to be made thread safe independently
Reads are concurrent, even for the same key, so performance will be better for typical applications.

Categories

Resources