Recently while exploring ConcurrentSkipListMap I went through its implementation and found that its put method is not thread-safe. It internally calls doPut which actually adds the item. But I found that this method does not use any kind of lock similar to ConcurrentHashMap.
Therefore, I want to know whether add is thread-safe or not. Looking at the method it seems that it is not thread-safe--that is if this method is executed by two threads simultaneously then a problem may occur.
I know ConcurrentSkipListMap internally uses a skiplist data structure but I was expecting add method to be thread safe. Am I understanding anything wrong ? Is ConcurrentSkipListMap really not thread-safe ?
Just because it doesn't use a Lock doesn't make it thread unsafe. The Skip list structure can be implemented lock free.
You should read the API carefully.
... Insertion, removal, update, and access operations safely execute concurrently by multiple threads. Iterators are weakly consistent, returning elements reflecting the state of the map at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations. ...
The comments in the implementation say:
Given the use of tree-like index nodes, you might wonder why this
doesn't use some kind of search tree instead, which would support
somewhat faster search operations. The reason is that there are no
known efficient lock-free insertion and deletion algorithms for search
trees. The immutability of the "down" links of index nodes (as opposed
to mutable "left" fields in true trees) makes this tractable using
only CAS operations.
So they use some low level programming features with compare-and-swap operations to make changes to the map atomic. With this they ensure thread safety without the need to synchronize access.
You can read it in more detail in the source code.
We should trust Java API. And this is what java.util.concurrent package docs says:
Concurrent Collections
Besides Queues, this package supplies Collection implementations designed for use in multithreaded contexts: ConcurrentHashMap, ConcurrentSkipListMap, ConcurrentSkipListSet, CopyOnWriteArrayList, and CopyOnWriteArraySet.
Related
public BlockingQueue<Message> Queue;
Queue = new LinkedBlockingQueue<>();
I know if I use, say a synchronized List, I need to surround it in synchronized blocks to safely use it across threads
Is that the same for Blocking Queues?
No you do not need to surround with synchronized blocks.
From the JDK javadocs...
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control. However, the bulk Collection operations addAll, containsAll, retainAll and removeAll are not necessarily performed atomically unless specified otherwise in an implementation. So it is possible, for example, for addAll(c) to fail (throwing an exception) after adding only some of the elements in c.
Just want to point out that from my experience the classes in the java.util.concurrent package of the JDK do not need synchronization blocks. Those classes manage the concurrency for you and are typically thread-safe. Whether intentional or not, seems like the java.util.concurrent has superseded the need to use synchronization blocks in modern Java code.
Depends on use case, will explain 2 scenarios where you may need synchronized blocks or dont need it.
Case 1: Not required while using queuing methods e.g. put, take etc.
Why not required is explained here, important line is below:
BlockingQueue implementations are thread-safe. All queuing methods
achieve their effects atomically using internal locks or other forms
of concurrency control.
Case 2: Required while iterating over blocking queues and most concurrent collections
Since iterator (one example from comments) is weakly consistent, meaning it reflects some but not necessarily all of the changes that have been made to its backing collection since it was created. So if you care about reflecting all changes you need to use synchronized blocks/ Locks while iterating.
You are thinking about synchronization at too low a level. It doesn't have anything to do with what classes you use. It's about protecting data and objects that are shared between threads.
If one thread is able to modify any single data object or group of related data objects while other threads are able to look at or modify the same object(s) at the same time, then you probably need synchronization. The reason is, it often is not possible for one thread to modify data in a meaningful way without temporarily putting the data into an invalid state.
The purpose of synchronization is to prevent other threads from seeing the invalid state and possibly doing bad things to the same data or to other data as a result.
Java's Collections.synchronizedList(...) gives you a way for two or more threads to share a List in such a way that the list itself is safe from being corrupted by the action of the different threads. But, It does not offer any protection for the data objects that are in the List. If your application needs that protection, then it's up to you to supply it.
If you need the equivalent protection for a queue, you can use any of the several classes that implement java.util.concurrent.BlockingQueue. But beware! The same caveat applies. The queue itself will be protected from corruption, but the protection does not automatically extend to the objects that your threads pass through the queue.
I just trying to explore What is ThreadSafe mean?
Below are my understanding:
It looks like for me; allowing multiple threads to access a collection at the same time; this is irrespective of its synchronization. For example any method without synchronized keyword on it; is thread safe, means mutiple threads can access it.
It is up to a developer choice to maintain some more logic (synchronization) on this method to maintain data integrity while multi-threads are accessing it. That is separate from thread safe.
If my above statement is false; just read the below JAVA DOC for `ConcurrentHashMap:
keySet: The view's iterator is a "weakly consistent" iterator that will never throw
ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.
The above statement says keySet iterator will not guarantee the data integrity; while multi-threads are modifying the collection.
Could you please answer me, *Is KeySet iterator of ConcurrentHashMap is threadsafe?
And my understanding on thread safe is correct ??
keySet: The view's iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction
This itself explains, that KeySet iterator of ConcurrentHashMap is threadsafe.
General idea behind the java.util.concurrent package is providing a set of data structures that provide thread-safe access without strong consistency. This way these objects achieve higher concurrency then properly locked objects.
Being thread safe means that, even without any explicit synchronization you never corrupt the objects. In HashTable and HashMap some methods are potential problems for multi-thread access, such as remove method, that first checks that the element exists, then removes it. These kind of methods are implemented as atomic operations in ConcurrentHashMap, thus you do not need to afraid that you will lose some data.
However it does not mean that this class is automatically locked for each operation. High level operations such as putAll and iterators are not synchronized. The class does not provide strong consistency. The order and timing of your operations are guaranteed to not to corrupt the object, but are not guaranteed to generate accurate results.
For example if your print the object concurrently with a call to putAll, you might see a partially populated output. Using an iterator concurrently with new insertions also might not reflect all insertions as you quoted.
This is different from being thread safe. Even though the results might surprise you, you are assured that nothing is lost or accidentally overwritten, elements are added to and removed from your object without any problem. If this behaviour is sufficient for your requirements you are advised to use java.util.concurrent classes. If you need more consistency, then you need to use synchronized classes from java.util or use synchronization yourself.
By your definition the Set returned by ConcurrentHashMap.keySet() is thread safe.
However, it may act in very strange ways, as pointed out in the quote you included.
As a Set, entries may appear and/or disappear at random. I.e. if you call contains twice on the same object, the two results may differ.
As an Iterable you could begin two iterations of its underlying objects in two different threads and discover that the two iterations enumerate different entries.
Furthermre, contains and iteration may not match either.
This activity will not occur, however, if you somehow lock the underlying Map from modification while you have hold of your Set but the need to do that does not imply that the structure is not thread safe.
I was recently writing a concurrent program in Java and came across the dollowing dilemma: suppose you have a global data structure, which is partof regular non-synchronized, non-concurrent lib such as HashMap. Is it OK to allow multiple threads to iterate through the collection (just reading, no modifications) perhaps at different, interleaved periods i.e. thread1 might be half way thrpugh iterating when thread2 gets his iterator on the same map?
It is OK. Ability to do this is the reason to create such interface as iterator. Every thread iterating over collection has its own instance of iterator that holds its state (e.g. where you are now in your iterating process).
This allows several threads to iterate over the same collection simultaneously.
It should be fine, as long as there are no writers.
This issue is similar to the readers-writer lock, where multiple readers are allowed to read from the data, but not during the time a writer "has" the lock for it. There is no concurrency issue for multiples read at the same time. [data race can occure only when you have at least one write].
Problems only arise when you attempt concurrent modifications on a data structure.
For instance, if one thread is iterating over the content of a Map, and another thread deletes elements from that collection, you'll be heading for serious trouble.
If you do need some threads to modify that collection safely, Java provides for mechanisms to do so, namely, ConcurrentHashMap.
ConcurrentHashMap in Java?
There is also Hashtable, which has the same interface as HashMap, but is synchronized, although it's use is not advised currently (deprecated), since it's performance suffers when the number of elements becomes larger (compared to ConcurrentHashMap which doesn't need to lock the entire Collection).
If you happen to have a Collection that is not synchronized and you need to have several threads reading and writing on top of it, you can use Collections.synchronizedMap(Map) to get a synchronized version of it.
The above answers are certainly good advice. In general, when writing Java with concurrent threads, so long as you do not modify a data structure, you need not worry about multiple threads concurrently reading that structure.
Should you have a similar problem in the future, except that the global data structure could be concurrently modified, I would suggest writing a Java class that all threads use to access and modify the structure. This class could impleement its own concurrency methodology, using either synchronized methods or locks. The Java tutorial has a very good explanation of Java's concurrency mechanisms. I have personally done this and it is fairly straight forward.
We know that ConcurrentHashMap can provide concurrent access to multiple threads to boost performance , and inside this class, segments are synchronized up (am I right?). Question is, can this design guarantee the thread safety? Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
From my recollection that the book "Java Concurrency in Practice" says the ConcurrentHashMap provide concurrent reading and a decent level of concurrent writing. in the aforementioned scenario, and if my guess is correct, the performance won't be better than using the Collection's static synchonization wrapper api?
Thanks for clarifying,
John
You will still have to synchronize any access to the object being modified, and as you suspect all access to the same key will still have contention. The performance improvement comes in access to different keys, which is of course the more typical case.
All a ConcurrentMap can give you wrt to concurrency is that modifications to the map itself are done atomically, and that any writes happen-before any reads (this is important as it provides safe publishing of any reference from the map.
Safe-publishing means that any (mutable) object retrieved from the map will be seen with all writes to it before it was placed in the map. It won't help for publishing modifications that are made after retrieving it though.
However, concurrency and thread-safety is generally hard to reason about and make correct if you have mutable objects that are being modified by multiple parties. Usually you have to lock in order to get it right. A better approach is often to use immutable objects in conjunction with the ConcurrentMap conditional putIfAbsent/replace methods and linearize your algorithm that way. This lock-free style tends to be easier to reason about.
Question is, can this design guarantee the thread safety?
It guarantees the thread safety of the map; i.e. that access and updates on the map have a well defined and orderly behaviour in the presence of multiple threads performing updates simultaneously.
It does guarantee thread safety of the key or value objects. And it does not provide any form of higher level synchronization.
Say we have 30+ threads accessing &changing an object mapped by the same key in a ConcurrentHashMap instance, my guess is, they still have to line up for that, don't they?
If you have multiple threads trying to use the same key, then their operations will inevitably be serialized to some degree. That is unavoidable.
In fact, from briefly looking at the source code, it looks like ConcurrentHashMap falls back to using conventional locks if there is too much contention for a particular segment of the map. And if you have multiple threads trying to access AND update the same key simultaneously, that will trigger locking.
first remember that a thread safe tool doesn't guarantee thread safe usage of it in and of itself
the if(!map.contains(k))map.put(k,v); construct to putIfAbsent for example is not thread safe
and each value access/modification still has to be made thread safe independently
Reads are concurrent, even for the same key, so performance will be better for typical applications.
I am really confused on how these 2 collections behave in multithreaded environment.
Hash table is synchronized that means no 2 threads will be updating its value simultaneously right?
Look at ConcurrentHashMaps for Thread safe Maps.
They offer all the features of HashTable with a performance very close to a HashMap.
Performance is gained by instead of using a map wide lock, the collection maintains a list of 16 locks by default, each of which is used to lock a single bucket of the map. You can even configure the number of buckets :) Tweaking this can help performance depending on your data.
I can't recommend enough Java Concurrency in Practice by Brian Goetz
http://jcip.net/
I still learn something new every time I read it.
Exactly, HashTable is synchronized that mean that it's safe to use it in multi-thread environment (many thread access the same HashTable) If two thread try to update the hashtable at the sametime, one of them will have to wait that the other thread finish his update.
HashMap is not synchronized, so it's faster, but you can have problem in a multi-thread environment.
Also note that Hashtable and Collections.synchronizedMap are safe only for individual operations. Any operations involving multiple keys or check-then-act that need to be atomic will not be so and additional client side locking will be required.
For example, you cannot write any of the following methods without additional locking:
swap the values at two different keys: swapValues(Map, Object k1, Object k2)
append the parameter to value at a key: appendToValue(Map, Object k1, String suffix)
And yes, all of this is covered in JCIP :-)
Yes, all the methods are done atomically, but values() method not (see docs).
Paul was faster than me recommending you the java.util.concurrent package, which gives you a very fine control and data structures for multithreade environments.
Hashtables are synchronized but they're an old implementation that you could almost say is deprecated. Also, they don't allow null keys (maybe not null values either? not sure).
One problem is that although every method call is synchronized, most interesting actions require more than one call so you have to synchronize around the several calls.
A similar level of synchronization can be obtained for HashMaps by calling:
Map m = Collections.synchronizedMap(new HashMap());
which wraps a map in synchronized method calls. But this has the same concurrency drawbacks as Hashtable.
As Paul says, ConcurrentHashMaps provide thread safe maps with additional useful methods for atomic updates.