I have a query regarding ConcurrentHashMap.
ConcurrentHashMap is a map for concurrent access. ConcurrentHashMap implements ConcurrentMap which extends Map.
a) ConcurrentHashMap implements the methods defined in ConcurrentMap (like putifAbsent etc) which are atomic.
b) But, how about the methods in the Map interface which ConcurrentMap extends?
How are they now atomic? Have they been reimplemented by ConcurrentHashMap
If I have a reference of type ConcurrentHashMap and call a method from
the Map Interface(e.g put) or any other method, is that method an atomic method?
ConcurrentHashMap does not extend HashMap. They are both implementations of a hash table, but ConcurrentHashMap has very different internals to HashMap in order to provide concurrency.
If you provide a ConcurrentHashMap to a method that accepts Map, then that will work, and you will get the concurrent behaviour you expect. Map is simply an interface that describes a set of methods, ConcurrentHashMap implements that interface with concurrent behaviour.
There is a difference between 'concurrent' and 'atomic'. Concurrent means that multiple operations can happen at the same time and the Map (or whatever data structure we are talking about) will ALWAYS BE IN A VALID STATE. This means that you can have multiple threads calling put(), get(), remove(), etc on this map and there will never be any errors (if you try this with a regular HashMap you WILL get errors as it isn't designed to handle concurrency).
Atomic means that an action that takes multiple steps appears to take a single step to other threads - as fair as they are aware it has completely finished or hasn't even started yet. For ConcurrentHashMap, putIfAbsent() is one such example. From the javadoc,
If the specified key is not already associated with a value, associate it with the given value. This is equivalent to:
if (!map.containsKey(key)) {
return map.put(key, value);
else
return map.get(key);
except that the action is performed atomically.
If you tried the above code with a ConcurrentHashMap, you wouldn't get any errors (since it is Concurrent), but there is a good change that other threads would interleave with the main thread and your entry would get overwritten or removed. ConcurrentMap specifies the atomic putIfAbsent() method to ensure that implementations can perform those steps atomically, without interference from other threads.
Map is just an interface. Therefore the ConcurrentHashMap has an atomic implementation of those methods.
The putIfAbsent method is a convenient way in a concurrent environment to execute an atomic if not contains then put that you cannot do from the Map interface even though the Map is actually of type ConcurrentHashMap.
The implementation of these methods like put() and remove() are getting the lock on the final Sync object, concurrent retrieval will always give the most recent data on the map.
In case of putAll() or clear(), which operates on whole Map, concurrent read may reflect insertion and removal of only some entries.
Below Two links will help you to understand:
http://javarevisited.blogspot.in/2013/02/concurrenthashmap-in-java-example-tutorial-working.html
http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap2.shtml
Related
HashTable is a thread-safe collection but does initializing it with an ArrayList (which is not thread-safe) as value endanger the whole thread-safety aspect?
Hashtable <Employee, ArrayList<Car>> carDealership = new Hashtable<>();
Further on, I am planning to wrap every action of ArrayLists in a synchronized block to prevent any race-conditions when operating with any methods.
Yet I haven't declared the ArrayLists in the HashTable as synchronized lists, this being achieved with the following code
Collections.synchronizedList(new ArrayList<>())
This will happen when I will be adding ArrayLists to the HashTable obviously.
How can I be sure that the ArrayLists in the HashTable are thread-safe?
Is it enough to pass a thread-safe ArrayList to the put() method of the hashTable and I'm good to go? (and not even worry about the constructor of the HashTable?) Therefore the put() method of the HashTable doesn't even recognize if I am passing a thread-safe/unsafe parameter?
Note: Thread-safety is a requirement. Otherwise I wouldn't have opted for this implementation.
The only way to ensure that the values in the Hashtable or ConcurrentHashMap are thread-safe is to wrap it in a way that prevents anyone from adding something that you don't control. Never expose the Map itself or any of the Lists contained in it to other parts of your code. Provide methods to get snapshot-copies if you need them, provide methods to add values to the lists, but make sure the class wrapping the map is the one that will create all lists that can ever get added to it. Iteration over the "live" lists in you map will require external synchronisation (as metioned in the JavaDocs of synchronizedList).
Both Hashtable and ConcurrentHashMap are thread-safe in that concurrent operations will not leave them in an invalid state. This means e.g. that if you invoke put from two threads with the same key, one of them will return the value the other inserted as the "old" value. But of course you can't tell which will be the first and which will be second in advance without some external synchronization.
The implementation is quite different, though: Hashtable and the synchronized Map returned by Collections.synchronizedMap(new HashMap()); are similar in that they basically add synchronized modifiers to most methods. This can be inefficient if you have lots of threads (i.e. high contention for the locks) that mostly read, but only occasionally modify the map. ConcurrentHashMap provides more fine grained locking:
Retrieval operations (including get) generally do not block
which can yield significantly better performance, depending on your use case. I also provides a richer API with powerful search- and bulk-modification-operations.
Yes, using ArrayList in this case is not thread safe. You can always get the object from the table and operate on it.
CopyOnWriteArrayList is a good substitue for it.
But you still have the case, when one thread takes (saves in a variable) the collection, and the other thread replaces with another one.
If you are not going to replace the lists inside the table, then this is not a problem.
I have a ConcurrentHashMap<String, Object> concurrentMap;
I need to return String[] with keys of the map.
Is the following code:
public String[] listKeys() {
return (String[]) concurrentMap.keySet().toArray();
}
thread safe?
While the ConcurrentHashMap is a thread-safe class, the Iterator that is used on the keys is NOT CERTAIN to be in sync with any subsequent HashMap changes, once created...
From the spec:
public Set<K> keySet()
Returns a Set view of the keys contained in this map......
...........................
The view's iterator is a "weakly consistent" iterator that will
never throw ConcurrentModificationException, and guarantees to
traverse elements as they existed upon construction of the iterator,
and may (but is not guaranteed to) reflect any modifications
subsequent to construction.
Yes and No. Threas-safe is only fuzzily defined as soon as you extend to scope.
Generally, concurrent collections implement all their methods in ways that allow concurrent access by multiple threads, or if they can't, provide mechanisms to serialize such accesses (e.g. synchronization) transparently. Thus, they are safe in the sense they ensure they preserve a valid internal structure and method calls give valid results.
The fuzziness starts if you look at the details, e.g. toArray() will return you some kind of snapshot of the collections contents. There is no guarantee that by the time the method returns the contents will not have already been changed. So while the call is thread safe, the result will not fulfill the usual invariants (e.g. the array contents may not be the same as the collections).
If you need consistency over the scope of mupltiple calls to a concurrent collection, you need to provide mechanisms within the code calling the methods to ensure the required consistency.
As far as I know, java.util.Hashtable synchronizes each and every method in the java.util.Map interface, while Collections.synchronizedMap(hash_map) returns a wrapper object containing synchronized methods delegating calls to the actual hash_map (correct me if I am wrong).
I have two questions :
What difference does it make to synchronize each and every method and to have a wrapper class? What are the scenarios to choose one over the other?
What happens when we do Collections.synchronizedMap(hash_table)? Will this be equal to simply using a normal java.util.Hashtable?
One more difference that I can find at the implementation of both the classes is as follows:
• The Hashtable class has all its methods synchronized i.e. the locking is done at the method level and hence one can say that the mutex is always at the Hashtable object (this) level.
• The method Collections.synchronizedMap(Map) returns an instance of SynchronizedMap which is an inner class to the Collections class. This class has all its methods in a Synchronized block with a mutex. The difference lies in the mutex here. The inner class SynchronizedMap has two constructors, one which takes only Map as an argument and another which takes a Map and an Object (mutex) as an argument. By default if one uses the first constructor of passing only a Map, this is used as a mutex. Though, the developer is allowed to pass another object of mutex as a second argument by which the lock on the Map methods would be only on that Object and hence less restrictive than Hashtable.
• Hence, Hashtable uses method level synchronization but Collections.synchronizedMap(Map) provides a flexibility to developer lock on provided mutex with Synchronized block.
Here are the answers I've gotten from a bit of (hopefully correct) research:
Both provide the same degree of synchronization. If you were to wrap Hashtable through Collections.synchronized you would have the same degree, but with another redundant layer, of synchronization.
The main difference between Hashtable and Collections.synchronizedMap(HashMap) exist more at the API level. Because Hashtable is part of Java's legacy code, you'll see that the Hashtable API is enhanced to implement the Map interface, to become part of Java's collections framework. This means that if you were to wrap Hashtable through Collections.synchronizedMap(), the API of the wrapped Hashtable would become limited to the Map API. So if the API of Hashtable is encompassed in your definition of behavior, then it is obviously altered/limited.
The first associative collection class to appear in the Java class
library was Hashtable, which was part of JDK 1.0. Hashtable provided
an easy-to-use, thread-safe, associative map capability, and it was
certainly convenient. However, the thread-safety came at a price --
all methods of Hashtable were synchronized. At that time, uncontended
synchronization had a measurable performance cost. The successor to
Hashtable, HashMap, which appeared as part of the Collections
framework in JDK 1.2, addressed thread-safety by providing an
unsynchronized base class and a synchronized wrapper,
Collections.synchronizedMap. Separating the base functionality from
the thread-safety Collections.synchronizedMap allowed users who needed
synchronization to have it, but users who didn't need it didn't have
to pay for it.
The simple approach to synchronization taken by both Hashtable and
synchronizedMap -- synchronizing each method on the Hashtable or the
synchronized Map wrapper object -- has two principal deficiencies. It
is an impediment to scalability, because only one thread can access
the hash table at a time. At the same time, it is insufficient to
provide true thread safety, in that many common compound operations
still require additional synchronization. While simple operations such
as get() and put() can complete safely without additional
synchronization, there are several common sequences of operations,
such as iteration or put-if-absent, which still require external
synchronization to avoid data races.
The following link is the source and has more information: Concurrent Collections Classes
Another point of difference to note is that HashTable does not allow null keys or values whereas HashMap allows one null key and any number of null values. Since synchronizedMap is wrapper over HashMap, its behavior with respect to null keys and values is same as HashMap.
The difference is not all at the obvious API level and there are many subtleties at the implementation level. For example, Hashtable doesn't sport HashMap's advanced recalculation of supplied keys' hashcodes that reduces hash collisions. On the other hand, Hashtable#hashCode() avoids infinite recursion for self-referential hashtables to allow "certain 1.1-era applets with self-referential hash tables to work".
In general, though, one shouldn't count on Hashtable receiving any further improvements or refinements beyond basic correctness and backward compatibility. It is considered a relic from the deep Java past.
At the risk of stating the obvious (or being plain wrong) isn't the difference that
The synchronization wrappers add automatic synchronization
(thread-safety) to an arbitrary collection
http://docs.oracle.com/javase/tutorial/collections/implementations/wrapper.html and continues to say
A collection created in this fashion is every bit as thread-safe as a
normally synchronized collection, such as a Vector.
You may like to see this thread for issues regarding HashMaps and concurrency - Hashmap concurrency issue (or you are possibly very much aware of them already). A good example is:
The conditions you describe will not be satisfied by HashMap. Since
the process of updating a map is not atomic you may encounter the map
in an invalid state. Multiple writes might leave it in a corrupted
state. ConcurrentHashMap (1.5 or later) does what you want.
https://stackoverflow.com/a/1003071/201648
I guess in terms of "when should I use this" I would tend to use the syncronised collection where concurrency is required, otherwise you may be creating more work for yourself (see below).
In terms of altering the behavior
If an explicit iterator is used, the iterator method must be called
from within the synchronized block. Failure to follow this advice may
result in nondeterministic behavior
There are more consequences of using synchronization given at the (Oracle) link provided.
Is static initialized unmodifiableCollection.get guaranteed immutable?
For:
static final Map FOO =
Collections.unmodifiableMap(new HashMap());
Can multiple threads use method get and not run into problems?
Even through items in FOO cannot be added/removed, what's stopping the get method from manipulating FOO's internal state for caching purposes, etc. If the internal state is modified in any way then FOO can't be used concurrently. If this is the case, where are the true immutable collections in java?
Given the specific example:
static final Map FOO = Collections.unmodifiableMap(new HashMap());
Then FOO will be immutable. It will also never have any elements. Given the more general case of:
static final Map BAR = Collections.unmodifiableMap(getMap());
Then whether or not this is immutable is entirely dependent on whether or not someone else can get to the underlying Map, and what type of Map it is. For example, if it is a LinkedHashMap then the underlying linked list could be modified by access order, and could change by calling get(). The safest way (using non-concurrent classes) to do this would be:
static final Map BAR = Collections.unmodifiableMap(new HashMap(getMap()));
The javadocs for HashMap imply that so long as you make no structural changes to the map, then it is safe to use it concurrently, so this should be safe for any of the accessors that you can use, that is getting the various sets and iterating over them and get() should then be safe.
If you can use the concurrent classes, then you could also do:
static final Map BAR = Collections.unmodifiableMap(new ConcurrentHashMap(getMap());
This will be explicitly safe to use from multiple threads, since ConcurrentHashMap is explicitly multi-thread access safe. The internal state might be mutable, but the externally visible state will not be, and since the class is guaranteed to be threadsafe, we can safely consider it to be externally immutable.
At the risk of sounding like I'm on an advertising spree, use the Google Immutable Collections and be done with it.
Actually a good question. Think WeakHashMap - that can change without having a mutation operation called on it. LinkedHashMap in access-order mode is much the same.
The API docs for HashMap state:
Note that this implementation is not
synchronized. If multiple threads
access a hash map concurrently, and at
least one of the threads modifies the
map structurally, it must be
synchronized externally. (A structural
modification is any operation that
adds or deletes one or more mappings;
merely changing the value associated
with a key that an instance already
contains is not a structural
modification.)
Presumably that should be if and only if. That means that get does not need to be synchronised if the HashMap is 'effectively immutable'.
There is no true immutable map in Java SDK. All of the suggested Maps by Chris are only thread safe. The unmodifiable Map is not immutable either since if the underlying Map changed there will ConcurrentModificationException as well.
If you want the truly immutable map, use ImmutableMap from Google Collections / Guava.
I would suggest for any threaded operation to use ConcurrentHashMap or HashTable, both are thread-safe.
Whether a getter on the returned map happens to twiddle with some internal state is unimportant, as long as the object honors its contract (which is to be a map that cannot be modified). So your question is "barking up the wrong tree".
You are right to be cautious of UnmodifiableMap, in the case where you do not have ownership and control over the map it wraps. For example
Map<String,String> wrapped = new HashMap<String,String>();
wrapped.add("pig","oink");
Map<String,String> wrapper = Collections.unmodifiableMap(wrapped);
System.out.println(wrapper.size());
wrapper.put("cow", "moo"); // throws exception
wrapped.put("cow", "moo");
System.out.println(wrapper.size()); // d'oh!
If all attributes (or items fields, or data members) of a java collection are thread-safe (CopyOnWriteArraySet,ConcurrentHashMap, BlockingQueue, ...), can we say that this collection is thread-safe ?
an exemple :
public class AmIThreadSafe {
private CopyOnWriteArraySet thradeSafeAttribute;
public void add(Object o) {
thradeSafeAttribute.add(o);
}
public void clear() {
thradeSafeAttribute.clear();
}
}
in this sample can we say that AmIThreadSafe is thread-safe ?
Assuming by "attributes" you mean "what the collection holds", then no. Just because the Collection holds thread-safe items does not mean that the Collection's implementation implements add(), clear(), remove(), etc., in a thread-safe manner.
Short answer: No.
Slightly longer answer: because add() and clear() are not in any way synchronized, and HashSet isn't itself synchronized, it's possible for multiple threads to be in them at the same time.
Edit following comment: Ah. Now the short answer is Yes, sorta. :)
The reason for the "sorta" (American slang meaning partially, btw) is that it's possible for two operations to be atomically safe, but to be unsafe when used in combination to make a compound operation.
In your given example, where only add() and clear() are supported, this can't happen.
But in a more complete class, where we would have more of the Collection interface, imagine a caller who needs to add an entry to the set iff the set has no more than 100 entries already.
This caller would want to write a method something like this:
void addIfNotOverLimit (AmIThreadSafe set, Object o, int limit) {
if (set.size() < limit) // ## thread-safe call 1
set.add(o); // ## thread-safe call 2
}
The problem is that while each call is itself threadsafe, two threads could be in addIfNotOverLimit (or for that matter, adding through another method altogether), and so threads A would call size() and get 99, and then call add(), but before that happens, it could be interrupted, and thread B could then add an entry, and now the set would be over its limit.
Moral? Compound operations make the definition of 'thread safe' more complex.
No, because the state of an object is the "sum" of all of its attributes.
for instance, you could have 2 thread-safe collections as attributes in your object. additionally, your object could depend on some sort of correlation between these 2 collections (e.g. if an object is in 1 collection, it is in the other collection, and vice versa). simply using 2 thread-safe collections will not ensure that that correlation is true at all points in time. you would need additional concurrency control in your object to ensure that this constraint holds across the 2 collections.
since most non-trivial objects have some type of correlation relationship across their attributes, using thread-safe collections as attributes is not sufficient to make an object thread-safe.
What is thread safety?
Thread safety simply means that the
fields of an object or class always
maintain a valid state, as observed by
other objects and classes, even when
used concurrently by multiple threads.
A thread-safe object is one that
always maintains a valid state, as
observed by other classes and objects,
even in a multithreaded environment.
According to the API documentation, you have to use this function to ensure thread-safety:
synchronizedCollection(Collection c)
Returns a synchronized (thread-safe) collection
backed by the specified collection
Reading that, it is my opinion that you have to use the above function to ensure a thread-safe Collection. However, you do not have to use them for all Collections and there are faster Collections that are thread-safe such as ConcurrentHashMap. The underlying nature of CopyOnWriteArraySet ensures thread-safe operations.