How to initialize a hashTable with safe-thread object as value? - java

HashTable is a thread-safe collection but does initializing it with an ArrayList (which is not thread-safe) as value endanger the whole thread-safety aspect?
Hashtable <Employee, ArrayList<Car>> carDealership = new Hashtable<>();
Further on, I am planning to wrap every action of ArrayLists in a synchronized block to prevent any race-conditions when operating with any methods.
Yet I haven't declared the ArrayLists in the HashTable as synchronized lists, this being achieved with the following code
Collections.synchronizedList(new ArrayList<>())
This will happen when I will be adding ArrayLists to the HashTable obviously.
How can I be sure that the ArrayLists in the HashTable are thread-safe?
Is it enough to pass a thread-safe ArrayList to the put() method of the hashTable and I'm good to go? (and not even worry about the constructor of the HashTable?) Therefore the put() method of the HashTable doesn't even recognize if I am passing a thread-safe/unsafe parameter?
Note: Thread-safety is a requirement. Otherwise I wouldn't have opted for this implementation.

The only way to ensure that the values in the Hashtable or ConcurrentHashMap are thread-safe is to wrap it in a way that prevents anyone from adding something that you don't control. Never expose the Map itself or any of the Lists contained in it to other parts of your code. Provide methods to get snapshot-copies if you need them, provide methods to add values to the lists, but make sure the class wrapping the map is the one that will create all lists that can ever get added to it. Iteration over the "live" lists in you map will require external synchronisation (as metioned in the JavaDocs of synchronizedList).
Both Hashtable and ConcurrentHashMap are thread-safe in that concurrent operations will not leave them in an invalid state. This means e.g. that if you invoke put from two threads with the same key, one of them will return the value the other inserted as the "old" value. But of course you can't tell which will be the first and which will be second in advance without some external synchronization.
The implementation is quite different, though: Hashtable and the synchronized Map returned by Collections.synchronizedMap(new HashMap()); are similar in that they basically add synchronized modifiers to most methods. This can be inefficient if you have lots of threads (i.e. high contention for the locks) that mostly read, but only occasionally modify the map. ConcurrentHashMap provides more fine grained locking:
Retrieval operations (including get) generally do not block
which can yield significantly better performance, depending on your use case. I also provides a richer API with powerful search- and bulk-modification-operations.

Yes, using ArrayList in this case is not thread safe. You can always get the object from the table and operate on it.
CopyOnWriteArrayList is a good substitue for it.
But you still have the case, when one thread takes (saves in a variable) the collection, and the other thread replaces with another one.
If you are not going to replace the lists inside the table, then this is not a problem.

Related

understanding java's synchronized collections

I'm reading the java official doc regarding wrappers implementation, which are static methods in Collections used to get synchronized collection, for example : List<Type> list = Collections.synchronizedList(new ArrayList<Type>());
...
the thing that I did not understand is the following (I quote from the java doc ) :
A collection created in this fashion is every bit as thread-safe as a normally synchronized collection, such as a Vector.
In the face of concurrent access, it is imperative that the user manually synchronize on the returned collection when iterating over it. The reason is that iteration is accomplished via multiple calls into the collection, which must be composed into a single atomic operation...
how it could be every bit as thread-safe an need to manually synchronize when iterating ??
It is thread safe in the sense that each of it's individual methods are thread safe, but if you perform compound actions on the collection, then your code is at risk of concurrency issues.
ex:
List<String> synchronizedList = Collections.synchronizedList(someList);
synchronizedList.add(whatever); // this is thread safe
the individual method add() is thread safe but if i perform the following:
List<String> synchronizedList = Collections.synchronizedList(someList);
if(!synchronizedList.contains(whatever))
synchronizedList.add(whatever); // this is not thread safe
the if-then-add operation is not thread safe because some other thread might have added whatever to the list after contains() check.
There is no contradiction here: collections returned from synchronizedXyz suffer the same shortcoming as synchronized collections available to you directly, namely the need to manually synchronize on iterating the collection.
The problem of external iteration cannot be solved by a better class design, because iterating a collection inherently requires making multiple calls to its methods (see this Q&A for detailed explanation).
Note that starting with Java 1.8 you can iterate without additional synchronization using forEach method of your synchronized collection*. This is thread-safe, and comes with additional benefits; see this Q&A for details.
The reason this is different from iterating externally is that forEach implementation inside the collection takes care of synchronizing the iteration for you.

Is concurrentMap.keySet().toArray() thread safe?

I have a ConcurrentHashMap<String, Object> concurrentMap;
I need to return String[] with keys of the map.
Is the following code:
public String[] listKeys() {
return (String[]) concurrentMap.keySet().toArray();
}
thread safe?
While the ConcurrentHashMap is a thread-safe class, the Iterator that is used on the keys is NOT CERTAIN to be in sync with any subsequent HashMap changes, once created...
From the spec:
public Set<K> keySet()
Returns a Set view of the keys contained in this map......
...........................
The view's iterator is a "weakly consistent" iterator that will
never throw ConcurrentModificationException, and guarantees to
traverse elements as they existed upon construction of the iterator,
and may (but is not guaranteed to) reflect any modifications
subsequent to construction.
Yes and No. Threas-safe is only fuzzily defined as soon as you extend to scope.
Generally, concurrent collections implement all their methods in ways that allow concurrent access by multiple threads, or if they can't, provide mechanisms to serialize such accesses (e.g. synchronization) transparently. Thus, they are safe in the sense they ensure they preserve a valid internal structure and method calls give valid results.
The fuzziness starts if you look at the details, e.g. toArray() will return you some kind of snapshot of the collections contents. There is no guarantee that by the time the method returns the contents will not have already been changed. So while the call is thread safe, the result will not fulfill the usual invariants (e.g. the array contents may not be the same as the collections).
If you need consistency over the scope of mupltiple calls to a concurrent collection, you need to provide mechanisms within the code calling the methods to ensure the required consistency.

Is a readonly EnumSet iterator thread safe?

I have an EnumSet which is final and immutable i.e. initialized once in the constructor.
Is the contains() method on this EnumSet thread safe? It is internally using an iterator to make the contains check. Hence if two threads are simultaneously calling the contains() can the iterator position in one call effect other one? Or are the iterators having different instances in these two thread calls?
The contents of an EnumSet can be changed despite the reference to it being final. No EnumSet is immutable. You can, however, wrap your EnumSet via Collections.unmodifiableSet(). If you also avoid retaining any reference to the original EnumSet then the unmodifiable wrapper object is functionally immutable.
Mutability notwithstanding, two iterators operating over the same Set at the same time presents no problem as long as the Set is not modified. This isn't really any different from the case of just one iterator.
In any case, the contains() method of an EnumSet likely doesn't create or use an iterator. The class implements membership via a bit vector, so it uses bit operations to perform the contains() test.
No, if two threads call contains() at the same time, that will call iterator() twice which will create two separate iterators.
If you were trying to share an iterator between two threads, that would not be a good idea.
Note that if you modify the set in one thread while iterating over it (e.g. via contains) in another, then this bit of the docs comes into play:
The returned iterator is weakly consistent: it will never throw ConcurrentModificationException and it may or may not show the effects of any modifications to the set that occur while the iteration is in progress.

Query regarding ConcurrentHashMap in Java

I have a query regarding ConcurrentHashMap.
ConcurrentHashMap is a map for concurrent access. ConcurrentHashMap implements ConcurrentMap which extends Map.
a) ConcurrentHashMap implements the methods defined in ConcurrentMap (like putifAbsent etc) which are atomic.
b) But, how about the methods in the Map interface which ConcurrentMap extends?
How are they now atomic? Have they been reimplemented by ConcurrentHashMap
If I have a reference of type ConcurrentHashMap and call a method from
the Map Interface(e.g put) or any other method, is that method an atomic method?
ConcurrentHashMap does not extend HashMap. They are both implementations of a hash table, but ConcurrentHashMap has very different internals to HashMap in order to provide concurrency.
If you provide a ConcurrentHashMap to a method that accepts Map, then that will work, and you will get the concurrent behaviour you expect. Map is simply an interface that describes a set of methods, ConcurrentHashMap implements that interface with concurrent behaviour.
There is a difference between 'concurrent' and 'atomic'. Concurrent means that multiple operations can happen at the same time and the Map (or whatever data structure we are talking about) will ALWAYS BE IN A VALID STATE. This means that you can have multiple threads calling put(), get(), remove(), etc on this map and there will never be any errors (if you try this with a regular HashMap you WILL get errors as it isn't designed to handle concurrency).
Atomic means that an action that takes multiple steps appears to take a single step to other threads - as fair as they are aware it has completely finished or hasn't even started yet. For ConcurrentHashMap, putIfAbsent() is one such example. From the javadoc,
If the specified key is not already associated with a value, associate it with the given value. This is equivalent to:
if (!map.containsKey(key)) {
return map.put(key, value);
else
return map.get(key);
except that the action is performed atomically.
If you tried the above code with a ConcurrentHashMap, you wouldn't get any errors (since it is Concurrent), but there is a good change that other threads would interleave with the main thread and your entry would get overwritten or removed. ConcurrentMap specifies the atomic putIfAbsent() method to ensure that implementations can perform those steps atomically, without interference from other threads.
Map is just an interface. Therefore the ConcurrentHashMap has an atomic implementation of those methods.
The putIfAbsent method is a convenient way in a concurrent environment to execute an atomic if not contains then put that you cannot do from the Map interface even though the Map is actually of type ConcurrentHashMap.
The implementation of these methods like put() and remove() are getting the lock on the final Sync object, concurrent retrieval will always give the most recent data on the map.
In case of putAll() or clear(), which operates on whole Map, concurrent read may reflect insertion and removal of only some entries.
Below Two links will help you to understand:
http://javarevisited.blogspot.in/2013/02/concurrenthashmap-in-java-example-tutorial-working.html
http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap2.shtml

Difference between Hashtable and Collections.synchronizedMap(HashMap)

As far as I know, java.util.Hashtable synchronizes each and every method in the java.util.Map interface, while Collections.synchronizedMap(hash_map) returns a wrapper object containing synchronized methods delegating calls to the actual hash_map (correct me if I am wrong).
I have two questions :
What difference does it make to synchronize each and every method and to have a wrapper class? What are the scenarios to choose one over the other?
What happens when we do Collections.synchronizedMap(hash_table)? Will this be equal to simply using a normal java.util.Hashtable?
One more difference that I can find at the implementation of both the classes is as follows:
• The Hashtable class has all its methods synchronized i.e. the locking is done at the method level and hence one can say that the mutex is always at the Hashtable object (this) level.
• The method Collections.synchronizedMap(Map) returns an instance of SynchronizedMap which is an inner class to the Collections class. This class has all its methods in a Synchronized block with a mutex. The difference lies in the mutex here. The inner class SynchronizedMap has two constructors, one which takes only Map as an argument and another which takes a Map and an Object (mutex) as an argument. By default if one uses the first constructor of passing only a Map, this is used as a mutex. Though, the developer is allowed to pass another object of mutex as a second argument by which the lock on the Map methods would be only on that Object and hence less restrictive than Hashtable.
• Hence, Hashtable uses method level synchronization but Collections.synchronizedMap(Map) provides a flexibility to developer lock on provided mutex with Synchronized block.
Here are the answers I've gotten from a bit of (hopefully correct) research:
Both provide the same degree of synchronization. If you were to wrap Hashtable through Collections.synchronized you would have the same degree, but with another redundant layer, of synchronization.
The main difference between Hashtable and Collections.synchronizedMap(HashMap) exist more at the API level. Because Hashtable is part of Java's legacy code, you'll see that the Hashtable API is enhanced to implement the Map interface, to become part of Java's collections framework. This means that if you were to wrap Hashtable through Collections.synchronizedMap(), the API of the wrapped Hashtable would become limited to the Map API. So if the API of Hashtable is encompassed in your definition of behavior, then it is obviously altered/limited.
The first associative collection class to appear in the Java class
library was Hashtable, which was part of JDK 1.0. Hashtable provided
an easy-to-use, thread-safe, associative map capability, and it was
certainly convenient. However, the thread-safety came at a price --
all methods of Hashtable were synchronized. At that time, uncontended
synchronization had a measurable performance cost. The successor to
Hashtable, HashMap, which appeared as part of the Collections
framework in JDK 1.2, addressed thread-safety by providing an
unsynchronized base class and a synchronized wrapper,
Collections.synchronizedMap. Separating the base functionality from
the thread-safety Collections.synchronizedMap allowed users who needed
synchronization to have it, but users who didn't need it didn't have
to pay for it.
The simple approach to synchronization taken by both Hashtable and
synchronizedMap -- synchronizing each method on the Hashtable or the
synchronized Map wrapper object -- has two principal deficiencies. It
is an impediment to scalability, because only one thread can access
the hash table at a time. At the same time, it is insufficient to
provide true thread safety, in that many common compound operations
still require additional synchronization. While simple operations such
as get() and put() can complete safely without additional
synchronization, there are several common sequences of operations,
such as iteration or put-if-absent, which still require external
synchronization to avoid data races.
The following link is the source and has more information: Concurrent Collections Classes
Another point of difference to note is that HashTable does not allow null keys or values whereas HashMap allows one null key and any number of null values. Since synchronizedMap is wrapper over HashMap, its behavior with respect to null keys and values is same as HashMap.
The difference is not all at the obvious API level and there are many subtleties at the implementation level. For example, Hashtable doesn't sport HashMap's advanced recalculation of supplied keys' hashcodes that reduces hash collisions. On the other hand, Hashtable#hashCode() avoids infinite recursion for self-referential hashtables to allow "certain 1.1-era applets with self-referential hash tables to work".
In general, though, one shouldn't count on Hashtable receiving any further improvements or refinements beyond basic correctness and backward compatibility. It is considered a relic from the deep Java past.
At the risk of stating the obvious (or being plain wrong) isn't the difference that
The synchronization wrappers add automatic synchronization
(thread-safety) to an arbitrary collection
http://docs.oracle.com/javase/tutorial/collections/implementations/wrapper.html and continues to say
A collection created in this fashion is every bit as thread-safe as a
normally synchronized collection, such as a Vector.
You may like to see this thread for issues regarding HashMaps and concurrency - Hashmap concurrency issue (or you are possibly very much aware of them already). A good example is:
The conditions you describe will not be satisfied by HashMap. Since
the process of updating a map is not atomic you may encounter the map
in an invalid state. Multiple writes might leave it in a corrupted
state. ConcurrentHashMap (1.5 or later) does what you want.
https://stackoverflow.com/a/1003071/201648
I guess in terms of "when should I use this" I would tend to use the syncronised collection where concurrency is required, otherwise you may be creating more work for yourself (see below).
In terms of altering the behavior
If an explicit iterator is used, the iterator method must be called
from within the synchronized block. Failure to follow this advice may
result in nondeterministic behavior
There are more consequences of using synchronization given at the (Oracle) link provided.

Categories

Resources