Difference between Hashtable and Collections.synchronizedMap(HashMap) - java

As far as I know, java.util.Hashtable synchronizes each and every method in the java.util.Map interface, while Collections.synchronizedMap(hash_map) returns a wrapper object containing synchronized methods delegating calls to the actual hash_map (correct me if I am wrong).
I have two questions :
What difference does it make to synchronize each and every method and to have a wrapper class? What are the scenarios to choose one over the other?
What happens when we do Collections.synchronizedMap(hash_table)? Will this be equal to simply using a normal java.util.Hashtable?

One more difference that I can find at the implementation of both the classes is as follows:
• The Hashtable class has all its methods synchronized i.e. the locking is done at the method level and hence one can say that the mutex is always at the Hashtable object (this) level.
• The method Collections.synchronizedMap(Map) returns an instance of SynchronizedMap which is an inner class to the Collections class. This class has all its methods in a Synchronized block with a mutex. The difference lies in the mutex here. The inner class SynchronizedMap has two constructors, one which takes only Map as an argument and another which takes a Map and an Object (mutex) as an argument. By default if one uses the first constructor of passing only a Map, this is used as a mutex. Though, the developer is allowed to pass another object of mutex as a second argument by which the lock on the Map methods would be only on that Object and hence less restrictive than Hashtable.
• Hence, Hashtable uses method level synchronization but Collections.synchronizedMap(Map) provides a flexibility to developer lock on provided mutex with Synchronized block.

Here are the answers I've gotten from a bit of (hopefully correct) research:
Both provide the same degree of synchronization. If you were to wrap Hashtable through Collections.synchronized you would have the same degree, but with another redundant layer, of synchronization.
The main difference between Hashtable and Collections.synchronizedMap(HashMap) exist more at the API level. Because Hashtable is part of Java's legacy code, you'll see that the Hashtable API is enhanced to implement the Map interface, to become part of Java's collections framework. This means that if you were to wrap Hashtable through Collections.synchronizedMap(), the API of the wrapped Hashtable would become limited to the Map API. So if the API of Hashtable is encompassed in your definition of behavior, then it is obviously altered/limited.

The first associative collection class to appear in the Java class
library was Hashtable, which was part of JDK 1.0. Hashtable provided
an easy-to-use, thread-safe, associative map capability, and it was
certainly convenient. However, the thread-safety came at a price --
all methods of Hashtable were synchronized. At that time, uncontended
synchronization had a measurable performance cost. The successor to
Hashtable, HashMap, which appeared as part of the Collections
framework in JDK 1.2, addressed thread-safety by providing an
unsynchronized base class and a synchronized wrapper,
Collections.synchronizedMap. Separating the base functionality from
the thread-safety Collections.synchronizedMap allowed users who needed
synchronization to have it, but users who didn't need it didn't have
to pay for it.
The simple approach to synchronization taken by both Hashtable and
synchronizedMap -- synchronizing each method on the Hashtable or the
synchronized Map wrapper object -- has two principal deficiencies. It
is an impediment to scalability, because only one thread can access
the hash table at a time. At the same time, it is insufficient to
provide true thread safety, in that many common compound operations
still require additional synchronization. While simple operations such
as get() and put() can complete safely without additional
synchronization, there are several common sequences of operations,
such as iteration or put-if-absent, which still require external
synchronization to avoid data races.
The following link is the source and has more information: Concurrent Collections Classes

Another point of difference to note is that HashTable does not allow null keys or values whereas HashMap allows one null key and any number of null values. Since synchronizedMap is wrapper over HashMap, its behavior with respect to null keys and values is same as HashMap.

The difference is not all at the obvious API level and there are many subtleties at the implementation level. For example, Hashtable doesn't sport HashMap's advanced recalculation of supplied keys' hashcodes that reduces hash collisions. On the other hand, Hashtable#hashCode() avoids infinite recursion for self-referential hashtables to allow "certain 1.1-era applets with self-referential hash tables to work".
In general, though, one shouldn't count on Hashtable receiving any further improvements or refinements beyond basic correctness and backward compatibility. It is considered a relic from the deep Java past.

At the risk of stating the obvious (or being plain wrong) isn't the difference that
The synchronization wrappers add automatic synchronization
(thread-safety) to an arbitrary collection
http://docs.oracle.com/javase/tutorial/collections/implementations/wrapper.html and continues to say
A collection created in this fashion is every bit as thread-safe as a
normally synchronized collection, such as a Vector.
You may like to see this thread for issues regarding HashMaps and concurrency - Hashmap concurrency issue (or you are possibly very much aware of them already). A good example is:
The conditions you describe will not be satisfied by HashMap. Since
the process of updating a map is not atomic you may encounter the map
in an invalid state. Multiple writes might leave it in a corrupted
state. ConcurrentHashMap (1.5 or later) does what you want.
https://stackoverflow.com/a/1003071/201648
I guess in terms of "when should I use this" I would tend to use the syncronised collection where concurrency is required, otherwise you may be creating more work for yourself (see below).
In terms of altering the behavior
If an explicit iterator is used, the iterator method must be called
from within the synchronized block. Failure to follow this advice may
result in nondeterministic behavior
There are more consequences of using synchronization given at the (Oracle) link provided.

Related

How to initialize a hashTable with safe-thread object as value?

HashTable is a thread-safe collection but does initializing it with an ArrayList (which is not thread-safe) as value endanger the whole thread-safety aspect?
Hashtable <Employee, ArrayList<Car>> carDealership = new Hashtable<>();
Further on, I am planning to wrap every action of ArrayLists in a synchronized block to prevent any race-conditions when operating with any methods.
Yet I haven't declared the ArrayLists in the HashTable as synchronized lists, this being achieved with the following code
Collections.synchronizedList(new ArrayList<>())
This will happen when I will be adding ArrayLists to the HashTable obviously.
How can I be sure that the ArrayLists in the HashTable are thread-safe?
Is it enough to pass a thread-safe ArrayList to the put() method of the hashTable and I'm good to go? (and not even worry about the constructor of the HashTable?) Therefore the put() method of the HashTable doesn't even recognize if I am passing a thread-safe/unsafe parameter?
Note: Thread-safety is a requirement. Otherwise I wouldn't have opted for this implementation.
The only way to ensure that the values in the Hashtable or ConcurrentHashMap are thread-safe is to wrap it in a way that prevents anyone from adding something that you don't control. Never expose the Map itself or any of the Lists contained in it to other parts of your code. Provide methods to get snapshot-copies if you need them, provide methods to add values to the lists, but make sure the class wrapping the map is the one that will create all lists that can ever get added to it. Iteration over the "live" lists in you map will require external synchronisation (as metioned in the JavaDocs of synchronizedList).
Both Hashtable and ConcurrentHashMap are thread-safe in that concurrent operations will not leave them in an invalid state. This means e.g. that if you invoke put from two threads with the same key, one of them will return the value the other inserted as the "old" value. But of course you can't tell which will be the first and which will be second in advance without some external synchronization.
The implementation is quite different, though: Hashtable and the synchronized Map returned by Collections.synchronizedMap(new HashMap()); are similar in that they basically add synchronized modifiers to most methods. This can be inefficient if you have lots of threads (i.e. high contention for the locks) that mostly read, but only occasionally modify the map. ConcurrentHashMap provides more fine grained locking:
Retrieval operations (including get) generally do not block
which can yield significantly better performance, depending on your use case. I also provides a richer API with powerful search- and bulk-modification-operations.
Yes, using ArrayList in this case is not thread safe. You can always get the object from the table and operate on it.
CopyOnWriteArrayList is a good substitue for it.
But you still have the case, when one thread takes (saves in a variable) the collection, and the other thread replaces with another one.
If you are not going to replace the lists inside the table, then this is not a problem.

understanding java's synchronized collections

I'm reading the java official doc regarding wrappers implementation, which are static methods in Collections used to get synchronized collection, for example : List<Type> list = Collections.synchronizedList(new ArrayList<Type>());
...
the thing that I did not understand is the following (I quote from the java doc ) :
A collection created in this fashion is every bit as thread-safe as a normally synchronized collection, such as a Vector.
In the face of concurrent access, it is imperative that the user manually synchronize on the returned collection when iterating over it. The reason is that iteration is accomplished via multiple calls into the collection, which must be composed into a single atomic operation...
how it could be every bit as thread-safe an need to manually synchronize when iterating ??
It is thread safe in the sense that each of it's individual methods are thread safe, but if you perform compound actions on the collection, then your code is at risk of concurrency issues.
ex:
List<String> synchronizedList = Collections.synchronizedList(someList);
synchronizedList.add(whatever); // this is thread safe
the individual method add() is thread safe but if i perform the following:
List<String> synchronizedList = Collections.synchronizedList(someList);
if(!synchronizedList.contains(whatever))
synchronizedList.add(whatever); // this is not thread safe
the if-then-add operation is not thread safe because some other thread might have added whatever to the list after contains() check.
There is no contradiction here: collections returned from synchronizedXyz suffer the same shortcoming as synchronized collections available to you directly, namely the need to manually synchronize on iterating the collection.
The problem of external iteration cannot be solved by a better class design, because iterating a collection inherently requires making multiple calls to its methods (see this Q&A for detailed explanation).
Note that starting with Java 1.8 you can iterate without additional synchronization using forEach method of your synchronized collection*. This is thread-safe, and comes with additional benefits; see this Q&A for details.
The reason this is different from iterating externally is that forEach implementation inside the collection takes care of synchronizing the iteration for you.

Hash Code method in Object class [duplicate]

This question already has answers here:
Why equals and hashCode were defined in Object?
(10 answers)
Closed 5 years ago.
I was thinking why the hash code is implemented in Object class when its purpose is served only while using collections like HashMap.So should'nt the hashcode be implemented in interfaces implementing Maps.
It's not a good idea to say that hashcode implementation is used
in Collections only.
In the Java API documentation, the general contract of hashCode is given as:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must
consistently return the same integer, provided no information used
in equals comparisons on the object is modified. This integer need
not remain consistent from one execution of an application to
another execution of the same application.
If two objects are equal according to the equals(Object) method,
then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hashtables.
So hashcode has to do only with Object. Collections just get benefits
from this feature for their own use cases e.g checking objects having same hashcode, storing objects based on hashcode
etc.
Note:- Collections doesn't use hashcode value to sort the objects.
hashcode() method is used mainly in case of hash based collections like HashMap and HashSet. The hash code returned by this method is used for calculation of hash index or bucket index.
HashCode function is the necessity of all classes or POJOs or Beans where we need a comparison or check equality.
Let suppose we need to compare two objects irrespective of the Collection API, then there should be a way to achieve it.
If HashCode is not the part of Object Class then it would be difficult to calculate the hash every time and be an overburden.
It's a pragmatic design decision but the question is essentially correct. A purist analysis would say that it's an example of a Interface Bloat or a Fat Interface.
The Java java.lang.Object has more methods than are strictly required by (or even meaningful for) all objects. It's not just hashCode() either.
It's arguable that the only method on Object that makes sense for all objects is getClass().
Not all applications are concurrent let alone needing their own monitors. So a purist object model would remove notify(), notifyAll() and 3 versions of wait() to an interface called (say) Monitored and then only permit synchronized to be used with objects implementing that.
It's very common for it to be either invalid or unnecessary to clone() objects - though that method is fortunately protected. Again best off in an interface say interface Cloneable<T>.
Object identity comparisons (are these references to the same object) has been provided as the intrinsic operator ==, so equals(Object) should (still being a purist) be in a ValueComparable<T> interface for objects that have that semantic (many don't).
Being very pure even then you'd push hashCode() into another interface (say) HashCodable.
finalize() could also be put in an interface HasFinalize. Indeed that could make the garbage collectors life a bit easier especially given its use so rare and specialized.
However there is a clear design decision in Java to simplify things and the designers decided to put a number of methods in Object that are apparently 'frequently used' or useful rather than 'strictly part of the minimal nature of being an object' which is (in the Java model at least) 'being an instance of some class of objects (having common methods, interfaces and semantics)'.
IMHO hashCode() is probably the least out of place!
It is totally unnecessary to provide a monitor on every object and leaves implementers with a headache of supporting the methods on every object knowing they will be called for a minuscule number of them. Don't under estimate the overhead that might cause given it may be necessary to allocate things like mutexes a whole cache-line (typically tens of bytes) to every one of millions of objects and there being no sane way it would ever get used.
I'm not suggesting for a second 'Java is broken' or 'badly designed'. I am not here to knock Java. It is a great language. As with the design of generics it has always chosen to make things simple and been willing to make some compromises on performance for simplicity and as a result produced a very powerful and accessible language in which by great implementation those performance overheads only occasionally grate.
But to repeat the point I think we should recognise those methods are not in the intrinsic nature of all objects.

Query regarding ConcurrentHashMap in Java

I have a query regarding ConcurrentHashMap.
ConcurrentHashMap is a map for concurrent access. ConcurrentHashMap implements ConcurrentMap which extends Map.
a) ConcurrentHashMap implements the methods defined in ConcurrentMap (like putifAbsent etc) which are atomic.
b) But, how about the methods in the Map interface which ConcurrentMap extends?
How are they now atomic? Have they been reimplemented by ConcurrentHashMap
If I have a reference of type ConcurrentHashMap and call a method from
the Map Interface(e.g put) or any other method, is that method an atomic method?
ConcurrentHashMap does not extend HashMap. They are both implementations of a hash table, but ConcurrentHashMap has very different internals to HashMap in order to provide concurrency.
If you provide a ConcurrentHashMap to a method that accepts Map, then that will work, and you will get the concurrent behaviour you expect. Map is simply an interface that describes a set of methods, ConcurrentHashMap implements that interface with concurrent behaviour.
There is a difference between 'concurrent' and 'atomic'. Concurrent means that multiple operations can happen at the same time and the Map (or whatever data structure we are talking about) will ALWAYS BE IN A VALID STATE. This means that you can have multiple threads calling put(), get(), remove(), etc on this map and there will never be any errors (if you try this with a regular HashMap you WILL get errors as it isn't designed to handle concurrency).
Atomic means that an action that takes multiple steps appears to take a single step to other threads - as fair as they are aware it has completely finished or hasn't even started yet. For ConcurrentHashMap, putIfAbsent() is one such example. From the javadoc,
If the specified key is not already associated with a value, associate it with the given value. This is equivalent to:
if (!map.containsKey(key)) {
return map.put(key, value);
else
return map.get(key);
except that the action is performed atomically.
If you tried the above code with a ConcurrentHashMap, you wouldn't get any errors (since it is Concurrent), but there is a good change that other threads would interleave with the main thread and your entry would get overwritten or removed. ConcurrentMap specifies the atomic putIfAbsent() method to ensure that implementations can perform those steps atomically, without interference from other threads.
Map is just an interface. Therefore the ConcurrentHashMap has an atomic implementation of those methods.
The putIfAbsent method is a convenient way in a concurrent environment to execute an atomic if not contains then put that you cannot do from the Map interface even though the Map is actually of type ConcurrentHashMap.
The implementation of these methods like put() and remove() are getting the lock on the final Sync object, concurrent retrieval will always give the most recent data on the map.
In case of putAll() or clear(), which operates on whole Map, concurrent read may reflect insertion and removal of only some entries.
Below Two links will help you to understand:
http://javarevisited.blogspot.in/2013/02/concurrenthashmap-in-java-example-tutorial-working.html
http://www.javamex.com/tutorials/synchronization_concurrency_8_hashmap2.shtml

If all collection attributes are thread-safe , can we say that this collection is thread-safe?

If all attributes (or items fields, or data members) of a java collection are thread-safe (CopyOnWriteArraySet,ConcurrentHashMap, BlockingQueue, ...), can we say that this collection is thread-safe ?
an exemple :
public class AmIThreadSafe {
private CopyOnWriteArraySet thradeSafeAttribute;
public void add(Object o) {
thradeSafeAttribute.add(o);
}
public void clear() {
thradeSafeAttribute.clear();
}
}
in this sample can we say that AmIThreadSafe is thread-safe ?
Assuming by "attributes" you mean "what the collection holds", then no. Just because the Collection holds thread-safe items does not mean that the Collection's implementation implements add(), clear(), remove(), etc., in a thread-safe manner.
Short answer: No.
Slightly longer answer: because add() and clear() are not in any way synchronized, and HashSet isn't itself synchronized, it's possible for multiple threads to be in them at the same time.
Edit following comment: Ah. Now the short answer is Yes, sorta. :)
The reason for the "sorta" (American slang meaning partially, btw) is that it's possible for two operations to be atomically safe, but to be unsafe when used in combination to make a compound operation.
In your given example, where only add() and clear() are supported, this can't happen.
But in a more complete class, where we would have more of the Collection interface, imagine a caller who needs to add an entry to the set iff the set has no more than 100 entries already.
This caller would want to write a method something like this:
void addIfNotOverLimit (AmIThreadSafe set, Object o, int limit) {
if (set.size() < limit) // ## thread-safe call 1
set.add(o); // ## thread-safe call 2
}
The problem is that while each call is itself threadsafe, two threads could be in addIfNotOverLimit (or for that matter, adding through another method altogether), and so threads A would call size() and get 99, and then call add(), but before that happens, it could be interrupted, and thread B could then add an entry, and now the set would be over its limit.
Moral? Compound operations make the definition of 'thread safe' more complex.
No, because the state of an object is the "sum" of all of its attributes.
for instance, you could have 2 thread-safe collections as attributes in your object. additionally, your object could depend on some sort of correlation between these 2 collections (e.g. if an object is in 1 collection, it is in the other collection, and vice versa). simply using 2 thread-safe collections will not ensure that that correlation is true at all points in time. you would need additional concurrency control in your object to ensure that this constraint holds across the 2 collections.
since most non-trivial objects have some type of correlation relationship across their attributes, using thread-safe collections as attributes is not sufficient to make an object thread-safe.
What is thread safety?
Thread safety simply means that the
fields of an object or class always
maintain a valid state, as observed by
other objects and classes, even when
used concurrently by multiple threads.
A thread-safe object is one that
always maintains a valid state, as
observed by other classes and objects,
even in a multithreaded environment.
According to the API documentation, you have to use this function to ensure thread-safety:
synchronizedCollection(Collection c)
Returns a synchronized (thread-safe) collection
backed by the specified collection
Reading that, it is my opinion that you have to use the above function to ensure a thread-safe Collection. However, you do not have to use them for all Collections and there are faster Collections that are thread-safe such as ConcurrentHashMap. The underlying nature of CopyOnWriteArraySet ensures thread-safe operations.

Categories

Resources