Why does ConcurrentHashMap prevent null keys and values?

Why does ConcurrentHashMap prevent null keys and values? - java

The JavaDoc of ConcurrentHashMap says this:
Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.
My question: Why?
2nd question: Why doesn't Hashtable allow null?
I've used a lot of HashMaps for storing data. But when changing to ConcurrentHashMap I got several times into trouble because of NullPointerExceptions.

From the author of ConcurrentHashMap himself (Doug Lea):
The main reason that nulls aren't allowed in ConcurrentMaps
(ConcurrentHashMaps, ConcurrentSkipListMaps) is that ambiguities that
may be just barely tolerable in non-concurrent maps can't be
accommodated. The main one is that if map.get(key) returns null, you
can't detect whether the key explicitly maps to null vs the key isn't
mapped. In a non-concurrent map, you can check this via
map.contains(key), but in a concurrent one, the map might have changed
between calls.

I believe it is, at least in part, to allow you to combine containsKey and get into a single call. If the map can hold nulls, there is no way to tell if get is returning a null because there was no key for that value, or just because the value was null.
Why is that a problem? Because there is no safe way to do that yourself. Take the following code:
if (m.containsKey(k)) {
return m.get(k);
} else {
throw new KeyNotPresentException();
}
Since m is a concurrent map, key k may be deleted between the containsKey and get calls, causing this snippet to return a null that was never in the table, rather than the desired KeyNotPresentException.
Normally you would solve that by synchronizing, but with a concurrent map that of course won't work. Hence the signature for get had to change, and the only way to do that in a backwards-compatible way was to prevent the user inserting null values in the first place, and continue using that as a placeholder for "key not found".

Josh Bloch designed HashMap; Doug Lea designed ConcurrentHashMap. I hope that isn't libelous. Actually I think the problem is that nulls often require wrapping so that the real null can stand for uninitialized. If client code requires nulls then it can pay the (admittedly small) cost of wrapping nulls itself.

You can't synchronize on a null.
Edit: This isn't exactly why in this case. I initially thought there was something fancy going on with locking things against concurrent updates or otherwise using the Object monitor to detect if something was modified, but upon examining the source code it appears I was wrong - they lock using a "segment" based on a bitmask of the hash.
In that case, I suspect they did it to copy Hashtable, and I suspect Hashtable did it because in the relational database world, null != null, so using a null as a key has no meaning.

I guess that the following snippet of the API documentation gives a good hint:
"This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details."
They probably just wanted to make ConcurrentHashMap fully compatible/interchangeable to Hashtable. And as Hashtable does not allow null keys and values..

ConcurrentHashMap is thread-safe. I believe that not allowing null keys and values was a part of making sure that it is thread-safe.

I don't think disallowing null value is a correct option.
In many cases, we do want do put a key with null value into the con-current map. However, by using ConcurrentHashMap, we cannot do that.
I suggest that the coming version of JDK can support that.

Related

Why Hashtable not allowed null value? [duplicate]

This question already has answers here:
Why Hashtable does not allow null keys or values?
(10 answers)
Closed 4 years ago.
I know about null key is not allowed in Hashtable because to store element in Hashtable hash code must required. But if key is null it will unable to calculate hash code for null key. But I don't understand but what is the exact reason in mind for Sun developers not to allow null value.
Someone says there is null check for value inside put method implementation and that's why it throws NullPointerException. But my question is why that null value check. Is there any specific reason behind it.
I went through lots of read but no got satisfied answer. Some one says there is ambiguity if there is null value and if you try to retrieve value using get() method it will return null and this null is because of actual value is null or key is missing that's why null and could not predict reason. So i need pin point answer with proof.

You will get NULL for value if you do
hashtable.get("key")
and "key" is not in the map, then you don't need to store null values.
If you would be able to store null, you will never know what you had: null mapping or that is a missing mapping.

Hashtable is considered legacy code. You should use HashMap and it allow null for values and also one key can be null.
EDIT
After deeper search I may have argument for such decision. Hashtable is synchronized (and HashMap isn't).
From JavaDoc:
Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.
As you can see successor of Hashtable is not HashMap as I previously write but ConcurrentHashMap. I was surprised that ConcurrentHashMap does not allows null. I start digging and found this:
From the author of ConcurrentHashMap himself (Doug Lea):
The main reason that nulls aren't allowed in ConcurrentMaps (ConcurrentHashMaps, ConcurrentSkipListMaps) is that ambiguities that may be just barely tolerable in non-concurrent maps can't be accommodated. The main one is that if map.get(key) returns null, you can't detect whether the key explicitly maps to null vs the key isn't mapped. In a non-concurrent map, you can check this via map.contains(key), but in a concurrent one, the map might have changed between calls.
So maybe authors of Hashtable have the same reason as authors of ConcurrentHashMap

Having a null value is still considered a bad decision in HashMap and the new Map classes and the static factory methods in java-9 prove that:
Map.of("test", null)
will throw a NulPointerException

From Java Documentation
To successfully store and retrieve objects from a hash table, the
objects used as keys must implement the hashCode method and the equals
method
Null is not an object, so can not call .equals() or .hashCode() on it, so the Hashtable can't compute a hash to use it as a key
Hashtable containsValue(Object value) function throw NullPointerException if the value is null so for the value also not allowed null

Concurrently checking for duplicates + adding item to list/set in Java

I have some code that is running a load tests against a web service by spinning up multiple threads and hitting the service with a specified transaction at a given rate. The transaction retrieves a list of values from the service, then checks the list of values to see if they exist in a set, and adds them if they do not or fails the transaction if they do (I'm aware the separte check is not necessary and the return value of the add could be inspected- that's just how the code is written now).
Looking at the code however, it is not thread safe. The set being checked against/added to is a basic HashSet. The current code also increments a value in a regular hashMap for each transation- so it looks like this code has been mesesed up from the beginning when it comes to thread safety.
I believe I solved the Map increment issue using ConcurrentHashMap based solution here: Atomically incrementing counters stored in ConcurrentHashMap, but I'm not sure the best way to handle the duplicate check/modification on the Set in a thread-safe way.
Originally I considered using CopyOnWriteArraySet, but because the expected case is to get no duplicates, the reads would occur as frequently as writes, so it doesn't seem ideal. The solution I'm considering now is to use a Set 'view' on ConcurrentHashMap using newKeySet()/KeySet(defaultVal) as described here: https://javarevisited.blogspot.com/2017/08/how-to-create-thread-safe-concurrent-hashset-in-java-8.html
If I use this solution checking for duplicates by just adding the value and checking the bool return type, will this achieve what I want in a thread-safe way? My main concern is that it is important that I DO detect any duplicates. What I don't want to happen is two threads try to add at the same time, and both adds return true since the value was not there when they attempted to add and the duplicate values received from the service goes undetected. For that purpose I thought maybe I should use a List and check for duplicates at the end by converting to a set and checking size? However it's still preferable to at least attempt to detect a duplicate during the transaction and fail if detected. It's fine to get a false negative sometimes and still pass the transaction if we can detect it at the end, but I think that check/failing transaction when we can is still valuable.
Any advice is appreciated- thanks!

I believe I solved the Map increment issue using ConcurrentHashMap based solution here: Atomically incrementing counters stored in ConcurrentHashMap, but I'm not sure the best way to handle the duplicate check/modification on the Set in a thread-safe way.
Yes you can certainly use a ConcurrentHashMap in your solution.
If I use this solution checking for duplicates by just adding the value and checking the bool return type, will this achieve what I want in a thread-safe way?
Yes. ConcurrentHashMap is a fully reentrant class so if two threads are doing a put(...) of the same key at the same instant, one of them will win and return null as the existing key and the other will replace the key and return the previous value for the key that you can test on. It is designed specifically for high performance multi-threaded applications. You can also do a putIfAbsent(...) in which case the 2nd thread (and any others) will return the value already in the map. This would also work if you are using a keyset wrapper to supply Set mechanics.
With all synchronized classes, you need to be careful about race conditions in your code when you make multiple calls to the class. For example, something like the following is a terrible pattern because there is a race condition because of the multiple calls to the concurrent-map:
// terrible pattern which creates a race condition
if (!concurrentMap.containsKey(key)) {
concurrentMap.put(key, value);
}
This is the reason why the ConcurrentMap has a number of atomic operations that help with this:
V putIfAbsent(K key, V value); -- put key into map if it is not there already
boolean remove(K key, V value); -- remove the key from the map if it has value
boolean replace(K key, V oldValue, V newValue); -- replaces key with new-value only if it already has old-value
V replace(K key, V value); -- replace the value associated with the key only if key already exists in the map
All of these methods would require multiple, non-atomic calls to the synchronized map to implement from outside which would introduce race conditions.
My main concern is that it is important that I DO detect any duplicates. What I don't want to happen is two threads try to add at the same time, and both adds return true...
As mentioned above, this won't happen. One of the 2 puts will return null and the other one should be counted as a duplicate.
For that purpose I thought maybe I should use a List and check for duplicates at the end by converting to a set and checking size?
The list would be unnecessary and very hard to get right.

I think a ConcurrentHashSet-like set is your best friend:
Set<Value> values = ConcurrentHashMap.newKeySet();
The set is backed by an ConcurrentHashMap so your code would both benefit from thread-safety and performance of ConcurrentHashMap

Just a little advise -
if your Transaction object (or whatever you put into Set) has proper equals method implementation you do not need to check duplicates in the Set.
Set always has only unique values.
If you still need to know is object already in the set use contains method.
Then there are multiple ways to do what you need.
You can use ConcurrentHashMap instead of Setjust put your objects as keys. You have a keySet there and you can use it. Value can be anything (e.g. same object). Sure you can use valueSet instead as well.
You can use one of the BlockingQueue (e.g. LinkedBlockingQueue) implementation to collect transactions from different threads first and then apply any logic you want after all threads done
and there are many other ways...

Why null is not allowed in ConcurrentHashMap? [duplicate]

The JavaDoc of ConcurrentHashMap says this:
Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.
My question: Why?
2nd question: Why doesn't Hashtable allow null?
I've used a lot of HashMaps for storing data. But when changing to ConcurrentHashMap I got several times into trouble because of NullPointerExceptions.

From the author of ConcurrentHashMap himself (Doug Lea):
The main reason that nulls aren't allowed in ConcurrentMaps
(ConcurrentHashMaps, ConcurrentSkipListMaps) is that ambiguities that
may be just barely tolerable in non-concurrent maps can't be
accommodated. The main one is that if map.get(key) returns null, you
can't detect whether the key explicitly maps to null vs the key isn't
mapped. In a non-concurrent map, you can check this via
map.contains(key), but in a concurrent one, the map might have changed
between calls.

I believe it is, at least in part, to allow you to combine containsKey and get into a single call. If the map can hold nulls, there is no way to tell if get is returning a null because there was no key for that value, or just because the value was null.
Why is that a problem? Because there is no safe way to do that yourself. Take the following code:
if (m.containsKey(k)) {
return m.get(k);
} else {
throw new KeyNotPresentException();
}
Since m is a concurrent map, key k may be deleted between the containsKey and get calls, causing this snippet to return a null that was never in the table, rather than the desired KeyNotPresentException.
Normally you would solve that by synchronizing, but with a concurrent map that of course won't work. Hence the signature for get had to change, and the only way to do that in a backwards-compatible way was to prevent the user inserting null values in the first place, and continue using that as a placeholder for "key not found".

Josh Bloch designed HashMap; Doug Lea designed ConcurrentHashMap. I hope that isn't libelous. Actually I think the problem is that nulls often require wrapping so that the real null can stand for uninitialized. If client code requires nulls then it can pay the (admittedly small) cost of wrapping nulls itself.

You can't synchronize on a null.
Edit: This isn't exactly why in this case. I initially thought there was something fancy going on with locking things against concurrent updates or otherwise using the Object monitor to detect if something was modified, but upon examining the source code it appears I was wrong - they lock using a "segment" based on a bitmask of the hash.
In that case, I suspect they did it to copy Hashtable, and I suspect Hashtable did it because in the relational database world, null != null, so using a null as a key has no meaning.

I guess that the following snippet of the API documentation gives a good hint:
"This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details."
They probably just wanted to make ConcurrentHashMap fully compatible/interchangeable to Hashtable. And as Hashtable does not allow null keys and values..

ConcurrentHashMap is thread-safe. I believe that not allowing null keys and values was a part of making sure that it is thread-safe.

I don't think disallowing null value is a correct option.
In many cases, we do want do put a key with null value into the con-current map. However, by using ConcurrentHashMap, we cannot do that.
I suggest that the coming version of JDK can support that.

Is Map.containsKey() useful in a Map that has no null values?

In the following piece of code:
if (map.containsKey(key)) {
map.remove(key);
}
Looking at performance, is it useful to first do a Map.containsKey() check before trying to remove the value from the map?
Same question goes for retrieving values, is it useful to first do the contains check if you know that the map contains no null values?
if (map.containsKey(key)) {
Object value = map.get(key);
}

remove returns null if there's no mapping for key no exception will be thrown:
public V remove(Object key)
I don't see any reason to perform that if before trying to remove a key, perhaps maybe if you want to count how many items where removed from the map..
In the second example, you'll get null if the key doesn't exist. Whether to check or not, depends on your logic.
Try not to waste your time on thinking about performance, containsKey has O(1) time complexity:
This implementation provides constant-time performance for the basic operations (get and put)

is it useful to first do a Map.containsKey() check before trying to remove the value from the map?
No, it is counterproductive:
In the case when the item is not there, you would see no difference
In the case when the item is there, you would end up with two look-ups.
If you want to remove the item unconditionally, simply call map.remove(key).
Same question goes for retrieving values
Same logic applies here. Of course you need to check the result for null, so in this case if stays there.
Note that this cleanup exercise is about readability first, and only then about performance. Accessing a map is a fast operation, so accessing it twice is unlikely to cause major performance issues except for some rather extreme cases. However, removing an extra conditional will make your code more readable, which is very important.

The Java documentation on remove() states that it will remove the element only if the map contains such element. So the contains() check before remove() is redundant.

This is subjective (and entirely a case of style), but for the case where you're retrieving a value, I prefer the contains(key) call to the null check. Boolean comparisons just feel better than null comparisons. I'd probably feel differently if Map<K,V>.get(key) returned Optional<V>.
Also, it's worth noting the "given no null keys" assertion is one that can be fairly hard to prove, depending on the type of the Map (which you might not even know). In general I think the redundant check on retrieval is (or maybe just feels) safer, just in case there's a mistake somewhere else (knocks on wood, checks for black cats, and avoids a ladder on the way out).
For the removal operation you're spot on. The check is useless.

Why is getEntry(Object key) not exposed on HashMap?

Here is my use case, I have an object that is logically equal to my HashMap key but not the same object (not ==). I need to get the actuall key object out of the HashMap so that i can synchronise on it. I am aware that i can iterate over the ketSet, but this is slow in comparison to hashing.
Looking through the java.util.HashMap implementation i see a getEntry(Object key) method that is exactly what i need. Any idea why this has not been exposed?
Can you think of any other way i can get the key out?

I think you would be better off putting in an extra layer of indirection on the value. The key should also be a "pure" value. Instead of:
Map<ReferenceObjectKey,Thing> map;
Use:
Map<ValueObjectKey,ReferenceObject<Thing>> map;

I can't answer your actual question (why is the method not exposed) beyond the rather obvious, "because the authors decided not to expose it."
However your question leads me to believe that you have a rather strange synchronization scheme going on; from my understanding you're only trying to call it to get a canonical representation of equal objects for synchronization. That sounds like a really bad idea, as I noted in my comment to the question.
A better approach would be to revisit how and why you want to synchronize on these key objects, and rework your synchronization to be clearer and saner, preferably at a level higher up or by using an alternative approach altogether.
It might help if you posted a code snippet of what you want to do with this synchronization so that others can give their opinions on a cleaner way to implement it. One example would simply be to use a thread-safe map class (such as ConcurrentHashMap), if this is indeed what you're trying to achieve here.
Edit: Have a look at How To Ask Questions The Smart Way, in particular the bullet point I've linked as this is a classic example of that deficiency. It seems likely that your overall design is a bit off and needs to go in a different direction; so while you're stuck on this specific issue it's a symptom of a larger problem. Giving us the broader context will lead to you getting much better overall answers.

Actually, the method the caller is asking for would have been useful. It was arguably a mistake that it, or something like it, was not included.
As it is, supposing you wish to increment the Integer value that's mapped from key "a" -- you end up having to do a hash lookup on "a" twice. Supposing you want to distinguish between a value being not present and the value being present but mapped to null -- again, two hash lookups.
In practice the world hasn't ended because of this, though.

I stumbled upon this problem recently myself recently. When I boiled the problem down enough, it was that I was essentially using 2 different methods to associate data with the part of the key object that was used for determining equality.
With the value the key mapped to, via the Map
With the data contained with the key object, but that wasn't used in the .equals()/hashCode methods, via composition.
I was using a List in the key class to determine equality and hashcode, and there were 3 other fields in it - a boolean, and 2 Strings. In the end, I remade the map as a Map<List<String>, ...> and refactored the other 3 fields into their own class, then had the original class as a composition of the List and the new class. I felt that the code seemed better after this.

This sounds like a deeper problem you're heaving. Why do you need such a thing? Why is the key not unique to its object?
What do you mean with "so this i can synchronise on it" ?

I'm sorry, but you seem to have a conceptual break here.
If your problem is that you "hold" an equivalent object (.equals() is true but == is false) to a key, and need to find the key, using the Object variant of get would not help you, because the only .equals that Object supports is identity (==).
What you need to do is to implement equals() and of course hashcode() in your key class.
This will make it trivial to obtain the entry.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why does ConcurrentHashMap prevent null keys and values? - java

ConcurrentHashMap is thread-safe. I believe that not allowing null keys and values was a part of making sure that it is thread-safe.

I don't think disallowing null value is a correct option. In many cases, we do want do put a key with null value into the con-current map. However, by using ConcurrentHashMap, we cannot do that. I suggest that the coming version of JDK can support that.

Related

Why Hashtable not allowed null value? [duplicate]

Concurrently checking for duplicates + adding item to list/set in Java

Why null is not allowed in ConcurrentHashMap? [duplicate]

Is Map.containsKey() useful in a Map that has no null values?

Why is getEntry(Object key) not exposed on HashMap?

Categories

Resources