In the following piece of code:
if (map.containsKey(key)) {
map.remove(key);
}
Looking at performance, is it useful to first do a Map.containsKey() check before trying to remove the value from the map?
Same question goes for retrieving values, is it useful to first do the contains check if you know that the map contains no null values?
if (map.containsKey(key)) {
Object value = map.get(key);
}
remove returns null if there's no mapping for key no exception will be thrown:
public V remove(Object key)
I don't see any reason to perform that if before trying to remove a key, perhaps maybe if you want to count how many items where removed from the map..
In the second example, you'll get null if the key doesn't exist. Whether to check or not, depends on your logic.
Try not to waste your time on thinking about performance, containsKey has O(1) time complexity:
This implementation provides constant-time performance for the basic operations (get and put)
is it useful to first do a Map.containsKey() check before trying to remove the value from the map?
No, it is counterproductive:
In the case when the item is not there, you would see no difference
In the case when the item is there, you would end up with two look-ups.
If you want to remove the item unconditionally, simply call map.remove(key).
Same question goes for retrieving values
Same logic applies here. Of course you need to check the result for null, so in this case if stays there.
Note that this cleanup exercise is about readability first, and only then about performance. Accessing a map is a fast operation, so accessing it twice is unlikely to cause major performance issues except for some rather extreme cases. However, removing an extra conditional will make your code more readable, which is very important.
The Java documentation on remove() states that it will remove the element only if the map contains such element. So the contains() check before remove() is redundant.
This is subjective (and entirely a case of style), but for the case where you're retrieving a value, I prefer the contains(key) call to the null check. Boolean comparisons just feel better than null comparisons. I'd probably feel differently if Map<K,V>.get(key) returned Optional<V>.
Also, it's worth noting the "given no null keys" assertion is one that can be fairly hard to prove, depending on the type of the Map (which you might not even know). In general I think the redundant check on retrieval is (or maybe just feels) safer, just in case there's a mistake somewhere else (knocks on wood, checks for black cats, and avoids a ladder on the way out).
For the removal operation you're spot on. The check is useless.
Related
I have some code that is running a load tests against a web service by spinning up multiple threads and hitting the service with a specified transaction at a given rate. The transaction retrieves a list of values from the service, then checks the list of values to see if they exist in a set, and adds them if they do not or fails the transaction if they do (I'm aware the separte check is not necessary and the return value of the add could be inspected- that's just how the code is written now).
Looking at the code however, it is not thread safe. The set being checked against/added to is a basic HashSet. The current code also increments a value in a regular hashMap for each transation- so it looks like this code has been mesesed up from the beginning when it comes to thread safety.
I believe I solved the Map increment issue using ConcurrentHashMap based solution here: Atomically incrementing counters stored in ConcurrentHashMap, but I'm not sure the best way to handle the duplicate check/modification on the Set in a thread-safe way.
Originally I considered using CopyOnWriteArraySet, but because the expected case is to get no duplicates, the reads would occur as frequently as writes, so it doesn't seem ideal. The solution I'm considering now is to use a Set 'view' on ConcurrentHashMap using newKeySet()/KeySet(defaultVal) as described here: https://javarevisited.blogspot.com/2017/08/how-to-create-thread-safe-concurrent-hashset-in-java-8.html
If I use this solution checking for duplicates by just adding the value and checking the bool return type, will this achieve what I want in a thread-safe way? My main concern is that it is important that I DO detect any duplicates. What I don't want to happen is two threads try to add at the same time, and both adds return true since the value was not there when they attempted to add and the duplicate values received from the service goes undetected. For that purpose I thought maybe I should use a List and check for duplicates at the end by converting to a set and checking size? However it's still preferable to at least attempt to detect a duplicate during the transaction and fail if detected. It's fine to get a false negative sometimes and still pass the transaction if we can detect it at the end, but I think that check/failing transaction when we can is still valuable.
Any advice is appreciated- thanks!
I believe I solved the Map increment issue using ConcurrentHashMap based solution here: Atomically incrementing counters stored in ConcurrentHashMap, but I'm not sure the best way to handle the duplicate check/modification on the Set in a thread-safe way.
Yes you can certainly use a ConcurrentHashMap in your solution.
If I use this solution checking for duplicates by just adding the value and checking the bool return type, will this achieve what I want in a thread-safe way?
Yes. ConcurrentHashMap is a fully reentrant class so if two threads are doing a put(...) of the same key at the same instant, one of them will win and return null as the existing key and the other will replace the key and return the previous value for the key that you can test on. It is designed specifically for high performance multi-threaded applications. You can also do a putIfAbsent(...) in which case the 2nd thread (and any others) will return the value already in the map. This would also work if you are using a keyset wrapper to supply Set mechanics.
With all synchronized classes, you need to be careful about race conditions in your code when you make multiple calls to the class. For example, something like the following is a terrible pattern because there is a race condition because of the multiple calls to the concurrent-map:
// terrible pattern which creates a race condition
if (!concurrentMap.containsKey(key)) {
concurrentMap.put(key, value);
}
This is the reason why the ConcurrentMap has a number of atomic operations that help with this:
V putIfAbsent(K key, V value); -- put key into map if it is not there already
boolean remove(K key, V value); -- remove the key from the map if it has value
boolean replace(K key, V oldValue, V newValue); -- replaces key with new-value only if it already has old-value
V replace(K key, V value); -- replace the value associated with the key only if key already exists in the map
All of these methods would require multiple, non-atomic calls to the synchronized map to implement from outside which would introduce race conditions.
My main concern is that it is important that I DO detect any duplicates. What I don't want to happen is two threads try to add at the same time, and both adds return true...
As mentioned above, this won't happen. One of the 2 puts will return null and the other one should be counted as a duplicate.
For that purpose I thought maybe I should use a List and check for duplicates at the end by converting to a set and checking size?
The list would be unnecessary and very hard to get right.
I think a ConcurrentHashSet-like set is your best friend:
Set<Value> values = ConcurrentHashMap.newKeySet();
The set is backed by an ConcurrentHashMap so your code would both benefit from thread-safety and performance of ConcurrentHashMap
Just a little advise -
if your Transaction object (or whatever you put into Set) has proper equals method implementation you do not need to check duplicates in the Set.
Set always has only unique values.
If you still need to know is object already in the set use contains method.
Then there are multiple ways to do what you need.
You can use ConcurrentHashMap instead of Setjust put your objects as keys. You have a keySet there and you can use it. Value can be anything (e.g. same object). Sure you can use valueSet instead as well.
You can use one of the BlockingQueue (e.g. LinkedBlockingQueue) implementation to collect transactions from different threads first and then apply any logic you want after all threads done
and there are many other ways...
The JavaDoc of ConcurrentHashMap says this:
Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.
My question: Why?
2nd question: Why doesn't Hashtable allow null?
I've used a lot of HashMaps for storing data. But when changing to ConcurrentHashMap I got several times into trouble because of NullPointerExceptions.
From the author of ConcurrentHashMap himself (Doug Lea):
The main reason that nulls aren't allowed in ConcurrentMaps
(ConcurrentHashMaps, ConcurrentSkipListMaps) is that ambiguities that
may be just barely tolerable in non-concurrent maps can't be
accommodated. The main one is that if map.get(key) returns null, you
can't detect whether the key explicitly maps to null vs the key isn't
mapped. In a non-concurrent map, you can check this via
map.contains(key), but in a concurrent one, the map might have changed
between calls.
I believe it is, at least in part, to allow you to combine containsKey and get into a single call. If the map can hold nulls, there is no way to tell if get is returning a null because there was no key for that value, or just because the value was null.
Why is that a problem? Because there is no safe way to do that yourself. Take the following code:
if (m.containsKey(k)) {
return m.get(k);
} else {
throw new KeyNotPresentException();
}
Since m is a concurrent map, key k may be deleted between the containsKey and get calls, causing this snippet to return a null that was never in the table, rather than the desired KeyNotPresentException.
Normally you would solve that by synchronizing, but with a concurrent map that of course won't work. Hence the signature for get had to change, and the only way to do that in a backwards-compatible way was to prevent the user inserting null values in the first place, and continue using that as a placeholder for "key not found".
Josh Bloch designed HashMap; Doug Lea designed ConcurrentHashMap. I hope that isn't libelous. Actually I think the problem is that nulls often require wrapping so that the real null can stand for uninitialized. If client code requires nulls then it can pay the (admittedly small) cost of wrapping nulls itself.
You can't synchronize on a null.
Edit: This isn't exactly why in this case. I initially thought there was something fancy going on with locking things against concurrent updates or otherwise using the Object monitor to detect if something was modified, but upon examining the source code it appears I was wrong - they lock using a "segment" based on a bitmask of the hash.
In that case, I suspect they did it to copy Hashtable, and I suspect Hashtable did it because in the relational database world, null != null, so using a null as a key has no meaning.
I guess that the following snippet of the API documentation gives a good hint:
"This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details."
They probably just wanted to make ConcurrentHashMap fully compatible/interchangeable to Hashtable. And as Hashtable does not allow null keys and values..
ConcurrentHashMap is thread-safe. I believe that not allowing null keys and values was a part of making sure that it is thread-safe.
I don't think disallowing null value is a correct option.
In many cases, we do want do put a key with null value into the con-current map. However, by using ConcurrentHashMap, we cannot do that.
I suggest that the coming version of JDK can support that.
I had originally written an ArrayList and stored unique values (usernames, i.e. Strings) in it. I later needed to use the ArrayList to search if a user existed in it. That's O(n) for the search.
My tech lead wanted me to change that to a HashMap and store the usernames as keys in the array and values as empty Strings.
So, in Java -
hashmap.put("johndoe","");
I can see if this user exists later by running -
hashmap.containsKey("johndoe");
This is O(1) right?
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
My question is, is this a good approach? The efficiency beats ArrayList#contains or an array search in general. It works.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
Since you have a set of unique values, a Set is the appropriate data structure. You can put your values inside HashSet, an implementation of the Set interface.
My lead said this was a more efficient way to do this and it made sense to me, but it just seemed a bit off to put null/empty as values in the hashmap and store elements in it as keys.
The advice of the lead is flawed. Map is not the right abstraction for this, Set is. A Map is appropriate for key-value pairs. But you don't have values, only keys.
Example usage:
Set<String> users = new HashSet<>(Arrays.asList("Alice", "Bob"));
System.out.println(users.contains("Alice"));
// -> prints true
System.out.println(users.contains("Jack"));
// -> prints false
Using a Map would be awkward, because what should be the type of the values? That question makes no sense in your use case,
as you have just keys, not key-value pairs.
With a Set, you don't need to ask that, the usage is perfectly natural.
This is O(1) right?
Yes, searching in a HashMap or a HashSet is O(1) amortized worst case, while searching in a List or an array is O(n) worst case.
Some comments point out that a HashSet is implemented in terms of HashMap.
That's fine, at that level of abstraction.
At the level of abstraction of the task at hand ---
to store a collection of unique usernames,
using a set is a natural choice, more natural than a map.
This is basically how HashSet is implemented, so I guess you can say it's a good approach. You might as well use HashSet instead of your HashMap with empty values.
For example :
HashSet's implementation of add is
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
where map is the backing HashMap and PRESENT is a dummy value.
My worry is, I haven't seen anyone else do this after a search. I may be missing an obvious issue somewhere but I can't see it.
As I mentioned, the developers of the JDK are using this same approach.
If I have a Name object and have an ArrayList of type Name (names), and I want to ascertain whether my list of names contains a given Name object (n), I could do it two ways:
boolean exists = names.contains(n);
or
boolean exists = names.stream().anyMatch(x -> x.equals(n));
I was considering if these two would behave the same and then thought about what happens if n was assigned null?
For contains, as I understand, if the argument is null, then it returns true if the list contains null. How would I achieve this anyMatch - would it be by using Objects.equals(x, n)?
If that is how it works, then which approach is more efficient - is it anyMatch as it can take advantage of laziness and parallelism?
The problem with the stream-based version is that if the collection (and thus its stream) contains null elements, then the predicate will throw a NullPointerException when it tries to call equals on this null object.
This could be avoided with
boolean exists = names.stream().anyMatch(x -> Objects.equals(x, n));
But there is no practical advantage to be expected for the stream-based solution in this case. Parallelism might bring an advantage for really large lists, but one should not casually throw in some parallel() here and there assuming that it may make things faster. First, you should clearly identify the actual bottlenecks.
And in terms of readability, I'd prefer the first, classical solution here. If you want to check whether the list of names.contains(aParticularValue), you should do this - it just reads like prose and makes the intent clear.
EDIT
Another advantage of the contains approach was mentioned in the comments and in the other answer, and that may be worth mentioning here: If the type of the names collection is later changed, for example, to be a HashSet, then you'll get the faster contains-check (with O(1) instead of O(n)) for free - without changing any other part of the code. The stream-based solution would then still have to iterate over all elements, and this could have a significantly lower performance.
They should provide the same result if hashCode() and equals() are written in reasonable way.
But the performance may be completely different. For Lists it wouldn't matter that much but for HashSet contains() will use hashCode() to locate the element and it will be done (most probably) in constant time. While with the second solution it will loop over all items and call a function so will be done in linear time.
If n is null, actually doesn't matter as usually equals() methods are aware of null arguments.
The JavaDoc of ConcurrentHashMap says this:
Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.
My question: Why?
2nd question: Why doesn't Hashtable allow null?
I've used a lot of HashMaps for storing data. But when changing to ConcurrentHashMap I got several times into trouble because of NullPointerExceptions.
From the author of ConcurrentHashMap himself (Doug Lea):
The main reason that nulls aren't allowed in ConcurrentMaps
(ConcurrentHashMaps, ConcurrentSkipListMaps) is that ambiguities that
may be just barely tolerable in non-concurrent maps can't be
accommodated. The main one is that if map.get(key) returns null, you
can't detect whether the key explicitly maps to null vs the key isn't
mapped. In a non-concurrent map, you can check this via
map.contains(key), but in a concurrent one, the map might have changed
between calls.
I believe it is, at least in part, to allow you to combine containsKey and get into a single call. If the map can hold nulls, there is no way to tell if get is returning a null because there was no key for that value, or just because the value was null.
Why is that a problem? Because there is no safe way to do that yourself. Take the following code:
if (m.containsKey(k)) {
return m.get(k);
} else {
throw new KeyNotPresentException();
}
Since m is a concurrent map, key k may be deleted between the containsKey and get calls, causing this snippet to return a null that was never in the table, rather than the desired KeyNotPresentException.
Normally you would solve that by synchronizing, but with a concurrent map that of course won't work. Hence the signature for get had to change, and the only way to do that in a backwards-compatible way was to prevent the user inserting null values in the first place, and continue using that as a placeholder for "key not found".
Josh Bloch designed HashMap; Doug Lea designed ConcurrentHashMap. I hope that isn't libelous. Actually I think the problem is that nulls often require wrapping so that the real null can stand for uninitialized. If client code requires nulls then it can pay the (admittedly small) cost of wrapping nulls itself.
You can't synchronize on a null.
Edit: This isn't exactly why in this case. I initially thought there was something fancy going on with locking things against concurrent updates or otherwise using the Object monitor to detect if something was modified, but upon examining the source code it appears I was wrong - they lock using a "segment" based on a bitmask of the hash.
In that case, I suspect they did it to copy Hashtable, and I suspect Hashtable did it because in the relational database world, null != null, so using a null as a key has no meaning.
I guess that the following snippet of the API documentation gives a good hint:
"This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details."
They probably just wanted to make ConcurrentHashMap fully compatible/interchangeable to Hashtable. And as Hashtable does not allow null keys and values..
ConcurrentHashMap is thread-safe. I believe that not allowing null keys and values was a part of making sure that it is thread-safe.
I don't think disallowing null value is a correct option.
In many cases, we do want do put a key with null value into the con-current map. However, by using ConcurrentHashMap, we cannot do that.
I suggest that the coming version of JDK can support that.