Is checking for key existence in HashMap always necessary?
I have a HashMap with say a 1000 entries and I am looking at improving the efficiency.
If the HashMap is being accessed very frequently, then checking for the key existence at every access will lead to a large overhead. Instead if the key is not present and hence an exception occurs, I can catch the exception. (when I know that this will happen rarely). This will reduce accesses to the HashMap by half.
This might not be a good programming practice, but it will help me reduce the number of accesses. Or am I missing something here?
[Update] I do not have null values in the HashMap.
Do you ever store a null value? If not, you can just do:
Foo value = map.get(key);
if (value != null) {
...
} else {
// No such key
}
Otherwise, you could just check for existence if you get a null value returned:
Foo value = map.get(key);
if (value != null) {
...
} else {
// Key might be present...
if (map.containsKey(key)) {
// Okay, there's a key but the value is null
} else {
// Definitely no such key
}
}
You won't gain anything by checking that the key exists. This is the code of HashMap:
#Override
public boolean containsKey(Object key) {
Entry<K, V> m = getEntry(key);
return m != null;
}
#Override
public V get(Object key) {
Entry<K, V> m = getEntry(key);
if (m != null) {
return m.value;
}
return null;
}
Just check if the return value for get() is different from null.
This is the HashMap source code.
Resources :
HashMap source code Bad one
HashMap source code Good one
Better way is to use containsKey method of HashMap. Tomorrow somebody will add null to the Map. You should differentiate between key presence and key has null value.
Do you mean that you've got code like
if(map.containsKey(key)) doSomethingWith(map.get(key))
all over the place ? Then you should simply check whether map.get(key) returned null and that's it.
By the way, HashMap doesn't throw exceptions for missing keys, it returns null instead. The only case where containsKey is needed is when you're storing null values, to distinguish between a null value and a missing value, but this is usually considered bad practice.
Just use containsKey() for clarity. It's fast and keeps the code clean and readable. The whole point of HashMaps is that the key lookup is fast, just make sure the hashCode() and equals() are properly implemented.
if(map.get(key) != null || (map.get(key) == null && map.containsKey(key)))
You can also use the computeIfAbsent() method in the HashMap class.
In the following example, map stores a list of transactions (integers) that are applied to the key (the name of the bank account). To add 2 transactions of 100 and 200 to checking_account you can write:
HashMap<String, ArrayList<Integer>> map = new HashMap<>();
map.computeIfAbsent("checking_account", key -> new ArrayList<>())
.add(100)
.add(200);
This way you don't have to check to see if the key checking_account exists or not.
If it does not exist, one will be created and returned by the lambda expression.
If it exists, then the value for the key will be returned by computeIfAbsent().
Really elegant! 👍
I usually use the idiom
Object value = map.get(key);
if (value == null) {
value = createValue(key);
map.put(key, value);
}
This means you only hit the map twice if the key is missing
If key class is your's make sure the hashCode() and equals() methods implemented.
Basically the access to HashMap should be O(1) but with wrong hashCode method implementation it's become O(n), because value with same hash key will stored as Linked list.
The Jon Skeet answer addresses well the two scenarios (map with null value and not null value) in an efficient way.
About the number entries and the efficiency concern, I would like add something.
I have a HashMap with say a 1.000 entries and I am looking at improving
the efficiency. If the HashMap is being accessed very frequently, then
checking for the key existence at every access will lead to a large
overhead.
A map with 1.000 entries is not a huge map.
As well as a map with 5.000 or 10.000 entries.
Map are designed to make fast retrieval with such dimensions.
Now, it assumes that hashCode() of the map keys provides a good distribution.
If you may use an Integer as key type, do it.
Its hashCode() method is very efficient since the collisions are not possible for unique int values :
public final class Integer extends Number implements Comparable<Integer> {
...
#Override
public int hashCode() {
return Integer.hashCode(value);
}
public static int hashCode(int value) {
return value;
}
...
}
If for the key, you have to use another built-in type as String for example that is often used in Map, you may have some collisions but from 1 thousand to some thousands of objects in the Map, you should have very few of it as the String.hashCode() method provides a good distribution.
If you use a custom type, override hashCode() and equals() correctly and ensure overall that hashCode() provides a fair distribution.
You may refer to the item 9 of Java Effective refers it.
Here's a post that details the way.
Since java 1.8, you can simply use:
var item = mapObject.getOrDefault(key, null);
if(item != null)
Related
So this is the default algorithm that generates the hashcode for Strings:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
However, I wanna use something different and much more simple like adding the ASCII values of each character and then adding them all up.
How do I make it so that it uses the algorithm I created, instead of using the default one when I use the put() method for hashtables?
As of now I don't know what to do other than implementing a hash table from scratch.
Create a new class, and use String type field in it. For example:
public class MyString {
private final String value;
public MyString(String value) {
this.value = value;
}
public String getValue() {
return value;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
MyString myString = (MyString) o;
return Objects.equals(value, myString.value);
}
#Override
public int hashCode() {
// use your own implementation
return value.codePoints().sum();
}
}
Add equals() and hashCode() methods with #Override annotation.
Note: here hashCode() operates only with ASCII values.
After that, you will be able to use new class objects in the desired data structure. Here you can find a detailed explanation of these methods and a contract between equals() and hashCode().
However, I wanna use something different and much more simple like adding the ASCII values of each character and then adding them all up.
This is an extremely bad idea if you care at all about hash table efficiency. What you're thinking of as an overly-complicated hashing function is actually designed to give a uniform distribution of hash values throughout the entire 32-bit (or whatever) range. That gives the best possibility of uniformly distributing the hash keys (after you mod by the hash table size) in your buckets.
Your simple method of adding up the ASCII values of the individual characters has multiple flaws. First, you're limited in the range of values you can reasonably expect to generate. The highest value you can create is 255*n, where n is the length of the key. If your key is 10 characters in length, then you can't possibly generate more than 2,550 unique hash values. But there are 255^10 possible 10-character strings. Your collision rate will be very high.
The second problem is that anagrams generate the same hash value. "stop," "spot," and "tops" all generate the same hash value and will hash to the same bucket. Again, this will greatly affect your collision rate.
It's unclear to me why you want to replace the hashing function. If you're thinking it will result in better performance, you should think again. Sure, it will make generating the hash value faster, but it will result in very skewed key distribution, and correspondingly terrible hash table performance.
I've created a loop which goes through my HashMap. Then I check whether the name of the current key (A) is equal to the key that might be added (B). The hash codes of key A and key B aren't necessarily equal when their names are. Therefore I check whether they are equal by transforming them into a string (with override of .equals()). The code is working but there most be a cleaner and easier way to do this.
This is my current code:
for (HashMap.Entry<Identifier, SetInterface<BigInteger>> entry : idAndSet.entrySet()) {
if (entry.getKey().isEqual(identifier)) {
factor = entry.getValue();
return factor;
}
}
The hash codes of key A and key B aren't necessarily equal when their names are.
That is not a good idea. Any class that acts as a key should override equals and hashCode. And it would be a good idea to make the class immutable as well (otherwise you could end up with some difficult debugging to do).
Once you do that, you can just do
Map<Indentifer, Object> map...;
Object value = map.get(id);
// or as of Java 8+
Object value = map.getorDefault(id, someDefaultValue);
You could misuse the Map.computeIfPresent method.
factor = map.computeIfPresent(identifier, (k,v) -> v);
return factor;
The method returns the value associated with the specified key, or null if none
I think I may have found a bug in Java.
I have a TreeMap in which I use a custom comparator. However, it seems when I put(key, value), on a key that already exists, it does not override the key, thus creating duplicate keys. I think I have verified this because I tried:
System.out.println(testMap.firstKey().equals(testMap.lastKey()));
And this prints out true. Anyone know why this is happening?
This is the comparator code:
private class TestComp implements Comparator<String> {
#Override
public int compare(String s1, String s2){
if (s1.equals(s2)) {
return 0;
}
int temp = otherMap.get(s1).compareTo(otherMap.get(s2));
if (temp > 0) {
return 1;
}
return -1;
}
A comparator always needs to return consistent results, and when used in a TreeMap, be consistent with equals.
In this case your comparator violates the first constraint since it does not necessarily give consistent results.
Example: If for instance otherMap maps
"a" -> "someString"
"b" -> "someString"
then both compare("a", "b") and compare("b", "a") will return -1.
Note that if you change the implementation to
if (s1.equals(s2)) {
return 0;
}
return otherMap.get(s1).compareTo(otherMap.get(s2));
you break the other criteria of being consistent with equals, since otherMap.get(s1).compareTo(otherMap.get(s2)) might return 0 even though s1 does not equal s2.
I've elaborated on this in a self-answered follow up question here.
From the comments:
Even if a comparator gives inconsistent results, shouldn't the Java language still not allow duplicate keys?
No, when you insert a key, the TreeMap will use the comparator to search the data structure to see if the key already exists. If the comparator gives inconsistent results, the TreeMap might look in the wrong place and conclude that the key does not exist, leading to undefined behavior.
The HashSet class has an add(Object o) method, which is not inherited from another class. The Javadoc for that method says the following:
Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.
In other words, if two objects are equal, then the second object will not be added and the HashSet will remain the same. However, I've discovered that this is not true if objects e and e2 have different hashcodes, despite the fact that e.equals(e2). Here is a simple example:
import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;
public class BadHashCodeClass {
/**
* A hashcode that will randomly return an integer, so it is unlikely to be the same
*/
#Override
public int hashCode(){
return new Random().nextInt();
}
/**
* An equal method that will always return true
*/
#Override
public boolean equals(Object o){
return true;
}
public static void main(String... args){
HashSet<BadHashCodeClass> hashSet = new HashSet<>();
BadHashCodeClass instance = new BadHashCodeClass();
System.out.println("Instance was added: " + hashSet.add(instance));
System.out.println("Instance was added: " + hashSet.add(instance));
System.out.println("Elements in hashSet: " + hashSet.size());
Iterator<BadHashCodeClass> iterator = hashSet.iterator();
BadHashCodeClass e = iterator.next();
BadHashCodeClass e2 = iterator.next();
System.out.println("Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): " + (e==null ? e2==null : e.equals(e2)));
}
The results from the main method are:
Instance was added: true
Instance was added: true
Elements in hashSet: 2
Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): true
As the example above clearly shows, HashSet was able to add two elements where e.equals(e2).
I'm going to assume that this is not a bug in Java and that there is in fact some perfectly rational explanation for why this is. But I can't figure out what exactly. What am I missing?
I think what you're really trying to ask is:
"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"
The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.
I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.
The equals() documentations states (emphasis added):
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.
If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
...
}
HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.
Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:
public V put(K key, V value) {
...
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
// Replace existing element and return
}
}
// Insert new element
}
In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.
Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.
Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?
You've violated the contract of equals/hashCode basically:
From the hashCode() docs:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
and from equals:
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
HashSet relies on equals and hashCode being implemented consistently - the Hash part of the name HashSet basically implies "This class uses hashCode for efficiency purposes." If the two methods are not implemented consistently, all bets are off.
This shouldn't happen in real code, because you shouldn't be violating the contract in real code...
#Override
public int hashCode(){
return new Random().nextInt();
}
You are returning different has codes for same object every time it is evaluated. Obviously you will get wrong results.
add() function is as follows
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
and put() is
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
If you notice first has is calculated which is different in your case which is why object is added. equals() comes into picture only if hash are same for objects i.e collision has occured. Since in case hash are different equals() is never executed
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
Read more on what short circuiting is. since e.hash == hash is false nothing else is evaluated.
I hope this helps.
because hashcode() is really implemented very badly,
it will try to equate in each random bucket on each add(), if you return constant value from hashcode() it wouldn't let you enter any
It is not required that hash codes be different for all elements! It is only required that two elements are not equal.
HashCode is used first to find the hash bucket the object should occupy. If hadhcodes are different, objects are assumed to be not equal. If hashcodes are equal, then the equals() method is used to determine equality. The use of hashCode is an efficiency mechanism.
And...
Your hash code implementation violates the contract that it should not change unless the objects identifying fields change.
I want to compare two Java Maps by a simple hash.
Each object is on a different computer, so sending a hash over the network will be cheaper that sending the whole object to compare.
For example I have two HashMaps of an ExampleClass
Map<String,ExampleClass> One=new ...;
Map<String,ExampleClass> Other=new ...;
I don't need to be sure that all elements are equal,
it's enough for me to trust in a hash.
I was about to iterate at each side and create a "homemade Hash", then send it to the network to finally compare for example an int or something.
It would be great if this "hash" is calculated every time an object is added or deleted from the Collection, saving me from iterate the whole object. I have to encapsulate every add/delete of the Map. Is there a Java library that does this?
If all your classes implement hashCode() (does not use the "default" memory address hashcode) you can use the map's hashCode().
The caveat here is that if your ExampleClass does not implement hashCode(), then equal items might have different hashes on the two different machines, which will result in different hashes for the maps.
To clarify:
Map implements a hashCode() that is defined as the sum of it's Map.Enytry's hashCode()s.
Map.Entry's hashCode() is defined to be the xor of the key's hashCode() and the value's hashCode().
Your keys are Strings -- they have a well defined hashCode() (two equal strings always have the same hashCode()).
Your values are ExampleClass instances -- they also need a well-defined hashCode().
In summary, a map that contains { s1 -> ec1, s2 -> ec2 } will have a hashcode equal to:
(s1.hashCode() ^ ec1.hashCode()) + (s2.hashCode() ^ ec2.hashCode())
meaning that it depends on ExampleClass's hashCode().
If ExampleClass did implement hashCode() in a way that equal ExampleClasses give equal hashCode()s, everything will work well.
If ExampleClass did not implement hashCode(), it will use Object's hashCode(), which will almost always give you different hashCodes().
A simple solution is just to xor the hash of every object in the map, or some simple derivation thereof. Because a ^ a = 0 and a ^ b ^ a = b for all a and b, (xor is commutative, associative, and its own inverse), and since xor is cheap, your add and remove can just xor the (possibly derived) hash code of the added or deleted item.
You may want to use a derived hash value to avoid cases where your map has all the same keys and values, but some of the mappings between them are transposed. A simple derived hash might be key.hashCode() - value.hashCode(), which would avoid most of these cases.
So, your code might look like:
public class MyMap<K, V> extends HashMap<K, V>{
private int hash = 0;
#Override
public int hashCode() {return hash;}
#Override
public V put(K key, V value) {
V old = super.put(key, value);
if (old != null) this.hash ^= key.hashCode() - old.hashCode();
this.hash ^= key.hashCode() - value.hashCode();
return ret;
}
#Override
public V remove(K key) {
V ret = super.remove(key);
if (ret != null) this.hash ^= key.hashCode() - ret.hashCode();
return ret;
}
}
Note that some of the more advanced methods (eg. adding multiple items from a collection) may or may not be safe depending on implementation.