I have this test code:
import java.util.*;
class MapEQ {
public static void main(String[] args) {
Map<ToDos, String> m = new HashMap<ToDos, String>();
ToDos t1 = new ToDos("Monday");
ToDos t2 = new ToDos("Monday");
ToDos t3 = new ToDos("Tuesday");
m.put(t1, "doLaundry");
m.put(t2, "payBills");
m.put(t3, "cleanAttic");
System.out.println(m.size());
} }
class ToDos{
String day;
ToDos(String d) { day = d; }
public boolean equals(Object o) {
return ((ToDos)o).day == this.day;
}
// public int hashCode() { return 9; }
}
When // public int hashCode() { return 9; } is uncommented m.size() returns 2, when it's left commented it returns three. Why?
HashMap uses hashCode(), == and equals() for entry lookup. The lookup sequence for a given key k is as follows:
Use k.hashCode() to determine which bucket the entry is stored, if any
If found, for each entry's key k1 in that bucket, if k == k1 || k.equals(k1), then return k1's entry
Any other outcomes, no corresponding entry
To demonstrate using an example, assume that we want to create a HashMap where keys are something which is 'logically equivalent' if they have same integer value, represented by AmbiguousInteger class. We then construct a HashMap, put in one entry, then attempt to override its value and retrieve value by key.
class AmbiguousInteger {
private final int value;
AmbiguousInteger(int value) {
this.value = value;
}
}
HashMap<AmbiguousInteger, Integer> map = new HashMap<>();
// logically equivalent keys
AmbiguousInteger key1 = new AmbiguousInteger(1),
key2 = new AmbiguousInteger(1),
key3 = new AmbiguousInteger(1);
map.put(key1, 1); // put in value for entry '1'
map.put(key2, 2); // attempt to override value for entry '1'
System.out.println(map.get(key1));
System.out.println(map.get(key2));
System.out.println(map.get(key3));
Expected: 2, 2, 2
Don't override hashCode() and equals(): by default Java generates different hashCode() values for different objects, so HashMap uses these values to map key1 and key2 into different buckets. key3 has no corresponding bucket so it has no value.
class AmbiguousInteger {
private final int value;
AmbiguousInteger(int value) {
this.value = value;
}
}
map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 2, set as entry 2[1]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 2, get as entry 2[1]
map.get(key3); // map to no bucket
Expected: 2, 2, 2
Output: 1, 2, null
Override hashCode() only: HashMap maps key1 and key2 into the same bucket, but they remain different entries due to both key1 == key2 and key1.equals(key2) checks fail, as by default equals() uses == check, and they refer to different instances. key3 fails both == and equals() checks against key1 and key2 and thus has no corresponding value.
class AmbiguousInteger {
private final int value;
AmbiguousInteger(int value) {
this.value = value;
}
#Override
public int hashCode() {
return value;
}
}
map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[2]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[2]
map.get(key3); // map to bucket 1, no corresponding entry
Expected: 2, 2, 2
Output: 1, 2, null
Override equals() only: HashMap maps all keys into different buckets because of default different hashCode(). == or equals() check is irrelevant here as HashMap never reaches the point where it needs to use them.
class AmbiguousInteger {
private final int value;
AmbiguousInteger(int value) {
this.value = value;
}
#Override
public boolean equals(Object obj) {
return obj instanceof AmbiguousInteger && value == ((AmbiguousInteger) obj).value;
}
}
map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 2, set as entry 2[1]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 2, get as entry 2[1]
map.get(key3); // map to no bucket
Expected: 2, 2, 2
Actual: 1, 2, null
Override both hashCode() and equals(): HashMap maps key1, key2 and key3 into the same bucket. == checks fail when comparing different instances, but equals() checks pass as they all have the same value, and deemed 'logically equivalent' by our logic.
class AmbiguousInteger {
private final int value;
AmbiguousInteger(int value) {
this.value = value;
}
#Override
public int hashCode() {
return value;
}
#Override
public boolean equals(Object obj) {
return obj instanceof AmbiguousInteger && value == ((AmbiguousInteger) obj).value;
}
}
map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[1], override value
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[1]
map.get(key3); // map to bucket 1, get as entry 1[1]
Expected: 2, 2, 2
Actual: 2, 2, 2
What if hashCode() is random?: HashMap will assign a different bucket for each operation, and thus you never find the same entry that you put in earlier.
class AmbiguousInteger {
private static int staticInt;
private final int value;
AmbiguousInteger(int value) {
this.value = value;
}
#Override
public int hashCode() {
return ++staticInt; // every subsequent call gets different value
}
#Override
public boolean equals(Object obj) {
return obj instanceof AmbiguousInteger && value == ((AmbiguousInteger) obj).value;
}
}
map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 2, set as entry 2[1]
map.get(key1); // map to no bucket, no corresponding value
map.get(key2); // map to no bucket, no corresponding value
map.get(key3); // map to no bucket, no corresponding value
Expected: 2, 2, 2
Actual: null, null, null
What if hashCode() is always the same?: HashMap maps all keys into one big bucket. In this case, your code is functionally correct, but the use of HashMap is practically redundant, as any retrieval would need to iterate through all entries in that single bucket in O(N) time (or O(logN) for Java 8), equivalent to the use of a List.
class AmbiguousInteger {
private final int value;
AmbiguousInteger(int value) {
this.value = value;
}
#Override
public int hashCode() {
return 0;
}
#Override
public boolean equals(Object obj) {
return obj instanceof AmbiguousInteger && value == ((AmbiguousInteger) obj).value;
}
}
map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[1]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[1]
map.get(key3); // map to bucket 1, get as entry 1[1]
Expected: 2, 2, 2
Actual: 2, 2, 2
And what if equals is always false?: == check passes when we compare the same instance with itself, but fails otherwise, equals checks always fails so key1, key2 and key3 are deemed to be 'logically different', and maps to different entries, though they are still in the same bucket due to same hashCode().
class AmbiguousInteger {
private final int value;
AmbiguousInteger(int value) {
this.value = value;
}
#Override
public int hashCode() {
return 0;
}
#Override
public boolean equals(Object obj) {
return false;
}
}
map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[2]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[2]
map.get(key3); // map to bucket 1, no corresponding entry
Expected: 2, 2, 2
Actual: 1, 2, null
Okay what if equals is always true now?: you're basically saying that all objects are deemed 'logically equivalent' to another, so they all map to the same bucket (due to same hashCode()), same entry.
class AmbiguousInteger {
private final int value;
AmbiguousInteger(int value) {
this.value = value;
}
#Override
public int hashCode() {
return 0;
}
#Override
public boolean equals(Object obj) {
return true;
}
}
map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[1], override value
map.put(new AmbiguousInteger(100), 100); // map to bucket 1, set as entry1[1], override value
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[1]
map.get(key3); // map to bucket 1, get as entry 1[1]
Expected: 2, 2, 2
Actual: 100, 100, 100
You have overidden equals without overriding hashCode. You must ensure that for all cases where equals returns true for two objects, hashCode returns the same value. The hash code is a code that must be equal if two objects are equal (the converse need not be true). When you put your hard-coded value of 9 in, you satisfy the contract again.
In your hash map, equality is only tested within a hash bucket. Your two Monday objects should be equal, but because they are returning different hash codes, the equals method isn't even called to determine their equality - they are put straight into different buckets, and the possibility that they are equal isn't even considered.
I cannot emphasize enough that you should read Chapter 3 in Effective Java (warning: pdf link). In that chapter you will learn everything you need to know about overriding methods in Object, and in particular, about the equals contract. Josh Bloch has a great recipe for overriding the equals method that you should follow. And it will help you understand why you should be using equals and not == in your particular implementation of the equals method.
Hope this helps. PLEASE READ IT. (At least the first couple items... and then you will want to read the rest :-).
-Tom
When you don't override the hashCode() method, your ToDos class inherits the default hashCode() method from Object, which gives every object a distinct hash code. This means that t1 and t2 have two different hash codes, even though were you to compare them, they would be equal. Depending on the particular hashmap implementation, the map is free to store them separately (and this is in fact what happens).
When you do correctly override the hashCode() method to ensure that equal objects get equal hash codes, the hashmap is able to find the two equal objects and place them in the same hash bucket.
A better implementation would give objects that are not equal different hash codes, like this:
public int hashCode() {
return (day != null) ? day.hashCode() : 0;
}
when you comment, it returns 3;
because hashCode() inherited from the Object is ONLY called which returns 3 different hashcodes for the 3 ToDos objects. The unequal hashcodes means the 3 objects are destined to different buckets and equals() return false as they are the first entrant in their respective buckets.
If the hashCodes are different it is understood in advance that the objects are unequal.
They will go in different buckets.
when you uncomment, it returns 2;
because here the overridden hashCode() is called which returns the same value for all the ToDos and they all will have to go into one bucket, connected linearly.
Equal hashcodes dont promise anything about the equality or inequality of objects.
hashCode() for t3 is 9 and as it is the first entrant, equals() is false and t3 inserted in the bucket- say bucket0.
Then t2 getting the same hashCode() as 9 is destined for the same bucket0, a subsequent equals() on the already residing t3 in bucket0 returns false by the definition of overridden equal().
Now t1 with hashCode() as 9 is also destined for bucket0, and a subsequent equals() call returns true when compared with the pre-existing t2 in the same bucket. t1 fails to enter the map.
So the net size of map is 2 -> {ToDos#9=cleanAttic, ToDos#9=payBills}
This explains the importance of implementing both equals() and hashCode(), and in such a way that the fields taken up in determining equals() must also be taken when determining hashCode().
This will guarantee that if two objects are equal they will always have same hashCodes. hashCodes should not be perceived as pseudo-random numbers as they must be consistent with equals()
According to Effective Java,
Always override hashCode() when you override equals()
well, why? Simple, because different objects (content, not references) should get different hash codes; on the other hand, equal objects should get the same hash code.
According to above, Java associative data structures compare the results obtained by equals() and hashCode() invokations to create the buckets. If both are the same, objects are equals; otherwise not.
In the specific case (i.e. the one presented above), when hashCode() is commented, a random number is generated for each instance (behaviour inherited by Object) as hash, the equals() checks String's references (remember Java String Pool), so the equals() should return true but the hashCode() not, the result is 3 different objects stored.
Let's see what happens in case the hashCode() respecting the contract but returning always 9 is uncommented. Well, hashCode() is constantly the same, the equals() returns true for the two Strings in the Pool (i.e. "Monday"), and for them the bucket will be the same resulting in only 2 elements stored.
Therefore, it's definitely needed to be careful in using the hashCode() and equals() overriding, in particular when compound data types are user defined and they are used with Java associative data structures.
When hashCode is uncommented, HashMap sees t1 and t2 as being the same thing; thus, t2's value clobbers that of t1. To understand how this works, note that when hashCode returns the same thing for two instances, they end up going to the same HashMap bucket. When you try to insert a second thing into the same bucket (in this case t2 is being inserted when t1 is already present), HashMap scans the bucket for another key that is equals. In your case, t1 and t2 are equals because they have the same day. At that point, "payBills" clobbers "doLaundry". As for whether t2 clobbers t1 as the key, I believe this is undefined; thus, either behavior is allowed.
There are a few important things to think about here:
Are two ToDos instances really equal just because they have the same day of the week?
Whenever you implement equals, you should implement hashCode so that any two objects that are equals also have the same hashCode values. This is a fundamental assumption that HashMap makes. This is probably also true of anything else that relies the hashCode method.
Design your hashCode method so that the hash codes are evenly distributed; otherwise, you won't get the performance benefits of hashing. From this perspective, returning 9 is one of the worst things you can do.
Rather than thinking of hashCode in terms of hash-bucket mapping, I think it's more helpful to think somewhat more abstractly: an observation that two objects have different hash codes constitutes an observation that the objects are not equal. As a consequence of that, an observation that none of the objects in a collection have a particular hash code constitutes an observation that none of the objects in a collection are equal to any object which has that hash code. Further, an observation that none of the objects in a collection have a hash code with some trait constitutes an observation that none of them are equal to any object which does.
Hash tables generally work by defining a family of traits, exactly one of of which will be applicable to each object's hash code (e.g. "being congruent to 0 mod 47", "being congruent to 1 mod 47", etc.), and then having a collection of objects with each trait. If one is then given an object and can determine which trait applies to it, one can know that it must be in a collection of things with that trait.
That hash tables generally use a sequence of numbered buckets is an implementation detail; what is essential is that an object's hash code is quickly used to identify many things which it cannot possibly be equal to, and with which it thus will not have to be compared.
Whenever you create a new object in Java, it will be assigned a unique hashcode by JVM itself. If you wouldn't override hashcode method then object will get unique hascode and hence a unique bucket (Imagine bucket is nothing but a place in memory where JVM will go to find an object).
(you can check uniqueness of an hashcode by calling hashcode method on each object and printing their values on console)
In your case when you are un commentting hashcode method, hashmap firstly look for bucket having same hashcode that method returns. And everytime you are returning same hashcode. Now when hashmap finds that bucket, it will compare current object with the object residing into bucket using euqals method. Here it finds "Monday" and so hashmap implementation do not allow to add it again because there is already an object having same hashcode and same euqality implementation.
When you comment hashcode method, JVM simply returns different hashcode for all the three objects and hence it never even bother about comapring objects using equals method. And so there will be three different objects in Map added by hashmap implementation.
Related
Is it bad practice to use mutable objects as Hashmap keys? What happens when you try to retrieve a value from a Hashmap using a key that has been modified enough to change its hashcode?
For example, given
class Key
{
int a; //mutable field
int b; //mutable field
public int hashcode()
return foo(a, b);
// setters setA and setB omitted for brevity
}
with code
HashMap<Key, Value> map = new HashMap<Key, Value>();
Key key1 = new Key(0, 0);
map.put(key1, value1); // value1 is an instance of Value
key1.setA(5);
key1.setB(10);
What happens if we now call map.get(key1)? Is this safe or advisable? Or is the behavior dependent on the language?
It has been noted by many well respected developers such as Brian Goetz and Josh Bloch that :
If an object’s hashCode() value can change based on its state, then we
must be careful when using such objects as keys in hash-based
collections to ensure that we don’t allow their state to change when
they are being used as hash keys. All hash-based collections assume
that an object’s hash value does not change while it is in use as a
key in the collection. If a key’s hash code were to change while it
was in a collection, some unpredictable and confusing consequences
could follow. This is usually not a problem in practice — it is not
common practice to use a mutable object like a List as a key in a
HashMap.
This is not safe or advisable. The value mapped to by key1 can never be retrieved. When doing a retrieval, most hash maps will do something like
Object get(Object key) {
int hash = key.hashCode();
//simplified, ignores hash collisions,
Entry entry = getEntry(hash);
if(entry != null && entry.getKey().equals(key)) {
return entry.getValue();
}
return null;
}
In this example, key1.hashcode() now points to the wrong bucket of the hash table, and you will not be able to retrieve value1 with key1.
If you had done something like,
Key key1 = new Key(0, 0);
map.put(key1, value1);
key1.setA(5);
Key key2 = new Key(0, 0);
map.get(key2);
This will also not retrieve value1, as key1 and key2 are no longer equal, so this check
if(entry != null && entry.getKey().equals(key))
will fail.
Hash maps use hash code and equality comparisons to identify a certain key-value pair with a given key. If the has map keeps the key as a reference to the mutable object, it would work in the cases where the same instance is used to retrieve the value. Consider however, the following case:
T keyOne = ...;
T keyTwo = ...;
// At this point keyOne and keyTwo are different instances and
// keyOne.equals(keyTwo) is true.
HashMap myMap = new HashMap();
myMap.push(keyOne, "Hello");
String s1 = (String) myMap.get(keyOne); // s1 is "Hello"
String s2 = (String) myMap.get(keyTwo); // s2 is "Hello"
// because keyOne equals keyTwo
mutate(keyOne);
s1 = myMap.get(keyOne); // returns "Hello"
s2 = myMap.get(keyTwo); // not found
The above is true if the key is stored as a reference. In Java usually this is the case. In .NET for instance, if the key is a value type (always passed by value), the result will be different:
T keyOne = ...;
T keyTwo = ...;
// At this point keyOne and keyTwo are different instances
// and keyOne.equals(keyTwo) is true.
Dictionary myMap = new Dictionary();
myMap.Add(keyOne, "Hello");
String s1 = (String) myMap[keyOne]; // s1 is "Hello"
String s2 = (String) myMap[keyTwo]; // s2 is "Hello"
// because keyOne equals keyTwo
mutate(keyOne);
s1 = myMap[keyOne]; // not found
s2 = myMap[keyTwo]; // returns "Hello"
Other technologies might have other different behaviors. However, almost all of them would come to a situation where the result of using mutable keys is not deterministic, which is very very bad situation in an application - a hard to debug and even harder to understand.
If key’s hash code changes after the key-value pair (Entry) is stored in HashMap, the map will not be able to retrieve the Entry.
Key’s hashcode can change if the key object is mutable. Mutable keys in HahsMap can result in data loss.
This will not work. You are changing the key value, so you are basically throwing it away. Its like creating a real life key and lock, and then changing the key and trying to put it back in the lock.
As others explained, it is dangerous.
A way to avoid that is to have a const field giving explicitly the hash in your mutable objects (so you would hash on their "identity", not their "state"). You might even initialize that hash field more or less randomly.
Another trick would be to use the address, e.g. (intptr_t) reinterpret_cast<void*>(this) as a basis for hash.
In all cases, you have to give up hashing the changing state of the object.
There are two very different issues that can arise with a mutable key depending on your expectation of behavior.
First Problem: (probably most trivial--but hell it gave me problems that I didn't think about!)
You are attempting to place key-value pairs into a map by updating and modifying the same key object. You might do something like Map<Integer, String> and simply say:
int key = 0;
loop {
map.put(key++, newString);
}
I'm reusing the "object" key to create a map. This works fine in Java because of autoboxing where each new value of key gets autoboxed to a new Integer object. What would not work is if I created my own (mutable) Integer object:
MyInteger {
int value;
plusOne(){
value++;
}
}
Then tried the same approach:
MyInteger key = new MyInteger(0);
loop{
map.put(key.plusOne(), newString)
}
My expectation is that, for instance, I map 0 -> "a" and 1 -> "b". In the first example, if I change int key = 0, the map will (correctly) give me "a". For simplicity let's assume MyInteger just always returns the same hashCode() (if you can somehow manage to create unique hashCode values for all possible states of an object, this will not be an issue, and you deserve an award). In this case, I call 0 -> "a", so now the map holds my key and maps it to "a", I then modify key = 1 and try to put 1 -> "b". We have a problem! The hashCode() is the same, and the only key in the HashMap is my MyInteger key object which has just been modified to be equal to 1, so It overwrites that key's value so that now, instead of a map with 0 -> "a" and 1 -> "b", I have 1 -> "b" only! Even worse, if I change back to key = 0, the hashCode points to 1 -> "b", but since the HashMap's only key is my key object, it satisfied the equality check and returns "b", not "a" as expected.
If, like me, you fall prey to this type of issue, it's incredibly difficult to diagnose. Why? Because if you have a decent hashCode() function it will generate (mostly) unique values. The hash value will largely take care of the inequality problem when structuring the map but if you have enough values, eventually you'll get a collision on the hash value and then you get unexpected and largely inexplicable results. The resultant behavior is that it works for small runs but fails for larger ones.
Advice:
To find this type of issue, modify the hashCode() method, even trivially (i.e. = 0--obviously when doing this, keep in mind that the hash values should be the same for two equal objects*), and see if you get the same results--because you should and if you don't, there's likely a semantic error with your implementation that's using a hash table.
*There should be no danger (if there is--you have a semantic problem) in always returning 0 from a hashCode() (although it would defeat the purpose of a Hash Table). But that's sort of the point: the hashCode is a "quick and easy" equality measure that's not exact. So two very different objects could have the same hashCode() yet not be equal. On the other hand, two equal objects must always have the same hashCode() value.
p.s. In Java, from my understanding, if you do such a terrible thing (as have many hashCode() collisions), it will start using a red-black-tree as opposed to ArrayList. So when you expect O(1) lookup, you'll get O(log(n))--which is better than the ArrayList which would give O(n).
Second Problem:
This is the one that most others seem to be focusing on, so I'll try to be brief. In this use case, I try to map a key-value pair and then I do some work on the key and then want to come back and get my value.
Expectation: key -> value is mapped, I then modify key and try to get(key). I expect that will give me value.
It seems kind of obvious to me that this wouldn't work but I'm not above having tried to use things like Collections as a key before (and quite quickly realizing it doesn't work). It doesn't work because it's quite likely that the hash value of key has changed so you won't even be looking in the correct bucket.
This is why it's very inadvisable to use collections as keys. I would assume, if you were doing this, you're trying to establish a many-to-one relationship. So I have a class (as in teaching) and I want two groups to do two different projects. What I want is that given a group, what is their project? Simple, I divide the class in two, and I have group1 -> project1 and group2 -> project2. But wait! A new student arrives so I place them in group1. The problem is that group1 has now been modified and likely its hash value has changed, therefore trying to do get(group1) is likely to fail because it will look in a wrong or non-existent bucket of the HashMap.
The obvious solution to the above is to chain things--instead of using the groups as keys, give them labels (that don't change) that point to the group and therefore the project: g1 -> group1 and g1 -> project1, etc.
p.s.
Please make sure to define a hashCode() and equals(...) method for any object you expect to use as a key (eclipse and, I'm assuming, most IDE's can do this for you).
Code Example:
Here is a class which exhibits the two different "problem" behaviors. In this case, I attempt to map 0 -> "a", 1 -> "b", and 2 -> "c" (in each case). In the first problem, I do that by modifying the same object, in the second problem, I use unique objects, and in the second problem "fixed" I clone those unique objects. After that I take one of the "unique" keys (k0) and modify it to attempt to access the map. I expect this will give me a, b, c and null when the key is 3.
However, what happens is the following:
map.get(0) map1: 0 -> null, map2: 0 -> a, map3: 0 -> a
map.get(1) map1: 1 -> null, map2: 1 -> b, map3: 1 -> b
map.get(2) map1: 2 -> c, map2: 2 -> a, map3: 2 -> c
map.get(3) map1: 3 -> null, map2: 3 -> null, map3: 3 -> null
The first map ("first problem") fails because it only holds a single key, which was last updated and placed to equal 2, hence why it correctly returns "c" when k0 = 2 but returns null for the other two (the single key doesn't equal 0 or 1). The second map fails twice: the most obvious is that it returns "b" when I asked for k0 (because it's been modified--that's the "second problem" which seems kind of obvious when you do something like this). It fails a second time when it returns "a" after modifying k0 = 2 (which I would expect to be "c"). This is more due to the "first problem": there's a hash code collision and the tiebreaker is an equality check--but the map holds k0, which it (apparently for me--could theoretically be different for someone else) checked first and thus returned the first value, "a" even though had it kept checking, "c" would have also been a match. Finally, the 3rd map works perfectly because I'm enforcing that the map holds unique keys no matter what else I do (by cloning the object during insertion).
I want to make clear that I agree, cloning is not a solution! I simply added that as an example of why a map needs unique keys and how enforcing unique keys "fixes" the issue.
public class HashMapProblems {
private int value = 0;
public HashMapProblems() {
this(0);
}
public HashMapProblems(final int value) {
super();
this.value = value;
}
public void setValue(final int i) {
this.value = i;
}
#Override
public int hashCode() {
return value % 2;
}
#Override
public boolean equals(final Object o) {
return o instanceof HashMapProblems
&& value == ((HashMapProblems) o).value;
}
#Override
public Object clone() {
return new HashMapProblems(value);
}
public void reset() {
this.value = 0;
}
public static void main(String[] args) {
final HashMapProblems k0 = new HashMapProblems(0);
final HashMapProblems k1 = new HashMapProblems(1);
final HashMapProblems k2 = new HashMapProblems(2);
final HashMapProblems k = new HashMapProblems();
final HashMap<HashMapProblems, String> map1 = firstProblem(k);
final HashMap<HashMapProblems, String> map2 = secondProblem(k0, k1, k2);
final HashMap<HashMapProblems, String> map3 = secondProblemFixed(k0, k1, k2);
for (int i = 0; i < 4; ++i) {
k0.setValue(i);
System.out.printf(
"map.get(%d) map1: %d -> %s, map2: %d -> %s, map3: %d -> %s",
i, i, map1.get(k0), i, map2.get(k0), i, map3.get(k0));
System.out.println();
}
}
private static HashMap<HashMapProblems, String> firstProblem(
final HashMapProblems start) {
start.reset();
final HashMap<HashMapProblems, String> map = new HashMap<>();
map.put(start, "a");
start.setValue(1);
map.put(start, "b");
start.setValue(2);
map.put(start, "c");
return map;
}
private static HashMap<HashMapProblems, String> secondProblem(
final HashMapProblems... keys) {
final HashMap<HashMapProblems, String> map = new HashMap<>();
IntStream.range(0, keys.length).forEach(
index -> map.put(keys[index], "" + (char) ('a' + index)));
return map;
}
private static HashMap<HashMapProblems, String> secondProblemFixed(
final HashMapProblems... keys) {
final HashMap<HashMapProblems, String> map = new HashMap<>();
IntStream.range(0, keys.length)
.forEach(index -> map.put((HashMapProblems) keys[index].clone(),
"" + (char) ('a' + index)));
return map;
}
}
Some Notes:
In the above it should be noted that map1 only holds two values because of the way I set up the hashCode() function to split odds and evens. k = 0 and k = 2 therefore have the same hashCode of 0. So when I modify k = 2 and attempt to k -> "c" the mapping k -> "a" gets overwritten--k -> "b" is still there because it exists in a different bucket.
Also there are a lot of different ways to examine the maps in the above code and I would encourage people that are curious to do things like print out the values of the map and then the key to value mappings (you may be surprised by the results you get). Do things like play with changing the different "unique" keys (i.e. k0, k1, and k2), try changing the single key k. You could also see how even the secondProblemFixed isn't actually fixed because you could also gain access to the keys (for instance via Map::keySet) and modify them.
I won't repeat what others have said. Yes, it's inadvisable. But in my opinion, it's not overly obvious where the documentation states this.
You can find it on the JavaDoc for the Map interface:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map
Behaviour of a Map is not specified if value of an object is changed in a manner that affects equals comparision while object(Mutable) is a key. Even for Set also using mutable object as key is not a good idea.
Lets see a example here :
public class MapKeyShouldntBeMutable {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
Map<Employee,Integer> map=new HashMap<Employee,Integer>();
Employee e=new Employee();
Employee e1=new Employee();
Employee e2=new Employee();
Employee e3=new Employee();
Employee e4=new Employee();
e.setName("one");
e1.setName("one");
e2.setName("three");
e3.setName("four");
e4.setName("five");
map.put(e, 24);
map.put(e1, 25);
map.put(e2, 26);
map.put(e3, 27);
map.put(e4, 28);
e2.setName("one");
System.out.println(" is e equals e1 "+e.equals(e1));
System.out.println(map);
for(Employee s:map.keySet())
{
System.out.println("key : "+s.getName()+":value : "+map.get(s));
}
}
}
class Employee{
String name;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
#Override
public boolean equals(Object o){
Employee e=(Employee)o;
if(this.name.equalsIgnoreCase(e.getName()))
{
return true;
}
return false;
}
public int hashCode() {
int sum=0;
if(this.name!=null)
{
for(int i=0;i<this.name.toCharArray().length;i++)
{
sum=sum+(int)this.name.toCharArray()[i];
}
/*System.out.println("name :"+this.name+" code : "+sum);*/
}
return sum;
}
}
Here we are trying to add mutable object "Employee" to a map. It will work good if all keys added are distinct.Here I have overridden equals and hashcode for employee class.
See first I have added "e" and then "e1". For both of them equals() will be true and hashcode will be same. So map sees as if the same key is getting added so it should replace the old value with e1's value. Then we have added e2,e3,e4 we are fine as of now.
But when we are changing the value of an already added key i.e "e2" as one ,it becomes a key similar to one added earlier. Now the map will behave wired. Ideally e2 should replace the existing same key i.e e1.But now map takes this as well. And you will get this in o/p :
is e equals e1 true
{Employee#1aa=28, Employee#1bc=27, Employee#142=25, Employee#142=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : one:value : 25
See here both keys having one showing same value also. So its unexpected.Now run the same programme again by changing e2.setName("diffnt"); which is e2.setName("one"); here ...Now the o/p will be this :
is e equals e1 true
{Employee#1aa=28, Employee#1bc=27, Employee#142=25, Employee#27b=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : diffnt:value : null
So by adding changing the mutable key in a map is not encouraged.
To make the answer compact:
The root cause is that HashMap calculates an internal hash of the user's key object hashcode only once and stores it inside for own needs.
All other operations for data navigation inside the map are doing by this pre-calculated internal hash.
So if you change the hashcode of the key object (mutate) it will be still stored nicely inside the map with the changed key object's hashcode (you could even observe it via HashMap.keySet() and see the altered hashcode).
But HashMap internal hash will not be recalculated of course and it will be the old stored one and the map won't be able to locate your data by the provided mutated key object new hashcode. (e.g. by HashMap.get() or HashMap.containsKey()).
Your key-value pairs will be still inside the map but to get it back you will need that old hash code value that was given when you put your data into the map.
Notice that you also will be unable to get data back by the mutated key object taken right from the HashMap.keySet().
How to use an object as a key in a hashmap. If you use an object as key do you need to override equals and hashcode methods for that object?
A simple thumb rule is to use immutable objects as keys in a HashMap.
because:
If it were mutable, then the hashcode() value or equals() condition might change, and you would never be able to retrieve the key from your HashMap.
More precisely, class fields that are used to compute equals() and hashcode() should be immutable!
Now, suppose you create your own class:
To compare two objects of your class you will have to override equals()
To use it as a key in any Hash based Data structure you will have to override hashcode() (again, keeping immutability in mind)
Remember that if two objects are equal(), then their hashcode() should be equal as well!
hashCode() -HashMap provides put(key, value) for storing and get(key) for retrieving values from a HashMap. When using put(key, value) to store a key-value-pair, HashMap calls hashcode() on the key object to calculate a hash that is used to find a bucket where the Entry object is stored. When get() is used to retrieve a value, again, the key object is used to calculate a hash which is used then to find a bucket where that particular key is stored.
equals() - equals() is used to compare objects for equality. In the case of HashMap, the key object is used for comparison, also using equals(). HashMap knows how to handle hashing collisions (more than one key having the same hash value, thus assigned to the same bucket). In that case objects are stored in a linked list (refer to the figure for more clarity).
hashCode() helps in finding the bucket where that key is stored, equals() helps in finding the right key as there may be more than one key-value pair stored in a single bucket.
You can use any object in a HashMap as long as it has properly defined hashCode and equals methods - those are absolutely crucial because the hashing mechanism depends on them.
Answer to your question is yes, objects of custom classes can be used as a key in a HashMap. But in order to retrieve the value object back from the map without failure, there are certain guidelines that need to be followed.
1)Custom class should follow the contract between hashCode() and equals().
The contract states that:
If two objects are equal according to the equals(Object) method, then calling
the hashCode method on each of the two objects must produce the same integer result.
This can be done by implementing hashcode() and equals() in your custom class.
2) Make custom class immutable.
Hint: use final, remove setters, use deep copy to set fields
package com.java.demo.map;
import java.util.HashMap;
public class TestMutableKey
{
public static void main(String[] args)
{
//Create a HashMap with mutable key
HashMap<Account, String> map = new HashMap<Account, String>();
//Create key 1
Account a1 = new Account(1);
a1.setHolderName("A_ONE");
//Create key 2
Account a2 = new Account(2);
a2.setHolderName("A_TWO");
//Put mutable key and value in map
map.put(a1, a1.getHolderName());
map.put(a2, a2.getHolderName());
//Change the keys state so hash map should be calculated again
a1.setHolderName("Defaulter");
a2.setHolderName("Bankrupt");
//Success !! We are able to get back the values
System.out.println(map.get(a1)); //Prints A_ONE
System.out.println(map.get(a2)); //Prints A_TWO
//Try with newly created key with same account number
Account a3 = new Account(1);
a3.setHolderName("A_THREE");
//Success !! We are still able to get back the value for account number 1
System.out.println(map.get(a3)); //Prints A_ONE
}
}
Yes, you should override equals and hashcode, for the proper functioning of the code otherwise you won't be able to get the value of the key which you have inserted in the map.
e.g
map.put(new Object() , "value") ;
when you want to get that value ,
map.get(new Object()) ; // This will always return null
Because with new Object() - new hashcode will be generated and it will not point to the expected bucket number on which value is saved, and if eventually bucket number comes to be same - it won't be able to match hashcode and even equals so it always return NULL .
Yes, we can use any object as key in a Map in java but we need to override the equals() and hashCode() methods of that object class. Please refer an example below, in which I am storing an object of Pair class as key in a hashMap with value type as string in map. I have overriden the hashCode() and equals() methods of Pair class. So, that different objects of Pair class with same values of Pair(x,y) will be treated as one object only.
import java.util.*;
import java.util.Map.Entry;
class App { // Case-sensitive
private class Pair {
private int x, y;
public Pair(int x, int y) {
this.x = x;
this.y = y;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + x;
result = prime * result + y;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Pair other = (Pair) obj;
if (x != other.x)
return false;
if (y != other.y)
return false;
return true;
}
}
public static void main(String[] args) {
App obj = new App();
obj.show();
}
private void show() {
Map<Pair, String> map = new HashMap<>();
Pair obj1 = new Pair(10, 20);
Pair obj2 = new Pair(40, 50);
Pair obj3 = new Pair(10, 20);
// We can see that obj1 and obj3 have same values. So, we want to store these
// objects
// as one .To achieve
// that,
// we have overridden equals() and hashCode() methods of Pair class.
map.put(obj1, "First");
map.put(obj2, "Second");
map.put(obj3, "Third");
System.out.printf("Size of Map is :%d \n", map.size());
for (Entry<App.Pair, String> p : map.entrySet()) {
Pair pair = p.getKey();
System.out.printf("Map key-value pair is (%d,%d)->%s \n", pair.x, pair.y, p.getValue());
}
// output -
// Size of Map is :2
// Map key-value pair is (10,20)->Third
// Map key-value pair is (40,50)->Second
}
}
This question was asked to me in a job interview and I still don't know answer so ask here. Lets say hashCode() of key object returns a fixed integer so HashMap would look like a LinkedList.
How would a duplicate element be found and replaced by new value in map?
e.g. if following 1001 puts are performed in order listed below,
put(1000,1000), put(1,1), put( 2, 2), put ( 3,3 ) ....put(999,999), put(1000,1000 )
Would map be traversed all the way to end and then new one be inserted at head when last put(1000,1000) is performed?
OR
Map has some other way to locate and replace duplicate keys?
First case is correct.
In your case when hashCode() is returning same hash value for all the keys. In the java HashMap, Key and Value both are stored in the bucket as Map.Entry object. When perform the second or further put() operations into the map, it will traverse all the element to check whether Key is already present in the Map. If Key is not found then new Key and Value pair will be added into the linked list. If Key is found in the list then it update the Value for the pair.
Details explanation about java HashMap working: How HashMap works in Java
Take this sample code and run in the debug mode and observe how the new Key and Value pair are inserted into the Map.
In the class you will need to hashCode() (we want to control how the hash codes are generated for Node), toString() (just to output the Node value in SOUT) and equals() (defines the equality of the keys based on the value of Node member variable Integer, for updating the values.) methods for getting it working.
public class HashMapTest {
static class Node {
Integer n;
public Node(int n) {
this.n = n;
}
#Override
public int hashCode() {
return n%3;
}
#Override
public boolean equals(Object object) {
Node node = (Node)object;
return this.n.equals(node.n);
}
#Override
public String toString() {
return n.toString();
}
}
public static void main(String[] args) {
Map<Node, String> map = new HashMap<>();
for (int i = 0; i<6; i++) {
map.put(new Node(i), ""+i); // <-- Debug Point
}
map.put(new Node(0), "xxx");
} // <-- Debug Point
}
First 3 entries in the map: (hash code is n%3)
Three more values: (hash code is n%3)
Now don't confused about the ordering of the node, I have executed them on java 1.8 and HashMap uses TreeNode, an implementation of Red-Black tree as per the code documentation. This can be different in different versions of the java.
Now lets update the Value of Key 0:
When the hash code is the same, the hash map compares objects using the equals method.
For example, let's say you put a bunch of elements in the hash map:
put(1000,1000), put(1,1), put( 2, 2), put ( 3,3 ) ....put(999,999)
And then you do this:
put(1000,1000 )
1000 is already in the map, the hash code is the same, it is also the same in terms of the equals method, so the element will be replaced, no need to iterate further.
Now if you do:
put(1234, 1234)
1234 is not yet in the map. All the elements are in a linked list, due to the fixed hash code. The hash map will iterate over the elements, comparing them using equals. It will be false for all elements, the end of the list will be reached, and the entry will be appended.
JDK implementations changes over time !
In JDK8, if the hashCode() is a constant value, the implementation creates a tree not a linked list in order to protect against DDOS attack 1.
Apologies for the fairly naive question, but I believe my own answer to be naive. I think keys (in HashTables) are immutable because we wouldn't want to somehow accidentally alter a key and therefore mess with the sorting of the HashTable. Is this a correct explanation? If so, how can it be more correct?
During the HashTable.put the key is hashed and it an its value are stored in one of a number of buckets (which are lists of key value pairs) based on the hash, for example something like:
bucket[key.hashcode() % numberOfBuckets].add(key, value)
If the key's hashcode changes after insertion it could then be in the wrong bucket and you would then not be able to find it and the hashtable would incorrectly return null on any get for that key.
Aside: Understanding the inner workings of a hashtable helps you understand the importance of a good quality hashcode function for your keys. As a poor hashcode function could result in a poor distribution of keys in buckets. And as buckets are just lists, this results in a lot of linear searches which greatly reduces the effectiveness of the hashtable. e.g. this terrible hashcode function puts everything in one bucket, so it's effectively just one list.
public int hashcode { return 42; /*terrible hashcode example, don't use!*/ }
This is also one reason why prime numbers appear in good hashcode functions, e.g.:
public int hashcode {
int hash = field1.hashcode();
hash = hash*31 + field2.hashcode(); //note the prime 31
hash = hash*31 + field3.hashcode();
return hash;
}
The general idea is correct, but not its details.
Keys in a HashTable need not be immutable, it's the result of a call to their hashCode() (and equals) method that needs to stay immutable and consistent (for the hash table to behave predictably, that is).
From a high-level point of view, it's because of the way hash tables work : when a (key, value) pair is inserted, the key's hashCode is used internally to figure out a "bucket" where the value will be put. And when the value is retrieved by key, the hashCode is computed once more, to find the bucket back.
Now if at any point in time between the insertion and the retreival, the result of calling hashCode changes, the "lookup bucket" will be different than the "insertion" bucket, and things will not behave predictably.
To sum up, given a Key object that looks like this (two internal Strings compose the objet, but only one, partOfHashCode is taken into account in hashCode / equals) :
public static class Key {
private String partOfHashCode;
private String notPartOfHashCode;
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((partOfHashCode == null) ? 0 : partOfHashCode.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Key other = (Key) obj;
if (partOfHashCode == null) {
if (other.partOfHashCode != null)
return false;
} else if (!partOfHashCode.equals(other.partOfHashCode))
return false;
return true;
}
}
The it is fine to use it this way :
public static void main(String[] args) {
Map<Key, String> myMap = new HashMap<>();
Key key = new Key();
key.partOfHashCode = "myHash";
myMap.put(key, "value");
key.notPartOfHashCode = "mutation of the key, but not of its hash/equals definition";
System.out.println(myMap.get(key));
}
(This logs the "value" object in the console).
But it is NOT fine to use it this way
public static void main(String[] args) {
Map<Key, String> myMap = new HashMap<>();
Key key = new Key();
key.partOfHashCode = "myHash";
myMap.put(key, "value");
key.partOfHashCode = "mutation of the hashCode of the key";
System.out.println(myMap.get(key));
}
(This last example could log "null" in the console).
For more on this subject, you should read also on hashCode / equals consistency.
There is no inherent guarantee in Java that HashTable-keys are immutable. It is not even guaranteed that their hashcode remains the same. But if you add keys that have a mutable hashCode you are in trouble. Assume you are inserting a key with a hashCode of 1. Then it is inserted in a hash-bucket corresponding to 1. Then alter the object to have a hashCode of 2 and call hashMap.get(key). While the object is still in the hashTable the system will look in the bucket corresponding to 2, but won't find it there. You won't even be able to remove the entry since it won't be found.
tl;dr For your application to work properly HashTable-keys need to have immutable hashcodes, but you have to take care of that fact for yourself.
I understand that HashSet is based on HashMap implementation but is used when you need unique set of elements. So why in the next code when putting same objects into the map and set we have size of both collections equals to 1? Shouldn't map size be 2? Because if size of both collection is equal I don't see any difference of using this two collections.
Set testSet = new HashSet<SimpleObject>();
Map testMap = new HashMap<Integer, SimpleObject>();
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println(testSet.size());
System.out.println(testMap.size());
The output is 1 and 1.
SimpleObject code
public class SimpleObject {
private String dataField1;
private int dataField2;
public SimpleObject(){}
public SimpleObject(String data1, int data2){
this.dataField1 = data1;
this.dataField2 = data2;
}
public String getDataField1() {
return dataField1;
}
public int getDataField2() {
return dataField2;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((dataField1 == null) ? 0 : dataField1.hashCode());
result = prime * result + dataField2;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SimpleObject other = (SimpleObject) obj;
if (dataField1 == null) {
if (other.dataField1 != null)
return false;
} else if (!dataField1.equals(other.dataField1))
return false;
if (dataField2 != other.dataField2)
return false;
return true;
}
}
The map holds unique keys. When you invoke put with a key that exists in the map, the object under that key is replaced with the new object. Hence the size 1.
The difference between the two should be obvious:
in a Map you store key-value pairs
in a Set you store only the keys
In fact, a HashSet has a HashMap field, and whenever add(obj) is invoked, the put method is invoked on the underlying map map.put(obj, DUMMY) - where the dummy object is a private static final Object DUMMY = new Object(). So the map is populated with your object as key, and a value that is of no interest.
A key in a Map can only map to a single value. So the second time you put in to the map with the same key, it overwrites the first entry.
In case of the HashSet, adding the same object will be more or less a no-op. In case of a HashMap, putting a new key,value pair with an existing key will overwrite the existing value to set a new value for that key. Below I've added equals() checks to your code:
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
//If the below prints true, the 2nd add will not add anything
System.out.println("Are the objects equal? " , (simpleObject1.equals(simpleObject2));
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
//This is a no-brainer as you've the exact same key, but lets keep it consistent
//If this returns true, the 2nd put will overwrite the 1st key-value pair.
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println("Are the keys equal? ", (key.equals(key));
System.out.println(testSet.size());
System.out.println(testMap.size());
I just wanted to add to these great answers, the answer to your last dilemma. You wanted to know what is the difference between these two collections, if they are returning the same size after your insertion. Well, you can't really see the difference here, because you are inserting two values in the map with the same key, and hence changing the first value with the second. You would see the real difference (among the others) should you have inserted the same value in the map, but with the different key. Then, you would see that you can have duplicate values in the map, but you can't have duplicate keys, and in the set you can't have duplicate values. This is the main difference here.
Answer is simple because it is nature of HashSets.
HashSet uses internally HashMap with dummy object named PRESENT as value and KEY of this hashmap will be your object.
hash(simpleObject1) and hash(simplObject2) will return the same int. So?
When you add simpleObject1 to hashset it will put this to its internal hashmap with simpleObject1 as a key. Then when you add(simplObject2) you will get false because it is available in the internal hashmap already as key.
As a little extra info, HashSet use effectively hashing function to provide O(1) performance by using object's equals() and hashCode() contract. That's why hashset does not allow "null" which cannot be implemented equals() and hashCode() to non-object.
I think the major difference is,
HashSet is stable in the sense, it doesn't replace duplicate value (if found after inserting first unique key, just discard all future duplicates), and HashMap will make the effort to replace old with new duplicate value. So there must be overhead in HashMap of inserting new duplicate item.
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, Serializable
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set
More Details