Directly from this java doc:
A special case of this prohibition is that it is not permissible for a
map to contain itself as a key. While it is permissible for a map to
contain itself as a value, extreme caution is advised: the equals and
hashCode methods are no longer well defined on such a map.
Why would the hashcode and equals no longer be well defined on such a map?
The relevant part form AbstractMap.equals which is used by most Map implementations:
Iterator<Entry<K,V>> i = entrySet().iterator();
while (i.hasNext()) {
Entry<K,V> e = i.next();
K key = e.getKey();
V value = e.getValue();
if (value == null) {
if (!(m.get(key)==null && m.containsKey(key)))
return false;
} else {
if (!value.equals(m.get(key))) // would call equals on itself.
return false;
}
}
Adding the map as a value would result in an infinite loop.
The full quote of the paragraph from the Java Docs is:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map. A special case of this prohibition is that it is not permissible for a map to contain itself as a key. While it is permissible for a map to contain itself as a value, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a map.
The AbstractMap.hashCode() method uses the hash code of the key value pairs in the map to compute a hash code. Therefore the hash code generated from this method would change every time the map is modified.
The hash code is used to compute the bucket to place a new entry. If the map was used as a key within itself then the computed bucket would be different everytime a new entry is updated/removed/modified. Therefore, future lookups with the map as a key will most likely fail because a differnt bucket is computed from the hash code. Future puts may not be able to detect that the key is already present in the map and then allow multiple entries that have the same key (but in different buckets)
Two maps are equal if the same keys map om the same values. (In some implementations.) So to check equality, the equality of every member should be checked.
Therefore, if a map contains itself, you would get an infinite recurssion of equality checks.
The same goes for hashes, as these can be calculated dependend on the hashes of the elements in the map.
Example:
Map<Int, Object> ma;
Map<Int, Object> mb;
Map<Int, Object> mc;
ma.put(1, ma);
ma.put(2, mb);
mc.put(1, ma);
mc.put(2, mb);
As a human, we can see ma and mc are equal from the definition. A computer would see 2 maps on mb (an empty map) in both maps, which is good. It would see 1 maps on another map in both mc and ma. It checks if these maps are equal. To determine this, it checks again if the two value for 1 are equals. And again.
Note that this is not the case for all implementations. Some implementations might check equality on the location in memory the object is saved, ... But every recursive check will loop infinitely.
To try to explain it:
The equals method will iterate over both Maps and call the equals method of each key and value of the map. So, if a map contains itself, you would keep calling the equals method indefinitely.
The same thing happens with the hash code.
Source: source code of the class AbstractMap
Related
I'm trying to create a collision intentionally.
fun main(args: Array<String>) {
val india = Country("India1", 1000)
val india2 = Country("India2", 1000)
val countryCapitalMap: HashMap<Country, String> = hashMapOf()
countryCapitalMap.put(india, "Delhi1")
countryCapitalMap.put(india2, "Delhi2")
}
class Country(var name: String, var population: Long) {
override fun hashCode(): Int {
return if (name.length % 2 == 0) 31 else 95
}
override fun equals(obj: Any?): Boolean {
val other = obj as Country?
return if (name.equals(other!!.name, ignoreCase = true)) true else false
}
}
So, I have india and india2 objects. I've overridden equals() and hashCode() methods for Country so that:
india.hashCode() == india2.hashCode() --> true
india.equals(india2) --> false
According to Collision resolution in Java HashMap and part of the article "Lets put this Country objects in hashmap", if key1 has result of hash(key.hashCode()) equal to the same operation on the key2 then there should be collision.
So, I put breakpoint to see content of countryCapitalMap and see that its size is 2. I.e. it contains two different entries and there is no linkedList. Hence, there is no collision.
My questions are:
Why countryCapitalMap has a size of 2? Why is there no collision?
Why doesn't HashMap creates a LinkedList with two entries with keys that is not equal but have the same hashCode?
You are confusing a collision - case when hashes of keys are the same (or to be precise, when hashes are withing the range corresponding to the same bucket of HashMap), with duplicated keys (i.e. keys that are identical according to the equals() method). These are two completely different situations.
Under the hood, HashMap maintains an array of buckets. Each bucket corresponds to a range of hash values.
If hashes of the keys collide (but keys are not equal) entries (Nodes to be precise) will be mapped to the same bucket and form a linked list, which after a certain threshold will turn into a tree.
Conversely, an attempt to add a new entry with a key that is already present in a map (i.e. duplicated key according to the equals() method) will result in updating the value of the existing entry.
Because as you've already observed, india.equals(india2) will return false, HashMap will contain two entries.
And since the hashes of india and india2 are identical, you've succeeded in your intention to create collision. Both entries will be added, both will end up in the same bucket (i.e. their nodes will form a linked list).
If you wonder how exactly this linked list looks like, take a look at the source code of the HashMap class. And you'll find that there's a field Node<K,V>[] table - an array of buckets, and there's an inner class Node:
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
}
And each node holds a reference next pointing to the next node mapped to the same bucket (if any).
Sidenotes :
The hash-function that you've provided in the example is a bad one. Sure, that's clear that you did that on purpose (to ensure that two keys would collide). But it's important to point out that the proper implementation of the equals/hashCode contract is crucial for every object that is intended to be used with Collections framework. And for hash-based collections like HashMap and HashSet the implementation of hashCode() is significant to perform well. If hash-function generates a lot of collisions, as a consequence many entries could appear in the same buckets meanwhile a large number of buckets could remain unoccupied. And depending on a load factor (ratio between the occupied and total number of buckets) your collection might never get resized which will lead to degradation of performance.
Another thing related to hashes that worth to mention is that the hashCode() of the key will be invoked only once, while a new node gets created. Take a look at the code above, and you'll notice that the field hash is marked as final. That's done on purpose, because computing hash some objects can be costful. Which means that if the state of this key changes a HashMap will be unaware of that change and will be unable to recognize the same key, i.e. if the previous and new hash will differ equals() method would not be invoked and HashMap will allow the same key to appear twice. That leads to another rule of thumb: an object that is used as a key should be immutable.
I was just wondering, when having a HashMap<HashMap<Integer, Integer>, String> and I add as key a new HashMap, does it get treated as a duplicate key or we have a call by reference and the value is not looked at at all?
Thanks :)
#Stephen has given a very good conceptual explanation.
So Just in brief:
When we are "putting" something in a HashMap it is internally stored in a "table" which is an array of "Entry"
Now at which position (bucketIndex) of that table it will be stored will be decided based on the "hashcode" of the key to be stored.Before storing it JVM will check if in that bucketIndex any "Entry" is present.This case is possible when two "keys" would have same hashcode.Now if there is an Entry JVM will further check if the "key" itself is identical(to know identical JVM will call "equals" on that key) to "key" already present.If yes it will consider it as a duplicate key and update its respective value in "Entry". If no it will just add another entry at the same bucket Index.
At time of "getting" the value from the map by sending a "key" ,similar process will run.First "hashcode" of the key would be extracted and depending on this the position (bucketIndex) of the table to be looked would be determined. Now if there is no content in that index "null" would be returned.Otherwise...
JVM will go to that index and there is a possibility that more than one "Entry" can be there because more than one object can have same hashcode. Now JVM will call "equals" method on that key to check if the Key present in the table is same as the inputKey sent for retrieval of value. if "equals" returns true then we will get the desired value.
So in general a key will be considered as a duplicate key if and only if hashcode() for both will return same value and equals() will return true.
Now coming to your question "I add as key a new HashMap, does it get treated as a duplicate key ", the answer is yes if and only if your new HashMap has exact same Entry i.e key,value pairs as that of an existing HashMap key.
Because :
if you will have a look at hashcode() implemenatation of HashMap then you will see its hashcode is calculated based on the "Key" and "value".So if 2 hashmaps will have same set,they will have the same hashcode().
the equals() of HashMap checks if Entry are identical.So again if two Hashmaps will have same set they are equal.
Now see the below code demonstrating this concept :
public static void main(String[] args) {
// TODO Auto-generated method stub
HashMap<Integer,Integer> keyMap=new HashMap<Integer, Integer>();
keyMap.put(2, 1020);
keyMap.put(3, 1352);
keyMap.put(23,1256);
System.out.println("hashcode keymap1:"+keyMap.hashCode());
HashMap<Integer,Integer> keyMap2=new HashMap<Integer, Integer>();
keyMap2.put(1, 100);
keyMap2.put(4, 152);
keyMap2.put(43,156);
System.out.println("hashcode keymap2:"+keyMap2.hashCode());
HashMap<HashMap<Integer,Integer>,String> mainMap=new HashMap<HashMap<Integer,Integer>,String>();
mainMap.put(keyMap, "1st value");
mainMap.put(keyMap2, "2ndvalue");
System.out.println(mainMap);
HashMap<Integer,Integer> keyMap3=new HashMap<Integer, Integer>();
keyMap3.put(23,1256);
keyMap3.put(3, 1352);
keyMap3.put(2, 1020);
System.out.println("hashcode keymap3:"+keyMap3.hashCode());
mainMap.put(keyMap3, "3rd value");
System.out.println(mainMap);
if(mainMap.containsKey(keyMap3))
System.out.println(" value retrieved is :"+mainMap.get(keyMap3));
else
System.out.println("key not found");
}
Here you can observe keymap and keymap3 are having same set of key,value and same hashcode. So both are here duplicate key, hence value of keymap is updated by value of keymap3.
The contract for the HashMap.equals(Object) method is:
"Compares the specified object with this map for equality. Returns true if the given object is also a map and the two maps represent the same mappings. More formally, two maps m1 and m2 represent the same mappings if m1.entrySet().equals(m2.entrySet())."
Now the standard behavior of a Map is to treat keys as the same if equals(Object) says they are equal.
So the answer to your question is that if you have
HashMap<Integer, Integer> k1 = // some map
HashMap<Integer, Integer> k2 = // another map
HashMap<HashMap<Integer, Integer>, String> map = // some
then using k1 and k2 as keys in map would give you one entry if k1.equals(k2) and two entries otherwise.
And given that k1 and k2 are maps, we determine if they are equal by comparing their respective sets of map entries.
This has two obvious problems:
If you change k1 or k2 while they are keys for entries in map, then you break a fundamental invariant for map. When that happens you will find that operations on map give incorrect results; e.g. map.get(k1) will give the wrong answer.
Whenever you do an operation involving a lookup on map, you will call HashMap.hashCode() for a key object. Calculating the hash code for the key entails calculating the hashcode for each and every key and value in the map HashMap<Integer, Integer>. That is expensive, especially since this HashMap.hashCode() does not (cannot) cache anything.
In short, using a HashMap as a key for another HashMap is a bad idea.
So, to answer your question:
I was just wondering, when having a HashMap<HashMap<Integer, Integer>, String> and I add as key a new HashMap, does it get treated as a duplicate key or we have a call by referencea and the value is not looked at at all?
It will not be a duplicate keyb unless the respective keys are maps with the same set of entries. This is what HashMap.equals(Object) tests.
It is not compared as a reference; i.e. it is not compared with == semantics. The HashMap.equals(Object) method is used for comparisons.
a - Note that "call by reference" terminology is not applicable to this situation. Call by reference / call by value is about how parameters are passed when a method is called.
b - .... provided that you don't violate the invariant.
I want to maintain a list of objects such that each object in the list is unique.Also I want to retrieve it at one point. Objects are in thousands and I can't modify their source to add a unique id. Also hascodes are unreliable.
My approach was to utilize the key uniqueness of a map.
Say a maintain a map like :
HashMap<Object,int> uniqueObjectMap.
I will add object to map with as a key and set a random int as value. But how does java determine if the object is unique when used as a key ?
Say,
List listOne;
List listTwo;
Object o = new Object;
listOne.add(o);
listTwo.add(o);
uniqueObjectMap.put(listOne.get(0),randomInt()); // -- > line 1
uniqueObjectMap.put(listTw0.get(0),randomInt()); // --> line 2
Will line 2 give an unique key violation error since both are referring to the same object o ?
Edit
So if will unqiueObjectMap.containsKey(listTwo.get(0)) return true ? How are objects determined to be equal here ? Is a field by field comparison done ? Can I rely on this to make sure only one copy of ANY type of object is maintained in the map as key ?
Will line 2 give an unique key violation error since both are referring to the same object o ?
- No. If a key is found to be already present, then its value will be overwritten with the new one.
Next, HashMap has a separate hash() method which Applies a supplemental hash function to a given hashCode (of key objects), which defends against poor quality hash functions.
It does so by calling the Object's hashcode() function.
The default implementation is roughly equivalent to the object's unique identifier (much like a memory address); however, there are objects that are compare-by-value. If dealing with a compare-by-value object, hashcode() will be overridden to compute a number based on the values, such that two identical values yield the same hashcode() number.
As for the collection items that are hash based, the put(...) operation is fine with putting something over the original location. In short, if two objects yeild the same hashcode() and a positive equals(...) result, then operations will assume that they are for all practical purposes the same object. Thus, put may replace the old with the new, or do nothing, as the object is considered the same.
It may not store two copies in the same "place" as it makes no sense to store two copies at the same location. Thus, sets will only contain one copy, as will map keys; however, lists will possibly contain two copies, depending on how you added the second copy.
How are objects determined to be equal here ?
By using equals and Hashcode function of Object class.
Is a field by field comparison done ?
No, if you dont implement equals and hashcode, java will compare the references of your objects.
Can I rely on this to make sure only one copy of ANY type of object is maintained in the map as key ?
No.
Using a Set is a better approch than using Map because it removes duplicates by his own, but in this case it wont work either because Set determinates duplicates the same way like a Map does it with Keys.
If you will refer to same then it ll not throw an error because when HashMap get same key then it's related value will be overwrite.
If the same key is exist in HashMap then it will be overwritten.
if you want to check if the key or value is already exist or not then you can use:
containsKey() and containsValue().
ex :
hashMap.containsKey(0);
this will return true if the key named 0 is already exist otherwise false.
By getting hashcode value using hash(key.hashCode())
HashMap has an inner class Entry with attributes
final K key;
V value;
Entry<K ,V> next;
final int hash;
Hash value is used to calculate the index in the array for storing Entry object, there might be the scenario where 2 unequal object can have same equal hash value.
Entry objects are stored in linked list form, in case of collision, all entry object with same hash value are stored in same Linkedlist but equal method will test for true equality. In this way, HashMap ensure the uniqueness of keys.
I know this has been asked many times, but I can't find an exact answer to my question.
In chapter 3 of Effective Java, there is a scenario there that shows and explains why hashcode should be overriden together with the equals method. I get the most part of it but there is a part there that I can't understand.
There is a given class there that override the equals method but not the hashCode method. The object is put as a key in a map
Map<PhoneNumber, String> m = new HashMap<PhoneNumber, String>();
m.put(new PhoneNumber(707, 867, 5309), "Jenny");
I understand that if we get using another equal object (m.get(new PhoneNumber(707, 867, 5309))), it will return null simply because their hashcodes are not overriden to return equal value for equal objects (because it will search for the object in another bucket because of different hashcode).
But according to my understanding, in that situation, there is no guarantee that the hashcodes of the two objects will always return distinct. What if they happen to return the same hashcode?
I think it is explained in this part
Even if the two instances happen to hash to the same bucket, the get
method will almost certainly return null, as HashMap has an
optimization that caches the hash code associated with each entry and
doesn’t bother checking for object equality if the hash codes don’t
match.
I just don't get the cache thing. Can someone explain this elaborately?
Also, I already did my home work, and found a related question
Influence of HashMap optimization that caches the hash code associated with each entry to its get method
But I'm not that satisfied with the answer accepted, also, the answerer says in the comment that
A hash code can be an arbitrary int, thus each hash code can't have
its own bucket. Consequently, some objects with different hash codes
end up in the same bucket.
Which I completely disagree. To my understanding different hashcodes will never end up in the same bucket.
Take a look at how java.util.HashMap calculates a bucket number for a key by hashCode:
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}
If hashtable length = 16 then both 128 and 256 will get in bucket #0. Hashtable is an array of entries:
Entry<K,V>[] table
...
class Entry<K,V> {
K key;
V value;
Entry<K,V> next;
int hash;
...
Entries may form a chain (LinkedList). If bucket #0 (table[0]) is empty (null) then the new entry will be placed directly there, otherwise HashMap will find the last entry in the chain and set the last entry's next = new entry.
When this is said "Even if the two instances happen to hash to the same bucket" it doesn't mean that they have same hashcode. Even different hashcodes can map to same bucket [read about hashing].
So even if the keys hash to the same bucket, .equals may not be invoked (due to the caching optimizations) for the relevant element (since not even the hash-codes matches). Thus, even if the relevant element resides in the same bucket, it may never be compared through .equals, and thus not "found".
In Java, I understand if two keys maps to one value , linear chaining occurs due to collision.
For Example:
 Map myMap= new HashMap(); //Lets says both of them get mapped to same bucket-A and
myMap.put("John", "Sydney");//linear chaining has occured.
myMap.put("Mary","Mumbai"); //{key1=John}--->[val1=Sydney]--->[val2=Mumbai]
So when I do:
myMap.get("John"); // or myMap.get("Mary")
What does the JVM return since bucket-A contains two values?
Does it return the ref to "chain"? Does it return "Sydney"? Or does it return "Mumbai"?
Linear chaining happens when your keys have the same hashcode and not when two keys map to one value.
So when I do: myMap.get("John"); // or myMap.get("Mary")
map.get("John") gives you Sydney
map.get("Mary") gives you Mumbai
What does the JVM return since bucket-A contains two values?
If the same bucket contains two values, then the equals method of the key is used to determine the correct value to return.
It is worthwhile mentioning the worst-case scenario of storing (K,V) pairs all having the same hashCode for Key. Your hashmap degrades to a linked list in that scenario.
The hashCode of your method determines what 'bucket' (aka list, aka 'linear chain') it will be put in. The equals method determines which object will actually be picked from the 'bucket', in the case of collision. This is why its important to properly implement both methods on all object you intend to store in any kind of hash map.
Your keys are different.
First some terminology
key: the first parameter in the put
value: the second parameter in the put
entry: an Object that holds both the key & the value
When you put into a HashMap the map will call hashCode() on the key and work out which hash bucket the entry needs to go into. If there is something in this bucket already then a LinkedList is formed of entries in the bucket.
When you get from a HashMap the map will call hashCode() on the key and work out which hash bucket to get the entry from. If there is more than one entry in the bucket the the map will walk along the LinkedList until it finds an entry with a key that equals() the key supplied.
A map will always return the Object tied to that key, the value from the entry. Map performance degrades rapidly if hashCode() returns the same (or similar) values for different keys.
You need to use java generics, so your code should really read
Map<String, String> myMap = new HashMap<String, String>();
This will tell the map that you want it to store String keys and values.
From my understanding, the Map first resolves the correct bucket (identified by the hashcode of the key). If there's more than one key in the same bucket, the equals method is used to find the right value in the bucket.
Looking at your example what confuses you is that you think values are chained for a given key. In fact Map.Entry objects are chained for a given hashcode. The hashCode of the key gives you the bucked, then you look at the chained entries to find the one with the equal key.