I am trying to understand the implementation HashTables in Java. Below is my code:
Hashtable<Integer, String> hTab = new Hashtable<Integer, String>();
hTab.put(1, "A");
hTab.put(1, "B");
hTab.put(2, "C");
hTab.put(3, "D");
Iterator<Map.Entry<Integer, String>> itr = hTab.entrySet().iterator();
Entry<Integer, String> entry;
while(itr.hasNext()){
entry = itr.next();
System.out.println(entry.getValue());
}
When I run it, I get the below output:
D
C
B
Which means that there has been a collision for the Key = 1; and as per the implementation:
"Whenever a collision happens in the hashTable, a new node is created in the linkedList corresponding for the particular bucket and the EntrySet(Key, Value) pairs are stored as nodes in the list, the new value is inserted in the beginning of the list for the particular bucket". And I completely agree to this implementation.
But if this is true, then where did "A" go when I try to retrieve the entrysets from the hashTable?
Again, I tried with the below code to understand this by implementing my own HashCode and equals method. And surprisingly, this works perfect and as per the HashTable implementation. Below is my code:
public class Hash {
private int key;
public Hash(int key){
this.key = key;
}
public int hashCode(){
return key;
}
public boolean equals(Hash o){
return this.key == o.key;
}
}
public class HashTable1 {
public static void main(String[] args) {
// TODO Auto-generated method stub
Hashtable<Hash, String> hTab = new Hashtable<Hash, String>();
hTab.put(new Hash(1), "A");
hTab.put(new Hash(1), "B");
hTab.put(new Hash(2), "C");
hTab.put(new Hash(3), "D");
Iterator<Map.Entry<Hash, String>> itr = hTab.entrySet().iterator();
Entry<Hash, String> entry;
while(itr.hasNext()){
entry = itr.next();
System.out.println(entry.getValue());
}
}
}
Output :
D
C
B
A
Which is perfect. I am not able to understand this ambiguity in the behavior of HashTable in Java.
Update
#garrytan and #Brian: thanks for responding. But I still have a small doubt.
In my second code, where it works fine. I have created two objects which are new keys and since they are 2 objects, Key collision does not happens in this case and it works fine. I agree with your explanation. However, if in the first set of code I use "new Integer(1)" instead of simply "1", it still doesn't work although now I am creating 2 objects now and they should be different. I cross checked by writing the simple line below:
Integer int1 = new Integer(1);
Integer int2 = new Integer(1);
System.out.println(int1 == int2);
which gives "False". it means now, the Key collision should have been resolved. But still it doesn't work. Why is this?
By design hashtable is not meant to store duplicate keys.
I think you get mixed up between 'hash collision' and 'key collision'. Put it simply, hash table consist of a collection of linked lists (ie: buckets). When you add a new key value pairs (KVPs), it is distributed into the buckets by the key's hash value. 'hash collision' happen when two keys result in the same hash (hence they get put into the same bucket)
A good hash function is one that distributes the key evenly into a number of buckets, hence improving key searching performance.
The second example gives the behaviour you want because your implementation of equals is incorrect.
The signature is
public boolean equals(Object o) {}
not
public boolean equals(Hash h) {}
So what you have created is a hash Collision, where two objects have the same hash code (key), but they are not equal according to the equals method (because your signature is wrong, it's still using the == operator and not your this.key == h.key code). As opposed to a key collision, where the objects both have the same hashCode and are also equals, as in your first example. If you fix the code in the second example to implement the actual equals(Object o) method you will see 'A' will again be missing from the values.
In your second example you are not overriding the original equals function because you use the following signature:
public boolean equals(Hash h) {}
Thus the original equals function with Object as a parameter is still used and as you create a new object Hash for each insert that Object is different from the other one and thus your keys for A and B are not equal.
Furthermore a HashTable is designed to have ONE value for EACH key. And keys are indeed relying on the equals functions to be compared.
About your example with two new Integers, try comparing them with .equals(). You could also override the hashCode function to generate different hashCodes or not for each object, i.e. depending on time, but that would be not a good coding principle. Objects which are the same should hash to the same code.
Related
Is it bad practice to use mutable objects as Hashmap keys? What happens when you try to retrieve a value from a Hashmap using a key that has been modified enough to change its hashcode?
For example, given
class Key
{
int a; //mutable field
int b; //mutable field
public int hashcode()
return foo(a, b);
// setters setA and setB omitted for brevity
}
with code
HashMap<Key, Value> map = new HashMap<Key, Value>();
Key key1 = new Key(0, 0);
map.put(key1, value1); // value1 is an instance of Value
key1.setA(5);
key1.setB(10);
What happens if we now call map.get(key1)? Is this safe or advisable? Or is the behavior dependent on the language?
It has been noted by many well respected developers such as Brian Goetz and Josh Bloch that :
If an object’s hashCode() value can change based on its state, then we
must be careful when using such objects as keys in hash-based
collections to ensure that we don’t allow their state to change when
they are being used as hash keys. All hash-based collections assume
that an object’s hash value does not change while it is in use as a
key in the collection. If a key’s hash code were to change while it
was in a collection, some unpredictable and confusing consequences
could follow. This is usually not a problem in practice — it is not
common practice to use a mutable object like a List as a key in a
HashMap.
This is not safe or advisable. The value mapped to by key1 can never be retrieved. When doing a retrieval, most hash maps will do something like
Object get(Object key) {
int hash = key.hashCode();
//simplified, ignores hash collisions,
Entry entry = getEntry(hash);
if(entry != null && entry.getKey().equals(key)) {
return entry.getValue();
}
return null;
}
In this example, key1.hashcode() now points to the wrong bucket of the hash table, and you will not be able to retrieve value1 with key1.
If you had done something like,
Key key1 = new Key(0, 0);
map.put(key1, value1);
key1.setA(5);
Key key2 = new Key(0, 0);
map.get(key2);
This will also not retrieve value1, as key1 and key2 are no longer equal, so this check
if(entry != null && entry.getKey().equals(key))
will fail.
Hash maps use hash code and equality comparisons to identify a certain key-value pair with a given key. If the has map keeps the key as a reference to the mutable object, it would work in the cases where the same instance is used to retrieve the value. Consider however, the following case:
T keyOne = ...;
T keyTwo = ...;
// At this point keyOne and keyTwo are different instances and
// keyOne.equals(keyTwo) is true.
HashMap myMap = new HashMap();
myMap.push(keyOne, "Hello");
String s1 = (String) myMap.get(keyOne); // s1 is "Hello"
String s2 = (String) myMap.get(keyTwo); // s2 is "Hello"
// because keyOne equals keyTwo
mutate(keyOne);
s1 = myMap.get(keyOne); // returns "Hello"
s2 = myMap.get(keyTwo); // not found
The above is true if the key is stored as a reference. In Java usually this is the case. In .NET for instance, if the key is a value type (always passed by value), the result will be different:
T keyOne = ...;
T keyTwo = ...;
// At this point keyOne and keyTwo are different instances
// and keyOne.equals(keyTwo) is true.
Dictionary myMap = new Dictionary();
myMap.Add(keyOne, "Hello");
String s1 = (String) myMap[keyOne]; // s1 is "Hello"
String s2 = (String) myMap[keyTwo]; // s2 is "Hello"
// because keyOne equals keyTwo
mutate(keyOne);
s1 = myMap[keyOne]; // not found
s2 = myMap[keyTwo]; // returns "Hello"
Other technologies might have other different behaviors. However, almost all of them would come to a situation where the result of using mutable keys is not deterministic, which is very very bad situation in an application - a hard to debug and even harder to understand.
If key’s hash code changes after the key-value pair (Entry) is stored in HashMap, the map will not be able to retrieve the Entry.
Key’s hashcode can change if the key object is mutable. Mutable keys in HahsMap can result in data loss.
This will not work. You are changing the key value, so you are basically throwing it away. Its like creating a real life key and lock, and then changing the key and trying to put it back in the lock.
As others explained, it is dangerous.
A way to avoid that is to have a const field giving explicitly the hash in your mutable objects (so you would hash on their "identity", not their "state"). You might even initialize that hash field more or less randomly.
Another trick would be to use the address, e.g. (intptr_t) reinterpret_cast<void*>(this) as a basis for hash.
In all cases, you have to give up hashing the changing state of the object.
There are two very different issues that can arise with a mutable key depending on your expectation of behavior.
First Problem: (probably most trivial--but hell it gave me problems that I didn't think about!)
You are attempting to place key-value pairs into a map by updating and modifying the same key object. You might do something like Map<Integer, String> and simply say:
int key = 0;
loop {
map.put(key++, newString);
}
I'm reusing the "object" key to create a map. This works fine in Java because of autoboxing where each new value of key gets autoboxed to a new Integer object. What would not work is if I created my own (mutable) Integer object:
MyInteger {
int value;
plusOne(){
value++;
}
}
Then tried the same approach:
MyInteger key = new MyInteger(0);
loop{
map.put(key.plusOne(), newString)
}
My expectation is that, for instance, I map 0 -> "a" and 1 -> "b". In the first example, if I change int key = 0, the map will (correctly) give me "a". For simplicity let's assume MyInteger just always returns the same hashCode() (if you can somehow manage to create unique hashCode values for all possible states of an object, this will not be an issue, and you deserve an award). In this case, I call 0 -> "a", so now the map holds my key and maps it to "a", I then modify key = 1 and try to put 1 -> "b". We have a problem! The hashCode() is the same, and the only key in the HashMap is my MyInteger key object which has just been modified to be equal to 1, so It overwrites that key's value so that now, instead of a map with 0 -> "a" and 1 -> "b", I have 1 -> "b" only! Even worse, if I change back to key = 0, the hashCode points to 1 -> "b", but since the HashMap's only key is my key object, it satisfied the equality check and returns "b", not "a" as expected.
If, like me, you fall prey to this type of issue, it's incredibly difficult to diagnose. Why? Because if you have a decent hashCode() function it will generate (mostly) unique values. The hash value will largely take care of the inequality problem when structuring the map but if you have enough values, eventually you'll get a collision on the hash value and then you get unexpected and largely inexplicable results. The resultant behavior is that it works for small runs but fails for larger ones.
Advice:
To find this type of issue, modify the hashCode() method, even trivially (i.e. = 0--obviously when doing this, keep in mind that the hash values should be the same for two equal objects*), and see if you get the same results--because you should and if you don't, there's likely a semantic error with your implementation that's using a hash table.
*There should be no danger (if there is--you have a semantic problem) in always returning 0 from a hashCode() (although it would defeat the purpose of a Hash Table). But that's sort of the point: the hashCode is a "quick and easy" equality measure that's not exact. So two very different objects could have the same hashCode() yet not be equal. On the other hand, two equal objects must always have the same hashCode() value.
p.s. In Java, from my understanding, if you do such a terrible thing (as have many hashCode() collisions), it will start using a red-black-tree as opposed to ArrayList. So when you expect O(1) lookup, you'll get O(log(n))--which is better than the ArrayList which would give O(n).
Second Problem:
This is the one that most others seem to be focusing on, so I'll try to be brief. In this use case, I try to map a key-value pair and then I do some work on the key and then want to come back and get my value.
Expectation: key -> value is mapped, I then modify key and try to get(key). I expect that will give me value.
It seems kind of obvious to me that this wouldn't work but I'm not above having tried to use things like Collections as a key before (and quite quickly realizing it doesn't work). It doesn't work because it's quite likely that the hash value of key has changed so you won't even be looking in the correct bucket.
This is why it's very inadvisable to use collections as keys. I would assume, if you were doing this, you're trying to establish a many-to-one relationship. So I have a class (as in teaching) and I want two groups to do two different projects. What I want is that given a group, what is their project? Simple, I divide the class in two, and I have group1 -> project1 and group2 -> project2. But wait! A new student arrives so I place them in group1. The problem is that group1 has now been modified and likely its hash value has changed, therefore trying to do get(group1) is likely to fail because it will look in a wrong or non-existent bucket of the HashMap.
The obvious solution to the above is to chain things--instead of using the groups as keys, give them labels (that don't change) that point to the group and therefore the project: g1 -> group1 and g1 -> project1, etc.
p.s.
Please make sure to define a hashCode() and equals(...) method for any object you expect to use as a key (eclipse and, I'm assuming, most IDE's can do this for you).
Code Example:
Here is a class which exhibits the two different "problem" behaviors. In this case, I attempt to map 0 -> "a", 1 -> "b", and 2 -> "c" (in each case). In the first problem, I do that by modifying the same object, in the second problem, I use unique objects, and in the second problem "fixed" I clone those unique objects. After that I take one of the "unique" keys (k0) and modify it to attempt to access the map. I expect this will give me a, b, c and null when the key is 3.
However, what happens is the following:
map.get(0) map1: 0 -> null, map2: 0 -> a, map3: 0 -> a
map.get(1) map1: 1 -> null, map2: 1 -> b, map3: 1 -> b
map.get(2) map1: 2 -> c, map2: 2 -> a, map3: 2 -> c
map.get(3) map1: 3 -> null, map2: 3 -> null, map3: 3 -> null
The first map ("first problem") fails because it only holds a single key, which was last updated and placed to equal 2, hence why it correctly returns "c" when k0 = 2 but returns null for the other two (the single key doesn't equal 0 or 1). The second map fails twice: the most obvious is that it returns "b" when I asked for k0 (because it's been modified--that's the "second problem" which seems kind of obvious when you do something like this). It fails a second time when it returns "a" after modifying k0 = 2 (which I would expect to be "c"). This is more due to the "first problem": there's a hash code collision and the tiebreaker is an equality check--but the map holds k0, which it (apparently for me--could theoretically be different for someone else) checked first and thus returned the first value, "a" even though had it kept checking, "c" would have also been a match. Finally, the 3rd map works perfectly because I'm enforcing that the map holds unique keys no matter what else I do (by cloning the object during insertion).
I want to make clear that I agree, cloning is not a solution! I simply added that as an example of why a map needs unique keys and how enforcing unique keys "fixes" the issue.
public class HashMapProblems {
private int value = 0;
public HashMapProblems() {
this(0);
}
public HashMapProblems(final int value) {
super();
this.value = value;
}
public void setValue(final int i) {
this.value = i;
}
#Override
public int hashCode() {
return value % 2;
}
#Override
public boolean equals(final Object o) {
return o instanceof HashMapProblems
&& value == ((HashMapProblems) o).value;
}
#Override
public Object clone() {
return new HashMapProblems(value);
}
public void reset() {
this.value = 0;
}
public static void main(String[] args) {
final HashMapProblems k0 = new HashMapProblems(0);
final HashMapProblems k1 = new HashMapProblems(1);
final HashMapProblems k2 = new HashMapProblems(2);
final HashMapProblems k = new HashMapProblems();
final HashMap<HashMapProblems, String> map1 = firstProblem(k);
final HashMap<HashMapProblems, String> map2 = secondProblem(k0, k1, k2);
final HashMap<HashMapProblems, String> map3 = secondProblemFixed(k0, k1, k2);
for (int i = 0; i < 4; ++i) {
k0.setValue(i);
System.out.printf(
"map.get(%d) map1: %d -> %s, map2: %d -> %s, map3: %d -> %s",
i, i, map1.get(k0), i, map2.get(k0), i, map3.get(k0));
System.out.println();
}
}
private static HashMap<HashMapProblems, String> firstProblem(
final HashMapProblems start) {
start.reset();
final HashMap<HashMapProblems, String> map = new HashMap<>();
map.put(start, "a");
start.setValue(1);
map.put(start, "b");
start.setValue(2);
map.put(start, "c");
return map;
}
private static HashMap<HashMapProblems, String> secondProblem(
final HashMapProblems... keys) {
final HashMap<HashMapProblems, String> map = new HashMap<>();
IntStream.range(0, keys.length).forEach(
index -> map.put(keys[index], "" + (char) ('a' + index)));
return map;
}
private static HashMap<HashMapProblems, String> secondProblemFixed(
final HashMapProblems... keys) {
final HashMap<HashMapProblems, String> map = new HashMap<>();
IntStream.range(0, keys.length)
.forEach(index -> map.put((HashMapProblems) keys[index].clone(),
"" + (char) ('a' + index)));
return map;
}
}
Some Notes:
In the above it should be noted that map1 only holds two values because of the way I set up the hashCode() function to split odds and evens. k = 0 and k = 2 therefore have the same hashCode of 0. So when I modify k = 2 and attempt to k -> "c" the mapping k -> "a" gets overwritten--k -> "b" is still there because it exists in a different bucket.
Also there are a lot of different ways to examine the maps in the above code and I would encourage people that are curious to do things like print out the values of the map and then the key to value mappings (you may be surprised by the results you get). Do things like play with changing the different "unique" keys (i.e. k0, k1, and k2), try changing the single key k. You could also see how even the secondProblemFixed isn't actually fixed because you could also gain access to the keys (for instance via Map::keySet) and modify them.
I won't repeat what others have said. Yes, it's inadvisable. But in my opinion, it's not overly obvious where the documentation states this.
You can find it on the JavaDoc for the Map interface:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map
Behaviour of a Map is not specified if value of an object is changed in a manner that affects equals comparision while object(Mutable) is a key. Even for Set also using mutable object as key is not a good idea.
Lets see a example here :
public class MapKeyShouldntBeMutable {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
Map<Employee,Integer> map=new HashMap<Employee,Integer>();
Employee e=new Employee();
Employee e1=new Employee();
Employee e2=new Employee();
Employee e3=new Employee();
Employee e4=new Employee();
e.setName("one");
e1.setName("one");
e2.setName("three");
e3.setName("four");
e4.setName("five");
map.put(e, 24);
map.put(e1, 25);
map.put(e2, 26);
map.put(e3, 27);
map.put(e4, 28);
e2.setName("one");
System.out.println(" is e equals e1 "+e.equals(e1));
System.out.println(map);
for(Employee s:map.keySet())
{
System.out.println("key : "+s.getName()+":value : "+map.get(s));
}
}
}
class Employee{
String name;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
#Override
public boolean equals(Object o){
Employee e=(Employee)o;
if(this.name.equalsIgnoreCase(e.getName()))
{
return true;
}
return false;
}
public int hashCode() {
int sum=0;
if(this.name!=null)
{
for(int i=0;i<this.name.toCharArray().length;i++)
{
sum=sum+(int)this.name.toCharArray()[i];
}
/*System.out.println("name :"+this.name+" code : "+sum);*/
}
return sum;
}
}
Here we are trying to add mutable object "Employee" to a map. It will work good if all keys added are distinct.Here I have overridden equals and hashcode for employee class.
See first I have added "e" and then "e1". For both of them equals() will be true and hashcode will be same. So map sees as if the same key is getting added so it should replace the old value with e1's value. Then we have added e2,e3,e4 we are fine as of now.
But when we are changing the value of an already added key i.e "e2" as one ,it becomes a key similar to one added earlier. Now the map will behave wired. Ideally e2 should replace the existing same key i.e e1.But now map takes this as well. And you will get this in o/p :
is e equals e1 true
{Employee#1aa=28, Employee#1bc=27, Employee#142=25, Employee#142=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : one:value : 25
See here both keys having one showing same value also. So its unexpected.Now run the same programme again by changing e2.setName("diffnt"); which is e2.setName("one"); here ...Now the o/p will be this :
is e equals e1 true
{Employee#1aa=28, Employee#1bc=27, Employee#142=25, Employee#27b=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : diffnt:value : null
So by adding changing the mutable key in a map is not encouraged.
To make the answer compact:
The root cause is that HashMap calculates an internal hash of the user's key object hashcode only once and stores it inside for own needs.
All other operations for data navigation inside the map are doing by this pre-calculated internal hash.
So if you change the hashcode of the key object (mutate) it will be still stored nicely inside the map with the changed key object's hashcode (you could even observe it via HashMap.keySet() and see the altered hashcode).
But HashMap internal hash will not be recalculated of course and it will be the old stored one and the map won't be able to locate your data by the provided mutated key object new hashcode. (e.g. by HashMap.get() or HashMap.containsKey()).
Your key-value pairs will be still inside the map but to get it back you will need that old hash code value that was given when you put your data into the map.
Notice that you also will be unable to get data back by the mutated key object taken right from the HashMap.keySet().
I have a class, Employee, let's say, and my hashCode function for this class is really bad (let's say it always return a constant). My code looks like the following.
public class Employee {
private String name;
public Employee(String name) {
this.name = name;
}
#Override
public int hashCode() { return 1; }
#Override
public boolean equals(Object object) {
if(null == object || !(object instanceof Employee)) {
return false;
}
Employee other = (Employee)object;
return this.name.equals(other.name);
}
}
Let's say I want to use Employee as the key in a Map, and so I can do something like the following.
public static void main(String[] args) {
Map<Employee, Long> map = new HashMap<>();
for(int i=0; i < 1000; i++) {
map.put(new Employee("john"+i, 1L));
}
System.out.println(map.size());
}
How come when I run this code, I always get 1,000 as the size?
Using Employee as a key seems to be "good" in the following sense.
It is immutable
Two employees that are equals always generate the same hash code
What I expected was that since the output of hashCode is always 1, then map.size() should always be 1. But it is not. Why? If I have a Map<Integer,Integer>, and I do map.put(1, 1) followed by map.put(1, 2), I would only expect the size to be 1.
The equals method must somehow be coming into play here, but I'm not sure how.
Any pointers are appreciated.
Your loop
for(int i=0; i < 1000; i++) {
map.put(new Employee("john"+System.currentTimeMillis(), 1L));
}
executes within a couple of milliseconds, so System.currentTimeMillis() will be returning the same value for the vast majority of the iterations of your loop. So, several hundred of your johns will have the exact same name + number.
Then, we have java's retarded Map which does not have an add() method, (which one would reasonably expect to throw an exception if the item already exists,) but instead it only has a put() method which will either add or replace items, without failing. So, most of your johns get overwritten by subsequent johns, without any increase in the map size, and without any exception being thrown to give you a hint about what you are doing wrong.
Furthermore, you seem to be a bit confused as to exactly what the effect of a bad hashCode() function is on a map. A bad hashCode() simply results in collisions. Collisions in a hashmap do not cause items to be lost; they only cause the internal structure of the map to not be very efficient. Essentially, a constant hashCode() will result in a degenerate map which internally looks like a linked list. It will be inefficient both for insertions and for deletions, but no items will be lost due to that.
Items will be lost due to a bad equals() method, or due to overwriting them with newer items. (Which is the case in your code.)
Mike's answer is right about what is causing this. But the real reason that it's happening is this:
In the put method of HashMap it first checks the hashcode for each entry. If the hash code is equal to the hashcode of your new key then it checks for .equals(). If equals() returns true it just replaces the existing object with the new one otherwise adds a new key value pair. That's where it gets busted. Because somethings your equals() function will return true because of the currentMilliSeconds and sometimes it won't hence different sizes every time.
Just pay attention to the equals in the code below (java HashMap).
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
If your hashcode is the same for every entry then your time complexity will be O(n) because the hashcode creates buckets to store your elements. If you only create a single bucket then you have to traverse the entire bucket to get your element.
If however, your hashcode is unique for every element then you will have a unique bucket and will only have to traverse a single element.
Bucket lookups (Hash) are O(1) so the better the hashcode the better the time complexity.
I think you have a misconecption what HashBuckets in a HashMap are for.
When you put two Objectswhich are not equal but have the same hashCode in a HashMap, both elements will be present in the Hashmap in the same HashBucket. An element is only overwritten when an element exists in the HashMapwhich has the same hashCode and is equals to an existing element.
The HashBuckets make the HashMap fast at lookup, because when searching for an element, only elements in the HahsBucket corresponding to the hashCode need to be considered. This is why it is generally a bad idea to wirte a HashFunction which is constant.
Your hashcode has to comply with certain requirements e.g equal objects should return equal hashcode.
But your implementation is not solid then it will give performance problem, if many of your objects have the same hashcode your look up some simply become O(N) instead of O(1). In your case it simply putting all items in a List. So the size is 1000.
Apologies for the fairly naive question, but I believe my own answer to be naive. I think keys (in HashTables) are immutable because we wouldn't want to somehow accidentally alter a key and therefore mess with the sorting of the HashTable. Is this a correct explanation? If so, how can it be more correct?
During the HashTable.put the key is hashed and it an its value are stored in one of a number of buckets (which are lists of key value pairs) based on the hash, for example something like:
bucket[key.hashcode() % numberOfBuckets].add(key, value)
If the key's hashcode changes after insertion it could then be in the wrong bucket and you would then not be able to find it and the hashtable would incorrectly return null on any get for that key.
Aside: Understanding the inner workings of a hashtable helps you understand the importance of a good quality hashcode function for your keys. As a poor hashcode function could result in a poor distribution of keys in buckets. And as buckets are just lists, this results in a lot of linear searches which greatly reduces the effectiveness of the hashtable. e.g. this terrible hashcode function puts everything in one bucket, so it's effectively just one list.
public int hashcode { return 42; /*terrible hashcode example, don't use!*/ }
This is also one reason why prime numbers appear in good hashcode functions, e.g.:
public int hashcode {
int hash = field1.hashcode();
hash = hash*31 + field2.hashcode(); //note the prime 31
hash = hash*31 + field3.hashcode();
return hash;
}
The general idea is correct, but not its details.
Keys in a HashTable need not be immutable, it's the result of a call to their hashCode() (and equals) method that needs to stay immutable and consistent (for the hash table to behave predictably, that is).
From a high-level point of view, it's because of the way hash tables work : when a (key, value) pair is inserted, the key's hashCode is used internally to figure out a "bucket" where the value will be put. And when the value is retrieved by key, the hashCode is computed once more, to find the bucket back.
Now if at any point in time between the insertion and the retreival, the result of calling hashCode changes, the "lookup bucket" will be different than the "insertion" bucket, and things will not behave predictably.
To sum up, given a Key object that looks like this (two internal Strings compose the objet, but only one, partOfHashCode is taken into account in hashCode / equals) :
public static class Key {
private String partOfHashCode;
private String notPartOfHashCode;
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((partOfHashCode == null) ? 0 : partOfHashCode.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Key other = (Key) obj;
if (partOfHashCode == null) {
if (other.partOfHashCode != null)
return false;
} else if (!partOfHashCode.equals(other.partOfHashCode))
return false;
return true;
}
}
The it is fine to use it this way :
public static void main(String[] args) {
Map<Key, String> myMap = new HashMap<>();
Key key = new Key();
key.partOfHashCode = "myHash";
myMap.put(key, "value");
key.notPartOfHashCode = "mutation of the key, but not of its hash/equals definition";
System.out.println(myMap.get(key));
}
(This logs the "value" object in the console).
But it is NOT fine to use it this way
public static void main(String[] args) {
Map<Key, String> myMap = new HashMap<>();
Key key = new Key();
key.partOfHashCode = "myHash";
myMap.put(key, "value");
key.partOfHashCode = "mutation of the hashCode of the key";
System.out.println(myMap.get(key));
}
(This last example could log "null" in the console).
For more on this subject, you should read also on hashCode / equals consistency.
There is no inherent guarantee in Java that HashTable-keys are immutable. It is not even guaranteed that their hashcode remains the same. But if you add keys that have a mutable hashCode you are in trouble. Assume you are inserting a key with a hashCode of 1. Then it is inserted in a hash-bucket corresponding to 1. Then alter the object to have a hashCode of 2 and call hashMap.get(key). While the object is still in the hashTable the system will look in the bucket corresponding to 2, but won't find it there. You won't even be able to remove the entry since it won't be found.
tl;dr For your application to work properly HashTable-keys need to have immutable hashcodes, but you have to take care of that fact for yourself.
Code:
public static void main(String[] args) {
Map<String,String> map= new HashMap<String,String>();
map.put("a", "s");
map.put("a", "v");
System.out.println(map.get("a"));
}
Now, as per my understanding, since the key values in both the put case is the same i.e. a, collision is bound to happen, and hence chaining occurs. [Correct me if I am wrong].
Now if I want to retrieve the list of all the values mapped to key value a, how do i get it?
Right now my println prints v only.
This has nothing to do with collision or chaining: you're replacing the old value of a with a new value.
A map keeps unique keys. collision/chaining will occur in a hash data structure when two distinct keys happen to get the same hash value based on the particular hash function. Or in java, you can explicitly create an object that returns the same value for hashCode().
If you want mapping with multiple values for a key, then you'll need to use a different data structure/class.
Like other people already suggested, there is no such thing as Collision for your case.
It's simply because Hashmap only accepts an unique key.
However you can have an alternative if you want the key to be not unique, for example Google Guava Multimap or Apache Multimap
Example using Google lib:
public class MutliMapTest {
public static void main(String... args) {
Multimap<String, String> myMultimap = ArrayListMultimap.create();
// Adding some key/value
myMultimap.put("Fruits", "Bannana");
myMultimap.put("Fruits", "Apple");
myMultimap.put("Fruits", "Pear");
myMultimap.put("Vegetables", "Carrot");
// Getting the size
int size = myMultimap.size();
System.out.println(size); // 4
// Getting values
Collection<string> fruits = myMultimap.get("Fruits");
System.out.println(fruits); // [Bannana, Apple, Pear]
Collection<string> vegetables = myMultimap.get("Vegetables");
System.out.println(vegetables); // [Carrot]
// Iterating over entire Mutlimap
for(String value : myMultimap.values()) {
System.out.println(value);
}
// Removing a single value
myMultimap.remove("Fruits","Pear");
System.out.println(myMultimap.get("Fruits")); // [Bannana, Pear]
// Remove all values for a key
myMultimap.removeAll("Fruits");
System.out.println(myMultimap.get("Fruits")); // [] (Empty Collection!)
}
}
See the java doc for put
Associates the specified value with the specified key in this map (optional operation). If the map previously contained a mapping for the key, the old value is replaced by the specified value. (A map m is said to contain a mapping for a key k if and only if m.containsKey(k) would return true.)
The collision happens when two different keys comes up with the same hashcode and not when two same keys.
class StringKey {
String text;
public StringKey() {
text = "";
}
public StringKey(String text) {
this.text = text;
}
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
#Override
public int hashCode() {
if (text != null) {
text.substring(0, 1).hashCode();
}
return 0;
}
#Override
public boolean equals(Object o) {
if (o instanceof StringKey) {
return ((StringKey) o).getText().equals(this.getText());
}
return false;
}
public static void main(String[] args) {
Map<StringKey, String> map = new HashMap<StringKey, String>();
StringKey key1 = new StringKey("a");
StringKey key2 = new StringKey("b");
map.put(key1, "s");
map.put(key2, "v");
System.out.println(map.get(key1));
System.out.println(key1.hashCode() + " " + key2.hashCode() + " " + key1.equals(key2));
}
}
The output is
s
0 0 false
now this will cause a collision; but you can not interpret this from the output of map keys and values.
The second put() simply overwrites what the first put() wrote. There is no chaining.
Second put replaces first put, so you will have only one value with key "a" in Hashmap.
So your map just contains
map.put("a", "v");
Now,as per my understanding, since the key values in both the put case
is the same i.e. a, collision is bound to happen, and hence chaining
occurs. [Correct me if i am wrong].
You're wrong. Thats not how a Map works. Consider using a MultiMap from Google's Guava library.
You can always roll your own:
Map<String, ArrayList<String>>();
You will have to make your HashMap as follows
public static void main(String[] args) {
HashMap<String, ArrayList<String>> map = new HashMap<String, ArrayList<String>>();
if ( map.get("a") == null ){
map.put("a", new ArrayList<String>());
}
ArrayList<String> innerList = map.get("a");
innerList.add("s");
innerList.add("v");
map.put("a",innerList);
System.out.println(map.get("a"));
}
Hashing algorithm used in HashMaps are pretty vague in the first go. Internally a HashMap is nothing but an array with indices. The index here is usually referred to as 'hashValue'. As the hashValue is the index of an element in the array, it has to be less than the size of the HashMap itself.The HashMap's hashing algorithm converts the key's hashcode into the hashValue. This is where the Map stores the Entry (key-value pair).
When an element is put into a Map, it generates the hashValue from the element key's hashcode, and stores the Entry into the array at this index, which is nothing but the hashValue.
Now, hashing algorithm can be efficient to a certain extent only, that is we can never assure that the hashValue generated for two different keys are always different. It could be same under two conditions:
1) The keys are same (as in your case)
2) The Keys are different, but the hashValue generated for both the keys are same.
We simply cannot replace the value of the Entry at the hashValue position in the array, as this will violate the second condition, which is very valid. This is where the equals() comes into picture. Now, the HashMap checks for the equality between the new key and the key that exists in that index's Entry. If both the keys are same it means replacement, else it's collision and the HashMap uses the appropriate collision technique.
Now, if you want the list of all the values that you put for a particular key, consider using a composite map
HashMap<String, List<String>>.
Both the keys you tried to put in the HashMap has the same HashCode. Thus the first value gets overwritten an you will end up having only one value in the HashMap.
You can put Two similar objects in the same HashMap by overriding thier hashCode() Method.
Further notes on when Chaining actually takes place when a HashMap is used:
The Java implementation for HashMap will either override a key or chain an object to it depending on the following:
You put an object foo as key, with hash code X into the map
You put another object bar (as key..) that has the same hash code X
into the map
Since the hashes are the same, the algorithm would need to put the
object bar on the same index where foo is already stored. It would then consult the equals method of foo, to determine whether it should chain bar to foo (i.e foo.next() will become bar) or override foo with bar:
3.1.If equals returns true, foo & bar are either the same object, or they are semantically the same, and overriding will take place rather than chaining.
3.2. If equals returns false, foo & bar are treated as two distinct entities and chaining will take place. If you then print your HashMap, you'll be seeing both foo and bar.
I have a Hashmap which has X number of elements
I need to move this map into another map
This is what my code looks like
Map originMap = initialize();
Map destMap = new Hashmap ();
int originMapSize = originMap.size();
Set<Map.Entry<K, V>> entries = originMap.entrySet();
for (Map.Entry<K, Y> mapEntry : entries) {
K key = mapEntry.getKey();
V value = mapEntry.getValue();
destMap.put (key,value);
}
// Shouldnt this be equal to originMapSize ????
int destMapSize = destMap.size();
What I am observing is - originMapSize is NOT equal to the destMapSize
It seems when we put the elements in the destMap, some of the elements are being overridden
We have overrridden the hashCode and equals method- and it is a suspicious implementation.
However, if the originMap allowed the elements to be added, why would the destinationMap not add a new elements and override an existing element instead ?
This could happen if the equals method was asymmetric. Suppose there are two keys a and b such that:
a.hashCode() == b.hashCode()
a.equals(b) returns false
b.equals(a) returns true
Then suppose that the HashMap implementation searches for an existing key by calling existingKey.equals(newKey) for each existing key with the same hash code as the new key.
Now suppose we originally add them in the order { a, b }.
The first key (a) obviously goes in with no problems. The second key (b) insertion ends up calling a.equals(b) - which is false, so we get two keys.
Now building the second HashMap, we may end up getting the entries in the order { b, a }.
This time we add b first, which is fine... but when we insert the second key (a) we end up calling b.equals(a), which returns true, so we overwrite the entry.
That may not be what's going on, but it could explain things - and shows the dangers of an asymmetric equals method.
EDIT: Here's a short but complete program demonstrating this situation. (The exact details of a and b may not be the same, but the asymmetry is.)
import java.util.*;
public class Test {
private final String name;
public Test(String name)
{
this.name = name;
}
public static void main(String[] args)
{
Map<Test, String> firstMap = new HashMap<Test, String>();
Test a = new Test("a");
Test b = new Test("b");
firstMap.put(b, "b");
firstMap.put(a, "a");
Map<Test, String> secondMap = new HashMap<Test, String>();
for (Map.Entry<Test, String> entry : firstMap.entrySet())
{
System.out.println("Adding " + entry.getKey().name);
secondMap.put(entry.getKey(), entry.getValue());
}
System.out.println(secondMap.size());
}
#Override public int hashCode()
{
return 0;
}
#Override public boolean equals(Object other)
{
return this.name.equals("b");
}
}
Output on my machine:
Adding a
Adding b
1
You may not get the output that way round - it depends on:
The way that equals is called (candidateKey.equals(newKey) or vice versa)
The order in which entries are returned from the set
It may even work differently on different runs.
Those values should be equal, but the problem is you are iterating over a different Map object.
for (Map.Entry mapEntry : entries)
is not the same as
for (Map.Entry mapEntry : originMap)
I suspect the order of the elements being added to the first hashmap is not the same as the order added to the second. This combined with the sketchy hashCode method is causing duplicates to be added to the first.
Try changing hashCode to always return the same value to see if your problem goes away.
Why don't you use destMap.putAll(originMap) ?
Map has a putAll method. Try something like this:
Map<String, String> destination = new HashMap<String, String>();
Map<String, String> original = new HashMap<String, String>();
destination.putAll(original);
It depends of how the first HashMap is initialized. Also everytime you add an object into the HashMap , once it passes 75% load factor, it allocates twice the default size to accomodate new values. Maps usually have default size = 16: when you pass the 75% load factor it enlarges to 32.