This question was asked to me in a job interview and I still don't know answer so ask here. Lets say hashCode() of key object returns a fixed integer so HashMap would look like a LinkedList.
How would a duplicate element be found and replaced by new value in map?
e.g. if following 1001 puts are performed in order listed below,
put(1000,1000), put(1,1), put( 2, 2), put ( 3,3 ) ....put(999,999), put(1000,1000 )
Would map be traversed all the way to end and then new one be inserted at head when last put(1000,1000) is performed?
OR
Map has some other way to locate and replace duplicate keys?
First case is correct.
In your case when hashCode() is returning same hash value for all the keys. In the java HashMap, Key and Value both are stored in the bucket as Map.Entry object. When perform the second or further put() operations into the map, it will traverse all the element to check whether Key is already present in the Map. If Key is not found then new Key and Value pair will be added into the linked list. If Key is found in the list then it update the Value for the pair.
Details explanation about java HashMap working: How HashMap works in Java
Take this sample code and run in the debug mode and observe how the new Key and Value pair are inserted into the Map.
In the class you will need to hashCode() (we want to control how the hash codes are generated for Node), toString() (just to output the Node value in SOUT) and equals() (defines the equality of the keys based on the value of Node member variable Integer, for updating the values.) methods for getting it working.
public class HashMapTest {
static class Node {
Integer n;
public Node(int n) {
this.n = n;
}
#Override
public int hashCode() {
return n%3;
}
#Override
public boolean equals(Object object) {
Node node = (Node)object;
return this.n.equals(node.n);
}
#Override
public String toString() {
return n.toString();
}
}
public static void main(String[] args) {
Map<Node, String> map = new HashMap<>();
for (int i = 0; i<6; i++) {
map.put(new Node(i), ""+i); // <-- Debug Point
}
map.put(new Node(0), "xxx");
} // <-- Debug Point
}
First 3 entries in the map: (hash code is n%3)
Three more values: (hash code is n%3)
Now don't confused about the ordering of the node, I have executed them on java 1.8 and HashMap uses TreeNode, an implementation of Red-Black tree as per the code documentation. This can be different in different versions of the java.
Now lets update the Value of Key 0:
When the hash code is the same, the hash map compares objects using the equals method.
For example, let's say you put a bunch of elements in the hash map:
put(1000,1000), put(1,1), put( 2, 2), put ( 3,3 ) ....put(999,999)
And then you do this:
put(1000,1000 )
1000 is already in the map, the hash code is the same, it is also the same in terms of the equals method, so the element will be replaced, no need to iterate further.
Now if you do:
put(1234, 1234)
1234 is not yet in the map. All the elements are in a linked list, due to the fixed hash code. The hash map will iterate over the elements, comparing them using equals. It will be false for all elements, the end of the list will be reached, and the entry will be appended.
JDK implementations changes over time !
In JDK8, if the hashCode() is a constant value, the implementation creates a tree not a linked list in order to protect against DDOS attack 1.
Related
Is it bad practice to use mutable objects as Hashmap keys? What happens when you try to retrieve a value from a Hashmap using a key that has been modified enough to change its hashcode?
For example, given
class Key
{
int a; //mutable field
int b; //mutable field
public int hashcode()
return foo(a, b);
// setters setA and setB omitted for brevity
}
with code
HashMap<Key, Value> map = new HashMap<Key, Value>();
Key key1 = new Key(0, 0);
map.put(key1, value1); // value1 is an instance of Value
key1.setA(5);
key1.setB(10);
What happens if we now call map.get(key1)? Is this safe or advisable? Or is the behavior dependent on the language?
It has been noted by many well respected developers such as Brian Goetz and Josh Bloch that :
If an object’s hashCode() value can change based on its state, then we
must be careful when using such objects as keys in hash-based
collections to ensure that we don’t allow their state to change when
they are being used as hash keys. All hash-based collections assume
that an object’s hash value does not change while it is in use as a
key in the collection. If a key’s hash code were to change while it
was in a collection, some unpredictable and confusing consequences
could follow. This is usually not a problem in practice — it is not
common practice to use a mutable object like a List as a key in a
HashMap.
This is not safe or advisable. The value mapped to by key1 can never be retrieved. When doing a retrieval, most hash maps will do something like
Object get(Object key) {
int hash = key.hashCode();
//simplified, ignores hash collisions,
Entry entry = getEntry(hash);
if(entry != null && entry.getKey().equals(key)) {
return entry.getValue();
}
return null;
}
In this example, key1.hashcode() now points to the wrong bucket of the hash table, and you will not be able to retrieve value1 with key1.
If you had done something like,
Key key1 = new Key(0, 0);
map.put(key1, value1);
key1.setA(5);
Key key2 = new Key(0, 0);
map.get(key2);
This will also not retrieve value1, as key1 and key2 are no longer equal, so this check
if(entry != null && entry.getKey().equals(key))
will fail.
Hash maps use hash code and equality comparisons to identify a certain key-value pair with a given key. If the has map keeps the key as a reference to the mutable object, it would work in the cases where the same instance is used to retrieve the value. Consider however, the following case:
T keyOne = ...;
T keyTwo = ...;
// At this point keyOne and keyTwo are different instances and
// keyOne.equals(keyTwo) is true.
HashMap myMap = new HashMap();
myMap.push(keyOne, "Hello");
String s1 = (String) myMap.get(keyOne); // s1 is "Hello"
String s2 = (String) myMap.get(keyTwo); // s2 is "Hello"
// because keyOne equals keyTwo
mutate(keyOne);
s1 = myMap.get(keyOne); // returns "Hello"
s2 = myMap.get(keyTwo); // not found
The above is true if the key is stored as a reference. In Java usually this is the case. In .NET for instance, if the key is a value type (always passed by value), the result will be different:
T keyOne = ...;
T keyTwo = ...;
// At this point keyOne and keyTwo are different instances
// and keyOne.equals(keyTwo) is true.
Dictionary myMap = new Dictionary();
myMap.Add(keyOne, "Hello");
String s1 = (String) myMap[keyOne]; // s1 is "Hello"
String s2 = (String) myMap[keyTwo]; // s2 is "Hello"
// because keyOne equals keyTwo
mutate(keyOne);
s1 = myMap[keyOne]; // not found
s2 = myMap[keyTwo]; // returns "Hello"
Other technologies might have other different behaviors. However, almost all of them would come to a situation where the result of using mutable keys is not deterministic, which is very very bad situation in an application - a hard to debug and even harder to understand.
If key’s hash code changes after the key-value pair (Entry) is stored in HashMap, the map will not be able to retrieve the Entry.
Key’s hashcode can change if the key object is mutable. Mutable keys in HahsMap can result in data loss.
This will not work. You are changing the key value, so you are basically throwing it away. Its like creating a real life key and lock, and then changing the key and trying to put it back in the lock.
As others explained, it is dangerous.
A way to avoid that is to have a const field giving explicitly the hash in your mutable objects (so you would hash on their "identity", not their "state"). You might even initialize that hash field more or less randomly.
Another trick would be to use the address, e.g. (intptr_t) reinterpret_cast<void*>(this) as a basis for hash.
In all cases, you have to give up hashing the changing state of the object.
There are two very different issues that can arise with a mutable key depending on your expectation of behavior.
First Problem: (probably most trivial--but hell it gave me problems that I didn't think about!)
You are attempting to place key-value pairs into a map by updating and modifying the same key object. You might do something like Map<Integer, String> and simply say:
int key = 0;
loop {
map.put(key++, newString);
}
I'm reusing the "object" key to create a map. This works fine in Java because of autoboxing where each new value of key gets autoboxed to a new Integer object. What would not work is if I created my own (mutable) Integer object:
MyInteger {
int value;
plusOne(){
value++;
}
}
Then tried the same approach:
MyInteger key = new MyInteger(0);
loop{
map.put(key.plusOne(), newString)
}
My expectation is that, for instance, I map 0 -> "a" and 1 -> "b". In the first example, if I change int key = 0, the map will (correctly) give me "a". For simplicity let's assume MyInteger just always returns the same hashCode() (if you can somehow manage to create unique hashCode values for all possible states of an object, this will not be an issue, and you deserve an award). In this case, I call 0 -> "a", so now the map holds my key and maps it to "a", I then modify key = 1 and try to put 1 -> "b". We have a problem! The hashCode() is the same, and the only key in the HashMap is my MyInteger key object which has just been modified to be equal to 1, so It overwrites that key's value so that now, instead of a map with 0 -> "a" and 1 -> "b", I have 1 -> "b" only! Even worse, if I change back to key = 0, the hashCode points to 1 -> "b", but since the HashMap's only key is my key object, it satisfied the equality check and returns "b", not "a" as expected.
If, like me, you fall prey to this type of issue, it's incredibly difficult to diagnose. Why? Because if you have a decent hashCode() function it will generate (mostly) unique values. The hash value will largely take care of the inequality problem when structuring the map but if you have enough values, eventually you'll get a collision on the hash value and then you get unexpected and largely inexplicable results. The resultant behavior is that it works for small runs but fails for larger ones.
Advice:
To find this type of issue, modify the hashCode() method, even trivially (i.e. = 0--obviously when doing this, keep in mind that the hash values should be the same for two equal objects*), and see if you get the same results--because you should and if you don't, there's likely a semantic error with your implementation that's using a hash table.
*There should be no danger (if there is--you have a semantic problem) in always returning 0 from a hashCode() (although it would defeat the purpose of a Hash Table). But that's sort of the point: the hashCode is a "quick and easy" equality measure that's not exact. So two very different objects could have the same hashCode() yet not be equal. On the other hand, two equal objects must always have the same hashCode() value.
p.s. In Java, from my understanding, if you do such a terrible thing (as have many hashCode() collisions), it will start using a red-black-tree as opposed to ArrayList. So when you expect O(1) lookup, you'll get O(log(n))--which is better than the ArrayList which would give O(n).
Second Problem:
This is the one that most others seem to be focusing on, so I'll try to be brief. In this use case, I try to map a key-value pair and then I do some work on the key and then want to come back and get my value.
Expectation: key -> value is mapped, I then modify key and try to get(key). I expect that will give me value.
It seems kind of obvious to me that this wouldn't work but I'm not above having tried to use things like Collections as a key before (and quite quickly realizing it doesn't work). It doesn't work because it's quite likely that the hash value of key has changed so you won't even be looking in the correct bucket.
This is why it's very inadvisable to use collections as keys. I would assume, if you were doing this, you're trying to establish a many-to-one relationship. So I have a class (as in teaching) and I want two groups to do two different projects. What I want is that given a group, what is their project? Simple, I divide the class in two, and I have group1 -> project1 and group2 -> project2. But wait! A new student arrives so I place them in group1. The problem is that group1 has now been modified and likely its hash value has changed, therefore trying to do get(group1) is likely to fail because it will look in a wrong or non-existent bucket of the HashMap.
The obvious solution to the above is to chain things--instead of using the groups as keys, give them labels (that don't change) that point to the group and therefore the project: g1 -> group1 and g1 -> project1, etc.
p.s.
Please make sure to define a hashCode() and equals(...) method for any object you expect to use as a key (eclipse and, I'm assuming, most IDE's can do this for you).
Code Example:
Here is a class which exhibits the two different "problem" behaviors. In this case, I attempt to map 0 -> "a", 1 -> "b", and 2 -> "c" (in each case). In the first problem, I do that by modifying the same object, in the second problem, I use unique objects, and in the second problem "fixed" I clone those unique objects. After that I take one of the "unique" keys (k0) and modify it to attempt to access the map. I expect this will give me a, b, c and null when the key is 3.
However, what happens is the following:
map.get(0) map1: 0 -> null, map2: 0 -> a, map3: 0 -> a
map.get(1) map1: 1 -> null, map2: 1 -> b, map3: 1 -> b
map.get(2) map1: 2 -> c, map2: 2 -> a, map3: 2 -> c
map.get(3) map1: 3 -> null, map2: 3 -> null, map3: 3 -> null
The first map ("first problem") fails because it only holds a single key, which was last updated and placed to equal 2, hence why it correctly returns "c" when k0 = 2 but returns null for the other two (the single key doesn't equal 0 or 1). The second map fails twice: the most obvious is that it returns "b" when I asked for k0 (because it's been modified--that's the "second problem" which seems kind of obvious when you do something like this). It fails a second time when it returns "a" after modifying k0 = 2 (which I would expect to be "c"). This is more due to the "first problem": there's a hash code collision and the tiebreaker is an equality check--but the map holds k0, which it (apparently for me--could theoretically be different for someone else) checked first and thus returned the first value, "a" even though had it kept checking, "c" would have also been a match. Finally, the 3rd map works perfectly because I'm enforcing that the map holds unique keys no matter what else I do (by cloning the object during insertion).
I want to make clear that I agree, cloning is not a solution! I simply added that as an example of why a map needs unique keys and how enforcing unique keys "fixes" the issue.
public class HashMapProblems {
private int value = 0;
public HashMapProblems() {
this(0);
}
public HashMapProblems(final int value) {
super();
this.value = value;
}
public void setValue(final int i) {
this.value = i;
}
#Override
public int hashCode() {
return value % 2;
}
#Override
public boolean equals(final Object o) {
return o instanceof HashMapProblems
&& value == ((HashMapProblems) o).value;
}
#Override
public Object clone() {
return new HashMapProblems(value);
}
public void reset() {
this.value = 0;
}
public static void main(String[] args) {
final HashMapProblems k0 = new HashMapProblems(0);
final HashMapProblems k1 = new HashMapProblems(1);
final HashMapProblems k2 = new HashMapProblems(2);
final HashMapProblems k = new HashMapProblems();
final HashMap<HashMapProblems, String> map1 = firstProblem(k);
final HashMap<HashMapProblems, String> map2 = secondProblem(k0, k1, k2);
final HashMap<HashMapProblems, String> map3 = secondProblemFixed(k0, k1, k2);
for (int i = 0; i < 4; ++i) {
k0.setValue(i);
System.out.printf(
"map.get(%d) map1: %d -> %s, map2: %d -> %s, map3: %d -> %s",
i, i, map1.get(k0), i, map2.get(k0), i, map3.get(k0));
System.out.println();
}
}
private static HashMap<HashMapProblems, String> firstProblem(
final HashMapProblems start) {
start.reset();
final HashMap<HashMapProblems, String> map = new HashMap<>();
map.put(start, "a");
start.setValue(1);
map.put(start, "b");
start.setValue(2);
map.put(start, "c");
return map;
}
private static HashMap<HashMapProblems, String> secondProblem(
final HashMapProblems... keys) {
final HashMap<HashMapProblems, String> map = new HashMap<>();
IntStream.range(0, keys.length).forEach(
index -> map.put(keys[index], "" + (char) ('a' + index)));
return map;
}
private static HashMap<HashMapProblems, String> secondProblemFixed(
final HashMapProblems... keys) {
final HashMap<HashMapProblems, String> map = new HashMap<>();
IntStream.range(0, keys.length)
.forEach(index -> map.put((HashMapProblems) keys[index].clone(),
"" + (char) ('a' + index)));
return map;
}
}
Some Notes:
In the above it should be noted that map1 only holds two values because of the way I set up the hashCode() function to split odds and evens. k = 0 and k = 2 therefore have the same hashCode of 0. So when I modify k = 2 and attempt to k -> "c" the mapping k -> "a" gets overwritten--k -> "b" is still there because it exists in a different bucket.
Also there are a lot of different ways to examine the maps in the above code and I would encourage people that are curious to do things like print out the values of the map and then the key to value mappings (you may be surprised by the results you get). Do things like play with changing the different "unique" keys (i.e. k0, k1, and k2), try changing the single key k. You could also see how even the secondProblemFixed isn't actually fixed because you could also gain access to the keys (for instance via Map::keySet) and modify them.
I won't repeat what others have said. Yes, it's inadvisable. But in my opinion, it's not overly obvious where the documentation states this.
You can find it on the JavaDoc for the Map interface:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map
Behaviour of a Map is not specified if value of an object is changed in a manner that affects equals comparision while object(Mutable) is a key. Even for Set also using mutable object as key is not a good idea.
Lets see a example here :
public class MapKeyShouldntBeMutable {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
Map<Employee,Integer> map=new HashMap<Employee,Integer>();
Employee e=new Employee();
Employee e1=new Employee();
Employee e2=new Employee();
Employee e3=new Employee();
Employee e4=new Employee();
e.setName("one");
e1.setName("one");
e2.setName("three");
e3.setName("four");
e4.setName("five");
map.put(e, 24);
map.put(e1, 25);
map.put(e2, 26);
map.put(e3, 27);
map.put(e4, 28);
e2.setName("one");
System.out.println(" is e equals e1 "+e.equals(e1));
System.out.println(map);
for(Employee s:map.keySet())
{
System.out.println("key : "+s.getName()+":value : "+map.get(s));
}
}
}
class Employee{
String name;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
#Override
public boolean equals(Object o){
Employee e=(Employee)o;
if(this.name.equalsIgnoreCase(e.getName()))
{
return true;
}
return false;
}
public int hashCode() {
int sum=0;
if(this.name!=null)
{
for(int i=0;i<this.name.toCharArray().length;i++)
{
sum=sum+(int)this.name.toCharArray()[i];
}
/*System.out.println("name :"+this.name+" code : "+sum);*/
}
return sum;
}
}
Here we are trying to add mutable object "Employee" to a map. It will work good if all keys added are distinct.Here I have overridden equals and hashcode for employee class.
See first I have added "e" and then "e1". For both of them equals() will be true and hashcode will be same. So map sees as if the same key is getting added so it should replace the old value with e1's value. Then we have added e2,e3,e4 we are fine as of now.
But when we are changing the value of an already added key i.e "e2" as one ,it becomes a key similar to one added earlier. Now the map will behave wired. Ideally e2 should replace the existing same key i.e e1.But now map takes this as well. And you will get this in o/p :
is e equals e1 true
{Employee#1aa=28, Employee#1bc=27, Employee#142=25, Employee#142=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : one:value : 25
See here both keys having one showing same value also. So its unexpected.Now run the same programme again by changing e2.setName("diffnt"); which is e2.setName("one"); here ...Now the o/p will be this :
is e equals e1 true
{Employee#1aa=28, Employee#1bc=27, Employee#142=25, Employee#27b=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : diffnt:value : null
So by adding changing the mutable key in a map is not encouraged.
To make the answer compact:
The root cause is that HashMap calculates an internal hash of the user's key object hashcode only once and stores it inside for own needs.
All other operations for data navigation inside the map are doing by this pre-calculated internal hash.
So if you change the hashcode of the key object (mutate) it will be still stored nicely inside the map with the changed key object's hashcode (you could even observe it via HashMap.keySet() and see the altered hashcode).
But HashMap internal hash will not be recalculated of course and it will be the old stored one and the map won't be able to locate your data by the provided mutated key object new hashcode. (e.g. by HashMap.get() or HashMap.containsKey()).
Your key-value pairs will be still inside the map but to get it back you will need that old hash code value that was given when you put your data into the map.
Notice that you also will be unable to get data back by the mutated key object taken right from the HashMap.keySet().
I have a class, Employee, let's say, and my hashCode function for this class is really bad (let's say it always return a constant). My code looks like the following.
public class Employee {
private String name;
public Employee(String name) {
this.name = name;
}
#Override
public int hashCode() { return 1; }
#Override
public boolean equals(Object object) {
if(null == object || !(object instanceof Employee)) {
return false;
}
Employee other = (Employee)object;
return this.name.equals(other.name);
}
}
Let's say I want to use Employee as the key in a Map, and so I can do something like the following.
public static void main(String[] args) {
Map<Employee, Long> map = new HashMap<>();
for(int i=0; i < 1000; i++) {
map.put(new Employee("john"+i, 1L));
}
System.out.println(map.size());
}
How come when I run this code, I always get 1,000 as the size?
Using Employee as a key seems to be "good" in the following sense.
It is immutable
Two employees that are equals always generate the same hash code
What I expected was that since the output of hashCode is always 1, then map.size() should always be 1. But it is not. Why? If I have a Map<Integer,Integer>, and I do map.put(1, 1) followed by map.put(1, 2), I would only expect the size to be 1.
The equals method must somehow be coming into play here, but I'm not sure how.
Any pointers are appreciated.
Your loop
for(int i=0; i < 1000; i++) {
map.put(new Employee("john"+System.currentTimeMillis(), 1L));
}
executes within a couple of milliseconds, so System.currentTimeMillis() will be returning the same value for the vast majority of the iterations of your loop. So, several hundred of your johns will have the exact same name + number.
Then, we have java's retarded Map which does not have an add() method, (which one would reasonably expect to throw an exception if the item already exists,) but instead it only has a put() method which will either add or replace items, without failing. So, most of your johns get overwritten by subsequent johns, without any increase in the map size, and without any exception being thrown to give you a hint about what you are doing wrong.
Furthermore, you seem to be a bit confused as to exactly what the effect of a bad hashCode() function is on a map. A bad hashCode() simply results in collisions. Collisions in a hashmap do not cause items to be lost; they only cause the internal structure of the map to not be very efficient. Essentially, a constant hashCode() will result in a degenerate map which internally looks like a linked list. It will be inefficient both for insertions and for deletions, but no items will be lost due to that.
Items will be lost due to a bad equals() method, or due to overwriting them with newer items. (Which is the case in your code.)
Mike's answer is right about what is causing this. But the real reason that it's happening is this:
In the put method of HashMap it first checks the hashcode for each entry. If the hash code is equal to the hashcode of your new key then it checks for .equals(). If equals() returns true it just replaces the existing object with the new one otherwise adds a new key value pair. That's where it gets busted. Because somethings your equals() function will return true because of the currentMilliSeconds and sometimes it won't hence different sizes every time.
Just pay attention to the equals in the code below (java HashMap).
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
If your hashcode is the same for every entry then your time complexity will be O(n) because the hashcode creates buckets to store your elements. If you only create a single bucket then you have to traverse the entire bucket to get your element.
If however, your hashcode is unique for every element then you will have a unique bucket and will only have to traverse a single element.
Bucket lookups (Hash) are O(1) so the better the hashcode the better the time complexity.
I think you have a misconecption what HashBuckets in a HashMap are for.
When you put two Objectswhich are not equal but have the same hashCode in a HashMap, both elements will be present in the Hashmap in the same HashBucket. An element is only overwritten when an element exists in the HashMapwhich has the same hashCode and is equals to an existing element.
The HashBuckets make the HashMap fast at lookup, because when searching for an element, only elements in the HahsBucket corresponding to the hashCode need to be considered. This is why it is generally a bad idea to wirte a HashFunction which is constant.
Your hashcode has to comply with certain requirements e.g equal objects should return equal hashcode.
But your implementation is not solid then it will give performance problem, if many of your objects have the same hashcode your look up some simply become O(N) instead of O(1). In your case it simply putting all items in a List. So the size is 1000.
I am trying to understand the implementation HashTables in Java. Below is my code:
Hashtable<Integer, String> hTab = new Hashtable<Integer, String>();
hTab.put(1, "A");
hTab.put(1, "B");
hTab.put(2, "C");
hTab.put(3, "D");
Iterator<Map.Entry<Integer, String>> itr = hTab.entrySet().iterator();
Entry<Integer, String> entry;
while(itr.hasNext()){
entry = itr.next();
System.out.println(entry.getValue());
}
When I run it, I get the below output:
D
C
B
Which means that there has been a collision for the Key = 1; and as per the implementation:
"Whenever a collision happens in the hashTable, a new node is created in the linkedList corresponding for the particular bucket and the EntrySet(Key, Value) pairs are stored as nodes in the list, the new value is inserted in the beginning of the list for the particular bucket". And I completely agree to this implementation.
But if this is true, then where did "A" go when I try to retrieve the entrysets from the hashTable?
Again, I tried with the below code to understand this by implementing my own HashCode and equals method. And surprisingly, this works perfect and as per the HashTable implementation. Below is my code:
public class Hash {
private int key;
public Hash(int key){
this.key = key;
}
public int hashCode(){
return key;
}
public boolean equals(Hash o){
return this.key == o.key;
}
}
public class HashTable1 {
public static void main(String[] args) {
// TODO Auto-generated method stub
Hashtable<Hash, String> hTab = new Hashtable<Hash, String>();
hTab.put(new Hash(1), "A");
hTab.put(new Hash(1), "B");
hTab.put(new Hash(2), "C");
hTab.put(new Hash(3), "D");
Iterator<Map.Entry<Hash, String>> itr = hTab.entrySet().iterator();
Entry<Hash, String> entry;
while(itr.hasNext()){
entry = itr.next();
System.out.println(entry.getValue());
}
}
}
Output :
D
C
B
A
Which is perfect. I am not able to understand this ambiguity in the behavior of HashTable in Java.
Update
#garrytan and #Brian: thanks for responding. But I still have a small doubt.
In my second code, where it works fine. I have created two objects which are new keys and since they are 2 objects, Key collision does not happens in this case and it works fine. I agree with your explanation. However, if in the first set of code I use "new Integer(1)" instead of simply "1", it still doesn't work although now I am creating 2 objects now and they should be different. I cross checked by writing the simple line below:
Integer int1 = new Integer(1);
Integer int2 = new Integer(1);
System.out.println(int1 == int2);
which gives "False". it means now, the Key collision should have been resolved. But still it doesn't work. Why is this?
By design hashtable is not meant to store duplicate keys.
I think you get mixed up between 'hash collision' and 'key collision'. Put it simply, hash table consist of a collection of linked lists (ie: buckets). When you add a new key value pairs (KVPs), it is distributed into the buckets by the key's hash value. 'hash collision' happen when two keys result in the same hash (hence they get put into the same bucket)
A good hash function is one that distributes the key evenly into a number of buckets, hence improving key searching performance.
The second example gives the behaviour you want because your implementation of equals is incorrect.
The signature is
public boolean equals(Object o) {}
not
public boolean equals(Hash h) {}
So what you have created is a hash Collision, where two objects have the same hash code (key), but they are not equal according to the equals method (because your signature is wrong, it's still using the == operator and not your this.key == h.key code). As opposed to a key collision, where the objects both have the same hashCode and are also equals, as in your first example. If you fix the code in the second example to implement the actual equals(Object o) method you will see 'A' will again be missing from the values.
In your second example you are not overriding the original equals function because you use the following signature:
public boolean equals(Hash h) {}
Thus the original equals function with Object as a parameter is still used and as you create a new object Hash for each insert that Object is different from the other one and thus your keys for A and B are not equal.
Furthermore a HashTable is designed to have ONE value for EACH key. And keys are indeed relying on the equals functions to be compared.
About your example with two new Integers, try comparing them with .equals(). You could also override the hashCode function to generate different hashCodes or not for each object, i.e. depending on time, but that would be not a good coding principle. Objects which are the same should hash to the same code.
Code:
public static void main(String[] args) {
Map<String,String> map= new HashMap<String,String>();
map.put("a", "s");
map.put("a", "v");
System.out.println(map.get("a"));
}
Now, as per my understanding, since the key values in both the put case is the same i.e. a, collision is bound to happen, and hence chaining occurs. [Correct me if I am wrong].
Now if I want to retrieve the list of all the values mapped to key value a, how do i get it?
Right now my println prints v only.
This has nothing to do with collision or chaining: you're replacing the old value of a with a new value.
A map keeps unique keys. collision/chaining will occur in a hash data structure when two distinct keys happen to get the same hash value based on the particular hash function. Or in java, you can explicitly create an object that returns the same value for hashCode().
If you want mapping with multiple values for a key, then you'll need to use a different data structure/class.
Like other people already suggested, there is no such thing as Collision for your case.
It's simply because Hashmap only accepts an unique key.
However you can have an alternative if you want the key to be not unique, for example Google Guava Multimap or Apache Multimap
Example using Google lib:
public class MutliMapTest {
public static void main(String... args) {
Multimap<String, String> myMultimap = ArrayListMultimap.create();
// Adding some key/value
myMultimap.put("Fruits", "Bannana");
myMultimap.put("Fruits", "Apple");
myMultimap.put("Fruits", "Pear");
myMultimap.put("Vegetables", "Carrot");
// Getting the size
int size = myMultimap.size();
System.out.println(size); // 4
// Getting values
Collection<string> fruits = myMultimap.get("Fruits");
System.out.println(fruits); // [Bannana, Apple, Pear]
Collection<string> vegetables = myMultimap.get("Vegetables");
System.out.println(vegetables); // [Carrot]
// Iterating over entire Mutlimap
for(String value : myMultimap.values()) {
System.out.println(value);
}
// Removing a single value
myMultimap.remove("Fruits","Pear");
System.out.println(myMultimap.get("Fruits")); // [Bannana, Pear]
// Remove all values for a key
myMultimap.removeAll("Fruits");
System.out.println(myMultimap.get("Fruits")); // [] (Empty Collection!)
}
}
See the java doc for put
Associates the specified value with the specified key in this map (optional operation). If the map previously contained a mapping for the key, the old value is replaced by the specified value. (A map m is said to contain a mapping for a key k if and only if m.containsKey(k) would return true.)
The collision happens when two different keys comes up with the same hashcode and not when two same keys.
class StringKey {
String text;
public StringKey() {
text = "";
}
public StringKey(String text) {
this.text = text;
}
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
#Override
public int hashCode() {
if (text != null) {
text.substring(0, 1).hashCode();
}
return 0;
}
#Override
public boolean equals(Object o) {
if (o instanceof StringKey) {
return ((StringKey) o).getText().equals(this.getText());
}
return false;
}
public static void main(String[] args) {
Map<StringKey, String> map = new HashMap<StringKey, String>();
StringKey key1 = new StringKey("a");
StringKey key2 = new StringKey("b");
map.put(key1, "s");
map.put(key2, "v");
System.out.println(map.get(key1));
System.out.println(key1.hashCode() + " " + key2.hashCode() + " " + key1.equals(key2));
}
}
The output is
s
0 0 false
now this will cause a collision; but you can not interpret this from the output of map keys and values.
The second put() simply overwrites what the first put() wrote. There is no chaining.
Second put replaces first put, so you will have only one value with key "a" in Hashmap.
So your map just contains
map.put("a", "v");
Now,as per my understanding, since the key values in both the put case
is the same i.e. a, collision is bound to happen, and hence chaining
occurs. [Correct me if i am wrong].
You're wrong. Thats not how a Map works. Consider using a MultiMap from Google's Guava library.
You can always roll your own:
Map<String, ArrayList<String>>();
You will have to make your HashMap as follows
public static void main(String[] args) {
HashMap<String, ArrayList<String>> map = new HashMap<String, ArrayList<String>>();
if ( map.get("a") == null ){
map.put("a", new ArrayList<String>());
}
ArrayList<String> innerList = map.get("a");
innerList.add("s");
innerList.add("v");
map.put("a",innerList);
System.out.println(map.get("a"));
}
Hashing algorithm used in HashMaps are pretty vague in the first go. Internally a HashMap is nothing but an array with indices. The index here is usually referred to as 'hashValue'. As the hashValue is the index of an element in the array, it has to be less than the size of the HashMap itself.The HashMap's hashing algorithm converts the key's hashcode into the hashValue. This is where the Map stores the Entry (key-value pair).
When an element is put into a Map, it generates the hashValue from the element key's hashcode, and stores the Entry into the array at this index, which is nothing but the hashValue.
Now, hashing algorithm can be efficient to a certain extent only, that is we can never assure that the hashValue generated for two different keys are always different. It could be same under two conditions:
1) The keys are same (as in your case)
2) The Keys are different, but the hashValue generated for both the keys are same.
We simply cannot replace the value of the Entry at the hashValue position in the array, as this will violate the second condition, which is very valid. This is where the equals() comes into picture. Now, the HashMap checks for the equality between the new key and the key that exists in that index's Entry. If both the keys are same it means replacement, else it's collision and the HashMap uses the appropriate collision technique.
Now, if you want the list of all the values that you put for a particular key, consider using a composite map
HashMap<String, List<String>>.
Both the keys you tried to put in the HashMap has the same HashCode. Thus the first value gets overwritten an you will end up having only one value in the HashMap.
You can put Two similar objects in the same HashMap by overriding thier hashCode() Method.
Further notes on when Chaining actually takes place when a HashMap is used:
The Java implementation for HashMap will either override a key or chain an object to it depending on the following:
You put an object foo as key, with hash code X into the map
You put another object bar (as key..) that has the same hash code X
into the map
Since the hashes are the same, the algorithm would need to put the
object bar on the same index where foo is already stored. It would then consult the equals method of foo, to determine whether it should chain bar to foo (i.e foo.next() will become bar) or override foo with bar:
3.1.If equals returns true, foo & bar are either the same object, or they are semantically the same, and overriding will take place rather than chaining.
3.2. If equals returns false, foo & bar are treated as two distinct entities and chaining will take place. If you then print your HashMap, you'll be seeing both foo and bar.
I understand that HashSet is based on HashMap implementation but is used when you need unique set of elements. So why in the next code when putting same objects into the map and set we have size of both collections equals to 1? Shouldn't map size be 2? Because if size of both collection is equal I don't see any difference of using this two collections.
Set testSet = new HashSet<SimpleObject>();
Map testMap = new HashMap<Integer, SimpleObject>();
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println(testSet.size());
System.out.println(testMap.size());
The output is 1 and 1.
SimpleObject code
public class SimpleObject {
private String dataField1;
private int dataField2;
public SimpleObject(){}
public SimpleObject(String data1, int data2){
this.dataField1 = data1;
this.dataField2 = data2;
}
public String getDataField1() {
return dataField1;
}
public int getDataField2() {
return dataField2;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((dataField1 == null) ? 0 : dataField1.hashCode());
result = prime * result + dataField2;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SimpleObject other = (SimpleObject) obj;
if (dataField1 == null) {
if (other.dataField1 != null)
return false;
} else if (!dataField1.equals(other.dataField1))
return false;
if (dataField2 != other.dataField2)
return false;
return true;
}
}
The map holds unique keys. When you invoke put with a key that exists in the map, the object under that key is replaced with the new object. Hence the size 1.
The difference between the two should be obvious:
in a Map you store key-value pairs
in a Set you store only the keys
In fact, a HashSet has a HashMap field, and whenever add(obj) is invoked, the put method is invoked on the underlying map map.put(obj, DUMMY) - where the dummy object is a private static final Object DUMMY = new Object(). So the map is populated with your object as key, and a value that is of no interest.
A key in a Map can only map to a single value. So the second time you put in to the map with the same key, it overwrites the first entry.
In case of the HashSet, adding the same object will be more or less a no-op. In case of a HashMap, putting a new key,value pair with an existing key will overwrite the existing value to set a new value for that key. Below I've added equals() checks to your code:
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
//If the below prints true, the 2nd add will not add anything
System.out.println("Are the objects equal? " , (simpleObject1.equals(simpleObject2));
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
//This is a no-brainer as you've the exact same key, but lets keep it consistent
//If this returns true, the 2nd put will overwrite the 1st key-value pair.
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println("Are the keys equal? ", (key.equals(key));
System.out.println(testSet.size());
System.out.println(testMap.size());
I just wanted to add to these great answers, the answer to your last dilemma. You wanted to know what is the difference between these two collections, if they are returning the same size after your insertion. Well, you can't really see the difference here, because you are inserting two values in the map with the same key, and hence changing the first value with the second. You would see the real difference (among the others) should you have inserted the same value in the map, but with the different key. Then, you would see that you can have duplicate values in the map, but you can't have duplicate keys, and in the set you can't have duplicate values. This is the main difference here.
Answer is simple because it is nature of HashSets.
HashSet uses internally HashMap with dummy object named PRESENT as value and KEY of this hashmap will be your object.
hash(simpleObject1) and hash(simplObject2) will return the same int. So?
When you add simpleObject1 to hashset it will put this to its internal hashmap with simpleObject1 as a key. Then when you add(simplObject2) you will get false because it is available in the internal hashmap already as key.
As a little extra info, HashSet use effectively hashing function to provide O(1) performance by using object's equals() and hashCode() contract. That's why hashset does not allow "null" which cannot be implemented equals() and hashCode() to non-object.
I think the major difference is,
HashSet is stable in the sense, it doesn't replace duplicate value (if found after inserting first unique key, just discard all future duplicates), and HashMap will make the effort to replace old with new duplicate value. So there must be overhead in HashMap of inserting new duplicate item.
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, Serializable
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set
More Details