Related
In the below code (actual implementation of put method inside HashMap), I see the modCount is incremented only in case of new entry getting added but not sure whether it is getting incremented in normal case of replacement of old value and inserting new value for a given key. But I have an understanding that mod count increases in case of the hashmap structure changes, i.e in case of add, remove or update the value inside a map. Can someone explain this, as I do not see here in the code the modCount getting incremented inside the if part rather outside the for loop (so is it only for adding anew value, this modCount changes)?
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
Copied below directly from the javadoc of modcount of hashmap.java class :-
/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
*/
So Modcount will not be changed if you replace the old value for a key and i am using java-8 and below is the piece of code which replace the value for a existing key :-
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
And modcount++ is increase after above line but note here we are returning and see the comment // existing mapping for key.
So to answer your question modcount is not increase in case of value is replace for existing key as it will neither change the structure of hashmap nor it will cause the rehashing of map.
Note:- Even the code sample which you have provided there also you can notice that if its for existing key its replacing the value and returning the old value, which is the return for a put method. so line modCount++; will not be executed.
Hope i am clear and let me know if you have some doubts.
Java 8 HashMap docs literally say (emphasis mine):
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
So, as modCount keeps the count of structural modifications to the map, the answer to your question is: no, modCount is not increased in case of replacing the old value for a given key.
As per my understanding I think:
It is perfectly legal for two objects to have the same hashcode.
If two objects are equal (using the equals() method) then they have the same hashcode.
If two objects are not equal then they cannot have the same hashcode
Am I correct?
Now if am correct, I have the following question:
The HashMap internally uses the hashcode of the object. So if two objects can have the same hashcode, then how can the HashMap track which key it uses?
Can someone explain how the HashMap internally uses the hashcode of the object?
A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode() and equals() methods of keys:
If two keys are the same (equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).
If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals() to tell them apart.
Your third assertion is incorrect.
It's perfectly legal for two unequal objects to have the same hash code. It's used by HashMap as a "first pass filter" so that the map can quickly find possible entries with the specified key. The keys with the same hash code are then tested for equality with the specified key.
You wouldn't want a requirement that two unequal objects couldn't have the same hash code, as otherwise that would limit you to 232 possible objects. (It would also mean that different types couldn't even use an object's fields to generate hash codes, as other classes could generate the same hash.)
HashMap is an array of Entry objects.
Consider HashMap as just an array of objects.
Have a look at what this Object is:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
…
}
Each Entry object represents a key-value pair. The field next refers to another Entry object if a bucket has more than one Entry.
Sometimes it might happen that hash codes for 2 different objects are the same. In this case, two objects will be saved in one bucket and will be presented as a linked list.
The entry point is the more recently added object. This object refers to another object with the next field and so on. The last entry refers to null.
When you create a HashMap with the default constructor
HashMap hashMap = new HashMap();
The array is created with size 16 and default 0.75 load balance.
Adding a new key-value pair
Calculate hashcode for the key
Calculate position hash % (arrayLength-1) where element should be placed (bucket number)
If you try to add a value with a key which has already been saved in HashMap, then value gets overwritten.
Otherwise element is added to the bucket.
If the bucket already has at least one element, a new one gets added and placed in the first position of the bucket. Its next field refers to the old element.
Deletion
Calculate hashcode for the given key
Calculate bucket number hash % (arrayLength-1)
Get a reference to the first Entry object in the bucket and by means of equals method iterate over all entries in the given bucket. Eventually we will find the correct Entry.
If a desired element is not found, return null
You can find excellent information at http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html
To Summarize:
HashMap works on the principle of hashing
put(key, value): HashMap stores both key and value object as Map.Entry. Hashmap applies hashcode(key) to get the bucket. if there is collision ,HashMap uses LinkedList to store object.
get(key): HashMap uses Key Object's hashcode to find out bucket location and then call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap.
Here is a rough description of HashMap's mechanism, for Java 8 version, (it might be slightly different from Java 6).
Data structures
Hash table
Hash value is calculated via hash() on key, and it decide which bucket of the hashtable to use for a given key.
Linked list (singly)
When count of elements in a bucket is small, a singly linked list is used.
Red-Black tree
When count of elements in a bucket is large, a red-black tree is used.
Classes (internal)
Map.Entry
Represent a single entity in map, the key/value entity.
HashMap.Node
Linked list version of node.
It could represent:
A hash bucket.
Because it has a hash property.
A node in singly linked list, (thus also head of linkedlist).
HashMap.TreeNode
Tree version of node.
Fields (internal)
Node[] table
The bucket table, (head of the linked lists).
If a bucket don't contains elements, then it's null, thus only take space of a reference.
Set<Map.Entry> entrySet
Set of entities.
int size
Number of entities.
float loadFactor
Indicate how full the hash table is allowed, before resizing.
int threshold
The next size at which to resize.
Formula: threshold = capacity * loadFactor
Methods (internal)
int hash(key)
Calculate hash by key.
How to map hash to bucket?
Use following logic:
static int hashToBucket(int tableSize, int hash) {
return (tableSize - 1) & hash;
}
About capacity
In hash table, capacity means the bucket count, it could be get from table.length.
Also could be calculated via threshold and loadFactor, thus no need to be defined as a class field.
Could get the effective capacity via: capacity()
Operations
Find entity by key.
First find the bucket by hash value, then loop linked list or search sorted tree.
Add entity with key.
First find the bucket according to hash value of key.
Then try find the value:
If found, replace the value.
Otherwise, add a new node at beginning of linked list, or insert into sorted tree.
Resize
When threshold reached, will double hashtable's capacity(table.length), then perform a re-hash on all elements to rebuild the table.
This could be an expensive operation.
Performance
get & put
Time complexity is O(1), because:
Bucket is accessed via array index, thus O(1).
Linked list in each bucket is of small length, thus could view as O(1).
Tree size is also limited, because will extend capacity & re-hash when element count increase, so could view it as O(1), not O(log N).
The hashcode determines which bucket for the hashmap to check. If there is more than one object in the bucket then a linear search is done to find which item in the bucket equals the desired item (using the equals()) method.
In other words, if you have a perfect hashcode then hashmap access is constant, you will never have to iterate through a bucket (technically you would also have to have MAX_INT buckets, the Java implementation may share a few hash codes in the same bucket to cut down on space requirements). If you have the worst hashcode (always returns the same number) then your hashmap access becomes linear since you have to search through every item in the map (they're all in the same bucket) to get what you want.
Most of the time a well written hashcode isn't perfect but is unique enough to give you more or less constant access.
You're mistaken on point three. Two entries can have the same hash code but not be equal. Take a look at the implementation of HashMap.get from the OpenJdk. You can see that it checks that the hashes are equal and the keys are equal. Were point three true, then it would be unnecessary to check that the keys are equal. The hash code is compared before the key because the former is a more efficient comparison.
If you're interested in learning a little more about this, take a look at the Wikipedia article on Open Addressing collision resolution, which I believe is the mechanism that the OpenJdk implementation uses. That mechanism is subtly different than the "bucket" approach one of the other answers mentions.
import java.util.HashMap;
public class Students {
String name;
int age;
Students(String name, int age ){
this.name = name;
this.age=age;
}
#Override
public int hashCode() {
System.out.println("__hash__");
final int prime = 31;
int result = 1;
result = prime * result + age;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
System.out.println("__eq__");
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Students other = (Students) obj;
if (age != other.age)
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
public static void main(String[] args) {
Students S1 = new Students("taj",22);
Students S2 = new Students("taj",21);
System.out.println(S1.hashCode());
System.out.println(S2.hashCode());
HashMap<Students,String > HM = new HashMap<Students,String > ();
HM.put(S1, "tajinder");
HM.put(S2, "tajinder");
System.out.println(HM.size());
}
}
Output:
__ hash __
116232
__ hash __
116201
__ hash __
__ hash __
2
So here we see that if both the objects S1 and S2 have different content, then we are pretty sure that our overridden Hashcode method will generate different Hashcode(116232,11601) for both objects. NOW since there are different hash codes, so it won't even bother to call EQUALS method. Because a different Hashcode GUARANTEES DIFFERENT content in an object.
public static void main(String[] args) {
Students S1 = new Students("taj",21);
Students S2 = new Students("taj",21);
System.out.println(S1.hashCode());
System.out.println(S2.hashCode());
HashMap<Students,String > HM = new HashMap<Students,String > ();
HM.put(S1, "tajinder");
HM.put(S2, "tajinder");
System.out.println(HM.size());
}
}
Now lets change out main method a little bit. Output after this change is
__ hash __
116201
__ hash __
116201
__ hash __
__ hash __
__ eq __
1
We can clearly see that equal method is called. Here is print statement __eq__, since we have same hashcode, then content of objects MAY or MAY not be similar. So program internally calls Equal method to verify this.
Conclusion
If hashcode is different , equal method will not get called.
if hashcode is same, equal method will get called.
Thanks , hope it helps.
two objects are equal, implies that they have same hashcode, but not vice versa.
2 equal objects ------> they have same hashcode
2 objects have same hashcode ----xxxxx--> they are NOT equal
Java 8 update in HashMap-
you do this operation in your code -
myHashmap.put("old","old-value");
myHashMap.put("very-old","very-old-value");
so, suppose your hashcode returned for both keys "old" and "very-old" is same. Then what will happen.
myHashMap is a HashMap, and suppose that initially you didn't specify its capacity. So default capacity as per java is 16. So now as soon as you initialised hashmap using the new keyword, it created 16 buckets. now when you executed first statement-
myHashmap.put("old","old-value");
then hashcode for "old" is calculated, and because the hashcode could be very large integer too, so, java internally did this - (hash is hashcode here and >>> is right shift)
hash XOR hash >>> 16
so to give as a bigger picture, it will return some index, which would be between 0 to 15. Now your key value pair "old" and "old-value" would be converted to Entry object's key and value instance variable. and then this entry object will be stored in the bucket, or you can say that at a particular index, this entry object would be stored.
FYI- Entry is a class in Map interface- Map.Entry, with these signature/definition
class Entry{
final Key k;
value v;
final int hash;
Entry next;
}
now when you execute next statement -
myHashmap.put("very-old","very-old-value");
and "very-old" gives same hashcode as "old", so this new key value pair is again sent to the same index or the same bucket. But since this bucket is not empty, then the next variable of the Entry object is used to store this new key value pair.
and this will be stored as linked list for every object which have the same hashcode, but a TRIEFY_THRESHOLD is specified with value 6. so after this reaches, linked list is converted to the balanced tree(red-black tree) with first element as the root.
Each Entry object represents key-value pair. Field next refers to other Entry object if a bucket has more than 1 Entry.
Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other object with next field and so one. Last entry refers to null.
When you create HashMap with default constructor
Array is gets created with size 16 and default 0.75 load balance.
(Source)
Hash map works on the principle of hashing
HashMap get(Key k) method calls hashCode method on the key object and applies returned hashValue to its own static hash function to find a bucket location(backing array) where keys and values are stored in form of a nested class called Entry (Map.Entry) . So you have concluded that from the previous line that Both key and value is stored in the bucket as a form of Entry object . So thinking that Only value is stored in the bucket is not correct and will not give a good impression on the interviewer .
Whenever we call get( Key k ) method on the HashMap object . First it checks that whether key is null or not . Note that there can only be one null key in HashMap .
If key is null , then Null keys always map to hash 0, thus index 0.
If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like
int hash = hash(hashValue)
and now ,it applies returned hashValue into its own hashing function .
We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is It defends against poor quality hash functions.
Now final hashvalue is used to find the bucket location at which the Entry object is stored . Entry object stores in the bucket like this (hash,key,value,bucketindex)
I will not get into the details of how HashMap works, but will give an example so we can remember how HashMap works by relating it to reality.
We have Key, Value ,HashCode and bucket.
For sometime, we will relate each of them with the following:
Bucket -> A Society
HashCode -> Society's address(unique always)
Value -> A House in the Society
Key -> House address.
Using Map.get(key) :
Stevie wants to get to his friend's(Josse) house who lives in a villa in a VIP society, let it be JavaLovers Society.
Josse's address is his SSN(which is different for everyone).
There's an index maintained in which we find out the Society's name based on SSN.
This index can be considered to be an algorithm to find out the HashCode.
SSN Society's Name
92313(Josse's) -- JavaLovers
13214 -- AngularJSLovers
98080 -- JavaLovers
53808 -- BiologyLovers
This SSN(key) first gives us a HashCode(from the index table) which is nothing but Society's name.
Now, mulitple houses can be in the same society, so the HashCode can be common.
Suppose, the Society is common for two houses, how are we going to identify which house we are going to, yes, by using the (SSN)key which is nothing but the House address
Using Map.put(key,Value)
This finds a suitable society for this Value by finding the HashCode and then the value is stored.
I hope this helps and this is open for modifications.
Bearing in mind the explanations here for the structure of a hashmap, perhaps someone could explain the following paragraph on Baeldung :-
Java has several implementations of the interface Map, each one with its own particularities.
However, none of the existing Java core Map implementations allow a Map to handle multiple values for a single key.
As we can see, if we try to insert two values for the same key, the second value will be stored, while the first one will be dropped.
It will also be returned (by every proper implementation of the put(K key, V value) method):
Map<String, String> map = new HashMap<>();
assertThat(map.put("key1", "value1")).isEqualTo(null);
assertThat(map.put("key1", "value2")).isEqualTo("value1");
assertThat(map.get("key1")).isEqualTo("value2");
It gonna be a long answer , grab a drink and read on …
Hashing is all about storing a key-value pair in memory that can be read and written faster. It stores keys in an array and values in a LinkedList .
Lets Say I want to store 4 key value pairs -
{
“girl” => “ahhan” ,
“misused” => “Manmohan Singh” ,
“horsemints” => “guess what”,
“no” => “way”
}
So to store the keys we need an array of 4 element . Now how do I map one of these 4 keys to 4 array indexes (0,1,2,3)?
So java finds the hashCode of individual keys and map them to a particular array index .
Hashcode Formulae is -
1) reverse the string.
2) keep on multiplying ascii of each character with increasing power of 31 . then add the components .
3) So hashCode() of girl would be –(ascii values of l,r,i,g are 108, 114, 105 and 103) .
e.g. girl = 108 * 31^0 + 114 * 31^1 + 105 * 31^2 + 103 * 31^3 = 3173020
Hash and girl !! I know what you are thinking. Your fascination about that wild duet might made you miss an important thing .
Why java multiply it with 31 ?
It’s because, 31 is an odd prime in the form 2^5 – 1 . And odd prime reduces the chance of Hash Collision
Now how this hash code is mapped to an array index?
answer is , Hash Code % (Array length -1) . So “girl” is mapped to (3173020 % 3) = 1 in our case . which is second element of the array .
and the value “ahhan” is stored in a LinkedList associated with array index 1 .
HashCollision - If you try to find hasHCode of the keys “misused” and “horsemints” using the formulae described above you’ll see both giving us same 1069518484. Whooaa !! lesson learnt -
2 equal objects must have same hashCode but there is no guarantee if
the hashCode matches then the objects are equal . So it should store
both values corresponding to “misused” and “horsemints” to bucket 1
(1069518484 % 3) .
Now the hash map looks like –
Array Index 0 –
Array Index 1 - LinkedIst (“ahhan” , “Manmohan Singh” , “guess what”)
Array Index 2 – LinkedList (“way”)
Array Index 3 –
Now if some body tries to find the value for the key “horsemints” , java quickly will find the hashCode of it , module it and start searching for it’s value in the LinkedList corresponding index 1 . So this way we need not search all the 4 array indexes thus making data access faster.
But , wait , one sec . there are 3 values in that linkedList corresponding Array index 1, how it finds out which one was was the value for key “horsemints” ?
Actually I lied , when I said HashMap just stores values in LinkedList .
It stores both key value pair as map entry. So actually Map looks like this .
Array Index 0 –
Array Index 1 - LinkedIst (<”girl” => “ahhan”> , <” misused” => “Manmohan Singh”> , <”horsemints” => “guess what”>)
Array Index 2 – LinkedList (<”no” => “way”>)
Array Index 3 –
Now you can see While traversing through the linkedList corresponding to ArrayIndex1 it actually compares key of each entry to of that LinkedList to “horsemints” and when it finds one it just returns the value of it .
Hope you had fun while reading it :)
As it is said, a picture is worth 1000 words. I say: some code is better than 1000 words. Here's the source code of HashMap. Get method:
/**
* Implements Map.get and related methods
*
* #param hash hash for key
* #param key the key
* #return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
So it becomes clear that hash is used to find the "bucket" and the first element is always checked in that bucket. If not, then equals of the key is used to find the actual element in the linked list.
Let's see the put() method:
/**
* Implements Map.put and related methods
*
* #param hash hash for key
* #param key the key
* #param value the value to put
* #param onlyIfAbsent if true, don't change existing value
* #param evict if false, the table is in creation mode.
* #return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
It's slightly more complicated, but it becomes clear that the new element is put in the tab at the position calculated based on hash:
i = (n - 1) & hash here i is the index where the new element will be put (or it is the "bucket"). n is the size of the tab array (array of "buckets").
First, it is tried to be put as the first element of in that "bucket". If there is already an element, then append a new node to the list.
This question already has answers here:
How does Java HashMap store entries internally
(3 answers)
Closed 9 years ago.
quick question to be sure I understood well how HashMap in java works.
Here is an example of code:
//String key = new String("key");
//String val = new String("value");
String key = "key";
String val = "value";
HashMap map = new HashMap();
map.put(key, val);
System.out.println("hashmap object created. Its key hashcode = "+key.hashCode());
// the hashcode is 106079
System.out.println("hashmap object value for key = "+map.get(key));
// Let's store using a key with same hashcode
Integer intkey = new Integer(106079);
val = "value2";
map.put(intkey, val);
System.out.println("hashmap object created. Its intkey hashcode = "+intkey.hashCode());
// this returns 106079 once again. So both key and intkey have the same hashcode
// Let's get the values
System.out.println("hashmap object value for intkey = "+map.get(intkey));
System.out.println("hashmap object value for key = "+map.get(key));
the returned values are as expected. I read that behind the scene, HashMap works as follows:
get the key/value.
make a hashcode from the key.
store in bucket the key and value objects (in my case bucket number 106079)
same again for the second one.
stores it within the same bucket but as this is at this point a LinkedList, I suppose store it at "next available allocation".
To get it:
pick up the key, get hashcode,
look at the bucket,
then look at the key from the first element in the LinkedList,
then check if key passed and key from element match, if not then continue to the next, and so on until value can be retrieved.
Am I understanding this concept correctly?
Many thanks!
EDIT:
many thanks, so to complete:
- How does Java HashMap store entries internally
- How does a Java HashMap handle different objects with the same hash code?
And:
There shouldn't be that many "buckets"
A bucket is not the same as an entry. A bucket is a logical collection of all entries sharing the same bucket# (which in the Java case is a function of the key's hashCode()). As stated in other answers, bucket overflow could be implemented several ways, not necessarily in a List.
There are other existing collision resolutions: http://en.wikipedia.org/wiki/Hash_table#Collision_resolution
it uses Object's equals method to compare and retrieve objects from the same bucket.
Please also note, there are several ways HashMap can implement hash codes collision resolution, not only utilizing linked list as you mentioned
Java's HashMap implementation does not only use LinkedList strategy to handle key-values with same key.hashCode() values.
Also, you may want to read this article
Yes, your understanding is correct. Just note that a single bucket is assigned many hashcodes: in a fresh HashMap there are altogether 16 buckets, each assigned a total of 232/16 = 228 hashcodes.
Your understandings are OK, but take into account that there are several implementations. The actual hashcode HashMap uses to store the value may not be 106079. Here's one implementation (java-6-openjdk):
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
Notice the hash method, which consists of the following:
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
So in this JVM's example, it does not use 106079 as a hash, since HashMap recreates a hash to 'harden' it.
I hope that helps
This question already has answers here:
How does a Java HashMap handle different objects with the same hash code?
(15 answers)
Closed 9 years ago.
In Java, if I have HashMap<Integer, int[]> map and want to lookup for a given int key like map.get(key) then the algorithm will compute key.hashCode(), go to the corresponding bucket and search linearly for objects of type int[] and compare them by using equals() ? So those int[] objects in a bucket will have the same key (computed by hashCode) and they will be compared by equals(). Is that right?
I can not find on the web an example, where it is shown clearly. Only words.
What you are redirecting me at does not contain a normal understandable example, I do not need theory.
Correction: ...go to the corresponding bucket and search linearly for an entry with the key (Integer) equal to the given key. And this is how this search actually implemented in HashMap
final Entry<K,V> getEntry(Object key) {
int hash = (key == null) ? 0 : hash(key);
for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
I do not think this is a good solution to intermix an array as a value when you can clearly use a data structure seperately. The idea of a hash map is to condencs index relative to a table. You can easyly keep all the ints in a seperate data structure and enumerate with a for loop and pair the key with put. This underlying construction will give you the reference regardless of where the reference is stored.
There are 2^32 possible hash codes, but the maximum number of buckets is Integer.MAX_VALUE, the largest possible int. That means HashMap must map multiple hash codes to the same bucket.
To look up a probe key, in your case an Integer, it first computes the probes's hash code. It goes to the bucket containing that hash code. It scans the keys in the (key,value) pairs in the bucket for keys whose hash code matches the probe hash code. It only runs the equals test for those keys that do have the required hash code.
If it finds a (key, value) pair whose key is equal to the probe, it returns the value, in your case an int[] reference.
See the code quoted in #Evgeniy's answer for details of handling of null.
As per my understanding I think:
It is perfectly legal for two objects to have the same hashcode.
If two objects are equal (using the equals() method) then they have the same hashcode.
If two objects are not equal then they cannot have the same hashcode
Am I correct?
Now if am correct, I have the following question:
The HashMap internally uses the hashcode of the object. So if two objects can have the same hashcode, then how can the HashMap track which key it uses?
Can someone explain how the HashMap internally uses the hashcode of the object?
A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode() and equals() methods of keys:
If two keys are the same (equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).
If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals() to tell them apart.
Your third assertion is incorrect.
It's perfectly legal for two unequal objects to have the same hash code. It's used by HashMap as a "first pass filter" so that the map can quickly find possible entries with the specified key. The keys with the same hash code are then tested for equality with the specified key.
You wouldn't want a requirement that two unequal objects couldn't have the same hash code, as otherwise that would limit you to 232 possible objects. (It would also mean that different types couldn't even use an object's fields to generate hash codes, as other classes could generate the same hash.)
HashMap is an array of Entry objects.
Consider HashMap as just an array of objects.
Have a look at what this Object is:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
…
}
Each Entry object represents a key-value pair. The field next refers to another Entry object if a bucket has more than one Entry.
Sometimes it might happen that hash codes for 2 different objects are the same. In this case, two objects will be saved in one bucket and will be presented as a linked list.
The entry point is the more recently added object. This object refers to another object with the next field and so on. The last entry refers to null.
When you create a HashMap with the default constructor
HashMap hashMap = new HashMap();
The array is created with size 16 and default 0.75 load balance.
Adding a new key-value pair
Calculate hashcode for the key
Calculate position hash % (arrayLength-1) where element should be placed (bucket number)
If you try to add a value with a key which has already been saved in HashMap, then value gets overwritten.
Otherwise element is added to the bucket.
If the bucket already has at least one element, a new one gets added and placed in the first position of the bucket. Its next field refers to the old element.
Deletion
Calculate hashcode for the given key
Calculate bucket number hash % (arrayLength-1)
Get a reference to the first Entry object in the bucket and by means of equals method iterate over all entries in the given bucket. Eventually we will find the correct Entry.
If a desired element is not found, return null
You can find excellent information at http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html
To Summarize:
HashMap works on the principle of hashing
put(key, value): HashMap stores both key and value object as Map.Entry. Hashmap applies hashcode(key) to get the bucket. if there is collision ,HashMap uses LinkedList to store object.
get(key): HashMap uses Key Object's hashcode to find out bucket location and then call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap.
Here is a rough description of HashMap's mechanism, for Java 8 version, (it might be slightly different from Java 6).
Data structures
Hash table
Hash value is calculated via hash() on key, and it decide which bucket of the hashtable to use for a given key.
Linked list (singly)
When count of elements in a bucket is small, a singly linked list is used.
Red-Black tree
When count of elements in a bucket is large, a red-black tree is used.
Classes (internal)
Map.Entry
Represent a single entity in map, the key/value entity.
HashMap.Node
Linked list version of node.
It could represent:
A hash bucket.
Because it has a hash property.
A node in singly linked list, (thus also head of linkedlist).
HashMap.TreeNode
Tree version of node.
Fields (internal)
Node[] table
The bucket table, (head of the linked lists).
If a bucket don't contains elements, then it's null, thus only take space of a reference.
Set<Map.Entry> entrySet
Set of entities.
int size
Number of entities.
float loadFactor
Indicate how full the hash table is allowed, before resizing.
int threshold
The next size at which to resize.
Formula: threshold = capacity * loadFactor
Methods (internal)
int hash(key)
Calculate hash by key.
How to map hash to bucket?
Use following logic:
static int hashToBucket(int tableSize, int hash) {
return (tableSize - 1) & hash;
}
About capacity
In hash table, capacity means the bucket count, it could be get from table.length.
Also could be calculated via threshold and loadFactor, thus no need to be defined as a class field.
Could get the effective capacity via: capacity()
Operations
Find entity by key.
First find the bucket by hash value, then loop linked list or search sorted tree.
Add entity with key.
First find the bucket according to hash value of key.
Then try find the value:
If found, replace the value.
Otherwise, add a new node at beginning of linked list, or insert into sorted tree.
Resize
When threshold reached, will double hashtable's capacity(table.length), then perform a re-hash on all elements to rebuild the table.
This could be an expensive operation.
Performance
get & put
Time complexity is O(1), because:
Bucket is accessed via array index, thus O(1).
Linked list in each bucket is of small length, thus could view as O(1).
Tree size is also limited, because will extend capacity & re-hash when element count increase, so could view it as O(1), not O(log N).
The hashcode determines which bucket for the hashmap to check. If there is more than one object in the bucket then a linear search is done to find which item in the bucket equals the desired item (using the equals()) method.
In other words, if you have a perfect hashcode then hashmap access is constant, you will never have to iterate through a bucket (technically you would also have to have MAX_INT buckets, the Java implementation may share a few hash codes in the same bucket to cut down on space requirements). If you have the worst hashcode (always returns the same number) then your hashmap access becomes linear since you have to search through every item in the map (they're all in the same bucket) to get what you want.
Most of the time a well written hashcode isn't perfect but is unique enough to give you more or less constant access.
You're mistaken on point three. Two entries can have the same hash code but not be equal. Take a look at the implementation of HashMap.get from the OpenJdk. You can see that it checks that the hashes are equal and the keys are equal. Were point three true, then it would be unnecessary to check that the keys are equal. The hash code is compared before the key because the former is a more efficient comparison.
If you're interested in learning a little more about this, take a look at the Wikipedia article on Open Addressing collision resolution, which I believe is the mechanism that the OpenJdk implementation uses. That mechanism is subtly different than the "bucket" approach one of the other answers mentions.
import java.util.HashMap;
public class Students {
String name;
int age;
Students(String name, int age ){
this.name = name;
this.age=age;
}
#Override
public int hashCode() {
System.out.println("__hash__");
final int prime = 31;
int result = 1;
result = prime * result + age;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
System.out.println("__eq__");
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Students other = (Students) obj;
if (age != other.age)
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
public static void main(String[] args) {
Students S1 = new Students("taj",22);
Students S2 = new Students("taj",21);
System.out.println(S1.hashCode());
System.out.println(S2.hashCode());
HashMap<Students,String > HM = new HashMap<Students,String > ();
HM.put(S1, "tajinder");
HM.put(S2, "tajinder");
System.out.println(HM.size());
}
}
Output:
__ hash __
116232
__ hash __
116201
__ hash __
__ hash __
2
So here we see that if both the objects S1 and S2 have different content, then we are pretty sure that our overridden Hashcode method will generate different Hashcode(116232,11601) for both objects. NOW since there are different hash codes, so it won't even bother to call EQUALS method. Because a different Hashcode GUARANTEES DIFFERENT content in an object.
public static void main(String[] args) {
Students S1 = new Students("taj",21);
Students S2 = new Students("taj",21);
System.out.println(S1.hashCode());
System.out.println(S2.hashCode());
HashMap<Students,String > HM = new HashMap<Students,String > ();
HM.put(S1, "tajinder");
HM.put(S2, "tajinder");
System.out.println(HM.size());
}
}
Now lets change out main method a little bit. Output after this change is
__ hash __
116201
__ hash __
116201
__ hash __
__ hash __
__ eq __
1
We can clearly see that equal method is called. Here is print statement __eq__, since we have same hashcode, then content of objects MAY or MAY not be similar. So program internally calls Equal method to verify this.
Conclusion
If hashcode is different , equal method will not get called.
if hashcode is same, equal method will get called.
Thanks , hope it helps.
two objects are equal, implies that they have same hashcode, but not vice versa.
2 equal objects ------> they have same hashcode
2 objects have same hashcode ----xxxxx--> they are NOT equal
Java 8 update in HashMap-
you do this operation in your code -
myHashmap.put("old","old-value");
myHashMap.put("very-old","very-old-value");
so, suppose your hashcode returned for both keys "old" and "very-old" is same. Then what will happen.
myHashMap is a HashMap, and suppose that initially you didn't specify its capacity. So default capacity as per java is 16. So now as soon as you initialised hashmap using the new keyword, it created 16 buckets. now when you executed first statement-
myHashmap.put("old","old-value");
then hashcode for "old" is calculated, and because the hashcode could be very large integer too, so, java internally did this - (hash is hashcode here and >>> is right shift)
hash XOR hash >>> 16
so to give as a bigger picture, it will return some index, which would be between 0 to 15. Now your key value pair "old" and "old-value" would be converted to Entry object's key and value instance variable. and then this entry object will be stored in the bucket, or you can say that at a particular index, this entry object would be stored.
FYI- Entry is a class in Map interface- Map.Entry, with these signature/definition
class Entry{
final Key k;
value v;
final int hash;
Entry next;
}
now when you execute next statement -
myHashmap.put("very-old","very-old-value");
and "very-old" gives same hashcode as "old", so this new key value pair is again sent to the same index or the same bucket. But since this bucket is not empty, then the next variable of the Entry object is used to store this new key value pair.
and this will be stored as linked list for every object which have the same hashcode, but a TRIEFY_THRESHOLD is specified with value 6. so after this reaches, linked list is converted to the balanced tree(red-black tree) with first element as the root.
Each Entry object represents key-value pair. Field next refers to other Entry object if a bucket has more than 1 Entry.
Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other object with next field and so one. Last entry refers to null.
When you create HashMap with default constructor
Array is gets created with size 16 and default 0.75 load balance.
(Source)
Hash map works on the principle of hashing
HashMap get(Key k) method calls hashCode method on the key object and applies returned hashValue to its own static hash function to find a bucket location(backing array) where keys and values are stored in form of a nested class called Entry (Map.Entry) . So you have concluded that from the previous line that Both key and value is stored in the bucket as a form of Entry object . So thinking that Only value is stored in the bucket is not correct and will not give a good impression on the interviewer .
Whenever we call get( Key k ) method on the HashMap object . First it checks that whether key is null or not . Note that there can only be one null key in HashMap .
If key is null , then Null keys always map to hash 0, thus index 0.
If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like
int hash = hash(hashValue)
and now ,it applies returned hashValue into its own hashing function .
We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is It defends against poor quality hash functions.
Now final hashvalue is used to find the bucket location at which the Entry object is stored . Entry object stores in the bucket like this (hash,key,value,bucketindex)
I will not get into the details of how HashMap works, but will give an example so we can remember how HashMap works by relating it to reality.
We have Key, Value ,HashCode and bucket.
For sometime, we will relate each of them with the following:
Bucket -> A Society
HashCode -> Society's address(unique always)
Value -> A House in the Society
Key -> House address.
Using Map.get(key) :
Stevie wants to get to his friend's(Josse) house who lives in a villa in a VIP society, let it be JavaLovers Society.
Josse's address is his SSN(which is different for everyone).
There's an index maintained in which we find out the Society's name based on SSN.
This index can be considered to be an algorithm to find out the HashCode.
SSN Society's Name
92313(Josse's) -- JavaLovers
13214 -- AngularJSLovers
98080 -- JavaLovers
53808 -- BiologyLovers
This SSN(key) first gives us a HashCode(from the index table) which is nothing but Society's name.
Now, mulitple houses can be in the same society, so the HashCode can be common.
Suppose, the Society is common for two houses, how are we going to identify which house we are going to, yes, by using the (SSN)key which is nothing but the House address
Using Map.put(key,Value)
This finds a suitable society for this Value by finding the HashCode and then the value is stored.
I hope this helps and this is open for modifications.
Bearing in mind the explanations here for the structure of a hashmap, perhaps someone could explain the following paragraph on Baeldung :-
Java has several implementations of the interface Map, each one with its own particularities.
However, none of the existing Java core Map implementations allow a Map to handle multiple values for a single key.
As we can see, if we try to insert two values for the same key, the second value will be stored, while the first one will be dropped.
It will also be returned (by every proper implementation of the put(K key, V value) method):
Map<String, String> map = new HashMap<>();
assertThat(map.put("key1", "value1")).isEqualTo(null);
assertThat(map.put("key1", "value2")).isEqualTo("value1");
assertThat(map.get("key1")).isEqualTo("value2");
It gonna be a long answer , grab a drink and read on …
Hashing is all about storing a key-value pair in memory that can be read and written faster. It stores keys in an array and values in a LinkedList .
Lets Say I want to store 4 key value pairs -
{
“girl” => “ahhan” ,
“misused” => “Manmohan Singh” ,
“horsemints” => “guess what”,
“no” => “way”
}
So to store the keys we need an array of 4 element . Now how do I map one of these 4 keys to 4 array indexes (0,1,2,3)?
So java finds the hashCode of individual keys and map them to a particular array index .
Hashcode Formulae is -
1) reverse the string.
2) keep on multiplying ascii of each character with increasing power of 31 . then add the components .
3) So hashCode() of girl would be –(ascii values of l,r,i,g are 108, 114, 105 and 103) .
e.g. girl = 108 * 31^0 + 114 * 31^1 + 105 * 31^2 + 103 * 31^3 = 3173020
Hash and girl !! I know what you are thinking. Your fascination about that wild duet might made you miss an important thing .
Why java multiply it with 31 ?
It’s because, 31 is an odd prime in the form 2^5 – 1 . And odd prime reduces the chance of Hash Collision
Now how this hash code is mapped to an array index?
answer is , Hash Code % (Array length -1) . So “girl” is mapped to (3173020 % 3) = 1 in our case . which is second element of the array .
and the value “ahhan” is stored in a LinkedList associated with array index 1 .
HashCollision - If you try to find hasHCode of the keys “misused” and “horsemints” using the formulae described above you’ll see both giving us same 1069518484. Whooaa !! lesson learnt -
2 equal objects must have same hashCode but there is no guarantee if
the hashCode matches then the objects are equal . So it should store
both values corresponding to “misused” and “horsemints” to bucket 1
(1069518484 % 3) .
Now the hash map looks like –
Array Index 0 –
Array Index 1 - LinkedIst (“ahhan” , “Manmohan Singh” , “guess what”)
Array Index 2 – LinkedList (“way”)
Array Index 3 –
Now if some body tries to find the value for the key “horsemints” , java quickly will find the hashCode of it , module it and start searching for it’s value in the LinkedList corresponding index 1 . So this way we need not search all the 4 array indexes thus making data access faster.
But , wait , one sec . there are 3 values in that linkedList corresponding Array index 1, how it finds out which one was was the value for key “horsemints” ?
Actually I lied , when I said HashMap just stores values in LinkedList .
It stores both key value pair as map entry. So actually Map looks like this .
Array Index 0 –
Array Index 1 - LinkedIst (<”girl” => “ahhan”> , <” misused” => “Manmohan Singh”> , <”horsemints” => “guess what”>)
Array Index 2 – LinkedList (<”no” => “way”>)
Array Index 3 –
Now you can see While traversing through the linkedList corresponding to ArrayIndex1 it actually compares key of each entry to of that LinkedList to “horsemints” and when it finds one it just returns the value of it .
Hope you had fun while reading it :)
As it is said, a picture is worth 1000 words. I say: some code is better than 1000 words. Here's the source code of HashMap. Get method:
/**
* Implements Map.get and related methods
*
* #param hash hash for key
* #param key the key
* #return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
So it becomes clear that hash is used to find the "bucket" and the first element is always checked in that bucket. If not, then equals of the key is used to find the actual element in the linked list.
Let's see the put() method:
/**
* Implements Map.put and related methods
*
* #param hash hash for key
* #param key the key
* #param value the value to put
* #param onlyIfAbsent if true, don't change existing value
* #param evict if false, the table is in creation mode.
* #return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
It's slightly more complicated, but it becomes clear that the new element is put in the tab at the position calculated based on hash:
i = (n - 1) & hash here i is the index where the new element will be put (or it is the "bucket"). n is the size of the tab array (array of "buckets").
First, it is tried to be put as the first element of in that "bucket". If there is already an element, then append a new node to the list.