Java Hashmap - Please explain how hash maps work

Java Hashmap - Please explain how hash maps work - java

I am currently preparing for interviews, Java in particular.
A common question is to explain hash maps.
Every explanation says that in case there is more than a single value per key, the values are being linked listed to the bucket.
Now, in HashMap class, when we use put(), and the key is already in the map, the value is not being linked to the existing one (at list as I understand) but replacing it:
Map<String, Integer> map = new HashMap();
map.put("a", 1);
//map now have the pair ["a", 1]
map.put("a", 2);
//map now have the pair ["a", 2]
//And according to all hash maps tutorials, it should have been like: ["a", 1->2]
From the docs:
If the map previously contained a mapping for the key, the old value
is replaced.
What am I missing here? I am a little confused...
Thanks

You're confusing the behaviour of a Map with the implementation of a HashMap.
In a Map, there is only one value for a key -- if you put a new value for the same key, the old value will be replaced.
HashMaps are implemented using 'buckets' -- an array of cells of finite size, indexed by the hashCode of the key.
It's possible for two different keys to hash to the same bucket, a 'hash collision'. In case of a collision, one solution is to put the (key, value) pairs into a list, which is searched when getting a value from that bucket. This list is part of the internal implementation of the HashMap and is not visible to the user of the HashMap.
This is probably what you are thinking of.

Your basic understanding is correct: maps in general and hashmaps in particular only support one value per key. That value could be a list but that's something different. put("a", 2) will replace any value for key "a" that's already in the list.
So what are you missing?
Every explanation says that in case there is more than a single value per key, the values are being linked listed to the bucket.
Yes, that's basically the case (unless the list is replaced by a tree for efficiency reasons but let's ignore that here).
This is not about putting values for the same key into a list, however, but for the same bucket. Since keys could be mapped to the same bucket due to their hash code you'd need to handle that "collision".
Example:
Key "A" has a hash code of 65, key "P" has a hash code of 81 (assuming hashCode() just returns the ascii codes).
Let's assume our hashmap currently has 16 buckets. So when I put "A" into the map, which bucket does it go to? We calculate bucketIndex = hashCode % numBuckets (so 65 % 16) and we get the index 1 for the 2nd bucket.
Now we want to put "P" into the map. bucketIndex = hashCode % numBuckets also yields 1 (81 % 16) so the value for a different key goes to the same bucket at index 1.
To solve that a simple implementation is to use a linked list, i.e. the entry for "A" points to the next entry for "P" in the same bucket.
Any get("P") will then look for the bucket index first using the same calculation, so it gets 1. Then it iterates the list and calls equals() on each entry's key until it hits the one that matches (or none if none match).

in case there is more than a single value per key, the values are being linked listed to the bucket.
Maybe you mistake that with: Multiple keys can have the same hashCode value. (Collision)
For example let's consider 2 keys(key1, key2). Key1 references value1 and Key2 references value2.
If
hashcode(key1) = 1
hashcode(key2) = 1

The objects might have the same hashCode, but at the same time not be equal (a collision). In that situation both values will be put as List according to hashCode. Values will be retrieved by hashCode and than you'll get your value among that values by equals operation.

Related

If Hashtables use separate chaining, why are duplicate keys not possible?

So I'm a bit confused about this one.
If Hashtables use separate chaining (or linear probing), why won't the following print out both values?
Hashtable<Character, Integer> map = new Hashtable<>();
map.put('h', 0);
map.put('h', 1);
System.out.println(map.remove('h')); // outputs 1
System.out.println(map.get('h')); // outputs null
I'm trying to understand why, given 2 identical keys, the hashtable won't use separate chaining in order to store both values. Did I understand this somewhat incorrectly or has Java just not implemented collision handling in their hashtable class?
Another question that might tie together would be, how does a hashtable using linear probing, given a key, know which value is the one we are looking for?
Thanks in advance!

I'm trying to understand why, given 2 identical keys, the hashtable won't use separate chaining in order to store both values.
The specification for Map (i.e. the javadoc) says that only one value is stored for each key. So that's what HashTable and HashMap implementations do.
Certainly the separate chaining doesn't stop someone from implementing a hash table with that property. The pseudo-code for put(key, value) on a hash table with separate chaining is broadly as follows:
Compute the hash value for the key.
Compute an index in the array of hash chains from the hash value. (The computation is index = hash % array.length or something similar.)
Search the hash chain at the computed index for an entry that matches the key.
If you found the entry for the key on the chain, update the value in the entry.
If you didn't find the entry, create an entry and add it to the chain.
If you repeat that for the same key, you will compute the same hash value, search the same chain, and find the same entry. You then update it, and there is still only one entry for that key ... as required by the specification.
In short, the above algorithm has no problem meeting the Map.put API requirements.

I think you are mis-understanding how hash tables work. Imagine I am looking for someone with an id of 227828. Say I have 1000 such people. I can search all 1000 and eventually find that ID and the person to whom it belongs.
But if their ids are used as keys in a hash table it is easier. Using the id as the key, say the hash function returns 0 for an even id and 1 for an odd id. Then all I have to do is find the box that contains even ids. Ideally I would then only have to search thru 500 entries to find the key - i.e. the id, and return the value associated with it.
But hash functions are more sophisticated and there are many such boxes or buckets. And the appropriate box or bucket can be identified and then be searched for the proper key. And then return its associated value.

Problems using HashMap

I'm trying to solve an analytic problem just for study data structures. My doubt is with the HashTable in Java.
I have a HashMap
HashMap<String, Integer> map = new HashMap<>();
And this HashMap has any fewer keys, but, some of these keys are duplicated.
map.put('k', 1);
map.put('k', 2);
And So on...
My question is when I am gonna remove a key to the HashMap. Some of these keys are duplicated too.
Let's see.
map.remove('k');
I suppose that in this case, it will remove all the values with the key 'k' or it just only remove the first it found.
What's going to happen in this case? I'm a little confused.
Thanks for your help!!

In HashMap (or HashTable) you can only have UNIQUE KEYS, you cannot have different values assigned to the same key. In your code you attempt put 2 different values with the same key:
map.put('k', 1);
map.put('k', 2);
Guess what, there will be no 2 entries, but only 1, the last, which will REPLACE previous one, since they have the same key - 'k'. Hence, map.remove('k'); will remove everything which is just one single entry, not two.

There are multiple things you are asking. Let's answer all of them.
HashTable is not the same as HashMap. However, hashTable is very similar to HashMap. The biggest difference between them is that in HashTable, every method is synchronized, which makes it extremely expensive to do a read/write. HashMap's methods are not synchronized. HashTable is more or less obsolete and people writing new code should avoid using a HashTable.
In a HashMap, keys are always unique. i.e., there cannot be 2 entries with the same key. In your example,
map.put('k', 1);
This will create an entry in the map whose key is 'k' and value is 1.
Then, you do
map.put('k', 2);
This will not create another entry with key 'k' and value 2. This will overwrite the value for the first entry itself. So, you will only have a singe entry for key 'k' whose value is now 2 (and not 1)
Now, I guess understanding remove() would be easy. When you do remove(key), it tries removing the only entry for that key.

In HashMap Keys will be unique , so it wont add multiple times of key K . when you remove key "K" , it will remove the unique key 'K' from the hashtable.

By definition a HashMap in Java stores all keys uniquely.
Simply, when we try to put more entries with the same key it will override the previously defined value for that key. Hence, when deleting a key after putting multiple values for that key means the key will no longer exist in the HashMap.
For more details you can read the documentation here.

use
map.putIfAbsent('k',2);
instead of
map.put('k', 2);

Iterating over hashmap in case of collision in Java

Let us say we have a collision but key values are different, so by definition Hashmap will create a linked list at that bucket and add the new key value pair as the next of existing key value entry.
My question is how do we iterate over the hashmap in this case ? Does the default iteration mechanism changes to actually retrieve all the key value pairs which collided and got stored in the same bucket location ?

There are no changes. It would iterate over like say bucket 0 i.e. every elements in bucket and then move to bucket 1 and so on. So there are no changes in the way it would iterate.

How HashMap retrieves different values if Key's hashcode is same but equals method return false

I'm not able to understand on working pattern of HashMap. Kindly help to understand it.
Say we have two objects Obj1 and Obj2 having same Hashcode as 1212. Now when we run "==" and equals it returns false.
Now i use ValueObj1 and Valueobj2 as value in a HashMap with Keys Obj1 and Obj2 respectively. I believe both the values will be save in same bucket making as List.
My question how HashMap picks Valueobj2 for Obj2 and ValueObj1 for Obj1. Say there are n.. such objects and values.
How this key--> value association works internally even though hashcode is same but values are different.
Assuming both condition equals is not overridden and overridden.

A HashMap/HashSet implements per bucket a list of keys (on the case of a Map together with the values). If multiple key's share the same hashCode value, they are placed in this list.
So the search method first extracts the hashCode of the queried key and then iterates over the corresponding list until an equals method succeeds. In case of a HashSet it means the key is found, in case of a HashMap, it returns the other side of the tuple: the value.
The memory of a HashMap thus works like:
+--------+--------+--------+--------+
| 00 | 01 | 10 | 11 |
+--------+--------+--------+--------+
| | | |
k00/v00 _ k06/v06 _
| |
k08/v08 k14/v14
| |
k04/v04 _
|
_
What you see is on top the four buckets. Each bucket has a list (the items underneath), that stores tuples of keys (k) and values (v). Since there are here only four buckets, the hash algorithm uses a modulo 4 operation, so a key k06 with value v06 would be placed in bucket 06 mod 4 = 02 thus 10. In case a second key k14 is added with 14 mod 4 = 02 thus 10, it is simply added to the list.
Since the values are stored with it as well, one can perform a fast lookup operation. The key is thus stored together with the value.
You noticed, that iterating over the (linked) list is an expensive operation. But the point of a HashMap is that one hopes, that the number of hash-collisions to use the correct term (number of keys sharing the same bucket) is very low. In general one might expect two or three elements per bucket. The performance boost is thus achieved by selecting the correct bucket in constant time, searching the bucket requires however linear time (or in case there is a complete ordering on the key's, one might implement a (balanced) binary tree to search in logarithmic time). Worst-case however, a HashMap can achieve, the same performance as an ArrayList/LinkedList of entries, but given the hash-function has been designed decently, the odds are very low.

You can always look at the code.
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
So it first gets the hash for the given key.
Using that hash it locates the table (referred to in other answers as a bucket).
For every entry in the bucket it tests if the key equals the table entries key and if so it has found the correct item.
Partitioning keys by hashes in to buckets reduces the size of the linear search using equals comparisons. So you can see how harmful it is to return a fixed value for hashcode. See this for tips on good hashcode calculation.

HashMap works by dividing its content into buckets based on the hash value of the key. Each bucket in turn contains a list of entries, an entry consisting of the key and the value.
Let's say we want to look up x in the map. We calculate x.hashCode() and pick the appropriate bucket. We then iterate through the bucket's list and pick the entry e where e.key equals x. We then return e.value.
Pseudocode:
class Map {
class Entry {
Object key, value;
}
List<List<Entry>> buckets;
Object get(Object key) {
List<Entry> bucket = buckets.get(key.hashCode() % buckets.size());
for (Entry entry : bucket) {
if (Object.equals(key, entry.key) return entry.value;
}
return null;
}
}
(Disclaimer: Using % to calculate a bucket index is an oversimplification and wouldn't work as-is; it's just there to convey the general idea)

hashcode() method is called and hash code is calculated. This hashcode is used to find index of array for storing Entry object.
indexFor(hash,table.length) is used to calculate exact index in table array for storing the Entry object.
two key objects having same hashcode(which is known as collision)
In hash map, bucket used simple linked list to store objects .
if two keys have same hashcode, then store the key-value pair in the same bucket as that of the existing key.
How do you retrieve value object when two keys with same hashcode are stored in hashmap?
Using hashcode wo go to the right bucket and using equals we find the right element in the bucket and then return it.
HashMap get() function
If key is not null then , it will call hashfunction on the key object.
int hash = hash(hashValue)
hashvalue is used to find the bucket location at which the Entry object is stored . Entry object stores in the bucket like this (hash,key,value,bucketindex) .
detail read here and here

Comparing two objects with == is not a good idea since it checks if two objects are actually links to the same objects in memory.
There is a good article on Wikipedia about hashtables. Hashmap's in java has inside an array of "buckets".
When you put a new pair <key, value> (or in your case <obj1, valueObj1>) the bucket number is calculated depending on obj1.hashcode(). This pair is added into the selected bucket which is a LinkedList inside to store actual pairs <key, value>.
When you try to search for an valueObj1 with key-object obj1 hashmap calculates a bucket number where that pair is located and iterates over all that LinkedList's elements comparing keys with equals(). If instantly equals() returns true, it means that the element we are looking for is found.

Why doesn't my HashTable allow key collisions?

I read that HashTable can map same key to multiple values. That's what collision is.
Now I run the program like this:
Dictionary<String,String> hTable = new Hashtable<String,String>();
hTable.put("a", "aa");
hTable.put("a", "ab");
System.out.println(""+hTable.get("a"));
My thinking says I should get aa and ab.
But actual output is ab
Why is it so? Where is the collision then?

There is no collision. A HashTable entry maps a key to only one value.
The third line in your sample:
hTable.put("a", "ab");
replaces the mapping from a to aa with a mapping from a to ab.
After your four lines of code complete execution, hTable has only one mapping: a to ab.

A collision only happens internally. To the user, are resolved transparently.
That's why a hashtable can be a dictionary -- it maps each key to exactly 1 value. If it mapped to more than 1 value then it wouldn't be a dictionary.

Hashtable doesn't map the same key to multiple values. Collision is that multiple keys might be mapped to the same hash value. It is resolved by the data structure itself which is transparent to you.
If you want to get aa and ab by hTable.get("a"), you need to create Dictionary<String,List<String>> and append the list with the values of the same key.
In your code
hTable.put("a", "aa");
hTable.put("a", "ab");
The keys are the same. So the second operation used "ab" to override "aa". That's why you only get "ab".

HashTable is Key -> Value mapping. That means you can not have multiple values for more one key. You need to combine two data structures store multiple values with one key.
For Example,
You can put a linkList inside you HashTable. For example
HashTable<String,LinkedList<String>> table = new HashTable();
LinkedList<String> list = new LinkedList();
list.add("aa");
list.add("ab");
table.add("a",list);
now you can do this get aa and ab value;
table.get("a").get(0); // returns aa
table.get("a").get(1); // returns ab
I strongly recommend you to go through the basics of data structure and algorithm.

You want to retrieve values by their keys. An array serves this purpose but is restricted to using integer keys and may use too much space (think about storing values at position 0 and 1000 only, you have to allocate the entire array for 2 elements).
HashTables solve both of these problems with:
a dispersive non-injective function that converts an array of bytes of variable length in a fixed length array of bytes. This means that you have hash(bytes_1) == hash(bytes_2) but it doesn't happen too often and if bytes_1 ~ bytes_2 the hashes are different;
an index of used hashes. If the function returns an array of 10 bytes you have 2^80 possibilities, so you need to keep a sorted list of the hashes that you already encountered;
an array of linked lists. The index of hashes maps the hash with the position in the array.
The collision means that two keys have the same hash: map.put("key1", "value1"); map.put("key2", "value2") key1 and key2 might wind up in the same linked list.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.