I know that we can use a linked list to handle chain collision for hash map. However, in Java, the hash map implementation uses an array, and I am curious how java implements hash map chain collision resolution. I do find this post: Collision resolution in Java HashMap
. However, this is not the answer I am looking for.
Thanks a lot.
HashMap contains an array of Entry class. Each bucket has a LinkedList implementation. Each bucket points to hashCode, That being said, if there is a collision, then the new entry will be added at the end of the list in the same bucket.
Look at this code :
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length); // get table/ bucket index
for (Entry<K,V> e = table[i]; e != null; e = e.next) { // walk through the list of nodes
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue; // return old value if found
}
}
modCount++;
addEntry(hash, key, value, i); // add new value if not found
return null;
}
Related
Question
How is the HashMap method putIfAbsent able to perform a put conditionally in a way thats faster than calling containsKey(x) prior?
For example, if you didn't use putIfAbsent you could use:
if(!map.containsKey(x)){
map.put(x,someValue);
}
I had previously thought putIfAbsent was convenience method for calling containsKey followed by a put on a HashMap. But after running a benchmark putIfAbsent is significantly faster than using containsKey followed by Put. I looked at the java.util source code to try and see how this is possible but it's a bit too cryptic for me to figure out. Does anyone know internally how putIfAbsent seems to work in a better time complexity? Thats my assumption based on running a few code tests in which my code ran 50% faster when using putIfAbsent. It seems to avoid calling a get() but how?
Example
if(!map.containsKey(x)){
map.put(x,someValue);
}
VS
map.putIfAbsent(x,somevalue)
Java Source Code for Hashmap.putIfAbsent
#Override
public V putIfAbsent(K key, V value) {
return putVal(hash(key), key, value, true, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
The HashMap implementation of putIfAbsent searches for the key just once, and if it doesn't find the key, it puts the value in the relevant bin (which was already located). That's what putVal does.
On the other hand, using map.containsKey(x) followed by map.put(x,someValue) performs two lookups for the key in the Map, which takes more time.
Note that put also calls putVal (put calls putVal(hash(key), key, value, false, true) while putIfAbsent calls putVal(hash(key), key, value, true, true)), so putIfAbsent has the same performance as calling just put, which is faster than calling both containsKey and put.
See Eran's answer... I'd like to also answer it more succinctly. put and putIfAbsent both use the same helper method putVal. But clients using put can't take advantage of its many parameters that allow put-if-present behavior. The public method putIfAbsent exposes this. So using putIfAbsent has the same underlying time complexity as the put you are already going to use in conjunction with containsKey. The use of containsKey then becomes a waste.
So the core of this is that private function putVal is being used by both put and putIfAbsent.
This question already has answers here:
What basic operations on a Map are permitted while iterating over it?
(4 answers)
Closed 6 years ago.
For example see the following code snippet:
Map<String,String> unsafemap=new HashMap<>();
unsafemap.put("hello",null);
unsafemap.put(null, null);
unsafemap.put("world","hello");
unsafemap.put("foo","hello");
unsafemap.put("bar","hello");
unsafemap.put("john","hello");
unsafemap.put("doe","hello");
System.out.println("changing null values");
for(Iterator<Map.Entry<String,String>> i=unsafemap.entrySet().iterator();i.hasNext();){
Map.Entry<String,String> e=i.next();
System.out.println("key : "+e.getKey()+" value :"+e.getValue());
if(e.getValue() == null){
//why is the below line not throwing ConcurrentModificationException
unsafemap.put(e.getKey(), "no data");
//same result, no ConcurrentModificationException thrown
e.setValue("no data");
}
//throws ConcurrentModificationException
unsafemap.put("testKey","testData");
}
System.out.println("---------------------------------");
for(Map.Entry<String,String> e :unsafemap.entrySet()){
System.out.println(e);
}
Modifying the map during iteration always results in an exception, if not done using the iterator e.g. iterator.remove(). So obviously adding a new value during iteration is throwing the exception as expected but why is it not thrown if the value of a particular key/value pair is modified?
The Entry object already exists in your first case, so the value will just be modified using e.value = value; and return and no new Entry will be made. So, no exception here.
In second case, changes done to the value object really don't affect the map, so no exception there.
From HashMap source code:
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
public V put(K key, V value) {
Entry<K,V> t = root;
if (t == null) {
compare(key, key); // type (and possibly null) check
root = new Entry<>(key, value, null);
size = 1;
modCount++;
return null;
}
int cmp;
...
}
final int compare(Object k1, Object k2) {
return comparator==null ? ((Comparable<? super K>)k1).compareTo((K)k2)
: comparator.compare((K)k1, (K)k2);
}
After facing some bug in my application, I had to debug TreeMaps put method. My issue was in comparing objects that were put in the map. What is odd, is that when I put FIRST element to the Map, it key gets compared with itself. I can't understand why would it work like that. Any insights (besides the commented "type (and possibly null) check")? Why wouldn't they just check if key was null? What kind of "type" check is made out there and what for?
As mentioned in the comment, https://bugs.openjdk.java.net/browse/JDK-5045147 is the issue where this was introduced. From the discussion in that issue, the original fix was the following:
BT2:SUGGESTED FIX
Doug Lea writes:
"Thanks! I have a strong sense of deja vu that I've
added this before(!) but Treemap.put should have the
following trap added."
public V put(K key, V value) {
Entry<K,V> t = root;
if (t == null) {
+ if (key == null) {
+ if (comparator == null)
+ throw new NullPointerException();
+ comparator.compare(key, key);
+ }
incrementSize();
root = new Entry<K,V>(key, value, null);
return null;
}
The intention seems to throw a NPE in case the comparator of the TreeMap is null, or the comparator does not accept null keys (which conforms to the API specification). It seems the fix was shortened to one line:
compare(key, key);
which is defined as:
#SuppressWarnings("unchecked")
final int compare(Object k1, Object k2) {
return comparator==null ? ((Comparable<? super K>)k1).compareTo((K)k2)
: comparator.compare((K)k1, (K)k2);
}
Hence this test will do both the null check and the type check, namely the cast to Comparable.
I believe this is the place where TreeMap< K,V > checks if K implements Comparable if no Comparator is supplied. You get a ClassCastException otherwise.
The following piece of code is used to add an element to a HashMap (from Android 5.1.1 source tree), I'm very confused this statement:int index = hash & (tab.length - 1);, how could this map assurance the same index when a duplicate key added with different tab.length?
For example, assume that we have a new empty HashMap hMap. Firstly, we add pair ("1","1") to it, assume tab.length equals 1 at this time, then we add many pairs to this map, assume tab.length equals "x", now we add a duplicate pair ("1","1") to it, notice that the tab.length is changed, so the index's value int index = hash & (tab.length - 1); may also changed.
/**
* Maps the specified key to the specified value.
*
* #param key
* the key.
* #param value
* the value.
* #return the value of any previous mapping with the specified key or
* {#code null} if there was no such mapping.
*/
#Override public V put(K key, V value) {
if (key == null) {
return putValueForNullKey(value);
}
int hash = Collections.secondaryHash(key);
HashMapEntry<K, V>[] tab = table;
int index = hash & (tab.length - 1);
for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
if (e.hash == hash && key.equals(e.key)) {
preModify(e);
V oldValue = e.value;
e.value = value;
return oldValue;
}
}
// No entry for (non-null) key is present; create one
modCount++;
if (size++ > threshold) {
tab = doubleCapacity();
index = hash & (tab.length - 1);
}
addNewEntry(key, value, hash, index);
return null;
}
When table need to reconstruct, it will first re-computing the index of older element, so the index will follow the changes of table's length.
I am looking for some better insight on hashtable/hash-map data-structure.
By going through the api I could make out that inner Entry class is referrred to as bucket. Please correct me if I am wrong.
Please find the following method:-
public synchronized V put(K key, V value) {
// Make sure the value is not null
if (value == null) {
throw new NullPointerException();
}
// Makes sure the key is not already in the hashtable.
Entry tab[] = table;
int hash = hash(key);
int index = (hash & 0x7FFFFFFF) % tab.length;
for (Entry<K,V> e = tab[index] ; e != null ; e = e.next) {
if ((e.hash == hash) && e.key.equals(key)) {
V old = e.value;
e.value = value;
return old;
}
}
modCount++;
if (count >= threshold) {
// Rehash the table if the threshold is exceeded
rehash();
tab = table;
hash = hash(key);
index = (hash & 0x7FFFFFFF) % tab.length;
}
// Creates the new entry.
Entry<K,V> e = tab[index]; <-------are we assigining null to this entry?
tab[index] = new Entry<>(hash, key, value, e);
count++;
return null;
}
By the following line of code
Entry<K,V> e = tab[index];
I can assume that we are assigning null to this new entry object; Please correct me here also.
So my another question is :-
why are we not doing this directly
Entry<K,V> e = null
instead of
Entry<K,V> e = tab[index];
Please find below is the screen shot for the debug also:-
Please share your valuable insights on this.
Entry<K,V> is an instance that can represent a link in a linked list. Note that the next member refers to the next Entry on the list.
A bucket contains a linked list of entries that were mapped to the same index.
Entry<K,V> e = tab[index] will return null only if there's no Entry stored in that index yet. Otherwise it will return the first Entry in the linked list of that bucket.
tab[index] = new Entry<>(hash, key, value, e); creates a new Entry and stores it as the first Entry in the bucket. The previous first Entry is passed to the Entry constructor, in order to become the next (second) Entry in the list.