Stuck on doubling size of hashtable - java

I can't figure out how to double the size of my hash table. Here is the code:
private void doubleLength () {
//Remember the old hash table array and allocate a new one 2 times as big
HashMap<K,V> resizedMap = new HashMap<K,V>(map.length * 2);
/*Traverse the old hash table adding each value to the new hash table.
Instead, add it to by applying
hashing/compression again (compression will be DIFFERENT, because the
length of the table is doubled, so we compute the same hash value but
compute the remainder using the DIFFERENT TABLE LENGTH).*/
for (int i = 0; i < map.length; i++) {
for (K key : map[i].entry) { //iterator does not work here
resizedMap.put(key, map[i].get(key)); //should go here
}
}
The hash table is an array of LN objects where LN is defined by:
public static class LN<K1,V1> {
public Map.Entry<K1,V1> entry;
public LN<K1,V1> next;
public LN (Map.Entry<K1,V1> e, LN<K1,V1> n)
{entry = e; next = n;}
}
I have an iterable within my class but it doesn't allow for map[i].entry.entries().
public Iterable<Map.Entry<K,V>> entries () {
return new Iterable<Map.Entry<K,V>>() {
public Iterator<Map.Entry<K,V>> iterator() {
return new MapEntryIterator();
}
};
}
I'm very lost on how I can double the size of public LN[] map;

The HashMap already resizes itself when the hash table gets too full. You do not have to resize it.

Your code will not compile. If you want to initialize a map to double the size it is easier to do this (assuming map is a Map also):
private void doubleLength () {
//Remember the old hash table array and allocate a new one 2 times as big
HashMap<K,V> resizedMap = new HashMap<K,V>(map.size()* 2);
resizedMap.putAll(map);
}
Also you seem to accessing things strangely. If you needed to loop through a map it shoudl look like:
for (K key : map.keySet()){
V value = map.get(key); //get the value from the map
//do what you need to do
}
Now as already stated, you do not need to resize the HashMap. It already does that. From the JavaDocs:
An instance of HashMap has two parameters that affect its performance:
initial capacity and load factor. The capacity is the number of
buckets in the hash table, and the initial capacity is simply the
capacity at the time the hash table is created. The load factor is a
measure of how full the hash table is allowed to get before its
capacity is automatically increased. When the number of entries in the
hash table exceeds the product of the load factor and the current
capacity, the hash table is rehashed (that is, internal data
structures are rebuilt) so that the hash table has approximately twice
the number of buckets.

Related

Re-hashing a hash map inside put method

I'm trying to implement a separate-chaining hash map in Java. Inside the put()-method I want to re-hash the map if the load factor( nr-of-elements/size-of-array) gets to large. For this I have written another method rehash() that rehashes the list by doubling the size of the array/capacity and then adding all the entries again (atleast this is what I want it to do). The problem is that when I test it I get an "java.lang.OutOfMemoryError: Java heap space" and I'm guessing this is since I'm calling the put() method inside the rehash() method as well. The problem is that I don't really know how to fix this. I wonder if someone can check my code and give me feedback or give me a hint on how to proceed.
The Entry<K,V> in the code below is a nested private class in the hash map class.
Thanks in advance!
The put()-method:
public V put(K key,V value) {
int idx = key.hashCode()%capacity; //Calculate index based on hash code.
if(idx<0) {
idx+=this.capacity; //if index is less than 0 add the length of the array table
}
if(table[idx]==null) { //If list at idx is empty just add the Entry-node
table[idx] = new Entry<K,V>(key,value);
nr_of_keys +=1;
if(this.load()>=this.load_factor) { //Check if load-factor is greater than maximum load. If this is the case rehash.
rehash();
}
return null;
} else {
Entry<K,V> p = table[idx]; //dummy pointer
while(p.next!=null) { //while next node isn't null move the pointer forward
if(p.getKey().equals(key)) { //if key matches:
if(!p.getValue().equals(value)) { //if value don't match replace the old value.
V oldVal = p.getValue();
p.setValue(value);
return oldVal;
}
} else {
p=p.next;
}
}
if(p.getKey().equals(key)) { //if the key of the last node matches the given key:
if(!p.getValue().equals(value)) {
V oldVal = p.getValue();
p.setValue(value);
return oldVal;
} else {
return null;
}
}
p.next = new Entry<K,V>(key,value); //key doesn't exist so add (key,value) at the end of the list.
nr_of_keys +=1;
if(this.load()>=this.load_factor) { //if load is to large rehash()
rehash();
}
return null;
}
}
Rehash()-method:
public void rehash() {
Entry<K,V>[] tmp = table; //create temporary table
int old_capacity = this.capacity; //store old capacity/length of array.
this.capacity = 2*capacity; //New capacity is twice as large
this.nr_of_keys=0; //reset nr. of keys to zero.
table = (Entry<K, V>[]) new Entry[capacity]; //make this.table twice as large
for(int i=0; i<old_capacity;i++) { //go through the array
Entry<K,V> p = tmp[i]; //points to first element of list at position i.
while(p!=null) {
put(p.getKey(), p.getValue());
p=p.next;
}
}
}
The load()-method:
public double load() {
return((double) this.size())/((double)this.capacity);
}
where size() returns the number of (key,value) pairs in the map and capacity is the size of the array table (where the linked lists are stored).
Once you rehash your map nothing will be the same. The buckets the entry sets, etc.
So.
create your temporary table.
get the values normally using your current get methods.
then create new buckets based on rehashing to the new bucket size, with the new capacity and add to the table. (DO NOT USE PUT).
Then replace the existing table with the just created one. Make certain that all values pertinent to the new table size are also changed such as bucket selection methods based on threhholds, capcity, etc.
Finally use print statements to track the new buckets and the movement of items between buckets.
You have added the rehash(), but there is still the load() implemetation missing (or inside load, the size()).
The pattern looks clear though, and allows a guess, waiting for this additional info.
You tell us that when the load factor reaches a certain point inside a put, you rehash. That rehash doubles the internal array and calls put again. And in the end you have no memory.
Where, my bet would be there is some subtle or not-so-subtle recursion taking place where you put, it rehashes by doubling the memory usage, then re-puts, which somehow creates a rehashing...
A first possiblity would be that there is some internal variables tracking the array's state that are not properly reset (e.g. number of occupied entries, ...). Confusing the "old" array data with that of the new being built would a likely culprit.
Another possiblity is with your put implementation, but it would require a step by step debug - which I'd advise you to perform.

HashMap bucket is unused when we have Link list of Entry object which same HashCode and Difference Equals

I have a Person class whose hashCode will always return same int value e.g. 101 and equals always return false i.e.
#Override
public int hashCode() {
return 101;
}
#Override
public boolean equals(Object obj) {
return false;
}
Now, I have put 100 Person object in a HashMap i.e.
Map<Person, Integer> personMap = new HashMap<Person, Integer>();
for(int i = 1; i<=100; i++){
Person p = new Person();
p.setId(i);
personMap.put(p, i);
}
HashMap put method will create the Entry object everytime, since the hashCode is same for all the Person object and equals return false, Entry will maintain the Linked List of similar HashCode, but with that we have a increase in HashMap size with the bucket size. Now I am wondering why we are increasing the HashMap size and bucket when Entry itself is maintaining the Linked List?
Seeking for example and explaination which justify why HashMap need to do so.
We are increasing the HashMap capacity to maintain a low average number of Entries per bucket, which allows us to put and get entries to/from the HashMap in expected constant time (i.e O(1)).
Your example is a bad usage of HashMap, since you force all the entries into the same bucket, but in normal use cases, most buckets will have 0 or 1 entries, and few buckets will have more than 1 (assuming a low load factor is used).
The number of buckets in the HashMap is increased when the total number of Entries in the HashMap reaches capacity * loadFactor, where capacity is the current number of buckets. Therefore, if loadFactor < 1 (the default is 0.75), each bucket will contain less than 1 Entry on average.
As the number of entries to the Map increases the Load Factor of the map comes into picture to maintain a effective distribution of the keys withing the Map. Remember that the HashMap does NOT know your implementation of the hashcode function and Load Factor works on the effective size of the Map.
HashMap simply not designed for this use case. It assume that hash codes will be randomly distributed.

How does put() method of LinkedHashMap internally works?

I am trying to understand a internal working of linkedhashmap.
So when we call put(map,key) of linkedhashmap. Internally it calls [createEntry][1] .
void createEntry(int hash, K key, V value, int bucketIndex) {
440 HashMap.Entry<K,V> old = table[bucketIndex];
441 Entry<K,V> e = new Entry<K,V>(hash, key, value, old);
442 table[bucketIndex] = e;
443 e.addBefore(header);
444 size++;
445 }
Here I am not able to understand the use of old variable.
Why new entry is added before the header. It should be added to the end of linkedhashmap.
Can somebody explain this code.
Why new entry is added before the header.
It is simpler to implement this way. It is the same for HashMap. This is the collision linked list, not the one the iterator uses.
It should be added to the end of linkedhashmap.
Where does it say that?
First know how HashMap put() method works:
Adding key-value pair to the map using put method:
public V put(K var1, V var2) {
return this.putVal(hash(var1), var1, var2, false, true);
}
Here we pass the key - K var1 and value – V var2.
The put() method calls the putVal() method internally and putVal() takes the following arguments.
hash(var1) - hashCode of the key
var1 – key
var2 – value
boolean false value
boolean true value
putVal() first checks for the HashMap table and if the table is null or empty then calls the resize() method. And the resize() method creates a new table for HashMap.
Resize() method call – initially the table is null. So for the first time a new table is created with the default size. If an initial capacity was set using HashMap(int initialCapacity) or HashMap(int initialCapacity, float loadFactor) constructor in threshold field then initial capacity is set according to threshold.
Otherwise it picks the default initial capacity for the table size.
Default initial capacity of a HashMap is 16 which is defined using the below HashMap field.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
In case initial capacity is defined :
int oldThr = threshold; //Threshold is being set in oldThr local field
And then depending on the below condition, in case threshold is greater than 0, it is being set as newCap(new capacity for the table)
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
Otherwise using default:
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
New table is created using the new capacity :
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
In other cases it checks for the MAXIMUM_CAPACITY, if the current capacity is more than MAXIMUM_CAPACITY then it resizes the table.
After return to putVal:
Then if the index value [(n - 1) & hash], table bucket is empty, the newNode() method is called and a newly create entry node is placed at the calculated index in the HashMap table.
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
Then checks for the hash value index in the hash table if there is no already existing node then it created a new Node which contains the hashCode, Key, Value and next node reference in that hash table bucket. Initially it will be null, when a new node will be added to the same hash index, then a LinkedList will be formed as the new node with the same hashcode index will be appended next to the previously added node in a linked list manner using the next field reference.
To know more about HashMap go through :
http://techmastertutorial.in/java-collection-internal-hashmap.html
LinkedHashMap put() method:
Adding a key-value pair is same as HashMap, in case of LinkedHash map we are keeping additional details with the help of the before and after references.
Since we add an entry for each key value pair so while adding the entry, updating the before and after references also. A doubly linked-list is formed with the before and after references. Using the before and after pointers we can traverse the entries in the order of insertion.
We can start at the head node ( the first node ) added to the LinkedHashMap and traverse using the after pointer until after is null.
LinkedHashMap Internal Working :
http://techmastertutorial.in/java-collection-internal-linked-hashmap.html

How does the get(key) work even after the hashtable has grown in size!

If a Hashtable is of size 8 originally and we hit the load factor and it grows double the size. How is get still able to retrieve the original values ... so say we have a hash function key(8) transforms into 12345 as the hash value which we mod by 8 and we get the index 7 ... now when the hash table size grows to 16 ...for key(8) we get 12345 .. if we mod it by 16 we will get a different answer! So how do i still retrieve the original key(8)
This isn't Java specific - when a hash table grows (in most implementations I know of), it has to reassess the keys of all hashed objects, and place them into their new, correct bucket based on the number of buckets now available.
This is also why resizing a hashtable is generally considered to be an "expensive" operation (compared to many others) - because it has to visit all of the stored items within it.
The hash value used to look up the value comes from the key object itself, not the container.
That's why objects uses as keys in a Map must be immutable. If the hashCode() changes, you won't be able to find your key or value again.
It is all implementation dependent, but a rehash will occur when it is necessary.
Take a look at the source for the HashMap class, in the transfer() method, which is called by the resize() method.
/**
* Transfers all entries from current table to newTable.
*/
void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) {
Entry<K,V> e = src[j];
if (e != null) {
src[j] = null;
do {
Entry<K,V> next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
}
In this HashTable implementation you can follow exactly how each entry is stored in the new (twice as big) storage array. The capacity of the new array is used in determining which slot each item will be stored. The hashcode of the keys does not change (it is in fact not even recomputed, but retrieved from the public field named hash in each Entry object, where it is stored), what changes is the result of the indexFor() call:
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}
which takes the hash code and the new storage array's length and returns the index in the new array.
So a client's new call to get() will go through the same indexFor() call, which will also use the new storage array's length, and all will be well.

Java HashSet vs HashMap

I understand that HashSet is based on HashMap implementation but is used when you need unique set of elements. So why in the next code when putting same objects into the map and set we have size of both collections equals to 1? Shouldn't map size be 2? Because if size of both collection is equal I don't see any difference of using this two collections.
Set testSet = new HashSet<SimpleObject>();
Map testMap = new HashMap<Integer, SimpleObject>();
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println(testSet.size());
System.out.println(testMap.size());
The output is 1 and 1.
SimpleObject code
public class SimpleObject {
private String dataField1;
private int dataField2;
public SimpleObject(){}
public SimpleObject(String data1, int data2){
this.dataField1 = data1;
this.dataField2 = data2;
}
public String getDataField1() {
return dataField1;
}
public int getDataField2() {
return dataField2;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((dataField1 == null) ? 0 : dataField1.hashCode());
result = prime * result + dataField2;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SimpleObject other = (SimpleObject) obj;
if (dataField1 == null) {
if (other.dataField1 != null)
return false;
} else if (!dataField1.equals(other.dataField1))
return false;
if (dataField2 != other.dataField2)
return false;
return true;
}
}
The map holds unique keys. When you invoke put with a key that exists in the map, the object under that key is replaced with the new object. Hence the size 1.
The difference between the two should be obvious:
in a Map you store key-value pairs
in a Set you store only the keys
In fact, a HashSet has a HashMap field, and whenever add(obj) is invoked, the put method is invoked on the underlying map map.put(obj, DUMMY) - where the dummy object is a private static final Object DUMMY = new Object(). So the map is populated with your object as key, and a value that is of no interest.
A key in a Map can only map to a single value. So the second time you put in to the map with the same key, it overwrites the first entry.
In case of the HashSet, adding the same object will be more or less a no-op. In case of a HashMap, putting a new key,value pair with an existing key will overwrite the existing value to set a new value for that key. Below I've added equals() checks to your code:
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
//If the below prints true, the 2nd add will not add anything
System.out.println("Are the objects equal? " , (simpleObject1.equals(simpleObject2));
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
//This is a no-brainer as you've the exact same key, but lets keep it consistent
//If this returns true, the 2nd put will overwrite the 1st key-value pair.
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println("Are the keys equal? ", (key.equals(key));
System.out.println(testSet.size());
System.out.println(testMap.size());
I just wanted to add to these great answers, the answer to your last dilemma. You wanted to know what is the difference between these two collections, if they are returning the same size after your insertion. Well, you can't really see the difference here, because you are inserting two values in the map with the same key, and hence changing the first value with the second. You would see the real difference (among the others) should you have inserted the same value in the map, but with the different key. Then, you would see that you can have duplicate values in the map, but you can't have duplicate keys, and in the set you can't have duplicate values. This is the main difference here.
Answer is simple because it is nature of HashSets.
HashSet uses internally HashMap with dummy object named PRESENT as value and KEY of this hashmap will be your object.
hash(simpleObject1) and hash(simplObject2) will return the same int. So?
When you add simpleObject1 to hashset it will put this to its internal hashmap with simpleObject1 as a key. Then when you add(simplObject2) you will get false because it is available in the internal hashmap already as key.
As a little extra info, HashSet use effectively hashing function to provide O(1) performance by using object's equals() and hashCode() contract. That's why hashset does not allow "null" which cannot be implemented equals() and hashCode() to non-object.
I think the major difference is,
HashSet is stable in the sense, it doesn't replace duplicate value (if found after inserting first unique key, just discard all future duplicates), and HashMap will make the effort to replace old with new duplicate value. So there must be overhead in HashMap of inserting new duplicate item.
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, Serializable
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set
More Details

Categories

Resources