Re-hashing a hash map inside put method - java

I'm trying to implement a separate-chaining hash map in Java. Inside the put()-method I want to re-hash the map if the load factor( nr-of-elements/size-of-array) gets to large. For this I have written another method rehash() that rehashes the list by doubling the size of the array/capacity and then adding all the entries again (atleast this is what I want it to do). The problem is that when I test it I get an "java.lang.OutOfMemoryError: Java heap space" and I'm guessing this is since I'm calling the put() method inside the rehash() method as well. The problem is that I don't really know how to fix this. I wonder if someone can check my code and give me feedback or give me a hint on how to proceed.
The Entry<K,V> in the code below is a nested private class in the hash map class.
Thanks in advance!
The put()-method:
public V put(K key,V value) {
int idx = key.hashCode()%capacity; //Calculate index based on hash code.
if(idx<0) {
idx+=this.capacity; //if index is less than 0 add the length of the array table
}
if(table[idx]==null) { //If list at idx is empty just add the Entry-node
table[idx] = new Entry<K,V>(key,value);
nr_of_keys +=1;
if(this.load()>=this.load_factor) { //Check if load-factor is greater than maximum load. If this is the case rehash.
rehash();
}
return null;
} else {
Entry<K,V> p = table[idx]; //dummy pointer
while(p.next!=null) { //while next node isn't null move the pointer forward
if(p.getKey().equals(key)) { //if key matches:
if(!p.getValue().equals(value)) { //if value don't match replace the old value.
V oldVal = p.getValue();
p.setValue(value);
return oldVal;
}
} else {
p=p.next;
}
}
if(p.getKey().equals(key)) { //if the key of the last node matches the given key:
if(!p.getValue().equals(value)) {
V oldVal = p.getValue();
p.setValue(value);
return oldVal;
} else {
return null;
}
}
p.next = new Entry<K,V>(key,value); //key doesn't exist so add (key,value) at the end of the list.
nr_of_keys +=1;
if(this.load()>=this.load_factor) { //if load is to large rehash()
rehash();
}
return null;
}
}
Rehash()-method:
public void rehash() {
Entry<K,V>[] tmp = table; //create temporary table
int old_capacity = this.capacity; //store old capacity/length of array.
this.capacity = 2*capacity; //New capacity is twice as large
this.nr_of_keys=0; //reset nr. of keys to zero.
table = (Entry<K, V>[]) new Entry[capacity]; //make this.table twice as large
for(int i=0; i<old_capacity;i++) { //go through the array
Entry<K,V> p = tmp[i]; //points to first element of list at position i.
while(p!=null) {
put(p.getKey(), p.getValue());
p=p.next;
}
}
}
The load()-method:
public double load() {
return((double) this.size())/((double)this.capacity);
}
where size() returns the number of (key,value) pairs in the map and capacity is the size of the array table (where the linked lists are stored).

Once you rehash your map nothing will be the same. The buckets the entry sets, etc.
So.
create your temporary table.
get the values normally using your current get methods.
then create new buckets based on rehashing to the new bucket size, with the new capacity and add to the table. (DO NOT USE PUT).
Then replace the existing table with the just created one. Make certain that all values pertinent to the new table size are also changed such as bucket selection methods based on threhholds, capcity, etc.
Finally use print statements to track the new buckets and the movement of items between buckets.

You have added the rehash(), but there is still the load() implemetation missing (or inside load, the size()).
The pattern looks clear though, and allows a guess, waiting for this additional info.
You tell us that when the load factor reaches a certain point inside a put, you rehash. That rehash doubles the internal array and calls put again. And in the end you have no memory.
Where, my bet would be there is some subtle or not-so-subtle recursion taking place where you put, it rehashes by doubling the memory usage, then re-puts, which somehow creates a rehashing...
A first possiblity would be that there is some internal variables tracking the array's state that are not properly reset (e.g. number of occupied entries, ...). Confusing the "old" array data with that of the new being built would a likely culprit.
Another possiblity is with your put implementation, but it would require a step by step debug - which I'd advise you to perform.

Related

Is there a hashmap implementation that uses a caching scheme?

I swear that in the past I had seen something about a hashmap implementation using some type of caching but when I was reading up today on how hashmaps are implemented in java, it was just simply a table with linked lists. Let me go deeper in what I mean.
From what I read today, hashmap in java is essentially like so
There exists an array simply called "table" where each index of the array is the hashcode. The value of the array is the first element of the linked list for that hashcode.
When you try to retrieve an object from the hashmap using the key, the key is transformed into a hashcode which is applied as the index of the "table" to then go to the linkedlist to iterate and find the correct object corresponding to the key.
But what I had read before was something different than this. What I had read was when you retrieve an object from the hashmap using the key, the corresponding bucket is cached so that when you retrieve another object from the hashmap from the same bucket, you are using the cache. But when you retrieve an object from a different bucket, the other bucket is cached instead.
Did I just completely misunderstand something in the past and invent something in my head, or is there something like this that might have confused me?
Never heard of that.
First of all: the "table" array has only a certain size so the hascode is not used directly. From the HashMap source tab[i = (n - 1) & hash]
Maybee you are mixing it up with the LinkedHashMap which keeps track of the element accesses:
void afterNodeAccess(Node<K,V> e) { // move node to last
LinkedHashMap.Entry<K,V> last;
if (accessOrder && (last = tail) != e) {
LinkedHashMap.Entry<K,V> p =
(LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after;
p.after = null;
if (b == null)
head = a;
else
b.after = a;
if (a != null)
a.before = b;
else
last = b;
if (last == null)
head = p;
else {
p.before = last;
last.after = p;
}
tail = p;
++modCount;
}
}
This behavior is usefull if you are implementing a LRU (Least recently used) cache. A LRU cache removes the elements that have not been requested for the longest period, once the cache reached its maximum size.

Implementing a put (or add) method for a Map/Associative Array Data Structure

I am given a Map Data Structure with the Generic Types (K,V) where the K represents the key, and the V represents the value.
Now I am asked to implement the classic put(K key, V value) method, which comes along with the collection. The method is pre-implemented, and overrides the one from an interface which is used:
#Override
public void put(K key, V value) {
for (int i = 0; i <= this.entries.length; i++) {
if (this.entries[i] == null || this.entries[i].getKey().equals(key)) {
this.entries[i] = new Entry<K, V>(key, value);
return;
} else {
this.entries = GenericArrayHelper.copyArrayWithIncreasedSize(this.entries, (this.entries.length) * 2);
/* replace [...]
this.entries[...] = new Entry<K, V>(key, value);
*/
}
}
}
With the exception of the part that's commented out, where I would need to replace the [...] with the correct position of the array. Namely:
If an Entry<K,V> at index i in the map is null, or the key of the entry at index i equals the key that was passed over, replace the entry with a new entry containing the key and value that are passed, and finish the method.
If such an Entry<K,V> cannot be found, the map (which is an array of the form entries<K,V>[ ]) which the method is called on, shall be copied in full and its array size shall be doubled (with the aiding method GenericArrayHelper.copyArrayWithIncreasedSize). Successively, at the very first vacant slot of the copied and resized array, put in a new Entry with the passed key and value.
And this is where the confusion arose. I have tried to replace [...] with all kinds of indices, but have never gotten a satisfactory result.
When I try to put in these several entries, with putInside being a Map:
putInside.put("sizeInMB", 42);
putInside.put("version", 4);
putInside.put("yearOfRelease", 2015);
I shall get the following result, when printing ("stringified" with a separate toString method):
yearOfRelease : 2015
sizeInMB : 42
version : 4
yearOfRelease : 2015
sizeInMB : 42
version : 4
but when I, say, use the array index of entries.length-1 for [...], which is the closest I got to after hours of trying, and watch the debugger, it looks abysmal:
with the first three entries being correct, but the other three getting mashed up... and when I print the whole thing I merely get an output of the first tree entries, since the other three seem to be ignored completely (perhaps because the larger arrays in the for loop are merely defined in the loop itself?)
My question is: How do I define a suitable replacement for the index [...], or, maybe also for better understanding: Why on earth would we need to double the size of the array first? I have never implemented any Java Collection data structure before, and also haven't taken the data structures class at my uni yet... How do I get to a "doubled" output?
Any form of help would be really really appreciated!
EDIT: To make things clearer, here is the toString method:
public static void toString(Map<String, Integer> print) {
for(String key : print.keysAsSet()){
System.out.println(key + ": " + print.getValueFor(key));
}
}
with keysAsSet() returning a HashSet of the Map, which merely contains the keys, but not the values.
public Set<K> keysAsSet() {
HashSet<K> current = new HashSet<>();
for(Entry<K,V> entry : entries){
if(entry != null) {
current.add(entry.getKey());
}
}
return current;
}
The code doesn’t match the description. To wit, you are resizing the array inside the loop, unless the very first element in entries is a hit (i.e. it’s null or it equals the key).
You need to put the array resizing after the loop (plus fix the loop condition check):
#Override
public void put(K key, V value) {
int oldLength = this.entries.length;
for (int i = 0; i < oldLength; i++) {
if (this.entries[i] == null || this.entries[i].getKey().equals(key)) {
this.entries[i] = new Entry<K, V>(key, value);
return;
}
}
this.entries = GenericArrayHelper.copyArrayWithIncreasedSize(this.entries, oldLength * 2);
this.entries[oldLength] = new Entry<K, V>(key, value);
}
… I assume you’re aware that this is a very inefficient map implementation: each search and insertion take O(n) tries. In reality you’d either use a hash table or some sort of search tree to speed up insertion and lookup.

Stuck on doubling size of hashtable

I can't figure out how to double the size of my hash table. Here is the code:
private void doubleLength () {
//Remember the old hash table array and allocate a new one 2 times as big
HashMap<K,V> resizedMap = new HashMap<K,V>(map.length * 2);
/*Traverse the old hash table adding each value to the new hash table.
Instead, add it to by applying
hashing/compression again (compression will be DIFFERENT, because the
length of the table is doubled, so we compute the same hash value but
compute the remainder using the DIFFERENT TABLE LENGTH).*/
for (int i = 0; i < map.length; i++) {
for (K key : map[i].entry) { //iterator does not work here
resizedMap.put(key, map[i].get(key)); //should go here
}
}
The hash table is an array of LN objects where LN is defined by:
public static class LN<K1,V1> {
public Map.Entry<K1,V1> entry;
public LN<K1,V1> next;
public LN (Map.Entry<K1,V1> e, LN<K1,V1> n)
{entry = e; next = n;}
}
I have an iterable within my class but it doesn't allow for map[i].entry.entries().
public Iterable<Map.Entry<K,V>> entries () {
return new Iterable<Map.Entry<K,V>>() {
public Iterator<Map.Entry<K,V>> iterator() {
return new MapEntryIterator();
}
};
}
I'm very lost on how I can double the size of public LN[] map;
The HashMap already resizes itself when the hash table gets too full. You do not have to resize it.
Your code will not compile. If you want to initialize a map to double the size it is easier to do this (assuming map is a Map also):
private void doubleLength () {
//Remember the old hash table array and allocate a new one 2 times as big
HashMap<K,V> resizedMap = new HashMap<K,V>(map.size()* 2);
resizedMap.putAll(map);
}
Also you seem to accessing things strangely. If you needed to loop through a map it shoudl look like:
for (K key : map.keySet()){
V value = map.get(key); //get the value from the map
//do what you need to do
}
Now as already stated, you do not need to resize the HashMap. It already does that. From the JavaDocs:
An instance of HashMap has two parameters that affect its performance:
initial capacity and load factor. The capacity is the number of
buckets in the hash table, and the initial capacity is simply the
capacity at the time the hash table is created. The load factor is a
measure of how full the hash table is allowed to get before its
capacity is automatically increased. When the number of entries in the
hash table exceeds the product of the load factor and the current
capacity, the hash table is rehashed (that is, internal data
structures are rebuilt) so that the hash table has approximately twice
the number of buckets.

How to randomly access data from a Java Iterator

I get an Iterator back from a class and I would like to get the xth element of that iterator. I know I could load it into an ArrayList in a loop or check a counter, but that seems inefficient/ugly. What is the best way to do this?
I thought something like,
List al = new ArrayList(myIterator);
myval = al.get(6);
But that constructor is undefined. thanks.
The definition of an Iterator does not allow arbitrary indexing to a position. That's just the way it's defined. If you need that capability you will have to load the data into an indexable collection.
The point of the Iterator interface is to allow sequential access to a collection without knowing anything about its size. Iterators can be forward-only or bi-directional (ListIterator). It's just one specific model for accessing elements of a collection. One advantage is that it implies nothing about the collection size, which could be much too large to fit completely into memory. By allowing only sequential access, the implementation is free to keep only part of the collection in memory at any given moment.
If you need to load the iterator contents into a list you need to do it yourself with a loop.
Nope, you've named the only approaches. You're going to end up needing a for loop to iterate to the appropriate position.
A few utility libraries have shortcuts to load an Iterator's contents into an ArrayList, or to get the nth element of an Iterator, but there's nothing built into the JDK.
As a workaround i have this method in my Utilities class
/**
* Retrieve the n position object of iterator
* #param iterator
* #param position
* #return Object at position
* #throws Exception
*/
public static Object iterateTo(Iterator iterator, int position) throws Exception{
if(iterator == null){
throw new Exception("iterator == null");
}
if(position < 0){
throw new Exception("position < 0");
}
while(iterator.hasNext() && position > 0){
position--;
iterator.next();
}
if(position != 0){
throw new Exception("position out of limit");
}
return iterator.next();
}
so instead of this
{
List al = new ArrayList(myIterator);
myval = al.get(6);
}
you will have to
{
Object myVal = Utilities.iterateTo(myIterator, 6);
}

How does the get(key) work even after the hashtable has grown in size!

If a Hashtable is of size 8 originally and we hit the load factor and it grows double the size. How is get still able to retrieve the original values ... so say we have a hash function key(8) transforms into 12345 as the hash value which we mod by 8 and we get the index 7 ... now when the hash table size grows to 16 ...for key(8) we get 12345 .. if we mod it by 16 we will get a different answer! So how do i still retrieve the original key(8)
This isn't Java specific - when a hash table grows (in most implementations I know of), it has to reassess the keys of all hashed objects, and place them into their new, correct bucket based on the number of buckets now available.
This is also why resizing a hashtable is generally considered to be an "expensive" operation (compared to many others) - because it has to visit all of the stored items within it.
The hash value used to look up the value comes from the key object itself, not the container.
That's why objects uses as keys in a Map must be immutable. If the hashCode() changes, you won't be able to find your key or value again.
It is all implementation dependent, but a rehash will occur when it is necessary.
Take a look at the source for the HashMap class, in the transfer() method, which is called by the resize() method.
/**
* Transfers all entries from current table to newTable.
*/
void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) {
Entry<K,V> e = src[j];
if (e != null) {
src[j] = null;
do {
Entry<K,V> next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
}
In this HashTable implementation you can follow exactly how each entry is stored in the new (twice as big) storage array. The capacity of the new array is used in determining which slot each item will be stored. The hashcode of the keys does not change (it is in fact not even recomputed, but retrieved from the public field named hash in each Entry object, where it is stored), what changes is the result of the indexFor() call:
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}
which takes the hash code and the new storage array's length and returns the index in the new array.
So a client's new call to get() will go through the same indexFor() call, which will also use the new storage array's length, and all will be well.

Categories

Resources