After I check to see if the load factor signals the backing array to be resized, how do I actually do the resizing with quadratic probing?
Here is the code.
It's only part of the class. Also, could you check if I'm implementing the add method correctly?
import java.util.*;
public class HashMap<K, V> implements HashMapInterface<K, V> {
// Do not make any new instance variables.
private MapEntry<K, V>[] table;
private int size;
/**
* Create a hash map with no entries.
*/
public HashMap() {
table = new MapEntry[STARTING_SIZE];
size = 0;
}
#Override
public V add(K key, V value) {
if (key == null || value == null) {
throw new IllegalArgumentException("Passed in null arguments.");
}
if (getNextLoadFactor() > MAX_LOAD_FACTOR) {
resize();
}
MapEntry<K, V> entry = new MapEntry<>(key, value);
V val = null;
int index = Math.abs(key.hashCode()) % table.length;
int temp = index;
int q = 1;
do {
if (table[index] == null) {
table[index] = entry;
} else if (table[index].getKey().equals(key)) {
val = table[index].getValue();
table[index].setValue(value);
}
index = index + q*q % table.length;
q++;
} while (temp != index);
size++;
return val;
}
private double getNextLoadFactor() {
return (double) size / (double) table.length;
}
private void resize() {
MapEntry<K, V>[] temp = table;
table = new MapEntry[table.length * 2 + 1];
for (int i = 0; i < table.length; i++) {
}
}
Following the following from wiki:
1. Get the key k
2. Set counter j = 0
3. Compute hash function h[k] = k % SIZE
4. If hashtable[h[k]] is empty
(4.1) Insert key k at hashtable[h[k]]
(4.2) Stop
Else
(4.3) The key space at hashtable[h[k]] is occupied, so we need to find the next available key space
(4.4) Increment j
(4.5) Compute new hash function h[k] = ( k + j * j ) % SIZE
(4.6) Repeat Step 4 till j is equal to the SIZE of hash table
5. The hash table is full
6. Stop
According to the above, it seems to me that there is a problem in your add method. Notice step (4.1) and (4.2): if table[index] == null, a position for the key has been found and you can stop. Your do will execute again, because right after the insert, you update the index, thus temp != index will be true.
You are also calculating the next index incorrectly, change
index = index + q*q % table.length;
to
index = (Math.abs(key.hashCode()) + q*q) % table.length;
The add will thus change to:
MapEntry<K, V> entry = new MapEntry<>(key, value);
V val = null;
int index = Math.abs(key.hashCode()) % table.length;
int q = 0;
while (table[(index = (Math.abs(key.hashCode()) + q*q++) % table.length)] != null);
table[index] = entry;
size++;
return val;
It can be proven that, if the table size b for b > 3 the first b/2 positions will be unique, so it is safe to assume that if the table is less than half full (b/2 - 1), you will find an empty position. This depends on your MAX_LOAD_FACTOR.
For resizing, you will need to rehash each value into the new table. This is due to your hash function using the size of the table as modulus. Your hash function has basically changed, so you need to create the new array of size + 1, and readd every element to the new array.
private void resize() {
MapEntry<K, V>[] temp = table;
table = new MapEntry[table.length * 2 + 1];
for (MapEntry<K, V> entry:temp) {
this.add(entry.getKey(), entry.getValue());
}
}
Note: I did not test this and only used the theory behind dynamic probing and hashtables to debug your code. Hope it helps!
Related
I'm working on a data structures assignment and my attempt to increment a double hash function is stuck in an infinite loop.
My book defines a strategy to double hash as
h′(k) = q−(k mod q), for some prime number q < N. Also, N
should be a prime.
I've identified that the double hash increment is causing the issue, as switching to linear probing runs fine.
private int findSlot(int h, K k) {
totalProbes = 0;
int avail = -1; // no slot available (thus far)
int j = h; // index while scanning table
do {
totalProbes++;
if (totalProbes > maxProbes) maxProbes = totalProbes;
if (isAvailable(j)) { // may be either empty or defunct
if (avail == -1) avail = j; // this is the first available slot!
if (table[j] == null) {
break;
} // if empty, search fails immediately
} else if (table[j].getKey().equals(k))
return j; // successful match
//j = (j + 1) % capacity; // keep looking (cyclically)
j = hashTwo(k); //increment using double hash
} while (j != h); // stop if we return to the start
return -(avail + 1); // search has failed
}
private int hashTwo(K key) {
String keyString = key.toString(); //convert generic -> string -> int
int keyInt = Integer.parseInt(keyString);
return 7 - (keyInt % 7);
}
There is some ugliness with the hash 2 function - namely converting from a generic to an integer, but besides that it follows the same instructions as the book.
1.error in your code: it has to be "j = hashTwo(j)"
2.error in the formula: for k=q 0> h′(k) = q−(k mod q) = q-0=q=k
(you better look https://en.wikipedia.org/wiki/Double_hashing for a valid formula)
Instead of "Integer.parseInt" you should start iteration with
private int findSlot(K k, int size) {
return findSlot(hashTwo(k.hashCode()), size);
}
private int findSlot(int h, int size) {...}
I am reading Java Shildt The Complete reference and I am wondering about one piece of code thats looks very simple, but I can't understand how it works.
// A generic interface example.
// A Min/Max interface.
interface MinMax<T extends Comparable<T>> {
T min();
T max();
}
// Now, implement MinMax
class MyClass<T extends Comparable<T>> implements MinMax<T> {
T[] vals;
MyClass(T[] o) {
vals = o;
}
// Return the minimum value in vals.
public T min() {
T v = vals[0];
for (int i = 1; i < vals.length; i++)
if (vals[i].compareTo(v) < 0) v = vals[i];
return v;
}
// Return the maximum value in vals.
public T max() {
T v = vals[0];
for (int i = 1; i < vals.length; i++)
if (vals[i].compareTo(v) > 0) v = vals[i];
return v;
}
}
class GenIFDemo {
public static void main(String args[]) {
Integer inums[] = {3, 6, 2, 8, 6};
Character chs[] = {'b', 'r', 'p', 'w'};
MyClass<Integer> iob = new MyClass<Integer>(inums);
MyClass<Character> cob = new MyClass<Character>(chs);
System.out.println("Max value in inums: " + iob.max());
System.out.println("Min value in inums: " + iob.min());
System.out.println("Max value in chs: " + cob.max());
System.out.println("Min value in chs: " + cob.min());
}
}
//The output is shown here:
//Max value in inums: 8
//Min value in inums: 2
//Max value in chs: w
//Min value in chs: b
I can't understand this one and its output:
// Return the maximum value in vals.
public T max() {
T v = vals[0];
for (int i = 1; i < vals.length; i++)
if (vals[i].compareTo(v) > 0) v = vals[i];
return v;
}
Why the output is 8, if according to the condition,
vals[1].compareTo(vals[0])(6>3) > 0 is already true,
so v = 6, not 8.
I can't understand how it finds the max and min value here..
Could you please explain it? Thanks!
Only for the iteration where i=1, vals[1].compareTo(vals[0]) comparison is done, in which case v=6. Consider the case i=3 for the given for-loop. Here vals[3], which is 8, is compared with the value of v (which is 6 since it was updated before). Since 8 is larger than 6, the value of v is updated to 8 in this iteration.
v holds the max value and starts with the first one.
The loop iterates over the remaining elements and sets the value to v if vals[i].compareTo(v) > 0 (which means vals[i] is greater than v).
So not only the first two elements are compared.
According to this reference,
a.compareTo(b)
returns a positive value if and only if a is to be regarded strictly larger than b, which means that
// Return the maximum value in vals.
public T max() {
T v = vals[0];
for (int i = 1; i < vals.length; i++)
if (vals[i].compareTo(v) > 0) v = vals[i];
return v;
}
actually determines the maximum where v is the candidate for the maximum. If vals[i].compareTo(v) is larger than zero, this means that v[i] is larger that the candidate for the maximum. Consequently, v ist assigned vals[i], as vals[i] is the new candidate for the maximum.
As in the title, I want to use Knuth-Fisher-Yates shuffle algorithm to select N random elements from a List but without using List.toArray and change the list. Here is my current code:
public List<E> getNElements(List<E> list, Integer n) {
List<E> rtn = null;
if (list != null && n != null && n > 0) {
int lSize = list.size();
if (lSize > n) {
rtn = new ArrayList<E>(n);
E[] es = (E[]) list.toArray();
//Knuth-Fisher-Yates shuffle algorithm
for (int i = es.length - 1; i > es.length - n - 1; i--) {
int iRand = rand.nextInt(i + 1);
E eRand = es[iRand];
es[iRand] = es[i];
//This is not necessary here as we do not really need the final shuffle result.
//es[i] = eRand;
rtn.add(eRand);
}
} else if (lSize == n) {
rtn = new ArrayList<E>(n);
rtn.addAll(list);
} else {
log("list.size < nSub! ", lSize, n);
}
}
return rtn;
}
It uses list.toArray() to make a new array to avoid modifying the original list. However, my problem now is that my list could be very big, can have 1 million elements. Then list.toArray() is too slow. And my n could range from 1 to 1 million. When n is small (say 2), the function is very in-efficient as it still need to do list.toArray() for a list of 1 million elements.
Can someone help improve the above code to make it more efficient when dealing with large lists. Thanks.
Here I assume Knuth-Fisher-Yates shuffle is the best algorithm to do the job of selecting n random elements from a list. Am I right? I would be very glad to if there is other algorithms better than Knuth-Fisher-Yates shuffle to do the job in terms of the speed and the quality of the results (guarantee real randomness).
Update:
Here is some of my test results:
When selection n from 1000000 elements.
When n<1000000/4 the fastest way to through using Daniel Lemire's Bitmap function to select n random id first then get the elements with these ids:
public List<E> getNElementsBitSet(List<E> list, int n) {
List<E> rtn = new ArrayList<E>(n);
int[] ids = genNBitSet(n, 0, list.size());
for (int i = 0; i < ids.length; i++) {
rtn.add(list.get(ids[i]));
}
return rtn;
}
The genNBitSet is using the code generateUniformBitmap from https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2013/08/14/java/UniformDistinct.java
When n>1000000/4 the Reservoir Sampling method is faster.
So I have built a function to combine these two methods.
You are probably looking for something like Resorvoir Sampling.
Start with an initial array with first k elements, and modify it with new elements with decreasing probabilities:
java like pseudo code:
E[] r = new E[k]; //not really, cannot create an array of generic type, but just pseudo code
int i = 0;
for (E e : list) {
//assign first k elements:
if (i < k) { r[i++] = e; continue; }
//add current element with decreasing probability:
j = random(i++) + 1; //a number from 1 to i inclusive
if (j <= k) r[j] = e;
}
return r;
This requires a single pass on the data, with very cheap ops every iteration, and the space consumption is linear with the required output size.
If n is very small compared to the length of the list, take an empty set of ints and keep adding a random index until the set has the right size.
If n is comparable to the length of the list, do the same, but then return items in the list that don't have indexes in the set.
In the middle ground, you can iterate through the list, and randomly select items based on how many items you've seen, and how many items you've already returned. In pseudo-code, if you want k items from N:
for i = 0 to N-1
if random(N-i) < k
add item[i] to the result
k -= 1
end
end
Here random(x) returns a random number between 0 (inclusive) and x (exclusive).
This produces a uniformly random sample of k elements. You could also consider making an iterator to avoid building the results list to save memory, assuming the list is unchanged as you're iterating over it.
By profiling, you can determine the transition point where it makes sense to switch from the naive set-building method to the iteration method.
Let's assume that you can generate n random indices out of m that are pairwise disjoint and then look them up efficiently in the collection. If you don't need the order of the elements to be random, then you can use an algorithm due to Robert Floyd.
Random r = new Random();
Set<Integer> s = new HashSet<Integer>();
for (int j = m - n; j < m; j++) {
int t = r.nextInt(j);
s.add(s.contains(t) ? j : t);
}
If you do need the order to be random, then you can run Fisher--Yates where, instead of using an array, you use a HashMap that stores only those mappings where the key and the value are distinct. Assuming that hashing is constant time, both of these algorithms are asymptotically optimal (though clearly, if you want to randomly sample most of the array, then there are data structures with better constants).
Just for convenience: A MCVE with an implementation of the Resorvoir Sampling proposed by amit (possible upvotes should go to him (I'm just hacking some code))
It seems like this is indeed a algorithm that nicely covers the cases of where the number of elements to select is low compared to the list size, and the cases where the number of elements is high compared to the list size (assumung that the properties about the randomness of the result that are stated on the wikipedia page are correct).
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Random;
import java.util.TreeMap;
public class ReservoirSampling
{
public static void main(String[] args)
{
example();
//test();
}
private static void test()
{
List<String> list = new ArrayList<String>();
list.add("A");
list.add("B");
list.add("C");
list.add("D");
list.add("E");
int size = 2;
int runs = 100000;
Map<String, Integer> counts = new TreeMap<String, Integer>();
for (int i=0; i<runs; i++)
{
List<String> sample = sample(list, size);
String s = createString(sample);
Integer count = counts.get(s);
if (count == null)
{
count = 0;
}
counts.put(s, count+1);
}
for (Entry<String, Integer> entry : counts.entrySet())
{
System.out.println(entry.getKey()+" : "+entry.getValue());
}
}
private static String createString(List<String> list)
{
Collections.sort(list);
StringBuilder sb = new StringBuilder();
for (String s : list)
{
sb.append(s);
}
return sb.toString();
}
private static void example()
{
List<String> list = new ArrayList<String>();
for (int i=0; i<26; i++)
{
list.add(String.valueOf((char)('A'+i)));
}
for (int i=1; i<=26; i++)
{
printExample(list, i);
}
}
private static <T> void printExample(List<T> list, int size)
{
System.out.printf("%3d elements: "+sample(list, size)+"\n", size);
}
private static final Random random = new Random(0);
private static <T> List<T> sample(List<T> list, int size)
{
List<T> result = new ArrayList<T>(Collections.nCopies(size, (T) null));
int i = 0;
for (T element : list)
{
if (i < size)
{
result.set(i, element);
i++;
continue;
}
i++;
int j = random.nextInt(i);
if (j < size)
{
result.set(j, element);
}
}
return result;
}
}
If n is way smaller then size, you could use this algorith, witch is unfortunatly quadratic with n, but doest depend on size of array at all.
Example with size = 100 and n = 4.
choose random number from 0 to 99, lets say 42, and add it to result.
choose random number from 0 to 98, lets say 39, and add it to result.
choose random number from 0 to 97, lets say 41, but since 41 is bigger or equal than 39, increment it by 1, so you have 42, but that is bigger then equal than 42, so you have 43.
...
Shortly, you choose from remaining numbers and then compuce what number have you acctualy chosen. I would use link list for this, but maybe there are better data structures.
Summarizing Changwang's update. If you want more than 250,000 items, use amit's answer. Otherwise use Knuth-Fisher-Yates Shuffle as shown in entirety here
NOTE: The result is always in the original order as well
public static <T> List<T> getNRandomElements(int n, List<T> list) {
List<T> subList = new ArrayList<>(n);
int[] ids = generateUniformBitmap(n, list.size());
for (int id : ids) {
subList.add(list.get(id));
}
return subList;
}
// https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2013/08/14/java/UniformDistinct.java
private static int[] generateUniformBitmap(int num, int max) {
if (num > max) {
DebugUtil.e("Can't generate n ints");
}
int[] ans = new int[num];
if (num == max) {
for (int k = 0; k < num; ++k) {
ans[k] = k;
}
return ans;
}
BitSet bs = new BitSet(max);
int cardinality = 0;
Random random = new Random();
while (cardinality < num) {
int v = random.nextInt(max);
if (!bs.get(v)) {
bs.set(v);
cardinality += 1;
}
}
int pos = 0;
for (int i = bs.nextSetBit(0); i >= 0; i = bs.nextSetBit(i + 1)) {
ans[pos] = i;
pos += 1;
}
return ans;
}
If you want them randomized, I use:
public static <T> List<T> getNRandomShuffledElements(int n, List<T> list) {
List<T> randomElements = getNRandomElements(n, list);
Collections.shuffle(randomElements);
return randomElements;
}
I needed something for this in C#, here's my solution which works on a generic List.
It selects N random elements of the list and places them at the front of the list.
So upon returning, the first N elements of the list are randomly selected. It is fast and efficient even when you're dealing with a very large number of elements.
static void SelectRandom<T>(List<T> list, int n)
{
if (n >= list.Count)
{
// n should be less than list.Count
return;
}
int max = list.Count;
var random = new Random();
for (int i = 0; i < n; i++)
{
int r = random.Next(max);
max = max - 1;
int irand = i + r;
if (i != irand)
{
T rand = list[irand];
list[irand] = list[i];
list[i] = rand;
}
}
}
I'm attempting to resize my hash table however; I am keep getting a NullPointerException.
I know if the size is greater than 0.75 then the table size has to double, if it's less than 0.50 then the table size is halved. So far I have this..
public boolean add(Object x)
{
int h = x.hashCode();
if (h < 0) { h = -h; }
h = h % buckets.length;
Node current = buckets[h];
while (current != null)
{
if (current.data.equals(x)) { return false; }
// Already in the set
current = current.next;
}
Node newNode = new Node();
newNode.data = x;
newNode.next = buckets[h];
buckets[h] = newNode;
currentSize++;
double factor1 = currentSize * load1; //load1 = 0.75
double factor2 = currentSize * load2; //load2 = 0.50
if (currentSize > factor1) { resize(buckets.length*2); }
if (currentSize < factor2) { resize(buckets.length/2); }
return true;
}
Example. Size = 3. Max Size = 5
if we take the Max Size and multiply by 0.75 we get 3.75.
this is the factor that says if we pass it the Max Size must double
so if we add an extra element into the table the size is 4 and is > 3.75 thus the new Max Size is 10.
However; once we increase the size, the hashcode will change with the addition of a new element, so we call resize(int newSize)
private void resize(int newLength)
{
//
HashSet newTable = new HashSet(newLength);
for (int i = 0; i < buckets.length; i++) {
newTable.add(buckets[i]);
}
}
Here is my constructor if the buckets[i] confuses anyone.
public HashSet(int bucketsLength)
{
buckets = new Node[bucketsLength];
currentSize = 0;
}
I feel that the logic is correct, unless my resize method is not retrieving the elements.
If that is all your code for resize(), then you are failing to assign newTable to a class attribute, i.e. your old table. Right now you fill it with data and then don't do anything with it, since it is defined inside resize and therefore not available outside of it.
So you end up thinking you have a larger table now, but in fact you are still using the old one ;-)
I was trying to do research on hashmap and came up with the following analysis:
https://stackoverflow.com/questions/11596549/how-does-javas-hashmap-work-internally/18492835#18492835
Q1 Can you guys show me a simple map where you can show the process..that how hashcode for the given key is calculated in detail by using this formula ..Calculate position hash % (arrayLength-1)) where element should be placed(bucket number), let say I have this hashMap
HashMap map=new HashMap();//HashMap key random order.
map.put("Amit","Java");
map.put("Saral","J2EE");
Q2 Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other objest with next field and so one. Last entry refers to null. Can you guys show me this with real example..!!
.
"Amit" will be distributed to the 10th bucket, because of the bit twiddeling. If there were no bit twiddeling it would go to the 7th bucket, because 2044535 & 15 = 7. how this is possible please explanin detail the whole calculation..?
Snapshots updated...
and the other image is ...
that how hashcode for the given key is calculated in detail by using
this formula
In case of String this is calculated by String#hashCode(); which is implemented as follows:
public int hashCode() {
int h = hash;
int len = count;
if (h == 0 && len > 0) {
int off = offset;
char val[] = value;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
Basically following the equation in the java doc
hashcode = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
One interesting thing to note on this implementation is that String actually caches its hash code. It can do this, because String is immutable.
If I calculate the hashcode of the String "Amit", it will yield to this integer:
System.out.println("Amit".hashCode());
> 2044535
Let's get through a simple put to a map, but first we have to determine how the map is built.
The most interesting fact about a Java HashMap is that it always has 2^n buckets. So if you call it, the default number of buckets is 16, which is obviously 2^4.
Doing a put operation on this map, it will first get the hashcode of the key. There happens some fancy bit twiddeling on this hashcode to ensure that poor hash functions (especially those that do not differ in the lower bits) don't "overload" a single bucket.
The real function that is actually responsible for distributing your key to the buckets is the following:
h & (length-1); // length is the current number of buckets, h the hashcode of the key
This only works for power of two bucket sizes, because it uses & to map the key to a bucket instead of a modulo.
"Amit" will be distributed to the 10th bucket, because of the bit twiddeling. If there were no bit twiddeling it would go to the 7th bucket, because 2044535 & 15 = 7.
Now that we have an index for it, we can find the bucket. If the bucket contains elements, we have to iterate over them and replace an equal entry if we find it.
If none item has been found in the linked list we will just add it at the beginning of the linked list.
The next important thing in HashMap is the resizing, so if the actual size of the map is above over a threshold (determined by the current number of buckets and the loadfactor, in our case 16*0.75=12) it will resize the backing array.
Resize is always 2 * the current number of buckets, which is guranteed to be a power of two to not break the function to find the buckets.
Since the number of buckets change, we have to rehash all the current entries in our table.
This is quite costly, so if you know how many items there are, you should initialize the HashMap with that count so it does not have to resize the whole time.
Q1: look at hashCode() method implementation for String object
Q2: Create simple class and implement its hashCode() method as return 1. That means each your object with that class will have the same hashCode and therefore will be saved in the same bucket in HashMap.
Understand that there are two basic requirements for a hash code:
When the hash code is recalculated for a given object (that has not been changed internally in a way that would alter its identity) it must produce the same value as the previous calculation. Similarly, two "identical" objects must produce the same hash codes.
When the hash code is calculated for two different objects (which are not considered "identical" from the standpoint of their internal content) there should be a high probability that the two hash codes would be different.
How these goals are accomplished is the subject of much interest to the math nerds who work on such things, but understanding the details is not at all important to understanding how hash tables work.
import java.util.Arrays;
public class Test2 {
public static void main(String[] args) {
Map<Integer, String> map = new Map<Integer, String>();
map.put(1, "A");
map.put(2, "B");
map.put(3, "C");
map.put(4, "D");
map.put(5, "E");
System.out.println("Iterate");
for (int i = 0; i < map.size(); i++) {
System.out.println(map.values()[i].getKey() + " : " + map.values()[i].getValue());
}
System.out.println("Get-> 3");
System.out.println(map.get(3));
System.out.println("Delete-> 3");
map.delete(3);
System.out.println("Iterate again");
for (int i = 0; i < map.size(); i++) {
System.out.println(map.values()[i].getKey() + " : " + map.values()[i].getValue());
}
}
}
class Map<K, V> {
private int size;
private Entry<K, V>[] entries = new Entry[16];
public void put(K key, V value) {
boolean flag = true;
for (int i = 0; i < size; i++) {
if (entries[i].getKey().equals(key)) {
entries[i].setValue(value);
flag = false;
break;
}
}
if (flag) {
this.ensureCapacity();
entries[size++] = new Entry<K, V>(key, value);
}
}
public V get(K key) {
V value = null;
for (int i = 0; i < size; i++) {
if (entries[i].getKey().equals(key)) {
value = entries[i].getValue();
break;
}
}
return value;
}
public boolean delete(K key) {
boolean flag = false;
Entry<K, V>[] entry = new Entry[size];
int j = 0;
int total = size;
for (int i = 0; i < total; i++) {
if (!entries[i].getKey().equals(key)) {
entry[j++] = entries[i];
} else {
flag = true;
size--;
}
}
entries = flag ? entry : entries;
return flag;
}
public int size() {
return size;
}
public Entry<K, V>[] values() {
return entries;
}
private void ensureCapacity() {
if (size == entries.length) {
entries = Arrays.copyOf(entries, size * 2);
}
}
#SuppressWarnings("hiding")
public class Entry<K, V> {
private K key;
private V value;
public K getKey() {
return key;
}
public V getValue() {
return value;
}
public void setValue(V value) {
this.value = value;
}
public Entry(K key, V value) {
super();
this.key = key;
this.value = value;
}
}
}