Regarding HashMap implementation in java - java

I was trying to do research on hashmap and came up with the following analysis:
https://stackoverflow.com/questions/11596549/how-does-javas-hashmap-work-internally/18492835#18492835
Q1 Can you guys show me a simple map where you can show the process..that how hashcode for the given key is calculated in detail by using this formula ..Calculate position hash % (arrayLength-1)) where element should be placed(bucket number), let say I have this hashMap
HashMap map=new HashMap();//HashMap key random order.
map.put("Amit","Java");
map.put("Saral","J2EE");
Q2 Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other objest with next field and so one. Last entry refers to null. Can you guys show me this with real example..!!
.
"Amit" will be distributed to the 10th bucket, because of the bit twiddeling. If there were no bit twiddeling it would go to the 7th bucket, because 2044535 & 15 = 7. how this is possible please explanin detail the whole calculation..?
Snapshots updated...
and the other image is ...

that how hashcode for the given key is calculated in detail by using
this formula
In case of String this is calculated by String#hashCode(); which is implemented as follows:
public int hashCode() {
int h = hash;
int len = count;
if (h == 0 && len > 0) {
int off = offset;
char val[] = value;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
Basically following the equation in the java doc
hashcode = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
One interesting thing to note on this implementation is that String actually caches its hash code. It can do this, because String is immutable.
If I calculate the hashcode of the String "Amit", it will yield to this integer:
System.out.println("Amit".hashCode());
> 2044535
Let's get through a simple put to a map, but first we have to determine how the map is built.
The most interesting fact about a Java HashMap is that it always has 2^n buckets. So if you call it, the default number of buckets is 16, which is obviously 2^4.
Doing a put operation on this map, it will first get the hashcode of the key. There happens some fancy bit twiddeling on this hashcode to ensure that poor hash functions (especially those that do not differ in the lower bits) don't "overload" a single bucket.
The real function that is actually responsible for distributing your key to the buckets is the following:
h & (length-1); // length is the current number of buckets, h the hashcode of the key
This only works for power of two bucket sizes, because it uses & to map the key to a bucket instead of a modulo.
"Amit" will be distributed to the 10th bucket, because of the bit twiddeling. If there were no bit twiddeling it would go to the 7th bucket, because 2044535 & 15 = 7.
Now that we have an index for it, we can find the bucket. If the bucket contains elements, we have to iterate over them and replace an equal entry if we find it.
If none item has been found in the linked list we will just add it at the beginning of the linked list.
The next important thing in HashMap is the resizing, so if the actual size of the map is above over a threshold (determined by the current number of buckets and the loadfactor, in our case 16*0.75=12) it will resize the backing array.
Resize is always 2 * the current number of buckets, which is guranteed to be a power of two to not break the function to find the buckets.
Since the number of buckets change, we have to rehash all the current entries in our table.
This is quite costly, so if you know how many items there are, you should initialize the HashMap with that count so it does not have to resize the whole time.

Q1: look at hashCode() method implementation for String object
Q2: Create simple class and implement its hashCode() method as return 1. That means each your object with that class will have the same hashCode and therefore will be saved in the same bucket in HashMap.

Understand that there are two basic requirements for a hash code:
When the hash code is recalculated for a given object (that has not been changed internally in a way that would alter its identity) it must produce the same value as the previous calculation. Similarly, two "identical" objects must produce the same hash codes.
When the hash code is calculated for two different objects (which are not considered "identical" from the standpoint of their internal content) there should be a high probability that the two hash codes would be different.
How these goals are accomplished is the subject of much interest to the math nerds who work on such things, but understanding the details is not at all important to understanding how hash tables work.

import java.util.Arrays;
public class Test2 {
public static void main(String[] args) {
Map<Integer, String> map = new Map<Integer, String>();
map.put(1, "A");
map.put(2, "B");
map.put(3, "C");
map.put(4, "D");
map.put(5, "E");
System.out.println("Iterate");
for (int i = 0; i < map.size(); i++) {
System.out.println(map.values()[i].getKey() + " : " + map.values()[i].getValue());
}
System.out.println("Get-> 3");
System.out.println(map.get(3));
System.out.println("Delete-> 3");
map.delete(3);
System.out.println("Iterate again");
for (int i = 0; i < map.size(); i++) {
System.out.println(map.values()[i].getKey() + " : " + map.values()[i].getValue());
}
}
}
class Map<K, V> {
private int size;
private Entry<K, V>[] entries = new Entry[16];
public void put(K key, V value) {
boolean flag = true;
for (int i = 0; i < size; i++) {
if (entries[i].getKey().equals(key)) {
entries[i].setValue(value);
flag = false;
break;
}
}
if (flag) {
this.ensureCapacity();
entries[size++] = new Entry<K, V>(key, value);
}
}
public V get(K key) {
V value = null;
for (int i = 0; i < size; i++) {
if (entries[i].getKey().equals(key)) {
value = entries[i].getValue();
break;
}
}
return value;
}
public boolean delete(K key) {
boolean flag = false;
Entry<K, V>[] entry = new Entry[size];
int j = 0;
int total = size;
for (int i = 0; i < total; i++) {
if (!entries[i].getKey().equals(key)) {
entry[j++] = entries[i];
} else {
flag = true;
size--;
}
}
entries = flag ? entry : entries;
return flag;
}
public int size() {
return size;
}
public Entry<K, V>[] values() {
return entries;
}
private void ensureCapacity() {
if (size == entries.length) {
entries = Arrays.copyOf(entries, size * 2);
}
}
#SuppressWarnings("hiding")
public class Entry<K, V> {
private K key;
private V value;
public K getKey() {
return key;
}
public V getValue() {
return value;
}
public void setValue(V value) {
this.value = value;
}
public Entry(K key, V value) {
super();
this.key = key;
this.value = value;
}
}
}

Related

finding non duplicate integer from give array with odd number of items

Trying to figure the error with this code. This works for small samples but fails for huge numbers (I don't have a large sample in my hand though).
The solution worked for the following tests.
private static final int[] A = {9,3,9,3,9,7,9};
private static final int[] A2 = {9,3,9};
private static final int[] A3 = {9,3,9,3,9,7,7,2,2,11,9};
#Test
public void test(){
OddOccurance oddOccurance =new OddOccurance();
int odd=oddOccurance.solution(A);
assertEquals(7,odd);
}
#Test
public void test2(){
OddOccurance oddOccurance =new OddOccurance();
int odd=oddOccurance.solution(A2);
assertEquals(3,odd);
}
#Test
public void test3(){
OddOccurance oddOccurance =new OddOccurance();
int odd=oddOccurance.solution(A3);
assertEquals(11,odd);
}
when an array is given with an odd number of integers (except one integer other integers can be repeated). The solution is to find the non-repeating integer. Any other better ideas (Time and space optimized) to implement this as well, welcome.
public int solution(int[] A) {
// write your code in Java SE 8
Map<Integer, List<Integer>> map = new HashMap<>();
int value = 0;
//iterate throught the list and for each array value( key in the map)
// set how often it appears as the value of the map
for (int key : A) {
if (map.containsKey(key)) {
map.get(key).add(value);
} else {
List<Integer> valueList = new ArrayList<>();
valueList.add(value);
map.put(key, valueList);
}
}
Set<Map.Entry<Integer, List<Integer>>> entrySet = map.entrySet();
// en
for (Map.Entry<Integer, List<Integer>> entry : entrySet) {
if (entry.getValue().size() == 1) {
return entry.getKey();
}
}
return 0;
}
Update
Looking at failed outputs
WRONG ANSWER, got 0 expected 42
WRONG ANSWER, got 0 expected 700
It seems it didn't even go to the for loop but just return 0
It's a standard problem, if the actual statement is the following:
each number except one appears even number of times; the remaining number appears once.
The solution is to take xor of all numbers. Since every repeating number occures even number of times, it will cancel itself. The reason is that xor is commutative:
a xor b xor c = a xor c xor b = c xor b xor a = etc.
For example, in case of 1, 2, 3, 1, 2
1 xor 2 xor 3 xor 1 xor 2 =
(1 xor 1) xor (2 xor 2) xor 3 =
0 xor 0 xor 3 =
3
One approach would be to create a new array containing the frequency of each value. You could start by looping through your initial array to calculate the maximum value in it.
For example, the array {9,3,9,3,9,7,7,2,2,11,9} would have a maximum value of 11. With this information, create a new array that can store the frequency of every possible value in your initial array. Then, assuming there is only one integer that repeats once, return the index of the new array that has a frequency of 1. This method should run in O(n) where n is the size of the input array.
Here's an implementation:
public int solution(int[] inp)
{
int max = inp[0];
for(int i = 1; i < inp.length; i++)
{
if(inp[i] > max)
max = inp[i];
}
int[] histogram = new int[max + 1]; //We add 1 so we have an index for our max value
for(int i = 0; i < inp.length; i++)
histogram[inp[i]]++; //Update the frequency
for(int i = 0; i < histogram.length; i++)
{
if(histogram[i] == 1)
return i;
}
return -1; //Hopefully this doesn't happen
}
Hope this helps
It's hard to know why yours failed without the actual error message. Regardless, as your array input gets very large, your internal data structure grows accordingly, but doesn't need to. Instead an array of Integer as the value, we can just use one Integer:
public int solution(int[] a) {
Integer ONE = 1;
Map<Integer, Integer> map = new HashMap<>();
for (int key : a) {
Integer value = (map.containsKey(key)) ? map.get(key) + ONE : ONE;
map.put(key, value);
}
for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
if (entry.getValue().equals(ONE)) {
return entry.getKey();
}
}
return -1;
}
I'm assuming the odd array length requirement is to avoid an array of length of two, where the items would both be unduplicated or duplicated.
Since we don't need the actual total, we can simplify this further and just consider parity. Here's a rework that does and uses the evolving new rules of this question, looking for the odd man out:
public int solution(int[] a) {
Map<Integer, Boolean> odd = new HashMap<>();
for (int key : a) {
odd.put(key, (odd.containsKey(key)) ? ! odd.get(key) : Boolean.TRUE);
}
for (Map.Entry<Integer, Boolean> entry : odd.entrySet()) {
if (entry.getValue()) {
return entry.getKey();
}
}
return 0;
}
Returns zero on failure as we now know:
A is an integer within the range [1..1,000,000,000]

Resizing a HashMap with quadratic probing (backing array implementation)

After I check to see if the load factor signals the backing array to be resized, how do I actually do the resizing with quadratic probing?
Here is the code.
It's only part of the class. Also, could you check if I'm implementing the add method correctly?
import java.util.*;
public class HashMap<K, V> implements HashMapInterface<K, V> {
// Do not make any new instance variables.
private MapEntry<K, V>[] table;
private int size;
/**
* Create a hash map with no entries.
*/
public HashMap() {
table = new MapEntry[STARTING_SIZE];
size = 0;
}
#Override
public V add(K key, V value) {
if (key == null || value == null) {
throw new IllegalArgumentException("Passed in null arguments.");
}
if (getNextLoadFactor() > MAX_LOAD_FACTOR) {
resize();
}
MapEntry<K, V> entry = new MapEntry<>(key, value);
V val = null;
int index = Math.abs(key.hashCode()) % table.length;
int temp = index;
int q = 1;
do {
if (table[index] == null) {
table[index] = entry;
} else if (table[index].getKey().equals(key)) {
val = table[index].getValue();
table[index].setValue(value);
}
index = index + q*q % table.length;
q++;
} while (temp != index);
size++;
return val;
}
private double getNextLoadFactor() {
return (double) size / (double) table.length;
}
private void resize() {
MapEntry<K, V>[] temp = table;
table = new MapEntry[table.length * 2 + 1];
for (int i = 0; i < table.length; i++) {
}
}
Following the following from wiki:
1. Get the key k
2. Set counter j = 0
3. Compute hash function h[k] = k % SIZE
4. If hashtable[h[k]] is empty
(4.1) Insert key k at hashtable[h[k]]
(4.2) Stop
Else
(4.3) The key space at hashtable[h[k]] is occupied, so we need to find the next available key space
(4.4) Increment j
(4.5) Compute new hash function h[k] = ( k + j * j ) % SIZE
(4.6) Repeat Step 4 till j is equal to the SIZE of hash table
5. The hash table is full
6. Stop
According to the above, it seems to me that there is a problem in your add method. Notice step (4.1) and (4.2): if table[index] == null, a position for the key has been found and you can stop. Your do will execute again, because right after the insert, you update the index, thus temp != index will be true.
You are also calculating the next index incorrectly, change
index = index + q*q % table.length;
to
index = (Math.abs(key.hashCode()) + q*q) % table.length;
The add will thus change to:
MapEntry<K, V> entry = new MapEntry<>(key, value);
V val = null;
int index = Math.abs(key.hashCode()) % table.length;
int q = 0;
while (table[(index = (Math.abs(key.hashCode()) + q*q++) % table.length)] != null);
table[index] = entry;
size++;
return val;
It can be proven that, if the table size b for b > 3 the first b/2 positions will be unique, so it is safe to assume that if the table is less than half full (b/2 - 1), you will find an empty position. This depends on your MAX_LOAD_FACTOR.
For resizing, you will need to rehash each value into the new table. This is due to your hash function using the size of the table as modulus. Your hash function has basically changed, so you need to create the new array of size + 1, and readd every element to the new array.
private void resize() {
MapEntry<K, V>[] temp = table;
table = new MapEntry[table.length * 2 + 1];
for (MapEntry<K, V> entry:temp) {
this.add(entry.getKey(), entry.getValue());
}
}
Note: I did not test this and only used the theory behind dynamic probing and hashtables to debug your code. Hope it helps!

extracting data from the list on a certain condition

I have a map as shown below:
Key Value
23 20
32 20 (20+20 =40 , min=23 max=32)
43 18
45 24 (24+18 =42 , since 42 >40 so here min and max will be same that is 43
47 10
56 6 (24 +10 +6 =40) so here min =45 and max = 56
43 50 ****so how we will handle the case where value is greater than key 50 >43 ********
so i have implemented the logic which will take the
1) where value of map value reaches 40
2) where the map value upon calculation becomes greater than 40
3) ** haven't implemented the scenario in where first instance where the value of map at initial level is greater at first instance let say as shown above the key is 43 and the value is 50**
Now please advise how to handle the third scenario, what I have implemented is ..
create a Pair class that will hold the key and the value.
class Pair {
public int key;
public int value;
public Pair(int key, int value){
this.key = key;
this.value = value;
}
}
hen create a list of pair and iterate through it. If the sum is 0, initialize the min and the max. Then for each pair iterated, add its value to the sum. If the sum is inferior continue the loop and update the max key, else you have two cases possible:
The sum is equals to the limit so update the max key
The sum is not equals to the limit (so it's superior), decrement the index and don't update the max key
public static void main(String[] arg) {
Map<Long, Integer> m = new LinkedHashMap<>();
//fill your map here
List<Pair> l = new ArrayList<>();
for(Map.Entry<Long, Integer> entries : m.entrySet()){
l.add(new Pair(entries.getKey(), entries.getValue()));
}
//Now you have a list of Pair
int sum = 0;
int min = -1;
int max = -1;
for(int i = 0; i < pairList.size(); i++){
Pair p = pairList.get(i);
if(sum == 0){
min = p.key;
max = p.key;
}
sum += p.value;
if(sum < LIMIT){
max = p.key;
} else {
if(sum > LIMIT){
i--;
} else {
max = p.key;
}
System.out.println(min+"_"+max);
sum = 0;
}
}
}
Which prints:
23_32
43_43
45_56
Can you please advise how to handle the third scenario where first instance where the value of map at initial level is greater at first instance let say as shown above the key is 43 and the value is 50**
You still did not really specify clearly what you are going to achieve. You did not say whether the keys of your input map are in ascending order (that is, whether it is a TreeMap). In the original question (linked in the first comment), your input was 2 lists. The example that you posted does not make any sense at all, because it contains the key=43 twice - so it can't be a map. Your description sounds like there is some constraint between the keys and the values (the value 50 being greater than the key 43), but maybe this is just an artifact of your explaination.
Your solution approach, and how you updated some variables in your implementation, may be an attempt to give precise information. But maybe a verbal or semi-formal description of your actual goal are more helpful here. Maybe you just can not explain what you want to achieve. In the worst case, you don't know it.
At the moment, my interpretation is roughly the following: Your input consists of two lists (!). And you are trying to find ranges of the first list, so that the sum of the corresponding values in the second list is at least 40.
IF this is the case, you can just compute the indices for the list of values where the summation should start and where the summation should end. When you have these indices, you can obtain the corresponding "keys" from the first list.
import java.util.ArrayList;
import java.util.List;
public class SumSplit
{
public static void main(String[] args)
{
List<Integer> keys = new ArrayList<Integer>();
keys.add(23);
keys.add(32);
keys.add(43);
keys.add(45);
keys.add(47);
keys.add(56);
keys.add(43);
List<Integer> values = new ArrayList<Integer>();
values.add(20);
values.add(20);
values.add(18);
values.add(24);
values.add(10);
values.add( 6);
values.add(50);
final int SPLIT_VALUE = 40;
List<Integer> minIndices = new ArrayList<Integer>();
List<Integer> maxIndices = new ArrayList<Integer>();
int sum = 0;
int minIndex = -1;
for (int i=0; i<keys.size(); i++)
{
Integer value = values.get(i);
sum += value;
if (minIndex == -1)
{
minIndex = i;
}
if (sum >= SPLIT_VALUE)
{
minIndices.add(minIndex);
maxIndices.add(i);
minIndex = -1;
sum = 0;
}
}
for (int i=0; i<minIndices.size(); i++)
{
Integer min = minIndices.get(i);
Integer max = maxIndices.get(i);
System.out.println("min: "+keys.get(min)+", max "+keys.get(max));
printInfo(min, max, keys, values);
}
}
private static void printInfo(int min, int max, List<Integer> keys, List<Integer> values)
{
int sum = 0;
for (int i=min; i<=max; i++)
{
Integer key = keys.get(i);
Integer value = values.get(i);
sum += value;
System.out.println(" "+key+" : "+value+" (sum until now: "+sum+")");
}
System.out.println("Sum: "+sum);
}
}
If this is not what you are going to achieve, please describe clearly what your actual goal is.

Java On-Memory Efficient Key-Value Store

I have store 111 million key-value pairs (one key can have multiple values - maximum 2/3) whose key are 50 bit Integers and values are 32 bit (maximum) Integers. Now, my requirements are:
Fast Insertion of (Key, Value) pair [allowing duplicates]
Fast retrieving of value/values based on key.
A nice solution of it is given here based on MultiMap. However, I want to store more key-values pairs in main memory with no/little bit performance penalty. I studied from web articles that B+ Tree, R+ Tree, B Tree, Compact Multimap etc. can be a nice solution for that. Can anybody help me:
Is there any Java library which satisfies my all those needs properly
(above mentioned/other ds also acceptable. no issue with that) ?
Actually, I want an efficient java library data structure to store/retrieve
key-value/values pairs which takes less memory footprint and must be
built in-memory.
NB: I have tried with HashMultiMap (Guava with some modification with trove) as mentioned by Louis Wasserman, Kyoto/Tokyo Cabinet etc etc.My experience is not good with disk-baked solutions. So please avoid that :). Another point is that, for choosing library/ds one important point is: keys are 50 bit (so if we assign 64bit) 14 bit will lost and values are 32 bit Int (maximum)- mostly they are 10-12-14 bits. So, we can save space there also.
I don't think there's anything in the JDK which will do this.
However, implementing such a thing is a simple matter of programming. Here is an open-addressed hashtable with linear probing, with keys and values stored in parallel arrays:
public class LongIntParallelHashMultimap {
private static final long NULL = 0L;
private final long[] keys;
private final int[] values;
private int size;
public LongIntParallelHashMultimap(int capacity) {
keys = new long[capacity];
values = new int[capacity];
}
public void put(long key, int value) {
if (key == NULL) throw new IllegalArgumentException("key cannot be " + NULL);
if (size == keys.length) throw new IllegalStateException("map is full");
int index = indexFor(key);
while (keys[index] != NULL) {
index = successor(index);
}
keys[index] = key;
values[index] = value;
++size;
}
public int[] get(long key) {
if (key == NULL) throw new IllegalArgumentException("key cannot be " + NULL);
int index = indexFor(key);
int count = countHits(key, index);
int[] hits = new int[count];
int hitIndex = 0;
while (keys[index] != NULL) {
if (keys[index] == key) {
hits[hitIndex] = values[index];
++hitIndex;
}
index = successor(index);
}
return hits;
}
private int countHits(long key, int index) {
int numHits = 0;
while (keys[index] != NULL) {
if (keys[index] == key) ++numHits;
index = successor(index);
}
return numHits;
}
private int indexFor(long key) {
// the hashing constant is (the golden ratio * Long.MAX_VALUE) + 1
// see The Art of Computer Programming, section 6.4
// the constant has two important properties:
// (1) it is coprime with 2^64, so multiplication by it is a bijective function, and does not generate collisions in the hash
// (2) it has a 1 in the bottom bit, so it does not add zeroes in the bottom bits of the hash, and does not generate (gratuitous) collisions in the index
long hash = key * 5700357409661598721L;
return Math.abs((int) (hash % keys.length));
}
private int successor(int index) {
return (index + 1) % keys.length;
}
public int size() {
return size;
}
}
Note that this is a fixed-size structure. You will need to create it big enough to hold all your data - 110 million entries for me takes up 1.32 GB. The bigger you make it, in excess of what you need to store the data, the faster that insertions and lookups will be. I found that for 110 million entries, with a load factor of 0.5 (2.64 GB, twice as much space as needed), it took on average 403 nanoseconds to look up a key, but with a load factor of 0.75 (1.76 GB, a third more space than is needed), it took 575 nanoseconds. Decreasing the load factor below 0.5 usually doesn't make much difference, and indeed, with a load factor of 0.33 (4.00 GB, three times more space than needed), i get an average time of 394 nanoseconds. So, even though you have 5 GB available, don't use it all.
Note also that zero is not allowed as a key. If this is a problem, change the null value to be something else, and pre-fill the keys array with that on creation.
Is there any Java library which satisfies my all those needs properly.
AFAIK no. Or at least, not one that minimizes the memory footprint.
However, it should be easy write a custom map class that is specialized to these requirements.
It's a good idea to look for databases, because problems like these are what they are designed for. In recent years Key-Value databases became very popular, e.g. for web services (keyword "NoSQL"), so you should find something.
The choice for a custom data structure also depends if you want to use a hard drive to store your data (and how safe that has to be) or if it completely lost on program exit.
If implementing manually and the whole db fits into memory somewhat easily, I'd just implement a hashmap in C. Create a hash function that gives a (well-spread) memory address from a value. Insert there or next to it if already assigned. Assigning and retrieval is then O(1). If you implement it in Java, you'll have the 4 byte overhead for each (primitive) object.
Based on #Tom Andersons solution I removed the need to allocate objects, and added a performance test.
import java.util.Arrays;
import java.util.Random;
public class LongIntParallelHashMultimap {
private static final long NULL = Long.MIN_VALUE;
private final long[] keys;
private final int[] values;
private int size;
public LongIntParallelHashMultimap(int capacity) {
keys = new long[capacity];
values = new int[capacity];
Arrays.fill(keys, NULL);
}
public void put(long key, int value) {
if (key == NULL) throw new IllegalArgumentException("key cannot be " + NULL);
if (size == keys.length) throw new IllegalStateException("map is full");
int index = indexFor(key);
while (keys[index] != NULL) {
index = successor(index);
}
keys[index] = key;
values[index] = value;
++size;
}
public int get(long key, int[] hits) {
if (key == NULL) throw new IllegalArgumentException("key cannot be " + NULL);
int index = indexFor(key);
int hitIndex = 0;
while (keys[index] != NULL) {
if (keys[index] == key) {
hits[hitIndex] = values[index];
++hitIndex;
if (hitIndex == hits.length)
break;
}
index = successor(index);
}
return hitIndex;
}
private int indexFor(long key) {
return Math.abs((int) (key % keys.length));
}
private int successor(int index) {
index++;
return index >= keys.length ? index - keys.length : index;
}
public int size() {
return size;
}
public static class PerfTest {
public static void main(String... args) {
int values = 110* 1000 * 1000;
long start0 = System.nanoTime();
long[] keysValues = generateKeys(values);
LongIntParallelHashMultimap map = new LongIntParallelHashMultimap(222222227);
long start = System.nanoTime();
addKeyValues(values, keysValues, map);
long mid = System.nanoTime();
int sum = lookUpKeyValues(values, keysValues, map);
long time = System.nanoTime();
System.out.printf("Generated %.1f M keys/s, Added %.1f M/s and looked up %.1f M/s%n",
values * 1e3 / (start - start0), values * 1e3 / (mid - start), values * 1e3 / (time - mid));
System.out.println("Expected " + values + " got " + sum);
}
private static long[] generateKeys(int values) {
Random rand = new Random();
long[] keysValues = new long[values];
for (int i = 0; i < values; i++)
keysValues[i] = rand.nextLong();
return keysValues;
}
private static void addKeyValues(int values, long[] keysValues, LongIntParallelHashMultimap map) {
for (int i = 0; i < values; i++) {
map.put(keysValues[i], i);
}
assert map.size() == values;
}
private static int lookUpKeyValues(int values, long[] keysValues, LongIntParallelHashMultimap map) {
int[] found = new int[8];
int sum = 0;
for (int i = 0; i < values; i++) {
sum += map.get(keysValues[i], found);
}
return sum;
}
}
}
prints
Generated 34.8 M keys/s, Added 11.1 M/s and looked up 7.6 M/s
Run on an 3.8 GHz i7 with Java 7 update 3.
This is much slower than the previous test because you are accessing main memory, rather than the cache at random. This is really a test of the speed of your memory. The writes are faster because they can be performed asynchronously to main memory.
Using this collection
final SetMultimap<Long, Integer> map = Multimaps.newSetMultimap(
TDecorators.wrap(new TLongObjectHashMap<Collection<Integer>>()),
new Supplier<Set<Integer>>() {
public Set<Integer> get() {
return TDecorators.wrap(new TIntHashSet());
}
});
Running the same test with 50 million entries (which used about 16 GB) and -mx20g I go the following result.
Generated 47.2 M keys/s, Added 0.5 M/s and looked up 0.7 M/s
For 110 M entries you will need about 35 GB of memory and a machine 10 x faster than mine (3.8 GHz) to perform 5 million adds per second.
If you must use Java, then implement your own hashtable/hashmap. An important property of your table is to use a linkedlist to handle collisions. Hence when you do a lookup, you may return all the elements on the list.
Might be I am late in answering this question but elastic search will solve your problem.

how to Compute the average probe length for success and failure - Linear probe (Hash Tables) [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I'm doing an assignment for my Data Structures class. we were asked to to study linear probing with load factors of .1, .2 , .3, ...., and .9. The formula for testing is:
The average probe length using linear probing is roughly
Success--> ( 1 + 1/(1-L)**2)/2
or
Failure--> (1+1(1-L))/2.
we are required to find the theoretical using the formula above which I did(just plug the load factor in the formula), then we have to calculate the empirical (which I not quite sure how to do). here is the rest of the requirements
**For each load factor, 10,000 randomly generated positive ints
between 1 and 50000 (inclusive) will
be inserted into a table of the
"right" size, where "right" is
strictly based upon the load factor
you are testing. Repeats are allowed.
Be sure that your formula for randomly
generated ints is correct. There is a
class called Random in java.util. USE
it! After a table of the right (based
upon L) size is loaded with 10,000
ints, do 100 searches of newly
generated random ints from the range
of 1 to 50000. Compute the average
probe length for each of the two
formulas and indicate the denominators
used in each calculationSo, for example, each test for a .5 load would have a table of > > size
approximately 20,000 (adjusted to be
prime) and similarly each test for a
.9 load would have a table of
approximate size 10,000/.9 (again
adjusted to be prime).
The program should run displaying the
various load factors tested, the
average probe for each search (the two
denominators used to compute the
averages will add to 100), and the
theoretical answers using the formula
above. .**
how do I calculate the empirical success?
here is my code so far:
import java.util.Random;
/**
*
* #author Johnny
*/
class DataItem
{
private int iData;
public DataItem(int it)
{iData = it;}
public int getKey()
{
return iData;
}
}
class HashTable
{
private DataItem[] hashArray;
private int arraySize;
public HashTable(int size)
{
arraySize = size;
hashArray = new DataItem[arraySize];
}
public void displayTable()
{
int sp=0;
System.out.print("Table: ");
for(int j=0; j<arraySize; j++)
{
if(sp>50){System.out.println("");sp=0;}
if(hashArray[j] != null){
System.out.print(hashArray[j].getKey() + " ");sp++;}
else
{System.out.print("** "); sp++;}
}
System.out.println("");
}
public int hashFunc(int key)
{
return key %arraySize;
}
public void insert(DataItem item)
{
int key = item.getKey();
int hashVal = hashFunc(key);
while(hashArray[hashVal] != null &&
hashArray[hashVal].getKey() != -1)
{
++hashVal;
hashVal %= arraySize;
}
hashArray[hashVal]=item;
}
public int hashFunc1(int key)
{
return key % arraySize;
}
public int hashFunc2(int key)
{
// non-zero, less than array size, different from hF1
// array size must be relatively prime to 5, 4, 3, and 2
return 5 - key % 5;
}
public DataItem find(int key) // find item with key
// (assumes table not full)
{
int hashVal = hashFunc1(key); // hash the key
int stepSize = hashFunc2(key); // get step size
while(hashArray[hashVal] != null) // until empty cell,
{ // is correct hashVal?
if(hashArray[hashVal].getKey() == key)
return hashArray[hashVal]; // yes, return item
hashVal += stepSize; // add the step
hashVal %= arraySize; // for wraparound
}
return null; // can’t find item
}
}
public class n00645805 {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
double b=1;
double L;
double[] tf = new double[9];
double[] ts = new double[9];
double d=0.1;
DataItem aDataItem;
int aKey;
HashTable h1Table = new HashTable(100003); //L=.1
HashTable h2Table = new HashTable(50051); //L=.2
HashTable h3Table = new HashTable(33343); //L=.3
HashTable h4Table = new HashTable(25013); //L=.4
HashTable h5Table = new HashTable(20011); //L=.5
HashTable h6Table = new HashTable(16673); //L=.6
HashTable h7Table = new HashTable(14243); //L=.7
HashTable h8Table = new HashTable(12503); //L=.8
HashTable h9Table = new HashTable(11113); //L=.9
fillht(h1Table);
fillht(h2Table);
fillht(h3Table);
fillht(h4Table);
fillht(h5Table);
fillht(h6Table);
fillht(h7Table);
fillht(h8Table);
fillht(h9Table);
pm(h1Table);
pm(h2Table);
pm(h3Table);
pm(h4Table);
pm(h5Table);
pm(h6Table);
pm(h7Table);
pm(h8Table);
pm(h9Table);
for (int j=1;j<10;j++)
{
//System.out.println(j);
L=Math.round((b-d)*100.0)/100.0;
System.out.println(L);
System.out.println("ts "+(1+(1/(1-L)))/2);
System.out.println("tf "+(1+(1/((1-L)*(1-L))))/2);
tf[j-1]=(1+(1/(1-L)))/2;
ts[j-1]=(1+(1/((1-L)*(1-L))))/2;
d=d+.1;
}
display(ts,tf);
}
public static void fillht(HashTable a)
{
Random r = new Random();
for(int j=0; j<10000; j++)
{
int aKey;
DataItem y;
aKey =1+Math.round(r.nextInt(50000));
y = new DataItem(aKey);
a.insert(y);
}
}
public static void pm(HashTable a)
{
DataItem X;
int numsuc=0;
int numfail=0;
int aKey;
Random r = new Random();
for(int j=0; j<100;j++)
{
aKey =1+Math.round(r.nextInt(50000));
X = a.find(aKey);
if(X != null)
{
//System.out.println("Found " + aKey);
numsuc++;
}
else
{
//System.out.println("Could not find " + aKey);
numfail++;
}
}
System.out.println("# of succ is "+ numsuc+" # of failures is "+ numfail);
}
public static void display(double[] s, double[] f)
{
}
}
You should take into account that Java's HashTable uses a closed addressing (no probing) implementation, so you have separate buckets in which many items can be placed. This is not what you are looking for in your benchmarks. I'm not sure about HashMap implementation but I think it uses open addressing too.
So forget about JDK classes.. since you want to calculate empirical values you should write your own version of an hashtable that uses the open addressing implementation with linear probing but you should take care of counting the probe length whenever you try to get a value from the hashmap..
For example you can write your hashmap and then take care of having
class YourHashMap
{
int empiricalGet(K key)
{
// search for the key but store the probe length of this get operation
return probeLength;
}
}
Then you can easily benchmark it by searching how many keys you want and calculating the average probe length.
Otherwise you can just provide the hasmap the ability of storing the total probe length and the count of gets requested and retrieve them after the benchmark run to calculate average value.
This kind of exercises must prove that the empirical value concordates with the theoretical one. So take also into account the fact that you may need many benchmarks, and then do the average of them all, assuring that variance is not too high.

Categories

Resources