Reallocating method for HashSet - java

I need to create a reallocate method for a HashSet. I will implement to an add and to a remove method, that way I increase or decrease its size if the load factor is greater than 1 or less than 0.5, respectively. The loadFactor = elements/size of the set. This is my work so far but I start losing elements whenever I need to increase the size of my set.
public void reallocate() {
double loadFactor = (double) currentSize / buckets.length;
Node[] newBuckets;
if (loadFactor > 1) {
newBuckets = new Node[buckets.length * 2];
for (Node bucket : buckets) {
if (bucket != null) {
int h = bucket.hashCode();
h = Math.abs(h % newBuckets.length);
newBuckets[h] = bucket;
}
}
buckets = newBuckets;
} else if (loadFactor < 0.5) {
newBuckets = new Node[buckets.length / 2];
for (Node bucket : buckets) {
if (bucket != null) {
int h = bucket.hashCode();
h = Math.abs(h % newBuckets.length);
newBuckets[h] = bucket;
}
}
buckets = newBuckets;
}
}
The original array is buckets and I create newBuckets with the selected size. Used the loop to copy each element to newBuckets and then set buckets = newBuckets. I would prefer some advice instead of handing me the solution since I want to learn how to do this.

Related

Resizing a HashMap with quadratic probing (backing array implementation)

After I check to see if the load factor signals the backing array to be resized, how do I actually do the resizing with quadratic probing?
Here is the code.
It's only part of the class. Also, could you check if I'm implementing the add method correctly?
import java.util.*;
public class HashMap<K, V> implements HashMapInterface<K, V> {
// Do not make any new instance variables.
private MapEntry<K, V>[] table;
private int size;
/**
* Create a hash map with no entries.
*/
public HashMap() {
table = new MapEntry[STARTING_SIZE];
size = 0;
}
#Override
public V add(K key, V value) {
if (key == null || value == null) {
throw new IllegalArgumentException("Passed in null arguments.");
}
if (getNextLoadFactor() > MAX_LOAD_FACTOR) {
resize();
}
MapEntry<K, V> entry = new MapEntry<>(key, value);
V val = null;
int index = Math.abs(key.hashCode()) % table.length;
int temp = index;
int q = 1;
do {
if (table[index] == null) {
table[index] = entry;
} else if (table[index].getKey().equals(key)) {
val = table[index].getValue();
table[index].setValue(value);
}
index = index + q*q % table.length;
q++;
} while (temp != index);
size++;
return val;
}
private double getNextLoadFactor() {
return (double) size / (double) table.length;
}
private void resize() {
MapEntry<K, V>[] temp = table;
table = new MapEntry[table.length * 2 + 1];
for (int i = 0; i < table.length; i++) {
}
}
Following the following from wiki:
1. Get the key k
2. Set counter j = 0
3. Compute hash function h[k] = k % SIZE
4. If hashtable[h[k]] is empty
(4.1) Insert key k at hashtable[h[k]]
(4.2) Stop
Else
(4.3) The key space at hashtable[h[k]] is occupied, so we need to find the next available key space
(4.4) Increment j
(4.5) Compute new hash function h[k] = ( k + j * j ) % SIZE
(4.6) Repeat Step 4 till j is equal to the SIZE of hash table
5. The hash table is full
6. Stop
According to the above, it seems to me that there is a problem in your add method. Notice step (4.1) and (4.2): if table[index] == null, a position for the key has been found and you can stop. Your do will execute again, because right after the insert, you update the index, thus temp != index will be true.
You are also calculating the next index incorrectly, change
index = index + q*q % table.length;
to
index = (Math.abs(key.hashCode()) + q*q) % table.length;
The add will thus change to:
MapEntry<K, V> entry = new MapEntry<>(key, value);
V val = null;
int index = Math.abs(key.hashCode()) % table.length;
int q = 0;
while (table[(index = (Math.abs(key.hashCode()) + q*q++) % table.length)] != null);
table[index] = entry;
size++;
return val;
It can be proven that, if the table size b for b > 3 the first b/2 positions will be unique, so it is safe to assume that if the table is less than half full (b/2 - 1), you will find an empty position. This depends on your MAX_LOAD_FACTOR.
For resizing, you will need to rehash each value into the new table. This is due to your hash function using the size of the table as modulus. Your hash function has basically changed, so you need to create the new array of size + 1, and readd every element to the new array.
private void resize() {
MapEntry<K, V>[] temp = table;
table = new MapEntry[table.length * 2 + 1];
for (MapEntry<K, V> entry:temp) {
this.add(entry.getKey(), entry.getValue());
}
}
Note: I did not test this and only used the theory behind dynamic probing and hashtables to debug your code. Hope it helps!

Select N random elements from a List efficiently (without toArray and change the list)

As in the title, I want to use Knuth-Fisher-Yates shuffle algorithm to select N random elements from a List but without using List.toArray and change the list. Here is my current code:
public List<E> getNElements(List<E> list, Integer n) {
List<E> rtn = null;
if (list != null && n != null && n > 0) {
int lSize = list.size();
if (lSize > n) {
rtn = new ArrayList<E>(n);
E[] es = (E[]) list.toArray();
//Knuth-Fisher-Yates shuffle algorithm
for (int i = es.length - 1; i > es.length - n - 1; i--) {
int iRand = rand.nextInt(i + 1);
E eRand = es[iRand];
es[iRand] = es[i];
//This is not necessary here as we do not really need the final shuffle result.
//es[i] = eRand;
rtn.add(eRand);
}
} else if (lSize == n) {
rtn = new ArrayList<E>(n);
rtn.addAll(list);
} else {
log("list.size < nSub! ", lSize, n);
}
}
return rtn;
}
It uses list.toArray() to make a new array to avoid modifying the original list. However, my problem now is that my list could be very big, can have 1 million elements. Then list.toArray() is too slow. And my n could range from 1 to 1 million. When n is small (say 2), the function is very in-efficient as it still need to do list.toArray() for a list of 1 million elements.
Can someone help improve the above code to make it more efficient when dealing with large lists. Thanks.
Here I assume Knuth-Fisher-Yates shuffle is the best algorithm to do the job of selecting n random elements from a list. Am I right? I would be very glad to if there is other algorithms better than Knuth-Fisher-Yates shuffle to do the job in terms of the speed and the quality of the results (guarantee real randomness).
Update:
Here is some of my test results:
When selection n from 1000000 elements.
When n<1000000/4 the fastest way to through using Daniel Lemire's Bitmap function to select n random id first then get the elements with these ids:
public List<E> getNElementsBitSet(List<E> list, int n) {
List<E> rtn = new ArrayList<E>(n);
int[] ids = genNBitSet(n, 0, list.size());
for (int i = 0; i < ids.length; i++) {
rtn.add(list.get(ids[i]));
}
return rtn;
}
The genNBitSet is using the code generateUniformBitmap from https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2013/08/14/java/UniformDistinct.java
When n>1000000/4 the Reservoir Sampling method is faster.
So I have built a function to combine these two methods.
You are probably looking for something like Resorvoir Sampling.
Start with an initial array with first k elements, and modify it with new elements with decreasing probabilities:
java like pseudo code:
E[] r = new E[k]; //not really, cannot create an array of generic type, but just pseudo code
int i = 0;
for (E e : list) {
//assign first k elements:
if (i < k) { r[i++] = e; continue; }
//add current element with decreasing probability:
j = random(i++) + 1; //a number from 1 to i inclusive
if (j <= k) r[j] = e;
}
return r;
This requires a single pass on the data, with very cheap ops every iteration, and the space consumption is linear with the required output size.
If n is very small compared to the length of the list, take an empty set of ints and keep adding a random index until the set has the right size.
If n is comparable to the length of the list, do the same, but then return items in the list that don't have indexes in the set.
In the middle ground, you can iterate through the list, and randomly select items based on how many items you've seen, and how many items you've already returned. In pseudo-code, if you want k items from N:
for i = 0 to N-1
if random(N-i) < k
add item[i] to the result
k -= 1
end
end
Here random(x) returns a random number between 0 (inclusive) and x (exclusive).
This produces a uniformly random sample of k elements. You could also consider making an iterator to avoid building the results list to save memory, assuming the list is unchanged as you're iterating over it.
By profiling, you can determine the transition point where it makes sense to switch from the naive set-building method to the iteration method.
Let's assume that you can generate n random indices out of m that are pairwise disjoint and then look them up efficiently in the collection. If you don't need the order of the elements to be random, then you can use an algorithm due to Robert Floyd.
Random r = new Random();
Set<Integer> s = new HashSet<Integer>();
for (int j = m - n; j < m; j++) {
int t = r.nextInt(j);
s.add(s.contains(t) ? j : t);
}
If you do need the order to be random, then you can run Fisher--Yates where, instead of using an array, you use a HashMap that stores only those mappings where the key and the value are distinct. Assuming that hashing is constant time, both of these algorithms are asymptotically optimal (though clearly, if you want to randomly sample most of the array, then there are data structures with better constants).
Just for convenience: A MCVE with an implementation of the Resorvoir Sampling proposed by amit (possible upvotes should go to him (I'm just hacking some code))
It seems like this is indeed a algorithm that nicely covers the cases of where the number of elements to select is low compared to the list size, and the cases where the number of elements is high compared to the list size (assumung that the properties about the randomness of the result that are stated on the wikipedia page are correct).
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Random;
import java.util.TreeMap;
public class ReservoirSampling
{
public static void main(String[] args)
{
example();
//test();
}
private static void test()
{
List<String> list = new ArrayList<String>();
list.add("A");
list.add("B");
list.add("C");
list.add("D");
list.add("E");
int size = 2;
int runs = 100000;
Map<String, Integer> counts = new TreeMap<String, Integer>();
for (int i=0; i<runs; i++)
{
List<String> sample = sample(list, size);
String s = createString(sample);
Integer count = counts.get(s);
if (count == null)
{
count = 0;
}
counts.put(s, count+1);
}
for (Entry<String, Integer> entry : counts.entrySet())
{
System.out.println(entry.getKey()+" : "+entry.getValue());
}
}
private static String createString(List<String> list)
{
Collections.sort(list);
StringBuilder sb = new StringBuilder();
for (String s : list)
{
sb.append(s);
}
return sb.toString();
}
private static void example()
{
List<String> list = new ArrayList<String>();
for (int i=0; i<26; i++)
{
list.add(String.valueOf((char)('A'+i)));
}
for (int i=1; i<=26; i++)
{
printExample(list, i);
}
}
private static <T> void printExample(List<T> list, int size)
{
System.out.printf("%3d elements: "+sample(list, size)+"\n", size);
}
private static final Random random = new Random(0);
private static <T> List<T> sample(List<T> list, int size)
{
List<T> result = new ArrayList<T>(Collections.nCopies(size, (T) null));
int i = 0;
for (T element : list)
{
if (i < size)
{
result.set(i, element);
i++;
continue;
}
i++;
int j = random.nextInt(i);
if (j < size)
{
result.set(j, element);
}
}
return result;
}
}
If n is way smaller then size, you could use this algorith, witch is unfortunatly quadratic with n, but doest depend on size of array at all.
Example with size = 100 and n = 4.
choose random number from 0 to 99, lets say 42, and add it to result.
choose random number from 0 to 98, lets say 39, and add it to result.
choose random number from 0 to 97, lets say 41, but since 41 is bigger or equal than 39, increment it by 1, so you have 42, but that is bigger then equal than 42, so you have 43.
...
Shortly, you choose from remaining numbers and then compuce what number have you acctualy chosen. I would use link list for this, but maybe there are better data structures.
Summarizing Changwang's update. If you want more than 250,000 items, use amit's answer. Otherwise use Knuth-Fisher-Yates Shuffle as shown in entirety here
NOTE: The result is always in the original order as well
public static <T> List<T> getNRandomElements(int n, List<T> list) {
List<T> subList = new ArrayList<>(n);
int[] ids = generateUniformBitmap(n, list.size());
for (int id : ids) {
subList.add(list.get(id));
}
return subList;
}
// https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2013/08/14/java/UniformDistinct.java
private static int[] generateUniformBitmap(int num, int max) {
if (num > max) {
DebugUtil.e("Can't generate n ints");
}
int[] ans = new int[num];
if (num == max) {
for (int k = 0; k < num; ++k) {
ans[k] = k;
}
return ans;
}
BitSet bs = new BitSet(max);
int cardinality = 0;
Random random = new Random();
while (cardinality < num) {
int v = random.nextInt(max);
if (!bs.get(v)) {
bs.set(v);
cardinality += 1;
}
}
int pos = 0;
for (int i = bs.nextSetBit(0); i >= 0; i = bs.nextSetBit(i + 1)) {
ans[pos] = i;
pos += 1;
}
return ans;
}
If you want them randomized, I use:
public static <T> List<T> getNRandomShuffledElements(int n, List<T> list) {
List<T> randomElements = getNRandomElements(n, list);
Collections.shuffle(randomElements);
return randomElements;
}
I needed something for this in C#, here's my solution which works on a generic List.
It selects N random elements of the list and places them at the front of the list.
So upon returning, the first N elements of the list are randomly selected. It is fast and efficient even when you're dealing with a very large number of elements.
static void SelectRandom<T>(List<T> list, int n)
{
if (n >= list.Count)
{
// n should be less than list.Count
return;
}
int max = list.Count;
var random = new Random();
for (int i = 0; i < n; i++)
{
int r = random.Next(max);
max = max - 1;
int irand = i + r;
if (i != irand)
{
T rand = list[irand];
list[irand] = list[i];
list[i] = rand;
}
}
}

Why is my Vector size 0?

As you can see in the screenshot, new_mean's capacity is 0 eventhough I've created it with an initial capacity of 2 therefore I'm getting index out of bounds exception.
Does anyone know what I'm doing wrong?
Update: Here's the code
private static Vector<Double> get_new_mean(
Tuple<Set<Vector<Double>>, Vector<Double>> cluster,
Vector<Double> v, boolean is_being_added) {
Vector<Double> previous_mean = cluster.y;
int n = previous_mean.size(), set_size = cluster.x.size();
Vector<Double> new_mean = new Vector<Double>(n);
if (is_being_added) {
for (int i = 0; i < n; ++i) {
double temp = set_size * previous_mean.get(i);
double updated_mean = (temp + v.get(i)) / (set_size + 1);
new_mean.set(i, updated_mean);
}
} else {
if (set_size > 1) {
for (int i = 0; i < n; ++i) {
double temp = set_size * previous_mean.get(i);
double updated_mean = (temp - v.get(i)) / (set_size - 1);
new_mean.set(i, updated_mean);
}
} else {
new_mean = null;
}
}
return new_mean;
}
Capacity is the total number of elements you could store.
Size is the number of elements you have actually stored.
In your code, there is nothing stored in the Vector, so you get an IndexOutOfBoundsException when you try to access element 0.
Use set(int, object) to change an EXISTING element. Use add(int, object) to add a NEW element.
This is explained in the javadoc for Vectors. elementCount should be 0 (it's empty) and capacityIncrement is 0 by default, and is only relevant if you're going to go over the limit you specified (2).
You need to fill your Vector with null values to make it's size equal to the capacity. Capacity is an optimization hint for the collection, it makes no change to the collection usage. Collection will automatically grow as you add elements to it and capacity will increase. So initializing with a higher capacity would requires less expansions and less memory allocations.

Re-Sizing Hash Table

I'm attempting to resize my hash table however; I am keep getting a NullPointerException.
I know if the size is greater than 0.75 then the table size has to double, if it's less than 0.50 then the table size is halved. So far I have this..
public boolean add(Object x)
{
int h = x.hashCode();
if (h < 0) { h = -h; }
h = h % buckets.length;
Node current = buckets[h];
while (current != null)
{
if (current.data.equals(x)) { return false; }
// Already in the set
current = current.next;
}
Node newNode = new Node();
newNode.data = x;
newNode.next = buckets[h];
buckets[h] = newNode;
currentSize++;
double factor1 = currentSize * load1; //load1 = 0.75
double factor2 = currentSize * load2; //load2 = 0.50
if (currentSize > factor1) { resize(buckets.length*2); }
if (currentSize < factor2) { resize(buckets.length/2); }
return true;
}
Example. Size = 3. Max Size = 5
if we take the Max Size and multiply by 0.75 we get 3.75.
this is the factor that says if we pass it the Max Size must double
so if we add an extra element into the table the size is 4 and is > 3.75 thus the new Max Size is 10.
However; once we increase the size, the hashcode will change with the addition of a new element, so we call resize(int newSize)
private void resize(int newLength)
{
//
HashSet newTable = new HashSet(newLength);
for (int i = 0; i < buckets.length; i++) {
newTable.add(buckets[i]);
}
}
Here is my constructor if the buckets[i] confuses anyone.
public HashSet(int bucketsLength)
{
buckets = new Node[bucketsLength];
currentSize = 0;
}
I feel that the logic is correct, unless my resize method is not retrieving the elements.
If that is all your code for resize(), then you are failing to assign newTable to a class attribute, i.e. your old table. Right now you fill it with data and then don't do anything with it, since it is defined inside resize and therefore not available outside of it.
So you end up thinking you have a larger table now, but in fact you are still using the old one ;-)

Binary Tree Max sum level - Better Design?

I have written a code for finding level in Binary Tree having max sum of elements. I have a few Questions.
Is it a good design ? - I have used 2 queues but the total num of elements both queues store will be less than n. SO I think it should be Ok.
Can there be a better design?
public class MaxSumLevel {
public static int findLevel(BinaryTreeNode root) {
Queue mainQ = new Queue();
Queue tempQ = new Queue();
int maxlevel = 0;
int maxVal = 0;
int tempSum = 0;
int tempLevel = 0;
if (root != null) {
mainQ.enqueue(root);
maxlevel = 1;
tempLevel = 1;
maxVal = root.getData();
}
while ( !mainQ.isEmpty()) {
BinaryTreeNode head = (BinaryTreeNode) mainQ.dequeue();
BinaryTreeNode left = head.getLeft();
BinaryTreeNode right = head.getRight();
if (left != null) {
tempQ.enqueue(left);
tempSum = tempSum + left.getData();
}
if (right != null) {
tempQ.enqueue(right);
tempSum = tempSum + right.getData();
}
if (mainQ.isEmpty()) {
mainQ = tempQ;
tempQ = new Queue();
tempLevel ++;
if (tempSum > maxVal) {
maxVal = tempSum;
maxlevel = tempLevel;
tempSum = 0;
}
}
}
return maxlevel;
}
}
I like recursion (note, untested code):
public static int maxLevel(BinaryTreeNode tree) {
ArrayList<Integer> levels = new ArrayList<Integer>();
findLevels(tree, 0, levels);
// now just return the index in levels with the maximal value.
// bearing in mind that levels could be empty.
}
private static void findLevels(BinaryTreeNode tree, int level,
ArrayList<Integer> levels) {
if (tree == null) {
return;
}
if (levels.length <= level) {
levels.add(0);
}
levels.set(level, levels.get(level) + tree.getData());
findLevels(tree.getLeft(), level+1, levels);
findLevels(tree.getRight(), level+1, levels);
}
If I was feeling really mean to the garbage collector, I'd make findLevels return a list of (level, value) pairs and sum over those. That makes a lot more sense in co-routiney sort of languages, though, it'd be weird in java.
Obviously you can take the strategy in the recursive function and do it with an explicit stack of nodes to be processed. The key difference between my way and yours is that mine takes memory proportional to the height of the tree; yours takes memory proportional to its width.
Looking at your code, it seems pretty reasonable for the approach. I'd rename tempLevel to currentLevel, and I'd be inclined to pull the inner loop out into a function sumLevel that takes a queue and returns an int and a queue (except actually the queue would be an argument, because you can only return one value, grrr). But it seems okay as is.
It depends on how many nodes your trees have and how deep they are. Since you're performing breadth first search, your queues will take O(n) memory space, which is OK for most applications.
The following solution has O(l) space complexity and and O(n) time complexity (l is the depth of a tree and n number of its vertices):
public List<Integer> levelsSum(BinaryTreeNode tree) {
List<Integer> sums = new ArrayList<Integer>()
levelsSum(tree, sums, 0);
return sums;
}
protected void levelsSum(BinaryTreeNode tree, List<Integer> levelSums, int level) {
if (tree == null)
return;
// add new element into the list if needed
if (level.size() <= level)
levelSums.add(Integer.valueOf(0));
// add this node's value to the appropriate level
levelSums.set(level, levelSums.get(level) + tree.getData());
// process subtrees
levelSum(tree.getLeft(), levelSums, level + 1);
levelSum(tree.getRight(), levelSums, level + 1);
}
Now just call levelsSum on a tree and scan the returned list to find the maximum value.
Are You sure that elements will all be non-negative?
I would make it callable like new MaxSumLevel(root).getLevel(). Otherwise, what will You when You have to sometimes return maxSum ?
I would structure this as 2 nested loops:
while(!mainQ.isEmpty()){
while(!mainQ.isEmpty()){
BinaryTreeNode head = (BinaryTreeNode) mainQ.dequeue();
BinaryTreeNode left = head.getLeft();
BinaryTreeNode right = head.getRight();
if (left != null) {
tempQ.enqueue(left);
tempSum = tempSum + left.getData();
}
if (right != null) {
tempQ.enqueue(right);
tempSum = tempSum + right.getData();
}
}
mainQ = tempQ;
tempQ = new Queue();
tempLevel ++;
if (tempSum > maxVal) {
maxVal = tempSum;
maxlevel = tempLevel;
tempSum = 0;
}
}
This recursive approach works for me:
public int findMaxSumRootLeaf(TreeNode node,int currSum) {
if(node == null)
return 0;
return Math.max(findMaxSumRootLeaf(node.leftChild,currSum)+node.data, findMaxSumRootLeaf(node.rightChild,currSum)+node.data);
}
You can represent end of a level using null in the queue and calculating the maximum sum for each level.
public int maxLevelSum(BinaryTreeNode root) {
if (root == null) //if empty tree
return 0;
else {
int current_sum = 0;
int max_sum = 0;
Queue<BinaryTreeNode> queue = new LinkedList<BinaryTreeNode>(); //initialize a queue
queue.offer(root); //add root in queue
queue.offer(null); // null in queue represent end of a level
while (!queue.isEmpty()) {
BinaryTreeNode temp = queue.poll();
if (temp != null) {
if (temp.getLeft() != null) //if left is not null
queue.offer(temp.getLeft());
if (temp.getRight() != null)
queue.offer(temp.getRight()); //if right is not null
current_sum = current_sum + temp.getData(); //add to level current level sum
} else { // we reached end of a level
if (current_sum > max_sum) //check if cuurent level sum is greater than max
max_sum = current_sum;
current_sum = 0; //make current_sum=0 for new level
if (!queue.isEmpty())
queue.offer(null); //completion of a level
}
}
return max_sum; //return the max sum
}
}

Categories

Resources