Related
I am writing my own hashmap. I am able to define put/get methods. When I am trying to resize the array, I am getting null values:
public void resize() {
if (100 * size >= 75 * capacity) {
int oldcap = capacity;
capacity = capacity * 2;
Entry[] resizedBuckets = new Entry[capacity];
for (int i = 0; i < oldcap; i++) {
resizedBuckets[i] = this.buckets[i];
}
this.buckets = resizedBuckets;
}
}
My whole code is as below:
package lib;
class Entry<K, V> {
K key;
V value;
Entry<K, V> next;
Entry(K key, V value, Entry next) {
this.key = key;
this.value = value;
}
}
public class MyMap<K, V> {
Entry[] buckets;
int size = 0;
static int capacity = 2;
MyMap(int capacity) {
this.buckets = new Entry[capacity];
}
public MyMap() {
this(capacity);
}
public void resize() {
if (100 * size >= 75 * capacity) {
int oldcap = capacity;
capacity = capacity * 2;
Entry[] resizedBuckets = new Entry[capacity];
for (int i = 0; i < oldcap; i++) {
resizedBuckets[i] = this.buckets[i];
}
this.buckets = resizedBuckets;
}
}
public void put(K key, V value) {
int bucket = key.hashCode() % capacity;
//i have bucket it
Entry newEntry = new Entry<K, V>(key, value, null);
if (this.buckets[bucket] == null) {
this.buckets[bucket] = newEntry;
size++;
} else {
Entry currentNode = this.buckets[bucket];
Entry prevNode = null;
while (currentNode.next != null && currentNode.key != key) {
prevNode = currentNode;
currentNode = currentNode.next;
}
if (currentNode.key == key) {
if (prevNode == null) {
newEntry.next = currentNode.next;
this.buckets[bucket] = newEntry;
} else {
newEntry.next = currentNode.next;
prevNode.next = newEntry;
}
} else {
currentNode.next = newEntry;
size++;
}
}
resize();
}
public V get(K key) {
int bucket = key.hashCode() % capacity;
Entry currentBucket = this.buckets[bucket];
while (currentBucket != null) {
if (currentBucket.key == key)
return (V) currentBucket.value;
currentBucket = currentBucket.next;
}
return null;
}
}
class MyMain {
public static void main(String[] args) {
MyMap<String, Integer> myMap = new MyMap<>();
myMap.put("name", 1);
myMap.put("2name", 2);
myMap.put("3name", 2);
myMap.put("4name", 3);
myMap.put("5name", 2);
myMap.put("6name", 2);
myMap.put("3name", 3);
System.out.println(myMap);
System.out.println(myMap.get("name"));
System.out.println(myMap.get("2name"));
System.out.println(myMap.get("3name"));
}
}
I am getting following output:
lib.MyMap#30f39991
null
null
3
I should be getting:
1
2
3
What's the reason? I think the issue is with resize method, but I am unable to figure it out.
As mentioned in the comments your bucket index changed as capacity was increased during the resize operation. And thus int bucket = key.hashCode() % capacity; inside put and get returns a different bucket index than before.
I haven't tested it, but I think, if you do
resizedBuckets[this.buckets[i].getKey().hashCode() % capacity] = this.buckets[i];
inside the resize method, this might already work.
As #Voo pointed out, the above change only works as long as you don't have any hash collisions before resizing. For a complete solution you will need to recalculate the index for every key!
Have a look at your code:
public void put(K key, V value) {
int bucket = key.hashCode() % capacity;
...
resize(); // this will modify capacity;
}
public void get(K key) {
int bucket = key.hashCode() % capacity;
...
}
You resize() the map, but store the old hashCode() x capacity key bucket. This invalidates the key somewhat for your get()-method. Because if you would put the same value-key-pair again with your modified map size, it would not land in the same bucket as the old key-value-pair.
You have to store the keys in your resized map with the new capacity instead of simply copying them.
In advance, I apologize for my lack of experience, these are advanced concepts that are difficult to wrap my head around. From what I understand, linear probing is circular, it won't stop until it finds an empty cell.
However I am not sure how to implement it. Some example on how to would be greatly appreciated. Sorry again for the inexperience, I'm not some vetted programmer, I'm picking this up very slowly.
public boolean ContainsElement(V element)
{
for(int i = 0; i < capacity; i++)
{
if(table[i] != null)
{
LinkedList<Entry<K, V>> bucketMethod = table[i];
for(Entry<K, V> entry : bucketMethod)
{
if(entry.getElement().equals(element))
{
return true;
}
}
}
}
return false;
}
Here's a working hash table based on the pseudocode examples found in the Wikipedia article for open addressing.
I think the main differences between the Wikipedia example and mine are:
Treating the hashCode() a little bit due to the way Java does modulo (%) with negative numbers.
Implemented simple resizing logic.
Changed the logic in the remove method a little bit because Java doesn't have goto.
Otherwise, it's more or less just a direct translation.
package mcve;
import java.util.*;
import java.util.stream.*;
public class OAHashTable {
private Entry[] table = new Entry[16]; // Must be >= 4. See findSlot.
private int size = 0;
public int size() {
return size;
}
private int hash(Object key) {
int hashCode = Objects.hashCode(key)
& 0x7F_FF_FF_FF; // <- This is like abs, but it works
// for Integer.MIN_VALUE. We do this
// so that hash(key) % table.length
// is never negative.
return hashCode;
}
private int findSlot(Object key) {
int i = hash(key) % table.length;
// Search until we either find the key, or find an empty slot.
//
// Note: this becomes an infinite loop if the key is not already
// in the table AND every element in the array is occupied.
// With the resizing logic (below), this will only happen
// if the table is smaller than length=4.
while ((table[i] != null) && !Objects.equals(table[i].key, key)) {
i = (i + 1) % table.length;
}
return i;
}
public Object get(Object key) {
int i = findSlot(key);
if (table[i] != null) { // Key is in table.
return table[i].value;
} else { // Key is not in table
return null;
}
}
private boolean tableIsThreeQuartersFull() {
return ((double) size / (double) table.length) >= 0.75;
}
private void resizeTableToTwiceAsLarge() {
Entry[] old = table;
table = new Entry[2 * old.length];
size = 0;
for (Entry e : old) {
if (e != null) {
put(e.key, e.value);
}
}
}
public void put(Object key, Object value) {
int i = findSlot(key);
if (table[i] != null) { // We found our key.
table[i].value = value;
return;
}
if (tableIsThreeQuartersFull()) {
resizeTableToTwiceAsLarge();
i = findSlot(key);
}
table[i] = new Entry(key, value);
++size;
}
public void remove(Object key) {
int i = findSlot(key);
if (table[i] == null) {
return; // Key is not in the table.
}
int j = i;
table[i] = null;
--size;
while (true) {
j = (j + 1) % table.length;
if (table[j] == null) {
break;
}
int k = hash(table[j].key) % table.length;
// Determine if k lies cyclically in (i,j]
// | i.k.j |
// |....j i.k.| or |.k..j i...|
if ( (i<=j) ? ((i<k)&&(k<=j)) : ((i<k)||(k<=j)) ) {
continue;
}
table[i] = table[j];
i = j;
table[i] = null;
}
}
public Stream<Entry> entries() {
return Arrays.stream(table).filter(Objects::nonNull);
}
#Override
public String toString() {
return entries().map(e -> e.key + "=" + e.value)
.collect(Collectors.joining(", ", "{", "}"));
}
public static class Entry {
private Object key;
private Object value;
private Entry(Object key, Object value) {
this.key = key;
this.value = value;
}
public Object getKey() { return key; }
public Object getValue() { return value; }
}
public static void main(String[] args) {
OAHashTable t = new OAHashTable();
t.put("A", 1);
t.put("B", 2);
t.put("C", 3);
System.out.println("size = " + t.size());
System.out.println(t);
t.put("X", 4);
t.put("Y", 5);
t.put("Z", 6);
t.remove("C");
t.remove("B");
t.remove("A");
t.entries().map(e -> e.key)
.map(key -> key + ": " + t.get(key))
.forEach(System.out::println);
}
}
java.util.HashMap implementation of java.util.Map internally provides linear probing that is HashMap can resolve collisions in hash tables.
Least Frequently Used (LFU) is a type of cache algorithm used to manage memory within a computer. The standard characteristics of this method involve the system keeping track of the number of times a block is referenced in memory. When the cache is full and requires more room the system will purge the item with the lowest reference frequency.
What would be the best way to implement a most-recently-used cache of objects, say in Java?
I've already implemented one using LinkedHashMap(by maintaining the no. of times objects are accessed) But I'm curious if any of the new concurrent collections would be better candidates.
Consider this case : Suppose cache is full and we need to make space for another one. Say two objects are noted in cache which are accessed for one time only. Which one to remove if we come to know that other(which is not in cache)object is being accessed for more than once ?
Thanks!
You might benefit from the LFU implementation of ActiveMQ: LFUCache
They have provided some good functionality.
I think, the LFU data structure must combine priority queue (for maintaining fast access to lfu item) and hash map (for providing fast access to any item by its key); I would suggest the following node definition for each object stored in cache:
class Node<T> {
// access key
private int key;
// counter of accesses
private int numAccesses;
// current position in pq
private int currentPos;
// item itself
private T item;
//getters, setters, constructors go here
}
You need key for referring to an item.
You need numAccesses as a key for priority queue.
You need currentPos to be able to quickly find a pq position of item by key.
Now you organize hash map (key(Integer) -> node(Node<T>)) to quickly access items and min heap-based priority queue using number of accesses as priority. Now you can very quickly perform all operations (access, add new item, update number of acceses, remove lfu). You need to write each operation carefully, so that it maintains all the nodes consistent (their number of accesses, their position in pq and there existence in hash map). All operations will work with constant average time complexity which is what you expect from cache.
According to me, the best way to implement a most-recently-used cache of objects would be to include a new variable as 'latestTS' for each object. TS stands for timestamp.
// A static method that returns the current date and time as milliseconds since January 1st 1970
long latestTS = System.currentTimeMillis();
ConcurrentLinkedHashMap is not yet implemented in Concurrent Java Collections.
(Ref: Java Concurrent Collection API). However, you can try and use ConcurrentHashMap and DoublyLinkedList
About the case to be considered: in such case, as I have said that you can declare latestTS variable, based upon the value of latestTS variable, you can remove an entry and add the new object. (Don't forget to update frequency and latestTS of the new object added)
As you have mentioned, you can use LinkedHashMap as it gives element access in O(1) and also, you get the order traversal.
Please, find the below code for LFU Cache:
(PS: The below code is the answer for the question in the title i.e. "How to implement LFU cache")
import java.util.LinkedHashMap;
import java.util.Map;
public class LFUCache {
class CacheEntry
{
private String data;
private int frequency;
// default constructor
private CacheEntry()
{}
public String getData() {
return data;
}
public void setData(String data) {
this.data = data;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}
private static int initialCapacity = 10;
private static LinkedHashMap<Integer, CacheEntry> cacheMap = new LinkedHashMap<Integer, CacheEntry>();
/* LinkedHashMap is used because it has features of both HashMap and LinkedList.
* Thus, we can get an entry in O(1) and also, we can iterate over it easily.
* */
public LFUCache(int initialCapacity)
{
this.initialCapacity = initialCapacity;
}
public void addCacheEntry(int key, String data)
{
if(!isFull())
{
CacheEntry temp = new CacheEntry();
temp.setData(data);
temp.setFrequency(0);
cacheMap.put(key, temp);
}
else
{
int entryKeyToBeRemoved = getLFUKey();
cacheMap.remove(entryKeyToBeRemoved);
CacheEntry temp = new CacheEntry();
temp.setData(data);
temp.setFrequency(0);
cacheMap.put(key, temp);
}
}
public int getLFUKey()
{
int key = 0;
int minFreq = Integer.MAX_VALUE;
for(Map.Entry<Integer, CacheEntry> entry : cacheMap.entrySet())
{
if(minFreq > entry.getValue().frequency)
{
key = entry.getKey();
minFreq = entry.getValue().frequency;
}
}
return key;
}
public String getCacheEntry(int key)
{
if(cacheMap.containsKey(key)) // cache hit
{
CacheEntry temp = cacheMap.get(key);
temp.frequency++;
cacheMap.put(key, temp);
return temp.data;
}
return null; // cache miss
}
public static boolean isFull()
{
if(cacheMap.size() == initialCapacity)
return true;
return false;
}
}
Here's the o(1) implementation for LFU - http://dhruvbird.com/lfu.pdf
I have tried to implement this below LFU cache implementation. Took reference from this -
LFU paper. My implementation is working nicely.
If anyone wants to provide any further suggestion to improve it again, please let me know.
import java.util.HashMap;
import java.util.Map;
import java.util.Objects;
import java.util.TreeMap;
public class LFUCacheImplementation {
private Map<Integer, Node> cache = new HashMap<>();
private Map<Integer, Integer> counts = new HashMap<>();
private TreeMap<Integer, DoublyLinkedList> frequencies = new TreeMap<>();
private final int CAPACITY;
public LFUCache(int capacity) {
this.CAPACITY = capacity;
}
public int get(int key) {
if (!cache.containsKey(key)) {
return -1;
}
Node node = cache.get(key);
int frequency = counts.get(key);
frequencies.get(frequency).remove(new Node(node.key(), node.value()));
removeFreq(frequency);
frequencies.computeIfAbsent(frequency + 1, k -> new DoublyLinkedList()).add(new Node(node.key(), node.value()));
counts.put(key, frequency + 1);
return cache.get(key).value();
}
public void set(int key, int value) {
if (!cache.containsKey(key)) {
Node node = new Node(key, value);
if (cache.size() == CAPACITY) {
int l_count = frequencies.firstKey();
Node deleteThisNode = frequencies.get(l_count).head();
frequencies.get(l_count).remove(deleteThisNode);
int deleteThisKey = deleteThisNode.key();
removeFreq(l_count);
cache.remove(deleteThisKey);
counts.remove(deleteThisKey);
}
cache.put(key, node);
counts.put(key, 1);
frequencies.computeIfAbsent(1, k -> new DoublyLinkedList()).add(node);
}
}
private void removeFreq(int frequency) {
if (frequencies.get(frequency).size() == 0) {
frequencies.remove(frequency);
}
}
public Map<Integer, Node> getCache() {
return cache;
}
public Map<Integer, Integer> getCounts() {
return counts;
}
public TreeMap<Integer, DoublyLinkedList> getFrequencies() {
return frequencies;
}
}
class Node {
private int key;
private int value;
private Node next;
private Node prev;
public Node(int key, int value) {
this.key = key;
this.value = value;
}
public Node getNext() {
return next;
}
public void setNext(Node next) {
this.next = next;
}
public Node getPrev() {
return prev;
}
public void setPrev(Node prev) {
this.prev = prev;
}
public int key() {
return key;
}
public int value() {
return value;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Node)) return false;
Node node = (Node) o;
return key == node.key &&
value == node.value;
}
#Override
public int hashCode() {
return Objects.hash(key, value);
}
#Override
public String toString() {
return "Node{" +
"key=" + key +
", value=" + value +
'}';
}
}
class DoublyLinkedList {
private int size;
private Node head;
private Node tail;
public void add(Node node) {
if (null == head) {
head = node;
} else {
tail.setNext(node);
node.setPrev(tail);
}
tail = node;
size++;
}
public void remove(Node node) {
if(null == head || null == node) {
return;
}
if(this.size() == 1 && head.equals(node)) {
head = null;
tail = null;
} else if (head.equals(node)) {
head = node.getNext();
head.setPrev(null);
} else if (tail.equals(node)) {
Node prevToTail = tail.getPrev();
prevToTail.setNext(null);
tail = prevToTail;
} else {
Node current = head.getNext();
while(!current.equals(tail)) {
if(current.equals(node)) {
Node prevToCurrent = current.getPrev();
Node nextToCurrent = current.getNext();
prevToCurrent.setNext(nextToCurrent);
nextToCurrent.setPrev(prevToCurrent);
break;
}
current = current.getNext();
}
}
size--;
}
public Node head() {
return head;
}
public int size() {
return size;
}
}
Client code to use the above cache implementation -
import java.util.Map;
public class Client {
public static void main(String[] args) {
Client client = new Client();
LFUCache cache = new LFUCache(4);
cache.set(11, function(11));
cache.set(12, function(12));
cache.set(13, function(13));
cache.set(14, function(14));
cache.set(15, function(15));
client.print(cache.getFrequencies());
cache.get(13);
cache.get(13);
cache.get(13);
cache.get(14);
cache.get(14);
cache.get(14);
cache.get(14);
client.print(cache.getCache());
client.print(cache.getCounts());
client.print(cache.getFrequencies());
}
public void print(Map<Integer, ? extends Object> map) {
for(Map.Entry<Integer, ? extends Object> entry : map.entrySet()) {
if(entry.getValue() instanceof Node) {
System.out.println("Cache Key => "+entry.getKey()+", Cache Value => "+((Node) entry.getValue()).toString());
} else if (entry.getValue() instanceof DoublyLinkedList) {
System.out.println("Frequency Key => "+entry.getKey()+" Frequency Values => [");
Node head = ((DoublyLinkedList) entry.getValue()).head();
while(null != head) {
System.out.println(head.toString());
head = head.getNext();
}
System.out.println(" ]");
} else {
System.out.println("Count Key => "+entry.getKey()+", Count Value => "+entry.getValue());
}
}
}
public static int function(int key) {
int prime = 31;
return key*prime;
}
}
How about a priority queue? You can keep elements sorted there with keys representing the frequency. Just update the object position in the queue after visiting it. You can update just from time to time for optimizing the performance (but reducing precision).
Many implementations I have seen have runtime complexity O(log(n)). This means, when the cache size is n, the time needed to insert/remove an element into/from chache is logarithmic. Such implementations use usually a min heap to maintain usage frequencies of elements. The root of the heap contains the element with lowest frequency, and can be accessed in O(1) time. But to maintain the heap property we have to move an element, every time it is used (and frequency is incremented) inside of the heap, to place it into proper position, or when we have to insert new element into the cache (and so put it into the heap).
But the runtime complexity can be reduced to O(1), when we maintain a hashmap (Java) or unordered_map (C++) with the element as key. Additinally we need two sorts of lists, frequency list and elements lists. The elements lists contain elements that have same frequency, and the frequency list contain the element lists.
frequency list
1 3 6 7
a k y x
c l z
m n
Here in the example we see the frequency list that has 4 elements (4 elements lists). The element list 1 contains elements (a,c,m), the elements list 3 contains elements (k, l, n) etc.
Now, when we use say element y, we have to increment its frequency and put it in the next list. Because the elements list with frequency 6 becomes empty, we delete it. The result is:
frequency list
1 3 7
a k y
c l x
m n z
We place the element y in the begin of the elements list 7. When we have to remove elements from the list later, we will start from the end (first z, then x and then y).
Now, when we use element n, we have to increment its frequency and put it into the new list, with frequencies 4:
frequency list
1 3 4 7
a k n y
c l x
m z
I hope the idea is clear. I provide now my C++ implementation of the LFU cache, and will add later a Java implementation.
The class has just 2 public methods, void set(key k, value v)
and bool get(key k, value &v). In the get method the value to retrieve will be set per reference when the element is found, in this case the method returns true. When the element is not found, the method returns false.
#include<unordered_map>
#include<list>
using namespace std;
typedef unsigned uint;
template<typename K, typename V = K>
struct Entry
{
K key;
V value;
};
template<typename K, typename V = K>
class LFUCache
{
typedef typename list<typename Entry<K, V>> ElementList;
typedef typename list <pair <uint, ElementList>> FrequencyList;
private:
unordered_map <K, pair<typename FrequencyList::iterator, typename ElementList::iterator>> cacheMap;
FrequencyList elements;
uint maxSize;
uint curSize;
void incrementFrequency(pair<typename FrequencyList::iterator, typename ElementList::iterator> p) {
if (p.first == prev(elements.end())) {
//frequency list contains single list with some frequency, create new list with incremented frequency (p.first->first + 1)
elements.push_back({ p.first->first + 1, { {p.second->key, p.second->value} } });
// erase and insert the key with new iterator pair
cacheMap[p.second->key] = { prev(elements.end()), prev(elements.end())->second.begin() };
}
else {
// there exist element(s) with higher frequency
auto pos = next(p.first);
if (p.first->first + 1 == pos->first)
// same frequency in the next list, add the element in the begin
pos->second.push_front({ p.second->key, p.second->value });
else
// insert new list before next list
pos = elements.insert(pos, { p.first->first + 1 , {{p.second->key, p.second->value}} });
// update cachMap iterators
cacheMap[p.second->key] = { pos, pos->second.begin() };
}
// if element list with old frequency contained this singe element, erase the list from frequency list
if (p.first->second.size() == 1)
elements.erase(p.first);
else
// erase only the element with updated frequency from the old list
p.first->second.erase(p.second);
}
void eraseOldElement() {
if (elements.size() > 0) {
auto key = prev(elements.begin()->second.end())->key;
if (elements.begin()->second.size() < 2)
elements.erase(elements.begin());
else
elements.begin()->second.erase(prev(elements.begin()->second.end()));
cacheMap.erase(key);
curSize--;
}
}
public:
LFUCache(uint size) {
if (size > 0)
maxSize = size;
else
maxSize = 10;
curSize = 0;
}
void set(K key, V value) {
auto entry = cacheMap.find(key);
if (entry == cacheMap.end()) {
if (curSize == maxSize)
eraseOldElement();
if (elements.begin() == elements.end()) {
elements.push_front({ 1, { {key, value} } });
}
else if (elements.begin()->first == 1) {
elements.begin()->second.push_front({ key,value });
}
else {
elements.push_front({ 1, { {key, value} } });
}
cacheMap.insert({ key, {elements.begin(), elements.begin()->second.begin()} });
curSize++;
}
else {
entry->second.second->value = value;
incrementFrequency(entry->second);
}
}
bool get(K key, V &value) {
auto entry = cacheMap.find(key);
if (entry == cacheMap.end())
return false;
value = entry->second.second->value;
incrementFrequency(entry->second);
return true;
}
};
Here are examples of usage:
int main()
{
LFUCache<int>cache(3); // cache of size 3
cache.set(1, 1);
cache.set(2, 2);
cache.set(3, 3);
cache.set(2, 4);
rc = cache.get(1, r);
assert(rc);
assert(r == 1);
// evict old element, in this case 3
cache.set(4, 5);
rc = cache.get(3, r);
assert(!rc);
rc = cache.get(4, r);
assert(rc);
assert(r == 5);
LFUCache<int, string>cache2(2);
cache2.set(1, "one");
cache2.set(2, "two");
string val;
rc = cache2.get(1, val);
if (rc)
assert(val == "one");
else
assert(false);
cache2.set(3, "three"); // evict 2
rc = cache2.get(2, val);
assert(rc == false);
rc = cache2.get(3, val);
assert(rc);
assert(val == "three");
}
Here is a simple implementation of LFU cache in Go/Golang based on here.
import "container/list"
type LFU struct {
cache map[int]*list.Element
freqQueue map[int]*list.List
cap int
maxFreq int
lowestFreq int
}
type entry struct {
key, val int
freq int
}
func NewLFU(capacity int) *LFU {
return &LFU{
cache: make(map[int]*list.Element),
freqQueue: make(map[int]*list.List),
cap: capacity,
maxFreq: capacity - 1,
lowestFreq: 0,
}
}
// O(1)
func (c *LFU) Get(key int) int {
if e, ok := c.cache[key]; ok {
val := e.Value.(*entry).val
c.updateEntry(e, val)
return val
}
return -1
}
// O(1)
func (c *LFU) Put(key int, value int) {
if e, ok := c.cache[key]; ok {
c.updateEntry(e, value)
} else {
if len(c.cache) == c.cap {
c.evict()
}
if c.freqQueue[0] == nil {
c.freqQueue[0] = list.New()
}
e := c.freqQueue[0].PushFront(&entry{key, value, 0})
c.cache[key] = e
c.lowestFreq = 0
}
}
func (c *LFU) updateEntry(e *list.Element, val int) {
key := e.Value.(*entry).key
curFreq := e.Value.(*entry).freq
c.freqQueue[curFreq].Remove(e)
delete(c.cache, key)
nextFreq := curFreq + 1
if nextFreq > c.maxFreq {
nextFreq = c.maxFreq
}
if c.lowestFreq == curFreq && c.freqQueue[curFreq].Len() == 0 {
c.lowestFreq = nextFreq
}
if c.freqQueue[nextFreq] == nil {
c.freqQueue[nextFreq] = list.New()
}
newE := c.freqQueue[nextFreq].PushFront(&entry{key, val, nextFreq})
c.cache[key] = newE
}
func (c *LFU) evict() {
back := c.freqQueue[c.lowestFreq].Back()
delete(c.cache, back.Value.(*entry).key)
c.freqQueue[c.lowestFreq].Remove(back)
}
So I have a HashTable implementation here that I wrote using only Arrays and had a little bit of help with the code. Unfortunately, I don't quite understand one of the lines someone added while running the "get" or "put" method. What exactly is happening in the while loop below? It is a method for linear probing correct? Also why is the loop checking the conditions it's checking?
Specifically,
int hash = hashThis(key);
while(data[hash] != AVAILABLE && data[hash].key() != key) {
hash = (hash + 1) % capacity;
}
Here's the whole Java class below for full reference.
public class Hashtable2 {
private Node[] data;
private int capacity;
private static final Node AVAILABLE = new Node("Available", null);
public Hashtable2(int capacity) {
this.capacity = capacity;
data = new Node[capacity];
for(int i = 0; i < data.length; i++) {
data[i] = AVAILABLE;
}
}
public int hashThis(String key) {
return key.hashCode() % capacity;
}
public Object get(String key) {
int hash = hashThis(key);
while(data[hash] != AVAILABLE && data[hash].key() != key) {
hash = (hash + 1) % capacity;
}
return data[hash].element();
}
public void put(String key, Object element) {
if(key != null) {
int hash = hashThis(key);
while(data[hash] != AVAILABLE && data[hash].key() != key) {
hash = (hash + 1) % capacity;
}
data[hash] = new Node(key, element);
}
}
public String toString(){
String s="<";
for (int i=0;i<this.capacity;i++)
{
s+=data[i]+", ";
}
s+=">";
return s;
}
Thank you.
I just rewrote some part of the code and added the findHash-method - try to avoid code-duplication!
private int findHash(String key) {
int hash = hashThis(key);
// search for the next available element or for the next matching key
while(data[hash] != AVAILABLE && data[hash].key() != key) {
hash = (hash + 1) % capacity;
}
return hash;
}
public Object get(String key) {
return data[findHash(key)].element();
}
public void put(String key, Object element) {
data[findHash(key)] = new Node(key, element);
}
What you asked for is - what exactly does this findHash-loop? The data was initialized with AVAILABLE - meaning: the data does not (yet) contain any actual data. Now - when we add an element with put - first a hashValue is calculated, that is just an index in the data array where to put the data. Now - if we encounter that the position has already been taken by another element with the same hash value but a different key, we try to find the next AVAILABLE position. And the get method essentially works the same - if a data element with a different key is detected, the next element is probed and so on.
The data itself is a so called ring-buffer. That is, it is searched until the end of the array and is next search again at the beginning, starting with index 0. This is done with the modulo % operator.
Alright?
Sample Hashtable implementation using Generics and Linear Probing for collision resolution. There are some assumptions made during implementation and they are documented in javadoc above class and methods.
This implementation doesn't have all the methods of Hashtable like keySet, putAll etc but covers most frequently used methods like get, put, remove, size etc.
There is repetition of code in get, put and remove to find the index and it can be improved to have a new method to find index.
class HashEntry<K, V> {
private K key;
private V value;
public HashEntry(K key, V value) {
this.key = key;
this.value = value;
}
public void setKey(K key) { this.key = key; }
public K getKey() { return this.key; }
public void setValue(V value) { this.value = value; }
public V getValue() { return this.value; }
}
/**
* Hashtable implementation ...
* - with linear probing
* - without loadfactor & without rehash implementation.
* - throws exception when table is full
* - returns null when trying to remove non existent key
*
* #param <K>
* #param <V>
*/
public class Hashtable<K, V> {
private final static int DEFAULT_CAPACITY = 16;
private int count;
private int capacity;
private HashEntry<K, V>[] table;
public Hashtable() {
this(DEFAULT_CAPACITY);
}
public Hashtable(int capacity) {
super();
this.capacity = capacity;
table = new HashEntry[capacity];
}
public boolean isEmpty() { return (count == 0); }
public int size() { return count; }
public void clear() { table = new HashEntry[this.capacity]; count = 0; }
/**
* Returns null if either probe count is higher than capacity else couldn't find the element.
*
* #param key
* #return
*/
public V get(K key) {
V value = null;
int probeCount = 0;
int hash = this.hashCode(key);
while (table[hash] != null && !table[hash].getKey().equals(key) && probeCount <= this.capacity) {
hash = (hash + 1) % this.capacity;
probeCount++;
}
if (table[hash] != null && probeCount <= this.capacity) {
value = table[hash].getValue();
}
return value;
}
/**
* Check on the no of probes done and terminate if probe count reaches to its capacity.
*
* Throw Exception if table is full.
*
* #param key
* #param value
* #return
* #throws Exception
*/
public V put(K key, V value) throws Exception {
int probeCount = 0;
int hash = this.hashCode(key);
while (table[hash] != null && !table[hash].getKey().equals(key) && probeCount <= this.capacity) {
hash = (hash + 1) % this.capacity;
probeCount++;
}
if (probeCount <= this.capacity) {
if (table[hash] != null) {
table[hash].setValue(value);
} else {
table[hash] = new HashEntry(key, value);
count++;
}
return table[hash].getValue();
} else {
throw new Exception("Table Full!!");
}
}
/**
* If key present then mark table[hash] = null & return value, else return null.
*
* #param key
* #return
*/
public V remove(K key) {
V value = null;
int probeCount = 0;
int hash = this.hashCode(key);
while (table[hash] != null && !table[hash].getKey().equals(key) && probeCount <= this.capacity) {
hash = (hash + 1) % this.capacity;
probeCount++;
}
if (table[hash] != null && probeCount <= this.capacity) {
value = table[hash].getValue();
table[hash] = null;
count--;
}
return value;
}
public boolean contains(Object value) {
return this.containsValue(value);
}
public boolean containsKey(Object key) {
for (HashEntry<K, V> entry : table) {
if (entry != null && entry.getKey().equals(key)) {
return true;
}
}
return false;
}
public boolean containsValue(Object value) {
for (HashEntry<K, V> entry : table) {
if (entry != null && entry.getValue().equals(value)) {
return true;
}
}
return false;
}
#Override
public String toString() {
StringBuilder data = new StringBuilder();
data.append("{");
for (HashEntry<K, V> entry : table) {
if (entry != null) {
data.append(entry.getKey()).append("=").append(entry.getValue()).append(", ");
}
}
if (data.toString().endsWith(", ")) {
data.delete(data.length() - 2, data.length());
}
data.append("}");
return data.toString();
}
private int hashCode(K key) { return (key.hashCode() % this.capacity); }
public static void main(String[] args) throws Exception {
Hashtable<Integer, String> table = new Hashtable<Integer, String>(2);
table.put(1, "1");
table.put(2, "2");
System.out.println(table);
table.put(1, "3");
table.put(2, "4");
System.out.println(table);
table.remove(1);
System.out.println(table);
table.put(1, "1");
System.out.println(table);
System.out.println(table.get(1));
System.out.println(table.get(3));
// table is full so below line
// will throw an exception
table.put(3, "2");
}
}
Sample run of the above code.
{2=2, 1=1}
{2=4, 1=3}
{2=4}
{2=4, 1=1}
1
null
Exception in thread "main" java.lang.Exception: Table Full!!
at Hashtable.put(Hashtable.java:95)
at Hashtable.main(Hashtable.java:177)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have a Java program that stores a lot of mappings from Strings to various objects.
Right now, my options are either to rely on hashing (via HashMap) or on binary searches (via TreeMap). I am wondering if there is an efficient and standard trie-based map implementation in a popular and quality collections library?
I've written my own in the past, but I'd rather go with something standard, if available.
Quick clarification: While my question is general, in the current project I am dealing with a lot of data that is indexed by fully-qualified class name or method signature. Thus, there are many shared prefixes.
You might want to look at the Trie implementation that Limewire is contributing to the Google Guava.
There is no trie data structure in the core Java libraries.
This may be because tries are usually designed to store character strings, while Java data structures are more general, usually holding any Object (defining equality and a hash operation), though they are sometimes limited to Comparable objects (defining an order). There's no common abstraction for "a sequence of symbols," although CharSequence is suitable for character strings, and I suppose you could do something with Iterable for other types of symbols.
Here's another point to consider: when trying to implement a conventional trie in Java, you are quickly confronted with the fact that Java supports Unicode. To have any sort of space efficiency, you have to restrict the strings in your trie to some subset of symbols, or abandon the conventional approach of storing child nodes in an array indexed by symbol. This might be another reason why tries are not considered general-purpose enough for inclusion in the core library, and something to watch out for if you implement your own or use a third-party library.
Apache Commons Collections v4.0 now supports trie structures.
See the org.apache.commons.collections4.trie package info for more information. In particular, check the PatriciaTrie class:
Implementation of a PATRICIA Trie (Practical Algorithm to Retrieve Information Coded in Alphanumeric).
A PATRICIA Trie is a compressed Trie. Instead of storing all data at the edges of the Trie (and having empty internal nodes), PATRICIA stores data in every node. This allows for very efficient traversal, insert, delete, predecessor, successor, prefix, range, and select(Object) operations. All operations are performed at worst in O(K) time, where K is the number of bits in the largest item in the tree. In practice, operations actually take O(A(K)) time, where A(K) is the average number of bits of all items in the tree.
Also check out concurrent-trees. They support both Radix and Suffix trees and are designed for high concurrency environments.
I wrote and published a simple and fast implementation here.
What you need is org.apache.commons.collections.FastTreeMap , I think.
Below is a basic HashMap implementation of a Trie. Some people might find this useful...
class Trie {
HashMap<Character, HashMap> root;
public Trie() {
root = new HashMap<Character, HashMap>();
}
public void addWord(String word) {
HashMap<Character, HashMap> node = root;
for (int i = 0; i < word.length(); i++) {
Character currentLetter = word.charAt(i);
if (node.containsKey(currentLetter) == false) {
node.put(currentLetter, new HashMap<Character, HashMap>());
}
node = node.get(currentLetter);
}
}
public boolean containsPrefix(String word) {
HashMap<Character, HashMap> node = root;
for (int i = 0; i < word.length(); i++) {
Character currentLetter = word.charAt(i);
if (node.containsKey(currentLetter)) {
node = node.get(currentLetter);
} else {
return false;
}
}
return true;
}
}
Apache's commons collections:
org.apache.commons.collections4.trie.PatriciaTrie
You can try the Completely Java library, it features a PatriciaTrie implementation. The API is small and easy to get started, and it's available in the Maven central repository.
You might look at this TopCoder one as well (registration required...).
If you required sorted map, then tries are worthwhile.
If you don't then hashmap is better.
Hashmap with string keys can be improved over the standard Java implementation:
Array hash map
If you're not worried about pulling in the Scala library, you can use this space efficient implementation I wrote of a burst trie.
https://github.com/nbauernfeind/scala-burst-trie
here is my implementation, enjoy it via: GitHub - MyTrie.java
/* usage:
MyTrie trie = new MyTrie();
trie.insert("abcde");
trie.insert("abc");
trie.insert("sadas");
trie.insert("abc");
trie.insert("wqwqd");
System.out.println(trie.contains("abc"));
System.out.println(trie.contains("abcd"));
System.out.println(trie.contains("abcdefg"));
System.out.println(trie.contains("ab"));
System.out.println(trie.getWordCount("abc"));
System.out.println(trie.getAllDistinctWords());
*/
import java.util.*;
public class MyTrie {
private class Node {
public int[] next = new int[26];
public int wordCount;
public Node() {
for(int i=0;i<26;i++) {
next[i] = NULL;
}
wordCount = 0;
}
}
private int curr;
private Node[] nodes;
private List<String> allDistinctWords;
public final static int NULL = -1;
public MyTrie() {
nodes = new Node[100000];
nodes[0] = new Node();
curr = 1;
}
private int getIndex(char c) {
return (int)(c - 'a');
}
private void depthSearchWord(int x, String currWord) {
for(int i=0;i<26;i++) {
int p = nodes[x].next[i];
if(p != NULL) {
String word = currWord + (char)(i + 'a');
if(nodes[p].wordCount > 0) {
allDistinctWords.add(word);
}
depthSearchWord(p, word);
}
}
}
public List<String> getAllDistinctWords() {
allDistinctWords = new ArrayList<String>();
depthSearchWord(0, "");
return allDistinctWords;
}
public int getWordCount(String str) {
int len = str.length();
int p = 0;
for(int i=0;i<len;i++) {
int j = getIndex(str.charAt(i));
if(nodes[p].next[j] == NULL) {
return 0;
}
p = nodes[p].next[j];
}
return nodes[p].wordCount;
}
public boolean contains(String str) {
int len = str.length();
int p = 0;
for(int i=0;i<len;i++) {
int j = getIndex(str.charAt(i));
if(nodes[p].next[j] == NULL) {
return false;
}
p = nodes[p].next[j];
}
return nodes[p].wordCount > 0;
}
public void insert(String str) {
int len = str.length();
int p = 0;
for(int i=0;i<len;i++) {
int j = getIndex(str.charAt(i));
if(nodes[p].next[j] == NULL) {
nodes[curr] = new Node();
nodes[p].next[j] = curr;
curr++;
}
p = nodes[p].next[j];
}
nodes[p].wordCount++;
}
}
I have just tried my own Concurrent TRIE implementation but not based on characters, it is based on HashCode. Still We can use this having Map of Map for each CHAR hascode.
You can test this using the code # https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapPerformanceTest.java
https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapValidationTest.java
import java.util.concurrent.atomic.AtomicReferenceArray;
public class TrieMap {
public static int SIZEOFEDGE = 4;
public static int OSIZE = 5000;
}
abstract class Node {
public Node getLink(String key, int hash, int level){
throw new UnsupportedOperationException();
}
public Node createLink(int hash, int level, String key, String val) {
throw new UnsupportedOperationException();
}
public Node removeLink(String key, int hash, int level){
throw new UnsupportedOperationException();
}
}
class Vertex extends Node {
String key;
volatile String val;
volatile Vertex next;
public Vertex(String key, String val) {
this.key = key;
this.val = val;
}
#Override
public boolean equals(Object obj) {
Vertex v = (Vertex) obj;
return this.key.equals(v.key);
}
#Override
public int hashCode() {
return key.hashCode();
}
#Override
public String toString() {
return key +"#"+key.hashCode();
}
}
class Edge extends Node {
volatile AtomicReferenceArray<Node> array; //This is needed to ensure array elements are volatile
public Edge(int size) {
array = new AtomicReferenceArray<Node>(8);
}
#Override
public Node getLink(String key, int hash, int level){
int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
Node returnVal = array.get(index);
for(;;) {
if(returnVal == null) {
return null;
}
else if((returnVal instanceof Vertex)) {
Vertex node = (Vertex) returnVal;
for(;node != null; node = node.next) {
if(node.key.equals(key)) {
return node;
}
}
return null;
} else { //instanceof Edge
level = level + 1;
index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
Edge e = (Edge) returnVal;
returnVal = e.array.get(index);
}
}
}
#Override
public Node createLink(int hash, int level, String key, String val) { //Remove size
for(;;) { //Repeat the work on the current node, since some other thread modified this node
int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
Node nodeAtIndex = array.get(index);
if ( nodeAtIndex == null) {
Vertex newV = new Vertex(key, val);
boolean result = array.compareAndSet(index, null, newV);
if(result == Boolean.TRUE) {
return newV;
}
//continue; since new node is inserted by other thread, hence repeat it.
}
else if(nodeAtIndex instanceof Vertex) {
Vertex vrtexAtIndex = (Vertex) nodeAtIndex;
int newIndex = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, vrtexAtIndex.hashCode(), level+1);
int newIndex1 = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level+1);
Edge edge = new Edge(Base10ToBaseX.Base.BASE8.getLevelZeroMask()+1);
if(newIndex != newIndex1) {
Vertex newV = new Vertex(key, val);
edge.array.set(newIndex, vrtexAtIndex);
edge.array.set(newIndex1, newV);
boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
if(result == Boolean.TRUE) {
return newV;
}
//continue; since vrtexAtIndex may be removed or changed to Edge already.
} else if(vrtexAtIndex.key.hashCode() == hash) {//vrtex.hash == hash) { HERE newIndex == newIndex1
synchronized (vrtexAtIndex) {
boolean result = array.compareAndSet(index, vrtexAtIndex, vrtexAtIndex); //Double check this vertex is not removed.
if(result == Boolean.TRUE) {
Vertex prevV = vrtexAtIndex;
for(;vrtexAtIndex != null; vrtexAtIndex = vrtexAtIndex.next) {
prevV = vrtexAtIndex; // prevV is used to handle when vrtexAtIndex reached NULL
if(vrtexAtIndex.key.equals(key)){
vrtexAtIndex.val = val;
return vrtexAtIndex;
}
}
Vertex newV = new Vertex(key, val);
prevV.next = newV; // Within SYNCHRONIZATION since prevV.next may be added with some other.
return newV;
}
//Continue; vrtexAtIndex got changed
}
} else { //HERE newIndex == newIndex1 BUT vrtex.hash != hash
edge.array.set(newIndex, vrtexAtIndex);
boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
if(result == Boolean.TRUE) {
return edge.createLink(hash, (level + 1), key, val);
}
}
}
else { //instanceof Edge
return nodeAtIndex.createLink(hash, (level + 1), key, val);
}
}
}
#Override
public Node removeLink(String key, int hash, int level){
for(;;) {
int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
Node returnVal = array.get(index);
if(returnVal == null) {
return null;
}
else if((returnVal instanceof Vertex)) {
synchronized (returnVal) {
Vertex node = (Vertex) returnVal;
if(node.next == null) {
if(node.key.equals(key)) {
boolean result = array.compareAndSet(index, node, null);
if(result == Boolean.TRUE) {
return node;
}
continue; //Vertex may be changed to Edge
}
return null; //Nothing found; This is not the same vertex we are looking for. Here hashcode is same but key is different.
} else {
if(node.key.equals(key)) { //Removing the first node in the link
boolean result = array.compareAndSet(index, node, node.next);
if(result == Boolean.TRUE) {
return node;
}
continue; //Vertex(node) may be changed to Edge, so try again.
}
Vertex prevV = node; // prevV is used to handle when vrtexAtIndex is found and to be removed from its previous
node = node.next;
for(;node != null; prevV = node, node = node.next) {
if(node.key.equals(key)) {
prevV.next = node.next; //Removing other than first node in the link
return node;
}
}
return null; //Nothing found in the linked list.
}
}
} else { //instanceof Edge
return returnVal.removeLink(key, hash, (level + 1));
}
}
}
}
class Base10ToBaseX {
public static enum Base {
/**
* Integer is represented in 32 bit in 32 bit machine.
* There we can split this integer no of bits into multiples of 1,2,4,8,16 bits
*/
BASE2(1,1,32), BASE4(3,2,16), BASE8(7,3,11)/* OCTAL*/, /*BASE10(3,2),*/
BASE16(15, 4, 8){
public String getFormattedValue(int val){
switch(val) {
case 10:
return "A";
case 11:
return "B";
case 12:
return "C";
case 13:
return "D";
case 14:
return "E";
case 15:
return "F";
default:
return "" + val;
}
}
}, /*BASE32(31,5,1),*/ BASE256(255, 8, 4), /*BASE512(511,9),*/ Base65536(65535, 16, 2);
private int LEVEL_0_MASK;
private int LEVEL_1_ROTATION;
private int MAX_ROTATION;
Base(int levelZeroMask, int levelOneRotation, int maxPossibleRotation) {
this.LEVEL_0_MASK = levelZeroMask;
this.LEVEL_1_ROTATION = levelOneRotation;
this.MAX_ROTATION = maxPossibleRotation;
}
int getLevelZeroMask(){
return LEVEL_0_MASK;
}
int getLevelOneRotation(){
return LEVEL_1_ROTATION;
}
int getMaxRotation(){
return MAX_ROTATION;
}
String getFormattedValue(int val){
return "" + val;
}
}
public static int getBaseXValueOnAtLevel(Base base, int on, int level) {
if(level > base.getMaxRotation() || level < 1) {
return 0; //INVALID Input
}
int rotation = base.getLevelOneRotation();
int mask = base.getLevelZeroMask();
if(level > 1) {
rotation = (level-1) * rotation;
mask = mask << rotation;
} else {
rotation = 0;
}
return (on & mask) >>> rotation;
}
}