How to implement a Least Frequently Used (LFU) cache? - java
Least Frequently Used (LFU) is a type of cache algorithm used to manage memory within a computer. The standard characteristics of this method involve the system keeping track of the number of times a block is referenced in memory. When the cache is full and requires more room the system will purge the item with the lowest reference frequency.
What would be the best way to implement a most-recently-used cache of objects, say in Java?
I've already implemented one using LinkedHashMap(by maintaining the no. of times objects are accessed) But I'm curious if any of the new concurrent collections would be better candidates.
Consider this case : Suppose cache is full and we need to make space for another one. Say two objects are noted in cache which are accessed for one time only. Which one to remove if we come to know that other(which is not in cache)object is being accessed for more than once ?
Thanks!
You might benefit from the LFU implementation of ActiveMQ: LFUCache
They have provided some good functionality.
I think, the LFU data structure must combine priority queue (for maintaining fast access to lfu item) and hash map (for providing fast access to any item by its key); I would suggest the following node definition for each object stored in cache:
class Node<T> {
// access key
private int key;
// counter of accesses
private int numAccesses;
// current position in pq
private int currentPos;
// item itself
private T item;
//getters, setters, constructors go here
}
You need key for referring to an item.
You need numAccesses as a key for priority queue.
You need currentPos to be able to quickly find a pq position of item by key.
Now you organize hash map (key(Integer) -> node(Node<T>)) to quickly access items and min heap-based priority queue using number of accesses as priority. Now you can very quickly perform all operations (access, add new item, update number of acceses, remove lfu). You need to write each operation carefully, so that it maintains all the nodes consistent (their number of accesses, their position in pq and there existence in hash map). All operations will work with constant average time complexity which is what you expect from cache.
According to me, the best way to implement a most-recently-used cache of objects would be to include a new variable as 'latestTS' for each object. TS stands for timestamp.
// A static method that returns the current date and time as milliseconds since January 1st 1970
long latestTS = System.currentTimeMillis();
ConcurrentLinkedHashMap is not yet implemented in Concurrent Java Collections.
(Ref: Java Concurrent Collection API). However, you can try and use ConcurrentHashMap and DoublyLinkedList
About the case to be considered: in such case, as I have said that you can declare latestTS variable, based upon the value of latestTS variable, you can remove an entry and add the new object. (Don't forget to update frequency and latestTS of the new object added)
As you have mentioned, you can use LinkedHashMap as it gives element access in O(1) and also, you get the order traversal.
Please, find the below code for LFU Cache:
(PS: The below code is the answer for the question in the title i.e. "How to implement LFU cache")
import java.util.LinkedHashMap;
import java.util.Map;
public class LFUCache {
class CacheEntry
{
private String data;
private int frequency;
// default constructor
private CacheEntry()
{}
public String getData() {
return data;
}
public void setData(String data) {
this.data = data;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}
private static int initialCapacity = 10;
private static LinkedHashMap<Integer, CacheEntry> cacheMap = new LinkedHashMap<Integer, CacheEntry>();
/* LinkedHashMap is used because it has features of both HashMap and LinkedList.
* Thus, we can get an entry in O(1) and also, we can iterate over it easily.
* */
public LFUCache(int initialCapacity)
{
this.initialCapacity = initialCapacity;
}
public void addCacheEntry(int key, String data)
{
if(!isFull())
{
CacheEntry temp = new CacheEntry();
temp.setData(data);
temp.setFrequency(0);
cacheMap.put(key, temp);
}
else
{
int entryKeyToBeRemoved = getLFUKey();
cacheMap.remove(entryKeyToBeRemoved);
CacheEntry temp = new CacheEntry();
temp.setData(data);
temp.setFrequency(0);
cacheMap.put(key, temp);
}
}
public int getLFUKey()
{
int key = 0;
int minFreq = Integer.MAX_VALUE;
for(Map.Entry<Integer, CacheEntry> entry : cacheMap.entrySet())
{
if(minFreq > entry.getValue().frequency)
{
key = entry.getKey();
minFreq = entry.getValue().frequency;
}
}
return key;
}
public String getCacheEntry(int key)
{
if(cacheMap.containsKey(key)) // cache hit
{
CacheEntry temp = cacheMap.get(key);
temp.frequency++;
cacheMap.put(key, temp);
return temp.data;
}
return null; // cache miss
}
public static boolean isFull()
{
if(cacheMap.size() == initialCapacity)
return true;
return false;
}
}
Here's the o(1) implementation for LFU - http://dhruvbird.com/lfu.pdf
I have tried to implement this below LFU cache implementation. Took reference from this -
LFU paper. My implementation is working nicely.
If anyone wants to provide any further suggestion to improve it again, please let me know.
import java.util.HashMap;
import java.util.Map;
import java.util.Objects;
import java.util.TreeMap;
public class LFUCacheImplementation {
private Map<Integer, Node> cache = new HashMap<>();
private Map<Integer, Integer> counts = new HashMap<>();
private TreeMap<Integer, DoublyLinkedList> frequencies = new TreeMap<>();
private final int CAPACITY;
public LFUCache(int capacity) {
this.CAPACITY = capacity;
}
public int get(int key) {
if (!cache.containsKey(key)) {
return -1;
}
Node node = cache.get(key);
int frequency = counts.get(key);
frequencies.get(frequency).remove(new Node(node.key(), node.value()));
removeFreq(frequency);
frequencies.computeIfAbsent(frequency + 1, k -> new DoublyLinkedList()).add(new Node(node.key(), node.value()));
counts.put(key, frequency + 1);
return cache.get(key).value();
}
public void set(int key, int value) {
if (!cache.containsKey(key)) {
Node node = new Node(key, value);
if (cache.size() == CAPACITY) {
int l_count = frequencies.firstKey();
Node deleteThisNode = frequencies.get(l_count).head();
frequencies.get(l_count).remove(deleteThisNode);
int deleteThisKey = deleteThisNode.key();
removeFreq(l_count);
cache.remove(deleteThisKey);
counts.remove(deleteThisKey);
}
cache.put(key, node);
counts.put(key, 1);
frequencies.computeIfAbsent(1, k -> new DoublyLinkedList()).add(node);
}
}
private void removeFreq(int frequency) {
if (frequencies.get(frequency).size() == 0) {
frequencies.remove(frequency);
}
}
public Map<Integer, Node> getCache() {
return cache;
}
public Map<Integer, Integer> getCounts() {
return counts;
}
public TreeMap<Integer, DoublyLinkedList> getFrequencies() {
return frequencies;
}
}
class Node {
private int key;
private int value;
private Node next;
private Node prev;
public Node(int key, int value) {
this.key = key;
this.value = value;
}
public Node getNext() {
return next;
}
public void setNext(Node next) {
this.next = next;
}
public Node getPrev() {
return prev;
}
public void setPrev(Node prev) {
this.prev = prev;
}
public int key() {
return key;
}
public int value() {
return value;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Node)) return false;
Node node = (Node) o;
return key == node.key &&
value == node.value;
}
#Override
public int hashCode() {
return Objects.hash(key, value);
}
#Override
public String toString() {
return "Node{" +
"key=" + key +
", value=" + value +
'}';
}
}
class DoublyLinkedList {
private int size;
private Node head;
private Node tail;
public void add(Node node) {
if (null == head) {
head = node;
} else {
tail.setNext(node);
node.setPrev(tail);
}
tail = node;
size++;
}
public void remove(Node node) {
if(null == head || null == node) {
return;
}
if(this.size() == 1 && head.equals(node)) {
head = null;
tail = null;
} else if (head.equals(node)) {
head = node.getNext();
head.setPrev(null);
} else if (tail.equals(node)) {
Node prevToTail = tail.getPrev();
prevToTail.setNext(null);
tail = prevToTail;
} else {
Node current = head.getNext();
while(!current.equals(tail)) {
if(current.equals(node)) {
Node prevToCurrent = current.getPrev();
Node nextToCurrent = current.getNext();
prevToCurrent.setNext(nextToCurrent);
nextToCurrent.setPrev(prevToCurrent);
break;
}
current = current.getNext();
}
}
size--;
}
public Node head() {
return head;
}
public int size() {
return size;
}
}
Client code to use the above cache implementation -
import java.util.Map;
public class Client {
public static void main(String[] args) {
Client client = new Client();
LFUCache cache = new LFUCache(4);
cache.set(11, function(11));
cache.set(12, function(12));
cache.set(13, function(13));
cache.set(14, function(14));
cache.set(15, function(15));
client.print(cache.getFrequencies());
cache.get(13);
cache.get(13);
cache.get(13);
cache.get(14);
cache.get(14);
cache.get(14);
cache.get(14);
client.print(cache.getCache());
client.print(cache.getCounts());
client.print(cache.getFrequencies());
}
public void print(Map<Integer, ? extends Object> map) {
for(Map.Entry<Integer, ? extends Object> entry : map.entrySet()) {
if(entry.getValue() instanceof Node) {
System.out.println("Cache Key => "+entry.getKey()+", Cache Value => "+((Node) entry.getValue()).toString());
} else if (entry.getValue() instanceof DoublyLinkedList) {
System.out.println("Frequency Key => "+entry.getKey()+" Frequency Values => [");
Node head = ((DoublyLinkedList) entry.getValue()).head();
while(null != head) {
System.out.println(head.toString());
head = head.getNext();
}
System.out.println(" ]");
} else {
System.out.println("Count Key => "+entry.getKey()+", Count Value => "+entry.getValue());
}
}
}
public static int function(int key) {
int prime = 31;
return key*prime;
}
}
How about a priority queue? You can keep elements sorted there with keys representing the frequency. Just update the object position in the queue after visiting it. You can update just from time to time for optimizing the performance (but reducing precision).
Many implementations I have seen have runtime complexity O(log(n)). This means, when the cache size is n, the time needed to insert/remove an element into/from chache is logarithmic. Such implementations use usually a min heap to maintain usage frequencies of elements. The root of the heap contains the element with lowest frequency, and can be accessed in O(1) time. But to maintain the heap property we have to move an element, every time it is used (and frequency is incremented) inside of the heap, to place it into proper position, or when we have to insert new element into the cache (and so put it into the heap).
But the runtime complexity can be reduced to O(1), when we maintain a hashmap (Java) or unordered_map (C++) with the element as key. Additinally we need two sorts of lists, frequency list and elements lists. The elements lists contain elements that have same frequency, and the frequency list contain the element lists.
frequency list
1 3 6 7
a k y x
c l z
m n
Here in the example we see the frequency list that has 4 elements (4 elements lists). The element list 1 contains elements (a,c,m), the elements list 3 contains elements (k, l, n) etc.
Now, when we use say element y, we have to increment its frequency and put it in the next list. Because the elements list with frequency 6 becomes empty, we delete it. The result is:
frequency list
1 3 7
a k y
c l x
m n z
We place the element y in the begin of the elements list 7. When we have to remove elements from the list later, we will start from the end (first z, then x and then y).
Now, when we use element n, we have to increment its frequency and put it into the new list, with frequencies 4:
frequency list
1 3 4 7
a k n y
c l x
m z
I hope the idea is clear. I provide now my C++ implementation of the LFU cache, and will add later a Java implementation.
The class has just 2 public methods, void set(key k, value v)
and bool get(key k, value &v). In the get method the value to retrieve will be set per reference when the element is found, in this case the method returns true. When the element is not found, the method returns false.
#include<unordered_map>
#include<list>
using namespace std;
typedef unsigned uint;
template<typename K, typename V = K>
struct Entry
{
K key;
V value;
};
template<typename K, typename V = K>
class LFUCache
{
typedef typename list<typename Entry<K, V>> ElementList;
typedef typename list <pair <uint, ElementList>> FrequencyList;
private:
unordered_map <K, pair<typename FrequencyList::iterator, typename ElementList::iterator>> cacheMap;
FrequencyList elements;
uint maxSize;
uint curSize;
void incrementFrequency(pair<typename FrequencyList::iterator, typename ElementList::iterator> p) {
if (p.first == prev(elements.end())) {
//frequency list contains single list with some frequency, create new list with incremented frequency (p.first->first + 1)
elements.push_back({ p.first->first + 1, { {p.second->key, p.second->value} } });
// erase and insert the key with new iterator pair
cacheMap[p.second->key] = { prev(elements.end()), prev(elements.end())->second.begin() };
}
else {
// there exist element(s) with higher frequency
auto pos = next(p.first);
if (p.first->first + 1 == pos->first)
// same frequency in the next list, add the element in the begin
pos->second.push_front({ p.second->key, p.second->value });
else
// insert new list before next list
pos = elements.insert(pos, { p.first->first + 1 , {{p.second->key, p.second->value}} });
// update cachMap iterators
cacheMap[p.second->key] = { pos, pos->second.begin() };
}
// if element list with old frequency contained this singe element, erase the list from frequency list
if (p.first->second.size() == 1)
elements.erase(p.first);
else
// erase only the element with updated frequency from the old list
p.first->second.erase(p.second);
}
void eraseOldElement() {
if (elements.size() > 0) {
auto key = prev(elements.begin()->second.end())->key;
if (elements.begin()->second.size() < 2)
elements.erase(elements.begin());
else
elements.begin()->second.erase(prev(elements.begin()->second.end()));
cacheMap.erase(key);
curSize--;
}
}
public:
LFUCache(uint size) {
if (size > 0)
maxSize = size;
else
maxSize = 10;
curSize = 0;
}
void set(K key, V value) {
auto entry = cacheMap.find(key);
if (entry == cacheMap.end()) {
if (curSize == maxSize)
eraseOldElement();
if (elements.begin() == elements.end()) {
elements.push_front({ 1, { {key, value} } });
}
else if (elements.begin()->first == 1) {
elements.begin()->second.push_front({ key,value });
}
else {
elements.push_front({ 1, { {key, value} } });
}
cacheMap.insert({ key, {elements.begin(), elements.begin()->second.begin()} });
curSize++;
}
else {
entry->second.second->value = value;
incrementFrequency(entry->second);
}
}
bool get(K key, V &value) {
auto entry = cacheMap.find(key);
if (entry == cacheMap.end())
return false;
value = entry->second.second->value;
incrementFrequency(entry->second);
return true;
}
};
Here are examples of usage:
int main()
{
LFUCache<int>cache(3); // cache of size 3
cache.set(1, 1);
cache.set(2, 2);
cache.set(3, 3);
cache.set(2, 4);
rc = cache.get(1, r);
assert(rc);
assert(r == 1);
// evict old element, in this case 3
cache.set(4, 5);
rc = cache.get(3, r);
assert(!rc);
rc = cache.get(4, r);
assert(rc);
assert(r == 5);
LFUCache<int, string>cache2(2);
cache2.set(1, "one");
cache2.set(2, "two");
string val;
rc = cache2.get(1, val);
if (rc)
assert(val == "one");
else
assert(false);
cache2.set(3, "three"); // evict 2
rc = cache2.get(2, val);
assert(rc == false);
rc = cache2.get(3, val);
assert(rc);
assert(val == "three");
}
Here is a simple implementation of LFU cache in Go/Golang based on here.
import "container/list"
type LFU struct {
cache map[int]*list.Element
freqQueue map[int]*list.List
cap int
maxFreq int
lowestFreq int
}
type entry struct {
key, val int
freq int
}
func NewLFU(capacity int) *LFU {
return &LFU{
cache: make(map[int]*list.Element),
freqQueue: make(map[int]*list.List),
cap: capacity,
maxFreq: capacity - 1,
lowestFreq: 0,
}
}
// O(1)
func (c *LFU) Get(key int) int {
if e, ok := c.cache[key]; ok {
val := e.Value.(*entry).val
c.updateEntry(e, val)
return val
}
return -1
}
// O(1)
func (c *LFU) Put(key int, value int) {
if e, ok := c.cache[key]; ok {
c.updateEntry(e, value)
} else {
if len(c.cache) == c.cap {
c.evict()
}
if c.freqQueue[0] == nil {
c.freqQueue[0] = list.New()
}
e := c.freqQueue[0].PushFront(&entry{key, value, 0})
c.cache[key] = e
c.lowestFreq = 0
}
}
func (c *LFU) updateEntry(e *list.Element, val int) {
key := e.Value.(*entry).key
curFreq := e.Value.(*entry).freq
c.freqQueue[curFreq].Remove(e)
delete(c.cache, key)
nextFreq := curFreq + 1
if nextFreq > c.maxFreq {
nextFreq = c.maxFreq
}
if c.lowestFreq == curFreq && c.freqQueue[curFreq].Len() == 0 {
c.lowestFreq = nextFreq
}
if c.freqQueue[nextFreq] == nil {
c.freqQueue[nextFreq] = list.New()
}
newE := c.freqQueue[nextFreq].PushFront(&entry{key, val, nextFreq})
c.cache[key] = newE
}
func (c *LFU) evict() {
back := c.freqQueue[c.lowestFreq].Back()
delete(c.cache, back.Value.(*entry).key)
c.freqQueue[c.lowestFreq].Remove(back)
}
Related
Java HashMap: Changing Bucket Implementation to Linear Probing method
In advance, I apologize for my lack of experience, these are advanced concepts that are difficult to wrap my head around. From what I understand, linear probing is circular, it won't stop until it finds an empty cell. However I am not sure how to implement it. Some example on how to would be greatly appreciated. Sorry again for the inexperience, I'm not some vetted programmer, I'm picking this up very slowly. public boolean ContainsElement(V element) { for(int i = 0; i < capacity; i++) { if(table[i] != null) { LinkedList<Entry<K, V>> bucketMethod = table[i]; for(Entry<K, V> entry : bucketMethod) { if(entry.getElement().equals(element)) { return true; } } } } return false; }
Here's a working hash table based on the pseudocode examples found in the Wikipedia article for open addressing. I think the main differences between the Wikipedia example and mine are: Treating the hashCode() a little bit due to the way Java does modulo (%) with negative numbers. Implemented simple resizing logic. Changed the logic in the remove method a little bit because Java doesn't have goto. Otherwise, it's more or less just a direct translation. package mcve; import java.util.*; import java.util.stream.*; public class OAHashTable { private Entry[] table = new Entry[16]; // Must be >= 4. See findSlot. private int size = 0; public int size() { return size; } private int hash(Object key) { int hashCode = Objects.hashCode(key) & 0x7F_FF_FF_FF; // <- This is like abs, but it works // for Integer.MIN_VALUE. We do this // so that hash(key) % table.length // is never negative. return hashCode; } private int findSlot(Object key) { int i = hash(key) % table.length; // Search until we either find the key, or find an empty slot. // // Note: this becomes an infinite loop if the key is not already // in the table AND every element in the array is occupied. // With the resizing logic (below), this will only happen // if the table is smaller than length=4. while ((table[i] != null) && !Objects.equals(table[i].key, key)) { i = (i + 1) % table.length; } return i; } public Object get(Object key) { int i = findSlot(key); if (table[i] != null) { // Key is in table. return table[i].value; } else { // Key is not in table return null; } } private boolean tableIsThreeQuartersFull() { return ((double) size / (double) table.length) >= 0.75; } private void resizeTableToTwiceAsLarge() { Entry[] old = table; table = new Entry[2 * old.length]; size = 0; for (Entry e : old) { if (e != null) { put(e.key, e.value); } } } public void put(Object key, Object value) { int i = findSlot(key); if (table[i] != null) { // We found our key. table[i].value = value; return; } if (tableIsThreeQuartersFull()) { resizeTableToTwiceAsLarge(); i = findSlot(key); } table[i] = new Entry(key, value); ++size; } public void remove(Object key) { int i = findSlot(key); if (table[i] == null) { return; // Key is not in the table. } int j = i; table[i] = null; --size; while (true) { j = (j + 1) % table.length; if (table[j] == null) { break; } int k = hash(table[j].key) % table.length; // Determine if k lies cyclically in (i,j] // | i.k.j | // |....j i.k.| or |.k..j i...| if ( (i<=j) ? ((i<k)&&(k<=j)) : ((i<k)||(k<=j)) ) { continue; } table[i] = table[j]; i = j; table[i] = null; } } public Stream<Entry> entries() { return Arrays.stream(table).filter(Objects::nonNull); } #Override public String toString() { return entries().map(e -> e.key + "=" + e.value) .collect(Collectors.joining(", ", "{", "}")); } public static class Entry { private Object key; private Object value; private Entry(Object key, Object value) { this.key = key; this.value = value; } public Object getKey() { return key; } public Object getValue() { return value; } } public static void main(String[] args) { OAHashTable t = new OAHashTable(); t.put("A", 1); t.put("B", 2); t.put("C", 3); System.out.println("size = " + t.size()); System.out.println(t); t.put("X", 4); t.put("Y", 5); t.put("Z", 6); t.remove("C"); t.remove("B"); t.remove("A"); t.entries().map(e -> e.key) .map(key -> key + ": " + t.get(key)) .forEach(System.out::println); } }
java.util.HashMap implementation of java.util.Map internally provides linear probing that is HashMap can resolve collisions in hash tables.
Java - how to obtain an interleaved iterator/collection
Let's say we receive strings from 3 producers asynchronously. Once a certain amount of these objects have been received I want to iterate over them in an interleaved manner, that is, if receiving the following strings: "a1" received from A, "a2" received from A, "c1" received from C, "a3" received from A, "b1" received from B, "b2" received from B, I'd like the "interleaved" iterator to return the strings as if we were iterating over the following list: List<String> interleavedList = {"a1", "b1", "c1", "a2", "c2", "a3"}, So far I've created one List<String> for each producer, and then I'm "iterating" over all the strings by working with the 3 list iterators (with a List<Iterator<String>>). This works fine but I think there is a simpler way... Maybe by directly constructing the interleaved list while receiving the strings? but I don't see which Collection or which Comparator to use... Note that I'm not so much interested in creating one list for each producer and then merging the 3 lists in a 4th interleaved list, as this will probably not be time-efficient.
It appears that you want the list to be sorted, with the number determining the sort first and the letter second. Java does not have a sorted list, because the nature of lists is that they are not sorted. However, you could use a sorted set with a custom comparator: SortedSet<String> sortedset = new TreeSet<String>( new Comparator<String>() { #Override public int compare(String e1, String e2) { int num1 = Integer.parseInt(e1.replaceAll("[^\\d.]", "")); int num2 = Integer.parseInt(e2.replaceAll("[^\\d.]", "")); if (num1 > num2) { return 1; } else if (num1 < num2) { return -1; } else { String let1 = e1.replaceAll("[0-9]", ""); String let2 = e2.replaceAll("[0-9]", ""); return let1.compareTo(let2); } } }); When you iterate over this set, you will get the ordering you described in your question.
One option would be to dump all objects into a PriorityBlockingQueue based on the producer-specific index in which they arrive: class Message { String data; long index; } PriorityBlockingQueue<Message> queue = new PriorityBlockingQueue<Message>( 10, Comparator.comparing(m -> m.index) ); List<String> batch = new ArrayList<>(BATCH_SIZE); for (;;) { for (int i = 0; i < BATCH_SIZE; i++) { batch.add(queue.take().data); } handleBatch(batch); batch.clear(); } Note this assumes an infinite queue; if the queue is non-infinite you would have to add some logic to handle the final batch.
You can add your strings to PriorityQueue<String> with a custom Comparator. In this case your elements will be automatically sorted when you add them. Here is a useful example.
I ran into similar problem recently, here is what I came up with (just take from each iterator in turns until all them drain, no sorting): import java.util.Collection; import java.util.Iterator; public class InterleavedIterator<T> implements Iterator<T> { static class Node<K> { Node<K> next; Node<K> prev; final K value; Node(K value) { this.value = value; } public void setNext(Node<K> next) { this.next = next; } public void setPrev(Node<K> prev) { this.prev = prev; } } private Node<Iterator<T>> node; public InterleavedIterator(Collection<Iterator<T>> iterators) { Node<Iterator<T>> prev = null; for (Iterator<T> iterator : iterators) { if (!iterator.hasNext()) { continue; } Node<Iterator<T>> current = new Node<>(iterator); if (prev != null) { prev.setNext(current); current.setPrev(prev); } else { node = current; } prev = current; } if (prev != null) { prev.setNext(node); node.setPrev(prev); } } #Override public boolean hasNext() { return node.value.hasNext(); } #Override public T next() { T res = node.value.next(); updateNode(); return res; } private void updateNode() { Node<Iterator<T>> nextNode = node.next; if (node.value.hasNext()) { node = nextNode; } else if (node != nextNode) { node.prev.next = nextNode; nextNode.prev = node.prev; node = nextNode; } } }
Implementation of Custom HashMap code issues
I am preparing my own custom HashMap implementation in Java. Below is my imlementation. public class Entry<K,V> { private final K key; private V value; private Entry<K,V> next; public Entry(K key, V value, Entry<K,V> next) { this.key = key; this.value = value; this.next = next; } public V getValue() { return value; } public void setValue(V value) { this.value = value; } public Entry<K, V> getNext() { return next; } public void setNext(Entry<K, V> next) { this.next = next; } public K getKey() { return key; } } public class MyCustomHashMap<K,V> { private int DEFAULT_BUCKET_COUNT = 10; private Entry<K,V>[] buckets; public MyCustomHashMap() { buckets = new Entry[DEFAULT_BUCKET_COUNT]; for (int i = 0;i<DEFAULT_BUCKET_COUNT;i++) buckets[i] = null; } public void put(K key,V value){ /** * This is the new node. */ Entry<K,V> newEntry = new Entry<K,V>(key, value, null); /** * If key is null, then null keys always map to hash 0, thus index 0 */ if(key == null){ buckets[0] = newEntry; } /** * get the hashCode of the key. */ int hash = hash(key); /** * if the index does of the bucket does not contain any element then assign the node to the index. */ if(buckets[hash] == null) { buckets[hash] = newEntry; } else { /** * we need to traverse the list and compare the key with each of the keys till the keys match OR if the keys does not match then we need * to add the node at the end of the linked list. */ Entry<K,V> previous = null; Entry<K,V> current = buckets[hash]; while(current != null) { boolean done = false; while(!done) { if(current.getKey().equals(key)) { current.setValue(value); done = true; // if the keys are same then replace the old value with the new value; } else if (current.getNext() == null) { current.setNext(newEntry); done = true; } current = current.getNext(); previous = current; } } previous.setNext(newEntry); } } public V getKey(K key) { int hash = hash(key); if(buckets[hash] == null) { return null; } else { Entry<K,V> temp = buckets[hash]; while(temp != null) { if(temp.getKey().equals(key)) return temp.getValue(); // returns value corresponding to key. temp = temp.getNext(); } return null; //return null if key is not found. } } public void display() { for(int i = 0; i < DEFAULT_BUCKET_COUNT; i++) { if(buckets[i] != null) { Entry<K,V> entry = buckets[i]; while(entry != null){ System.out.print("{"+entry.getKey()+"="+entry.getValue()+"}" +" "); entry=entry.getNext(); } } } } public int bucketIndexForKey(K key) { int bucketIndex = key.hashCode() % buckets.length; return bucketIndex; } /** * * #param key * #return */ private int hash(K key){ return Math.abs(key.hashCode()) % buckets.length; } public static void main(String[] args) { // TODO Auto-generated method stub MyCustomHashMap<String, Integer> myCustomHashMap = new MyCustomHashMap<String, Integer>(); myCustomHashMap.put("S", 22); myCustomHashMap.put("S", 1979); myCustomHashMap.put("V", 5); myCustomHashMap.put("R", 31); System.out.println("Value corresponding to key R: "+myCustomHashMap.getKey("R")); System.out.println("Value corresponding to key V: "+myCustomHashMap.getKey("V")); System.out.println("Displaying the contents of the HashMap:: "); myCustomHashMap.display(); } } 1) I feel that put (K key,V value) is somewhat flawed. Please do kindly validate and let me know what's wrong here. On entering the same key its giving me wrong result. I have not yet tested it for collision cases having different keys. 2) It is said that we rehash the hashCode so that it eliminates wrong implementation of hashCode. how do I do it because if I give hashCode of key i.e. hash(key.hashCode()) then it dosn't take as it can't compute hashCode of int. How to do this? Any help would be highly appreciated. Thanks Sid
You handle null key incorrectly : if(key == null){ buckets[0] = newEntry; } It's possible that buckets[0] already contains entries, in which case you will lose those entries. The following loop has some issues : Entry<K,V> previous = null; Entry<K,V> current = buckets[hash]; while(current != null) { boolean done = false; while(!done) { if(current.getKey().equals(key)) { current.setValue(value); done = true; } else if (current.getNext() == null) { current.setNext(newEntry); done = true; } current = current.getNext(); previous = current; // you are not really setting previous to // to the previous Entry in the list - you // are setting it to the current Entry } } previous.setNext(newEntry); // you don't need this statement. You // already have a statement inside the // loop that adds the new Entry to the list It looks like removing any statements related to previous will fix this loop. EDIT: As kolakao commented, in order for your implementation to be efficient (i.e. require expected constant time for get and put), you must resize the HashMap when the number of entries exceeds some threshold (in order for the average number of entries in each bucket to be bound by a constant). It is said that we rehash the hashCode so that it eliminates wrong implementation of hashCode. how do I do it because if I give hashCode of key i.e. hash(key.hashCode()) then it dosn't take as it can't compute hashCode of int. How to do this? The idea of re-hashing doesn't involve calling hashCode for the hashCode of the key. It involves running some hardcoded function on the value obtained by key.hashCode(). For example, in Java 7 implementation of HashMap, the following function is used : static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } Then you use it with : int hash = hash(key.hashCode()); int bucket = hash % buckets.length;
Priority Queue with O(1) Insertion Time using Arrays?
My code right now has O(N) insertion time, and O(1) removal time. I need to change this around. I am trying to implement O(1) insertion time and O(N) deletion time. Legend: nItems = number of items/objects. Initially is set 0. queArray is my array of long integers. Here are my two methods. Insertion method does all the sorting work. Delete method just one line - to delete the first element in the array which happens to be the smallest number thanks to our Insert method. If I were to change insertion time to O(1) would I need to give "sorting task" to remove method? It's a priority queue after all and we have to sort it, otherwise it would be just a regular queue with numbers in random order. Please, any help would be nice!!! public void insert(long item) { int j; if(nItems==0) // if no items, queArray[nItems++] = item; // insert at 0 else { for(j=nItems-1; j>=0; j--) { // start at the end if( item > queArray[j] ) // if new item larger, queArray[j+1] = queArray[j]; // shift upward else // if smaller, break; // done shifting } // end for queArray[j+1] = item; // insert it nItems++; } // end else (nItems > 0) } public long remove() // remove minimum item { return queArray[--nItems]; }
If you want O(1) insertion time and O(N) removal time, simply add new elements unsorted to the end of your internal array, and do an O(N) linear search through your list for removals, shifting the rest of the array down one. Or for a better implementation, you may want to consider a Fibonacci heap.
I'm not certain you can achieve O(1) insertion time for an array-based priority queue. You could get O(log n) by using a min/max heap structure. Here's an implementation of that using a List<> internally (but that could be swapped to an array implementation easily enough. using System; using System.Collections; using System.Collections.Generic; namespace HeapADT { public class Heap<T> : ICollection, IEnumerable<T> where T : IComparable<T> { #region Private Members private readonly List<T> m_Items; private readonly IComparer<T> m_Comparer; #endregion #region Constructors public Heap() : this(0) {} public Heap( int capacity ) : this( capacity, null ) {} public Heap( IEnumerable<T> items ) : this( items, null ) {} public Heap( int capacity, IComparer<T> comparer ) { m_Items = new List<T>(capacity); m_Comparer = comparer ?? Comparer<T>.Default; } public Heap( IEnumerable<T> items, IComparer<T> comparer ) { m_Items = new List<T>(items); m_Comparer = comparer ?? Comparer<T>.Default; BuildHeap(); } #endregion #region Operations public void Add( T item ) { m_Items.Add( item ); var itemIndex = Count - 1; while( itemIndex > 0 ) { var parentIndex = ParentIndex(itemIndex); // are we a heap? If yes, then we're done... if( m_Comparer.Compare( this[parentIndex], this[itemIndex] ) < 0 ) return; // otherwise, sift the item up the heap by swapping with parent Swap( itemIndex, parentIndex ); itemIndex = parentIndex; } } public T RemoveRoot() { if( Count == 0 ) throw new InvalidOperationException("Cannot remove the root of an empty heap."); var rootItem = this[0]; ReplaceRoot(RemoveLast()); return rootItem; } public T RemoveLast() { if( Count == 0 ) throw new InvalidOperationException("Cannot remove the tail from an empty heap."); var leafItem = this[Count - 1]; m_Items.RemoveAt( Count-1 ); return leafItem; } public void ReplaceRoot( T newRoot ) { if (Count == 0) return; // cannot replace a nonexistent root m_Items[0] = newRoot; Heapify(0); } public T this[int index] { get { return m_Items[index]; } private set { m_Items[index] = value; } } #endregion #region Private Members private void Heapify( int parentIndex ) { var leastIndex = parentIndex; var leftIndex = LeftIndex(parentIndex); var rightIndex = RightIndex(parentIndex); // do we have a right child? if (rightIndex < Count) leastIndex = m_Comparer.Compare(this[rightIndex], this[leastIndex]) < 0 ? rightIndex : leastIndex; // do we have a left child? if (leftIndex < Count) leastIndex = m_Comparer.Compare(this[leftIndex], this[leastIndex]) < 0 ? leftIndex : leastIndex; if (leastIndex != parentIndex) { Swap(leastIndex, parentIndex); Heapify(leastIndex); } } private void Swap( int firstIndex, int secondIndex ) { T tempItem = this[secondIndex]; this[secondIndex] = this[firstIndex]; this[firstIndex] = tempItem; } private void BuildHeap() { for( var index = Count/2; index >= 0; index-- ) Heapify( index ); } private static int ParentIndex( int childIndex ) { return (childIndex - 1)/2; } private static int LeftIndex( int parentIndex ) { return parentIndex*2 + 1; } private static int RightIndex(int parentIndex) { return parentIndex*2 + 2; } #endregion #region ICollection Members public void CopyTo(Array array, int index) { m_Items.CopyTo( (T[])array, index ); } public int Count { get { return m_Items.Count; } } public bool IsSynchronized { get { return false; } } public object SyncRoot { get { return null; } } #endregion #region IEnumerable Members IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); } public IEnumerator<T> GetEnumerator() { return m_Items.GetEnumerator(); } #endregion } }
An unsorted linked list sounds like it fits the requirements stated (although they seem a bit silly for most practical applications). You have constant insertion time (stick it at the end or beginning), and linear deletion time (scan the list for the smallest element).
In order to change the insertion time to O(1), you can insert elements in to the array unsorted. You can then create a minPeek() method that searches for the smallest key using a linear search and then call that inside the delete/remove method and delete the smallest key. Here is how you can achieve this. public void insert(int item) { queArray[nItems++] = item; } public int remove() { int removeIndex = minPeek(); if (nItems - 1 != removeIndex) { for (int i = removeIndex; i < nItems - 1; i++) { queArray[i] = queArray[i + 1]; } } return queArray[--nItems]; } public int minPeek() { int min = 0; for (int i = 0; i < maxSize; i++) { if (queArray[i] < queArray[min]) { min = i; } } return min; } By doing so your priority queue has O(1) insertion time and delete method has O(N) time.
There is no way to implement a O(1) insertion method and keep you array sorted. If you pass your sorting to the delete method the fast you can do is a O(N log(n)) with quick sort or something. Or you can do a O(log n) algorithm in the insert method like LBushkin suggest.
Where do I find a standard Trie based map implementation in Java? [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations. Closed 2 years ago. Improve this question I have a Java program that stores a lot of mappings from Strings to various objects. Right now, my options are either to rely on hashing (via HashMap) or on binary searches (via TreeMap). I am wondering if there is an efficient and standard trie-based map implementation in a popular and quality collections library? I've written my own in the past, but I'd rather go with something standard, if available. Quick clarification: While my question is general, in the current project I am dealing with a lot of data that is indexed by fully-qualified class name or method signature. Thus, there are many shared prefixes.
You might want to look at the Trie implementation that Limewire is contributing to the Google Guava.
There is no trie data structure in the core Java libraries. This may be because tries are usually designed to store character strings, while Java data structures are more general, usually holding any Object (defining equality and a hash operation), though they are sometimes limited to Comparable objects (defining an order). There's no common abstraction for "a sequence of symbols," although CharSequence is suitable for character strings, and I suppose you could do something with Iterable for other types of symbols. Here's another point to consider: when trying to implement a conventional trie in Java, you are quickly confronted with the fact that Java supports Unicode. To have any sort of space efficiency, you have to restrict the strings in your trie to some subset of symbols, or abandon the conventional approach of storing child nodes in an array indexed by symbol. This might be another reason why tries are not considered general-purpose enough for inclusion in the core library, and something to watch out for if you implement your own or use a third-party library.
Apache Commons Collections v4.0 now supports trie structures. See the org.apache.commons.collections4.trie package info for more information. In particular, check the PatriciaTrie class: Implementation of a PATRICIA Trie (Practical Algorithm to Retrieve Information Coded in Alphanumeric). A PATRICIA Trie is a compressed Trie. Instead of storing all data at the edges of the Trie (and having empty internal nodes), PATRICIA stores data in every node. This allows for very efficient traversal, insert, delete, predecessor, successor, prefix, range, and select(Object) operations. All operations are performed at worst in O(K) time, where K is the number of bits in the largest item in the tree. In practice, operations actually take O(A(K)) time, where A(K) is the average number of bits of all items in the tree.
Also check out concurrent-trees. They support both Radix and Suffix trees and are designed for high concurrency environments.
I wrote and published a simple and fast implementation here.
What you need is org.apache.commons.collections.FastTreeMap , I think.
Below is a basic HashMap implementation of a Trie. Some people might find this useful... class Trie { HashMap<Character, HashMap> root; public Trie() { root = new HashMap<Character, HashMap>(); } public void addWord(String word) { HashMap<Character, HashMap> node = root; for (int i = 0; i < word.length(); i++) { Character currentLetter = word.charAt(i); if (node.containsKey(currentLetter) == false) { node.put(currentLetter, new HashMap<Character, HashMap>()); } node = node.get(currentLetter); } } public boolean containsPrefix(String word) { HashMap<Character, HashMap> node = root; for (int i = 0; i < word.length(); i++) { Character currentLetter = word.charAt(i); if (node.containsKey(currentLetter)) { node = node.get(currentLetter); } else { return false; } } return true; } }
Apache's commons collections: org.apache.commons.collections4.trie.PatriciaTrie
You can try the Completely Java library, it features a PatriciaTrie implementation. The API is small and easy to get started, and it's available in the Maven central repository.
You might look at this TopCoder one as well (registration required...).
If you required sorted map, then tries are worthwhile. If you don't then hashmap is better. Hashmap with string keys can be improved over the standard Java implementation: Array hash map
If you're not worried about pulling in the Scala library, you can use this space efficient implementation I wrote of a burst trie. https://github.com/nbauernfeind/scala-burst-trie
here is my implementation, enjoy it via: GitHub - MyTrie.java /* usage: MyTrie trie = new MyTrie(); trie.insert("abcde"); trie.insert("abc"); trie.insert("sadas"); trie.insert("abc"); trie.insert("wqwqd"); System.out.println(trie.contains("abc")); System.out.println(trie.contains("abcd")); System.out.println(trie.contains("abcdefg")); System.out.println(trie.contains("ab")); System.out.println(trie.getWordCount("abc")); System.out.println(trie.getAllDistinctWords()); */ import java.util.*; public class MyTrie { private class Node { public int[] next = new int[26]; public int wordCount; public Node() { for(int i=0;i<26;i++) { next[i] = NULL; } wordCount = 0; } } private int curr; private Node[] nodes; private List<String> allDistinctWords; public final static int NULL = -1; public MyTrie() { nodes = new Node[100000]; nodes[0] = new Node(); curr = 1; } private int getIndex(char c) { return (int)(c - 'a'); } private void depthSearchWord(int x, String currWord) { for(int i=0;i<26;i++) { int p = nodes[x].next[i]; if(p != NULL) { String word = currWord + (char)(i + 'a'); if(nodes[p].wordCount > 0) { allDistinctWords.add(word); } depthSearchWord(p, word); } } } public List<String> getAllDistinctWords() { allDistinctWords = new ArrayList<String>(); depthSearchWord(0, ""); return allDistinctWords; } public int getWordCount(String str) { int len = str.length(); int p = 0; for(int i=0;i<len;i++) { int j = getIndex(str.charAt(i)); if(nodes[p].next[j] == NULL) { return 0; } p = nodes[p].next[j]; } return nodes[p].wordCount; } public boolean contains(String str) { int len = str.length(); int p = 0; for(int i=0;i<len;i++) { int j = getIndex(str.charAt(i)); if(nodes[p].next[j] == NULL) { return false; } p = nodes[p].next[j]; } return nodes[p].wordCount > 0; } public void insert(String str) { int len = str.length(); int p = 0; for(int i=0;i<len;i++) { int j = getIndex(str.charAt(i)); if(nodes[p].next[j] == NULL) { nodes[curr] = new Node(); nodes[p].next[j] = curr; curr++; } p = nodes[p].next[j]; } nodes[p].wordCount++; } }
I have just tried my own Concurrent TRIE implementation but not based on characters, it is based on HashCode. Still We can use this having Map of Map for each CHAR hascode. You can test this using the code # https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapPerformanceTest.java https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapValidationTest.java import java.util.concurrent.atomic.AtomicReferenceArray; public class TrieMap { public static int SIZEOFEDGE = 4; public static int OSIZE = 5000; } abstract class Node { public Node getLink(String key, int hash, int level){ throw new UnsupportedOperationException(); } public Node createLink(int hash, int level, String key, String val) { throw new UnsupportedOperationException(); } public Node removeLink(String key, int hash, int level){ throw new UnsupportedOperationException(); } } class Vertex extends Node { String key; volatile String val; volatile Vertex next; public Vertex(String key, String val) { this.key = key; this.val = val; } #Override public boolean equals(Object obj) { Vertex v = (Vertex) obj; return this.key.equals(v.key); } #Override public int hashCode() { return key.hashCode(); } #Override public String toString() { return key +"#"+key.hashCode(); } } class Edge extends Node { volatile AtomicReferenceArray<Node> array; //This is needed to ensure array elements are volatile public Edge(int size) { array = new AtomicReferenceArray<Node>(8); } #Override public Node getLink(String key, int hash, int level){ int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level); Node returnVal = array.get(index); for(;;) { if(returnVal == null) { return null; } else if((returnVal instanceof Vertex)) { Vertex node = (Vertex) returnVal; for(;node != null; node = node.next) { if(node.key.equals(key)) { return node; } } return null; } else { //instanceof Edge level = level + 1; index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level); Edge e = (Edge) returnVal; returnVal = e.array.get(index); } } } #Override public Node createLink(int hash, int level, String key, String val) { //Remove size for(;;) { //Repeat the work on the current node, since some other thread modified this node int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level); Node nodeAtIndex = array.get(index); if ( nodeAtIndex == null) { Vertex newV = new Vertex(key, val); boolean result = array.compareAndSet(index, null, newV); if(result == Boolean.TRUE) { return newV; } //continue; since new node is inserted by other thread, hence repeat it. } else if(nodeAtIndex instanceof Vertex) { Vertex vrtexAtIndex = (Vertex) nodeAtIndex; int newIndex = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, vrtexAtIndex.hashCode(), level+1); int newIndex1 = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level+1); Edge edge = new Edge(Base10ToBaseX.Base.BASE8.getLevelZeroMask()+1); if(newIndex != newIndex1) { Vertex newV = new Vertex(key, val); edge.array.set(newIndex, vrtexAtIndex); edge.array.set(newIndex1, newV); boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge if(result == Boolean.TRUE) { return newV; } //continue; since vrtexAtIndex may be removed or changed to Edge already. } else if(vrtexAtIndex.key.hashCode() == hash) {//vrtex.hash == hash) { HERE newIndex == newIndex1 synchronized (vrtexAtIndex) { boolean result = array.compareAndSet(index, vrtexAtIndex, vrtexAtIndex); //Double check this vertex is not removed. if(result == Boolean.TRUE) { Vertex prevV = vrtexAtIndex; for(;vrtexAtIndex != null; vrtexAtIndex = vrtexAtIndex.next) { prevV = vrtexAtIndex; // prevV is used to handle when vrtexAtIndex reached NULL if(vrtexAtIndex.key.equals(key)){ vrtexAtIndex.val = val; return vrtexAtIndex; } } Vertex newV = new Vertex(key, val); prevV.next = newV; // Within SYNCHRONIZATION since prevV.next may be added with some other. return newV; } //Continue; vrtexAtIndex got changed } } else { //HERE newIndex == newIndex1 BUT vrtex.hash != hash edge.array.set(newIndex, vrtexAtIndex); boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge if(result == Boolean.TRUE) { return edge.createLink(hash, (level + 1), key, val); } } } else { //instanceof Edge return nodeAtIndex.createLink(hash, (level + 1), key, val); } } } #Override public Node removeLink(String key, int hash, int level){ for(;;) { int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level); Node returnVal = array.get(index); if(returnVal == null) { return null; } else if((returnVal instanceof Vertex)) { synchronized (returnVal) { Vertex node = (Vertex) returnVal; if(node.next == null) { if(node.key.equals(key)) { boolean result = array.compareAndSet(index, node, null); if(result == Boolean.TRUE) { return node; } continue; //Vertex may be changed to Edge } return null; //Nothing found; This is not the same vertex we are looking for. Here hashcode is same but key is different. } else { if(node.key.equals(key)) { //Removing the first node in the link boolean result = array.compareAndSet(index, node, node.next); if(result == Boolean.TRUE) { return node; } continue; //Vertex(node) may be changed to Edge, so try again. } Vertex prevV = node; // prevV is used to handle when vrtexAtIndex is found and to be removed from its previous node = node.next; for(;node != null; prevV = node, node = node.next) { if(node.key.equals(key)) { prevV.next = node.next; //Removing other than first node in the link return node; } } return null; //Nothing found in the linked list. } } } else { //instanceof Edge return returnVal.removeLink(key, hash, (level + 1)); } } } } class Base10ToBaseX { public static enum Base { /** * Integer is represented in 32 bit in 32 bit machine. * There we can split this integer no of bits into multiples of 1,2,4,8,16 bits */ BASE2(1,1,32), BASE4(3,2,16), BASE8(7,3,11)/* OCTAL*/, /*BASE10(3,2),*/ BASE16(15, 4, 8){ public String getFormattedValue(int val){ switch(val) { case 10: return "A"; case 11: return "B"; case 12: return "C"; case 13: return "D"; case 14: return "E"; case 15: return "F"; default: return "" + val; } } }, /*BASE32(31,5,1),*/ BASE256(255, 8, 4), /*BASE512(511,9),*/ Base65536(65535, 16, 2); private int LEVEL_0_MASK; private int LEVEL_1_ROTATION; private int MAX_ROTATION; Base(int levelZeroMask, int levelOneRotation, int maxPossibleRotation) { this.LEVEL_0_MASK = levelZeroMask; this.LEVEL_1_ROTATION = levelOneRotation; this.MAX_ROTATION = maxPossibleRotation; } int getLevelZeroMask(){ return LEVEL_0_MASK; } int getLevelOneRotation(){ return LEVEL_1_ROTATION; } int getMaxRotation(){ return MAX_ROTATION; } String getFormattedValue(int val){ return "" + val; } } public static int getBaseXValueOnAtLevel(Base base, int on, int level) { if(level > base.getMaxRotation() || level < 1) { return 0; //INVALID Input } int rotation = base.getLevelOneRotation(); int mask = base.getLevelZeroMask(); if(level > 1) { rotation = (level-1) * rotation; mask = mask << rotation; } else { rotation = 0; } return (on & mask) >>> rotation; } }