Maintaining multiple indexes with guava cache (in-memory table)

Maintaining multiple indexes with guava cache (in-memory table) - java

I'm trying to implement a simplified in-memory cached "table" where there are 2 types of indexes: primary and secondary.
Primary index maps a single key (primary key) to a unique value (Map interface)
Secondary index maps a single key to a Collection of values (Multimap fits the bill)
Very similar to a table in RDBMS world where one has several lookup columns. Sometimes you want to search by PK, sometimes return a list of rows based on a common property. Right now, there is no need in other operations than equals (=) (ie no range queries, or pattern matching).
Add cache semantics to the above data structure (eviction, data population/cache loader, refresh etc.) and that's pretty much what is needed.
I would like to ask your advice on how to best approach given problem. Should it be Cache per index or Cache (for PK) + (synchronized) Multimap for secondary indexes ?
Any help is much appreciated.
Regards.

You can replace a Map with a Guava com.google.common.cache.Cache. It doesn't support Multimap type semantics , so you'd have to use
Cache<K, ? extends List<V>>
in that case.
For the sake of simplicity I would make the 'primary index' a subset of the secondary index - i.e. you have a single index that returns a list of values for a given key and primary keys just return a list with a single value.

The challenge here is to maintain the integrity of two indexes regardless of whether you use two cache or even one cache for PK + multimap.
May be you should create a new cache class (say TableCache) which extends com.google.common.cache.Cache, internally this class can maintain a multimap instance variable for the secondary index (which can be a ConcurrentHashMap).
Then you can override Cache methods (put, get, invalidate etc) to keep the secondary index in sync.
Of course, you have to provide a get function to retrieve values based on secondary index.
This approach gives you the ability to maintain the integrity of primary and secondary indexes.
public class TableCache<K, V> extends Cache<K, V> {
Map<K, List<V>> secondaryIndex = new ConcurrentHashMap<K, List<V>>();
public void put(K key, V value) {
super.put(key, value);
// Update secondaryIndex
}
}

I have had this problem many times myself.
What would fix this problem is if Java had better STM support. It is very difficult to make non blocking atomic data structures. The best I have seen is multiverse.
Thus #vladimir's answer is probably the best but I would say that the stored collections should be immutable and you will have to retrieve the whole collection on refresh/cache miss etc.... Also if you change one of the members of multiset your going to have a tough time knowing how to update its parent and invalidate the cache.
Otherwise I would consider something like Redis for larger data sets which does support atomic operations on maps and list combinations.

Related

"key-value" pairs collection maintaining order and retrieving by key and by pair index?

I'd would like to use a collection of "key - value" pairs:
Which is the base for a JTable data model (TableModel implementation), so I need to access elements by their index/position (e.g. to implement Object getValueAt(int rowIndex, int columnIndex)). The collection must preserve the order of the elements. An ArrayList would be good for that.
Which allows to retrieve elements by the value of the key. A HashMap would be good for that.
The closest usable collection I've found is LinkedHashMap. It preserves the order and allows retrieval by key. However the access to an item is not possible by index/position, I must iterate over the items until the good one is found. This is not be time-effective.
Is there a better one than this one?
Thanks.
(Question is similar to this one, but the solution provided uses the conversion toArray() which is not time-effective. If the set of pairs changes, the conversion needs to be done again.)

There is no such collection implementation in the JRE.
But you can easily overcome this issue by using a Map<K,V> for storing the key-value pairs and an additional List<K> to store the keys in sequential order. You can then access a value either by key or by index using: keyValueMap.get(keys.get(index)).
You must make sure that modifications are always synchronized on both collections:
Add/Change an entry: if (keyValueMap.put(key, value) == null) keys.add(key)
Remove an entry: if (keyValueMap.remove(key) != null) keys.remove(key)
Note that this implementation assumes that values are never null. In case null values are required too, the code gets slightly more complex as we have to check for existence of an entry using keyValueMap.contains(key).

Without implementing your own collection, it might be easiest to just have two collections, with the key-value reversed in the second one but otherwise ordered the same.
(wanted this to be a comment, but not enough rep yet)

Apache Commons Collection has a ListOrderedMap class that makes what you want.

Is it possible/required to speed up HashMap operations on same entry?

Suppose I wish to check HashMap entry and then replace it:
if( check( hashMap.get(key) ) ) {
hashMap.put(key, newValue);
}
this will cause search procedure inside HashMap to run two times: once while get and another one while put. This looks ineffective. Is it possible to modify value of already found entry of Map?
UPDATE
I know I can make a wrapper and I know I have problems to mutate entry. But the question is WHY? May be HashMap remembers last search to improve repeated one? Why there are no methods to do such operation?

EDIT: I've just discovered that you can modify the entry, via Map.Entry.setValue (and the HashMap implementation is mutable). It's a pain to get the entry for a particular key though, and I can't remember ever seeing anyone do this. You can get a set of the entries, but you can't get the entry for a single key, as far as I can tell.
There's one evil way of doing it - declare your own subclass of HashMap within the java.util package, and create a public method which just delegates to the package-private existing method:
package java.util;
// Please don't actually do this...
public class BadMap<K, V> extends HashMap<K, V> {
public Map.Entry<K, V> getEntryPublic(K key) {
return getEntry(key);
}
}
That's pretty nasty though.
You wouldn't normally modify the entry - but of course you can change data within the value, if that's a mutable type.
I very much doubt that this is actually a performance bottleneck though, unless you're doing this a heck of a lot. You should profile your application to prove to yourself that this is a real problem before you start trying to fine-tune something which is probably not an issue.
If it does turn out to be an issue, you could change (say) a Map<Integer, String> into a Map<Integer, AtomicReference<String>> and use the AtomicReference<T> as a simple mutable wrapper type.

Too much information for a comment on your question. Check the documentation for Hashmap.
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
Constant time means that it always requires the same amount of time to do the get and put operations [O(1)]. The amount of time that is going to be required is going to be linear based on how many times you need to loop [O(n)].

You can change the entry if it is mutable. One example of where you might do this is
private final Map<String, List<String>> map = new LinkedHashMap<>();
public void put(String key, String value) {
List<String> list = map.get(key);
if (list == null)
map.put(key, list = new ArrayList<>());
list.add(value);
}
This allows you to update a value, but you can't find and replace a value in one operation.

Take a look at trove ( http://trove4j.sourceforge.net/ ), their maps do have several methods that might be what you want:
adjustOrPut
putIfAbsent
I don't know how this is implemented internally, but i would guess that since trove is made to be highly performant, there will be only one lookup.

Efficient data structure with two keys

I have an Android app in which I use a HashMap to store container objects. During the course of the App, the datastructure is accessed continuously.
However, about half the time, the reference used in not the Key in the map but another variable from the object so I end up looping over the structure again and again.
Is there an efficient way to have a datastructure indexed on two keys in Java ?

Why not two maps with different keys, but that both refer to the same values?

Manage two maps, where two sets of keys map to the same underlying set of objects. Wrap them in a class that has methods similar to a normal map, but internally searches on both keys, and synchronizes additions and deletions.
This is efficient because manipulations are (in the worst case) linearly proportionate to managing a single map.

I'd create a key object that combines the two variables.

You could use one map with both keys:
Map<Object, Person> personMap = new HashMap<Object, Person>()
Person person = ...
personMap.put(person.getName(), person)
personMap.put(person.getSSN(), person)
Then you can retrieve by the key. This of course assumes that there are no collisions in your key usage. If your two keys are different class types, then this is safe to do. If your keys are the same type (example String), then you may not want to use the two maps solution.
Follow-up: This approach does suffer from losing type safety, but it only impacts put(K, V) and putAll(Map<? extends K, ? extends V>), as get(Object) and containsKey(Object) always accepts Object.
So with this limitation I'd wrap this single map or go with the two map solution (also wrapped).

cache with multiple keys

I am looking out for implementing a timestamp based cache with multiple keys. What data structure other than hash tables I would use. Any suggestions...
for two values, pair could be used, java (un)fortunately doesn't have a pair.
if it has to be triplet or quartet, what architecture is advised. or just the best-practise data structure to be used is also sufficient...

Assuming that you want to only retrieve the cached value given all of the keys, you can simply just make a CacheKey object. Maps/Hashtable are still a decent candidate here:
map.put(new CacheKey(keyA, keyB, keyC), value);
map.get(new CacheKey(keyA, keyB, keyC));
//etc...
Just make sure to properly implement equals() and hashcode() in the CacheKey class.
However if you intend to heavily use this map or hashtable as a cache, you should seriously consider re-using an existing caching library, unless you want to deal with things like limiting the number of entries stored in the map, choosing which entries get evicted when you reach the limit, etc. EhCache is incredibly simple to use and has many many configuration options - caches can have a maximum number of entries or maximum memory size, caches can overflow to disk, etc.

Make a hashtable where the value is a reference to the object, so that you don't have to store the object multiple times if it has multiple keys.
Luckily, this is the default in Java.

Just use plain MultiKeyMap or even decorate LRU map:
MultiKeyMap cache = MultiKeyMap.decorate(new LRUMap());
cache.put(keyA, keyB, value);

What Java Data Structure/Solution would best fit these requirements?

I need a java data structure/solution that meets these requirements. What best fits these?
1) Object's insertion order must be kept
2) Object's must be unique (These are database objects that are uniquely identified by a UUID).
3) If a newer object with the same ID is added, the older version of the object should be over-written/removed
4) The Solution should be accessible by many threads.
5) When the first object added to the Structure is read/used, it should be removed from the data structure

There are a couple of possibilities here. The simplest might be to start with a LinkedHashSet. That will provide you with the uniqueness and predictable ordering that you require. Then, you could wrap the resulting set to make it thread-safe:
Set<T> s = Collections.synchronizedSet(new LinkedHashSet<T>(...));
Note: Since a Set doesn't really define a method for retrieving items from it, your code would have to manually invoke Set.remove(Object).
Alternatively, you could wrap a LinkedHashMap, which does provide a hook for the delete-on-read semantics you require:
class DeleteOnReadMap<K, V> implements Map<K, V> {
private Map<K, V> m = new LinkedHashMap<K, V>();
// implement Map "read" methods Map with delete-on-read semantics
public V get(K key) {
// ...
}
// (other read methods here)
// implement remaining Map methods by forwarding to inner Map
public V put(K key, V value) {
return m.put(key, value);
}
// (remaining Map methods here)
}
Finally, wrap an instance of your custom Map to make it thread-safe:
Map<K, V> m = Collections.synchronizedMap(new DeleteOnReadMap<K, V>(...));

My thought is something like the following:
Collections.synchronizedMap(new LinkedHashMap<K, V>());
I think that takes care of everything except requirement 5, but you can do that by using the remove() method instead of get().
This won't be quite as efficient as a ConcurrentMap would be - synchronization locks the entire map on every access, but I think ConncurrentMap implementations can use read-write locks and selective locking on only part of the map to allow multiple non-conflicting accesses to go on simultaneously. If you wanted, you could probably get better performance by writing your own subclass of some existing Map implementation.

1) Object's insertion order must be
kept
This is any "normal" data structure - array, arrayList, tree. So avoid self-balancing or self-sorting data structures: heaps, hashtables, or move-to-front trees (splay trees, for example.) Then again, you could use one of those structures, but then you have to keep track of its insertion order in each node.
2) Object's must be unique (These are
database objects that are uniquely
identified by a UUID).
Keep a unique identifier associated with each object. If this is a C program, then the pointer to that node is unique (I guess this applies in Java as well.) If the node's pointer is not sufficient to maintain "uniqueness", then you need to add a field to each node which you gaurantee to have a unique value.
3) If a newer object with the same ID
is added, the older version of the
object should be over-written/removed
Where do you want to place the node? Do you want to replace the existing node? Or do you want to delete the old node,and then add the new one to the end? This is important because it is related to your requirement #1, where the order of insertion must be preserved.
4) The Solution should be accessible
by many threads.
The only way I can think of to do this is to implement some sort of locking. Java lets you wrap strucutres and code within an synchronized block.
5) When the first object added to the
Structure is read/used, it should be
removed from the data structure
Kinda like a "dequeue" operation.
Seems like an ArrayList is a pretty good option for this: simply because of #5. The only problem is that searches are linear. But if you have a relatively small amount of data, then it isn't really that much of a problem.
Otherwise, like others have said: a HashMap or even a Tree of some sort would work - but that will depend on the frequency of accesses. (For example, if the "most recent" element is most likely to be accessed, I'd use a linear structure. But if accesses will be of "random" elements, I'd go with a HashMap or Tree.)

The solutions talking about LinkedHashSet would be a good starting point.
However, you would have to override the equals and hashcode methods on the objects that you are going to be putting in the set in order to satisfy your requirement number 3.

Sounds like you have to create your own data structure, but it sounds like a pretty easy class assignment.
Basically you start with anything like an Array or Stack but then you have to extend it for the rest of the functionality.
You can look at the 'Contains' method as you will need that.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.