Cache that has access to all existing items - java

I have a system where objects (for the purposes of this question they are immutable) are created based on a request object (could be as simple as a url or a long). They are created with a factory method and not with new.
If an object for a request already exists, requesting a new object would be done more efficiently if instead we can get a reference to the existing instance.
To that end I have created a class, called UniversalCache<K, V> for lack of a better name at this time. It has an LruCache so that an X amount of strong references are kept, and a HashMap<K, SoftReference<V> > to keep track of all the objects that might still be kept alive via other strong references in the system (I'm not relying on a SoftReference keeping the objects from being GC'd).
When a new object is created that is not already in the cache, it is added to the cache along with its key. To search for it in the cache I use the key to get the reference and check if it still has a reference to an object.
The problem I'm having is how to remove these key/reference pairs once the objects get garbage collected. I don't want to go over the whole HashMap searching for references for which poll returns null. Since the referent is not always available, I can't use it to obtain or generate a key back. So I'm extending SoftReference to store the key and use it to remove the pair from the HashMap. Is this a good idea? I have a KeyedSoftReference<K,Rt> that has an additional field of the same type K for the key as the cache (and Rt which ends up being the same as V).
In particular I'd like advice on where to handle the ReferenceQueue (at the moment it's in get) and how to cast the object I get from ReferenceQueue.poll().
This is the code that I have up to now:
package com.frozenkoi.oss;
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;
import java.lang.ref.SoftReference;
import java.util.HashMap;
import android.util.LruCache;
public class UniversalCache<K, V> {
private final LruCache<K, V> mStrongCache;
private final HashMap<K, KeyedSoftReference<K, V> > mSoftCache;
private final ReferenceQueue<V> mRefQueue;
private static class KeyedSoftReference<K, Rt> extends SoftReference<Rt>
{
private final K mKey;
public KeyedSoftReference(K key, Rt r, ReferenceQueue<? super Rt> q)
{
super(r, q);
mKey = key;
}
public K getKey()
{
return mKey;
}
}
public UniversalCache(int strongCacheMaxItemCount)
{
mStrongCache = new LruCache<K, V>(strongCacheMaxItemCount);
mSoftCache = new HashMap<K, KeyedSoftReference<K, V> >();
mRefQueue = new ReferenceQueue<V>();
}
private void solidify(K key, V value)
{
mStrongCache.put(key, value);
}
public void put(K key, V value)
{
solidify(key, value);
mSoftCache.put(key, new KeyedSoftReference<K, V>(key, value, mRefQueue));
}
public V get(K key)
{
//if it's in Strong container, must also be in soft.
//just check in one of them
KeyedSoftReference<K,? extends V> tempRef = mSoftCache.get(key);
final V tempVal = (null!=tempRef)?tempRef.get():null;
V retVal = null;
if (null == tempVal)
{
mSoftCache.remove(key);
retVal = tempVal;
}
else
{
//if found in LruCache container, must be also in Soft one
solidify(key, tempVal);
retVal = tempVal;
}
//remove expired entries
while (null != (tempRef = (KeyedSoftReference<K,V>)mRefQueue.poll())) //Cast
{
//how to get key from val?
K tempKey = tempRef.getKey();
mSoftCache.remove(tempKey);
}
return retVal;
}
}

Related

synchronize a method by achieving better performance?

I have a class that is being called by multiple threads on multi core machine. I want to make it thread safe.
add method will be called by multiple threads. And if key exists, just append the current value to new value otherwise just put key and value in the map.
Now to make it thread safe, I was planning to synchronize add method but it will destroy performance. Is there any better way by which we can achieve better performance without synchronizing add method?
class Test {
private final Map<Integer, Integer> map = new ConcurrentHashMap<>();
public void add(int key, int value) {
if (map.containsKey(key)) {
int val = map.get(key);
map.put(key, val + value);
return;
}
map.put(key, value);
}
public Object getResult() {
return map.toString();
}
}
but it will destroy performance
It likely wouldn't destroy performance. It will reduce it some, with further reduction if there is a high collision rate.
Is there any better way by which we can achieve better performance?
Yes, use merge() (Java 8+). Quoting the javadoc:
If the specified key is not already associated with a value or is associated with null, associates it with the given non-null value. Otherwise, replaces the associated value with the results of the given remapping function, or removes if the result is null.
Example:
public void add(int key, int value) {
map.merge(key, value, (a, b) -> a + b);
}
Or using a method reference to sum(int a, int b) instead of a lambda expression:
public void add(int key, int value) {
map.merge(key, value, Integer::sum);
}
Use merge:
class Test {
final Map<Integer, Integer> map = new ConcurrentHashMap<>();
public void add(int key, int value) {
map.merge(key, value, Integer::sum);
}
public Object getResult() {
return map.toString();
}
}
Java 7 solution if you absolutely can't use synchronized (or, you absolutely cannot lock explicitly):
class Test {
final Map<Integer, AtomicInteger> map = new ConcurrentHashMap<>();
public void add(int key, int value) {
get(key).addAndGet(value);
}
private AtomicInteger get(int key) {
AtomicInteger current = map.get(key);
if (current == null) {
AtomicInteger ai = new AtomicInteger();
current = map.putIfAbsent(key, ai);
if (current == null) {
current = ai;
}
}
return current;
}
public Object getResult() {
return map.toString();
}
}
synchronized causes a bottleneck only when you run an expensive operation holding a lock.
In your case by adding a synchronized you are doing:
1. check a hashmap for existence of a key
2. get the value mapped to that key
3. do an addition and put the result back to the hashmap.
All these operations are super cheap O(1) and unless you are using some strange pattern for the keys which are integers it should be very unlikely that you can get some degenerate performance due to collisions.
I would suggest if you can't use merge as the other answers point out, to just synchronize. You should be considered so much about performance only in critical hotpaths and after you have actually profiled that there is an issue there

Infinite Loop in Hazelcast IMap for compute method

I try use Set interface as value for hazelcast IMap instance and when I run my test I found that test hung inside ConcurrentMap#compute method.
Why do I have infinite loop when I use hazelcast IMap in this code:
import com.hazelcast.config.Config;
import com.hazelcast.config.MapConfig;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.IMap;
import java.io.Serializable;
import java.util.*;
public class Main {
public static void main(String[] args) {
IMap<String, HashSet<StringWrapper>> store = Hazelcast.newHazelcastInstance(
new Config().addMapConfig(new MapConfig("store"))
).getMap("store");
store.compute("user", (k, value) -> {
HashSet<StringWrapper> newValues = Objects.isNull(value) ? new HashSet<>() : new HashSet<>(value);
newValues.add(new StringWrapper("user"));
return newValues;
});
store.compute("user", (k, value) -> {
HashSet<StringWrapper> newValues = Objects.isNull(value) ? new HashSet<>() : new HashSet<>(value);
newValues.add(new StringWrapper("user"));
return newValues;
});
System.out.println(store.keySet());
}
// Data class
public static class StringWrapper implements Serializable {
String value;
public StringWrapper() {}
public StringWrapper(String value) {
this.value = value;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
if (!super.equals(o)) return false;
StringWrapper value = (StringWrapper) o;
return Objects.equals(this.value, value.value);
}
#Override
public int hashCode() {
return Objects.hash(super.hashCode(), value);
}
}
}
Hazelcast: 3.9.3
Java:build 1.8.0_161-b12
Operating system: macOS High Sierra 10.13.3
#Alykoff I reproduced the issue based on above example & ArrayList version, which is reported as a github issue: https://github.com/hazelcast/hazelcast/issues/12557.
There are 2 seperate problems:
1 - When using HashSet, the problem is how Java deserialize the HashSet/ArrayList (collections) & how compute method works. Inside compute method (since Hazelcast complied with Java 6 & there is no compute method to override, default implementation from ConcurrentMap called ), this block causes the infinite loop:
// replace
if (replace(key, oldValue, newValue)) {
// replaced as expected.
return newValue;
}
// some other value replaced old value. try again.
oldValue = get(key);
this replace method calls IMap replace method. IMap checks if the current value equal to the user-supplied value. But because of a Java Serialization optimization, the check fails. Please check HashSet.readObject method. You'll see that when deserializing the HashSet, since element size is known, it creates the inner HashMap with a capacity:
// Set the capacity according to the size and load factor ensuring that
// the HashMap is at least 25% full but clamping to maximum capacity.
capacity = (int) Math.min(size * Math.min(1 / loadFactor, 4.0f),
HashMap.MAXIMUM_CAPACITY);
But your HashSet, created without an initial capacity, has a default capacity of 16, while the deserialized one has the initial capacity of 1. This changes the serialization, index 51 contains the current capacity & it seems JDK re-calculate it based on size when deserializing the object to minimize the size.
Please see below example:
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
IMap<String, Collection<String>> store = instance.getMap("store");
Collection<String> val = new HashSet<>();
val.add("a");
store.put("a", val);
Collection<String> oldVal = store.get("a");
byte[] dataOld = ((HazelcastInstanceProxy) hz).getSerializationService().toBytes(oldVal);
byte[] dataNew = ((HazelcastInstanceProxy) hz).getSerializationService().toBytes(val);
System.out.println(Arrays.equals(dataNew, dataOld));
This code prints false. But if you create the HashSet with the initial size 1, then both byte arrays are equal. And in your case, you won't get an infinite loop.
2 - When using ArrayList, or any other collection, there's another problem which you pointed above. Due to how compute method implemented in ConcurrentMap, when you assign the old value to the newValue & add a new element, you actually modify the oldValue thus causing replace method fail. But when you change the code to new ArrayList(value), now you're creating a new ArrayList & value collection is not modified. It's a best practice to wrap a collection before using it if you don't want to modify the original one. Same works for HashSet if you create with size 1 due to the first issue I explained.
So in your case, you should use
Collection<String> newValues = Objects.isNull(value) ? new HashSet<>(1) : new HashSet<>(value);
or
Collection<String> newValues = Objects.isNull(value) ? new ArrayList<>() : new ArrayList<>(value);
That HashSet case seems to be a JDK issue, rather than an optimization. I don't know any of these cases can be solved/fixed in Hazelcast, unless Hazalcast overrides the HashXXX collection serialization & overrides the compute method.

Why does TreeSet's add method behaves differently in different JREs?

I try to add objects of Employee class to a TreeSet. I don't implement Comparable or Comparator interface. But the add method code behaves differently in different systems. Why so? Code snippet below:-
import java.util.Set;
import java.util.TreeSet;
public class TreeSetTest {
public static void main(String[] args) {
Set<Employee> set = new TreeSet<Employee>();
set.add(new Employee());
// set.add(new Employee());
// set.add(new Employee());
}
}
On the current system (Win 10), whether I write set.add() method once or thrice. It always throws ClassCastException at runtime.
But talking of this question- Why does TreeSet throws ClassCastException
The user there has written, that he doesn't get exception when he uses add method only once.
Also, in another system (Win 7), yesterday I had tried adding object 3 times, calling set method thrice, and there is no ClassCastException!! The size of set remains 1 only, so it appeared that multiple objects are just NOT getting added to set.
So what could be the reason for different-different kind of behavior of add method?
TreeSet.add() delegates to TreeMap.put(), which has differing behavior in Java 6 and Java 8.
Java 6:
public V put(K key, V value) {
Entry<K,V> t = root;
if (t == null) {
// TBD:
// 5045147: (coll) Adding null to an empty TreeSet should
// throw NullPointerException
//
// compare(key, key); // type check
root = new Entry<K,V>(key, value, null);
size = 1;
modCount++;
return null;
}
...
Java 8:
public V put(K key, V value) {
Entry<K,V> t = root;
if (t == null) {
compare(key, key); // type (and possibly null) check
root = new Entry<>(key, value, null);
size = 1;
modCount++;
return null;
}
...
As you can see, the earlier version had the compare() line commented out for some reason, but it was added back in the later version. Hence the exception you're seeing for the first element.
See here also: Why TreeSet can be used as a key for TreeMap in jdk 1.6?

Bidirectional multimap equivalent data structure

I know that Guava has a BiMultimap class internally but didn't outsource the code. I need a data structure which is bi-directional, i.e. lookup by key and by value and also accepts duplicates.
i.e. something like this: (in my case, values are unique, but two values can point to the same key)
0 <-> 5
1 <-> 10
2 <-> 7
2 <-> 8
3 <-> 11
I want to be able to get(7) -> returning 2 and get(2) returning [7, 8].
Is there another library out there which has a data structure I can make use of?
If not, what do you suggest is the better option to handle this case? Is keeping two Multimaps in memory one with and the other with a bad practice?
P.S.: I have read this question: Bidirectional multi-valued map in Java but considering it is dated in 2011, I thought I'll open a more recent question
What do you mean by
Guava has a BiMultimap class internally but didn't outsource the code
The code of an implementation is here.
I didn't check if this is a working implementation, nor if it made it into a release or if I'm just looking at some kind of snapshot. Everything is out in the open, so you should be able to get it.
From a quick glance at the source code it looks like the implementation does maintain two MultMaps, and this should be fine for the general case.
If you don't need the whole bunch of Guava HashBiMultimap functionality, but just getByKey() and getByValue(), as you specified, I can suggest the approach, where only one HashMultiMap is used as a storage.
The idea is to treat provided key and value as equilibrium objects and put both of them in the storage map as keys and values.
For example: Let we have the following multiMap.put(0, 5), so we should get the storage map containing something like this [[key:0, value:5], [key:5, value:0]].
As far as we need our BiMultiMap to be generic, we also need to provide some wrapper classes, that should be used as storage map type parameters.
Here is this wrapper class:
public class ObjectHolder {
public static ObjectHolder newLeftHolder(Object object) {
return new ObjectHolder(object, false);
}
public static ObjectHolder newRightHolder(Object object) {
return new ObjectHolder(object, true);
}
private Object object;
private boolean flag;
private ObjectHolder(Object object, boolean flag) {
this.object = object;
this.flag = flag;
}
public Object getObject() {
return object;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof ObjectHolder)) return false;
ObjectHolder that = (ObjectHolder) o;
if (flag != that.flag) return false;
if (!object.equals(that.object)) return false;
return true;
}
#Override
public int hashCode() {
int result = object.hashCode();
result = 31 * result + (flag ? 1 : 0);
return result;
}
}
And here is the MultiMap:
public class BiHashMultiMap<L, R> {
private Map<ObjectHolder, Set<ObjectHolder>> storage;
public SimpleBiMultiMap() {
storage = new HashMap<ObjectHolder, Set<ObjectHolder>>();
}
public void put(L left, R right) {
ObjectHolder leftObjectHolder = ObjectHolder.newLeftHolder(left);
ObjectHolder rightObjectHolder = ObjectHolder.newRightHolder(right);
put(leftObjectHolder, rightObjectHolder);
put(rightObjectHolder, leftObjectHolder);
}
private void put(ObjectHolder key, ObjectHolder value) {
if (!storage.containsKey(key)) {
storage.put(key, new HashSet<ObjectHolder>());
}
storage.get(key).add(value);
}
public Set<R> getRight(L left) {
return this.get(ObjectHolder.newLeftHolder(left));
}
public Set<L> getLeft(R right) {
return this.get(ObjectHolder.newRightHolder(right));
}
private <V> Set<V> get(ObjectHolder key) {
Set<ObjectHolder> values = storage.get(key);
if (values == null || values.isEmpty()) {
return null;
}
Set<V> result = new HashSet<V>();
for (ObjectHolder value : values) {
result.add((V)value.getObject());
}
return result;
}
}
Thing that could seem strange is the left and right prefixed variable everywhere. You can think of them as left is the original key, that was putted to map and right is the value.
Usage example:
BiHashMultiMap<Integer, Integer> multiMap = new BiHashMultiMap<Integer, Integer>();
multiMap.put(0,5);
multiMap.put(1,10);
multiMap.put(2,7);
multiMap.put(3,7);
multiMap.put(2,8);
multiMap.put(3,11);
Set<Integer> left10 = multiMap.getLeft(10); // [1]
Set<Integer> left7 = multiMap.getLeft(7); // [2, 3]
Set<Integer> right0 = multiMap.getRight(0); // [5]
Set<Integer> right3 = multiMap.getRight(3); // [7, 11]
So to get left value we need to provide right value as key and to get right value we need to provide left as a key.
And of course to make map fully function we need to provide other methods, like remove(), contains() and so on.

Create and put a map value only if not already present, and get it: thread-safe implementation

What is the best way to make this snippet thread-safe?
private static final Map<A, B> MAP = new HashMap<A, B>();
public static B putIfNeededAndGet(A key) {
B value = MAP.get(key);
if (value == null) {
value = buildB(...);
MAP.put(key, value);
}
return value;
}
private static B buildB(...) {
// business, can be quite long
}
Here are the few solutions I could think about:
I could use a ConcurrentHashMap, but if I well understood, it just makes the atomic put and get operations thread-safe, i.e. it does not ensure the buildB() method to be called only once for a given value.
I could use Collections.synchronizedMap(new HashMap<A, B>()), but I would have the same issue as the first point.
I could set the whole putIfNeededAndGet() method synchronized, but I can have really many threads accessing this method together, so it could be quite expensive.
I could use the double-checked locking pattern, but there is still the related out-of-order writes issue.
What other solutions may I have?
I know this is a quite common topic on the Web, but I didn't find a clear, full and working example yet.
Use ConcurrentHashMap and the lazy init pattern which you used
public static B putIfNeededAndGet(A key) {
B value = map.get(key);
if (value == null) {
value = buildB(...);
B oldValue = map.putIfAbsent(key, value);
if (oldValue != null) {
value = oldValue;
}
}
return value;
}
This might not be the answer you're looking for, but use the Guava CacheBuilder, it already does all that and more:
private static final LoadingCache<A, B> CACHE = CacheBuilder.newBuilder()
.maximumSize(100) // if necessary
.build(
new CacheLoader<A, B>() {
public B load(A key) {
return buildB(key);
}
});
You can also easily add timed expiration and other features as well.
This cache will ensure that load() (or in your case buildB) will not be called concurrently with the same key. If one thread is already building a B, then any other caller will just wait for that thread.
In the above solution it is possible that many threads will class processB(...) simultaneously hence all will calculate. But in my case i am using Future and a single thread only get the old value as null hence it will only compute the processB rest will wait on f.get().
private static final ConcurrentMap<A, Future<B>> map = new ConcurrentHashMap<A, Future<B>>();
public static B putIfNeededAndGet(A key) {
while (true) {
Future<V> f = map.get(key);
if (f == null) {
Callable<B> eval = new Callable<V>() {
public B call() throws InterruptedException {
return buildB(...);
}
};
FutureTask<V> ft = new FutureTask<V>(eval);
f = map.putIfAbsent(arg, ft);
if (f == null) {
f = ft;
ft.run();
}
}
try {
return f.get();
} catch (CancellationException e) {
cache.remove(arg, f);
} catch (ExecutionException e) {
}
}
}
Thought maybe this will be useful for someone else as well, using java 8 lambdas I created this function which worked great for me:
private <T> T getOrCreate(Object key, Map<Object, T> map,
Function<Object, T> creationFunction) {
T value = map.get(key);
// if the doesn't exist yet - create and add it
if (value == null) {
value = creationFunction.apply(key);
map.put(label, metric);
}
return value;
}
then you can use it like this:
Object o = getOrCreate(key, map, s -> createSpecialObjectWithKey(key));
I created this for something specific but changed the context and code to a more general look, that is why my creationFunction has one parameter, it can also have no parameters...
also you can generify it more by changing Object to a generic type, if it's not clear let me know and I'll add another example.
UPDATE:
I just found out about Map.computeIfAbsent which basically does the same, gotta love java 8 :)

Categories

Resources