High-performance Concurrent MultiMap Java/Scala

High-performance Concurrent MultiMap Java/Scala - java

I am looking for a high-performance, concurrent, MultiMap. I have searched everywhere but I simply cannot find a solution that uses the same approach as ConcurrentHashMap (Only locking a segment of the hash array).
The multimap will be both read, added to and removed from often.
The multimap key will be a String and it's value will be arbitrary.
I need O(1) to find all values for a given key, O(N) is OK for removal, but O(logN) would be preferred.
It is crucial that removal of the last value for a given key will remove the container of values from the key, as to not leak memory.
EDIT: HERE'S THE SOLUTION I BUILT, available under ApacheV2:
Index (multimap)

Why not wrap ConcurrentHashMap[T,ConcurrentLinkedQueue[U]] with some nice Scala-like methods (e.g. implicit conversion to Iterable or whatever it is that you need, and an update method)?

Have you tried Google Collections? They have various Multimap implementations.

There is one in akka although I haven't used it.

I made a ConcurrentMultiMap mixin which extends the mutable.MultiMap mixin and has a concurrent.Map[A, Set[B]] self type. It locks per key, which has O(n) space complexity, but its time complexity is pretty good, if you aren't particularly write-heavy.

I had a requirement where I had to have a Map<Comparable, Set<Comparable>> where insertion on the Map be concurrent and also on the corresponding Set, but once a Key was consumed from the Map, it had to be deleted, think if as a Job running every two seconds which is consuming the whole Set<Comparable> from an specific Key but insertion be totally concurrent so that most values be buffered when the Job kicks in, here is my implementation:
Note: I use Guava's helper class Maps to create the concurrent Maps, also, this solution emulates Java concurrency in Practice Listing 5.19:
import com.google.common.collect.MapMaker;
import com.google.common.collect.Sets;
import java.util.Collection;
import java.util.Set;
import java.util.concurrent.ConcurrentMap;
/**
* A general purpose Multimap implementation for delayed processing and concurrent insertion/deletes.
*
* #param <K> A comparable Key
* #param <V> A comparable Value
*/
public class ConcurrentMultiMap<K extends Comparable, V extends Comparable>
{
private final int size;
private final ConcurrentMap<K, Set<V>> cache;
private final ConcurrentMap<K, Object> locks;
public ConcurrentMultiMap()
{
this(32, 2);
}
public ConcurrentMultiMap(final int concurrencyLevel)
{
this(concurrencyLevel, 2);
}
public ConcurrentMultiMap(final int concurrencyLevel, final int factor)
{
size=concurrencyLevel * factor;
cache=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(concurrencyLevel).makeMap();
locks=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(concurrencyLevel).weakKeys().weakValues().makeMap();
}
private Object getLock(final K key){
final Object object=new Object();
Object lock=locks.putIfAbsent(key, object);
if(lock == null){
lock=object;
}
return lock;
}
public void put(final K key, final V value)
{
synchronized(getLock(key)){
Set<V> set=cache.get(key);
if(set == null){
set=Sets.newHashSetWithExpectedSize(size);
cache.put(key, set);
}
set.add(value);
}
}
public void putAll(final K key, final Collection<V> values)
{
synchronized(getLock(key)){
Set<V> set=cache.get(key);
if(set == null){
set=Sets.newHashSetWithExpectedSize(size);
cache.put(key, set);
}
set.addAll(values);
}
}
public Set<V> remove(final K key)
{
synchronized(getLock(key)){
return cache.remove(key);
}
}
public Set<K> getKeySet()
{
return cache.keySet();
}
public int size()
{
return cache.size();
}
}

you should give ctries a try. here is the pdf.

It's late for the discussion, yet...
When it comes to high performance concurrent stuff, one should be prepared to code the solution.
With Concurrent the statement the Devil is in the details has a complete meaning.
It's possible to implement the structure fully concurrent and lock-free.
Starting base would be the NonBlocking Hashtable http://sourceforge.net/projects/high-scale-lib/ and then depending how many values per key and how often need to add/remove some copy on write Object[] for values or an array based Set with semaphore/spin lock.

I am a bit late on this topic but I think, nowadays, you can use Guava like this:
Multimaps.newSetMultimap(new ConcurrentHashMap<>(), ConcurrentHashMap::newKeySet)

Use MultiMaps from Gauava.
Multimaps.synchronizedMultimap(HashMultimap.create())

Have you taken a look to Javalution which is intended for Real time etc. and of course high performance.

Related

How to wrap ConcurrentSkipListSet to keep a fixed capacity of the latest values in a thread-safe way?

I want to wrap ConcurrentSkipListSet to keep a fixed capacity of the latest (according to Comparator) values:
private int capacity = 100;
// using Integer just for an illustration
private ConcurrentSkipListSet<Integer> intSet = new ConcurrentSkipListSet<>();
Therefore, I implemented put() like this:
// This method should be atomic.
public void put(int value) {
intSet.add(value);
if (intSet.size() > capacity)
intSet.pollFirst();
}
However, this put() is not thread-safe.
Note: No other mutation methods. Of course, I need "read-only" methods like getLast() or getBefore(Integer value).
How to wrap ConcurrentSkipListSet to keep a fixed capacity of the latest values in a thread-safe way?

You're not likely to be able to do this and get the concurrency benefits of ConcurrentSkipListSet. At that point, you might as well just use Collections.synchronizedNavigableSet(TreeSet), at which point you can just write
synchronized (set) {
set.add(value);
if (set.size() > cap) {
set.pollFirst();
}
}

TreeSet Comparator

I have a TreeSet and a custom comparator.
I get the values from server according to the changes in the stock
ex: if time=0 then server will send all the entries on the stock (unsorted)
if time=200 then server will send entries added or deleted after the time 200(unsorted)
In client side i am sorting the entries. My question is which is more efficient
1> fetch all entries first and then call addAll method
or
2> add one by one
there can be millions of entries.
/////////updated///////////////////////////////////
private static Map<Integer, KeywordInfo> hashMap = new HashMap<Integer, KeywordInfo>();
private static Set<Integer> sortedSet = new TreeSet<Integer>(comparator);
private static final Comparator<Integer> comparator = new Comparator<Integer>() {
public int compare(Integer o1, Integer o2) {
int integerCompareValue = o1.compareTo(o2);
if (integerCompareValue == 0) return integerCompareValue;
KeywordInfo k1 = hashMap.get(o1);
KeywordInfo k2 = hashMap.get(o2);
if (null == k1.getKeyword()) {
if (null == k2.getKeyword())
return integerCompareValue;
else
return -1;
} else {
if (null == k2.getKeyword())
return 1;
else {
int compareString = AlphaNumericCmp.COMPARATOR.compare(k1.getKeyword().toLowerCase(), k2.getKeyword().toLowerCase());
//int compareString = k1.getKeyword().compareTo(k2.getKeyword());
if (compareString == 0)
return integerCompareValue;
return compareString;
}
}
}
};
now there is an event handler which gives me an ArrayList of updated entries,
after adding them to my hashMap i am calling
final Map<Integer, KeywordInfo> mapToReturn = new SubMap<Integer, KeywordInfo>(sortedSet, hashMap);

I think your bottleneck can be probably more network-related than CPU related. A bulk operation fetching all the new entries at once would be more network efficient.
With regards to your CPU, the time required to populate a TreeSet does not change consistently between multiple add()s and addAll(). The reason behind is that TreeSet relies on AbstractCollection's addAll() (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/util/AbstractCollection.java#AbstractCollection.addAll%28java.util.Collection%29) which in turn creates an iterator and calls multiple times add().
So, my advice on the CPU side is: choose the way that keeps your code cleaner and more readable. This is probably obtained through addAll().

In general it is less memory overhead when on being loaded alread data is stored. This should be time efficient too, maybe using small buffers. Memory allocation costs time too.
However time both solutions, in a separate prototype. You really have to test with huge numbers, as network traffic costs much too. That is a bit Test Driven Development, and adds to QA both quantitative statistics, as correctness of implementation.

The actual implementation is a linked list, so add one by one will be faster if you do it right. And i think in the near future this behaviour wont be change.
For your problem a Statefull comparator may help.
// snipplet, must not work fine
public class NaturalComparator implements Comparator{
private boolean anarchy = false;
private Comparator parentComparator;
NaturalComparator(Comparator parent){
this.parentComparator = parent;
}
public void setAnarchy(){...}
public int compare(A a, A b){
if(anarchy) return 1
else return parentCoparator.compare(a,b);
}
}
...
Set<Integer> sortedSet = new TreeSet<Integer>(new NaturalComparator(comparator));
comparator.setAnarchy(true);
sortedSet.addAll(sorted);
comparator.setAnarchy(false);

Lambdas and putIfAbsent

I posted an answer here where the code demonstrating use of the putIfAbsent method of ConcurrentMap read:
ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong> ();
public long addTo(String key, long value) {
// The final value it became.
long result = value;
// Make a new one to put in the map.
AtomicLong newValue = new AtomicLong(value);
// Insert my new one or get me the old one.
AtomicLong oldValue = map.putIfAbsent(key, newValue);
// Was it already there? Note the deliberate use of '!='.
if ( oldValue != newValue ) {
// Update it.
result = oldValue.addAndGet(value);
}
return result;
}
The main downside of this approach is that you have to create a new object to put into the map whether it will be used or not. This can have significant effect if the object is heavy.
It occurred to me that this would be an opportunity to use Lambdas. I have not downloaded Java 8 n'or will I be able to until it is official (company policy) so I cannot test this but would something like this be valid and effective?
public long addTo(String key, long value) {
return map.putIfAbsent( key, () -> new AtomicLong(0) ).addAndGet(value);
}
I am hoping to use the lambda to delay the evaluation of the new AtomicLong(0) until it is actually determined that it should be created because it does not exist in the map.
As you can see this is much more succinct and functional.
Essentially I suppose my questions are:
Will this work?
Or have I completely misinterpreted lambdas?
Might something like this work one day?

UPDATE 2015-08-01
The computeIfAbsent method as described below has indeed been added to Java SE 8. The semantics appear to be very close to the pre-release version.
In addition, computeIfAbsent, along with a whole pile of new default methods, has been added to the Map interface. Of course, maps in general can't support atomic updates, but the new methods add considerable convenience to the API.
What you're trying to do is quite reasonable, but unfortunately it doesn't work with the current version of ConcurrentMap. An enhancement is on the way, however. The new version of the concurrency library includes ConcurrentHashMapV8 which contains a new method computeIfAbsent. This pretty much allows you to do exactly what you're looking to do. Using this new method, your example could be rewritten as follows:
public long addTo(String key, long value) {
return map.computeIfAbsent( key, () -> new AtomicLong(0) ).addAndGet(value);
}
For further information about the ConcurrentHashMapV8, see Doug Lea's initial announcement thread on the concurrency-interest mailing list. Several messages down the thread is a followup message that shows an example very similar to what you're trying to do. (Note however the old lambda syntax. That message was from August 2011 after all.) And here is recent javadoc for ConcurrentHashMapV8.
This work is intended to be integrated into Java 8, but it hasn't yet as far as I can see. Also, this is still a work in progress, names and specs may change, etc.

AtomicLong is not really a heavy object. For heavier objects I would consider a lazy proxy and provide a lambda to that one to create the object if needed.
class MyObject{
void doSomething(){}
}
class MyLazyObject extends MyObject{
Funktion create;
MyLazyObject(Funktion create){
this.create = create;
}
MyObject instance;
MyObject getInstance(){
if(instance == null)
instance = create.apply();
return instance;
}
#Override void doSomething(){getInstance().doSomething();}
}
public long addTo(String key, long value) {
return map.putIfAbsent( key, new MyLazyObject( () -> new MyObject(0) ) );
}

Unfortunately it's not as easy as that. There are two main problems with the approach you've sketched out:
1. The type of the map would need to change from Map<String, AtomicLong> to Map<String, AtomicLongFunction> (where AtomicLongFunction is some function interface that has a single method that takes no arguments and returns an AtomicLong).
2. When you retrieve the element from the map you'd need to apply the function each time to get the AtomicLong out of it. This would result in creating a new instance each time you retrieve it, which is not likely what you wanted.
The idea of having a map that runs a function on demand to fill up missing values is a good one, though, and in fact Google's Guava library has a map that does exactly that; see their MapMaker. In fact that code would benefit from Java 8 lambda expressions: instead of
ConcurrentMap<Key, Graph> graphs = new MapMaker()
.concurrencyLevel(4)
.weakKeys()
.makeComputingMap(
new Function<Key, Graph>() {
public Graph apply(Key key) {
return createExpensiveGraph(key);
}
});
you'd be able to write
ConcurrentMap<Key, Graph> graphs = new MapMaker()
.concurrencyLevel(4)
.weakKeys()
.makeComputingMap((Key key) -> createExpensiveGraph(key));
or
ConcurrentMap<Key, Graph> graphs = new MapMaker()
.concurrencyLevel(4)
.weakKeys()
.makeComputingMap(this::createExpensiveGraph);

Note that using Java 8 ConcurrentHashMap it's completely unnecessary to have AtomicLong values. You can safely use ConcurrentHashMap.merge:
ConcurrentMap<String, Long> map = new ConcurrentHashMap<String, Long>();
public long addTo(String key, long value) {
return map.merge(key, value, Long::sum);
}
It's much simpler and also significantly faster.

Should you check if the map containsKey before using ConcurrentMap's putIfAbsent

I have been using Java's ConcurrentMap for a map that can be used from multiple threads. The putIfAbsent is a great method and is much easier to read/write than using standard map operations. I have some code that looks like this:
ConcurrentMap<String, Set<X>> map = new ConcurrentHashMap<String, Set<X>>();
// ...
map.putIfAbsent(name, new HashSet<X>());
map.get(name).add(Y);
Readability wise this is great but it does require creating a new HashSet every time even if it is already in the map. I could write this:
if (!map.containsKey(name)) {
map.putIfAbsent(name, new HashSet<X>());
}
map.get(name).add(Y);
With this change it loses a bit of readability but does not need to create the HashSet every time. Which is better in this case? I tend to side with the first one since it is more readable. The second would perform better and may be more correct. Maybe there is a better way to do this than either of these.
What is the best practice for using a putIfAbsent in this manner?

Concurrency is hard. If you are going to bother with concurrent maps instead of straightforward locking, you might as well go for it. Indeed, don't do lookups more than necessary.
Set<X> set = map.get(name);
if (set == null) {
final Set<X> value = new HashSet<X>();
set = map.putIfAbsent(name, value);
if (set == null) {
set = value;
}
}
(Usual stackoverflow disclaimer: Off the top of my head. Not tested. Not compiled. Etc.)
Update: 1.8 has added computeIfAbsent default method to ConcurrentMap (and Map which is kind of interesting because that implementation would be wrong for ConcurrentMap). (And 1.7 added the "diamond operator" <>.)
Set<X> set = map.computeIfAbsent(name, n -> new HashSet<>());
(Note, you are responsible for the thread-safety of any operations of the HashSets contained in the ConcurrentMap.)

Tom's answer is correct as far as API usage goes for ConcurrentMap. An alternative that avoids using putIfAbsent is to use the computing map from the GoogleCollections/Guava MapMaker which auto-populates the values with a supplied function and handles all the thread-safety for you. It actually only creates one value per key and if the create function is expensive, other threads asking getting the same key will block until the value becomes available.
Edit from Guava 11, MapMaker is deprecated and being replaced with the Cache/LocalCache/CacheBuilder stuff. This is a little more complicated in its usage but basically isomorphic.

You can use MutableMap.getIfAbsentPut(K, Function0<? extends V>) from Eclipse Collections (formerly GS Collections).
The advantage over calling get(), doing a null check, and then calling putIfAbsent() is that we'll only compute the key's hashCode once, and find the right spot in the hashtable once. In ConcurrentMaps like org.eclipse.collections.impl.map.mutable.ConcurrentHashMap, the implementation of getIfAbsentPut() is also thread-safe and atomic.
import org.eclipse.collections.impl.map.mutable.ConcurrentHashMap;
...
ConcurrentHashMap<String, MyObject> map = new ConcurrentHashMap<>();
map.getIfAbsentPut("key", () -> someExpensiveComputation());
The implementation of org.eclipse.collections.impl.map.mutable.ConcurrentHashMap is truly non-blocking. While every effort is made not to call the factory function unnecessarily, there's still a chance it will be called more than once during contention.
This fact sets it apart from Java 8's ConcurrentHashMap.computeIfAbsent(K, Function<? super K,? extends V>). The Javadoc for this method states:
The entire method invocation is performed atomically, so the function
is applied at most once per key. Some attempted update operations on
this map by other threads may be blocked while computation is in
progress, so the computation should be short and simple...
Note: I am a committer for Eclipse Collections.

By keeping a pre-initialized value for each thread you can improve on the accepted answer:
Set<X> initial = new HashSet<X>();
...
Set<X> set = map.putIfAbsent(name, initial);
if (set == null) {
set = initial;
initial = new HashSet<X>();
}
set.add(Y);
I recently used this with AtomicInteger map values rather than Set.

In 5+ years, I can't believe no one has mentioned or posted a solution that uses ThreadLocal to solve this problem; and several of the solutions on this page are not threadsafe and are just sloppy.
Using ThreadLocals for this specific problem isn't only considered best practices for concurrency, but for minimizing garbage/object creation during thread contention. Also, it's incredibly clean code.
For example:
private final ThreadLocal<HashSet<X>>
threadCache = new ThreadLocal<HashSet<X>>() {
#Override
protected
HashSet<X> initialValue() {
return new HashSet<X>();
}
};
private final ConcurrentMap<String, Set<X>>
map = new ConcurrentHashMap<String, Set<X>>();
And the actual logic...
// minimize object creation during thread contention
final Set<X> cached = threadCache.get();
Set<X> data = map.putIfAbsent("foo", cached);
if (data == null) {
// reset the cached value in the ThreadLocal
listCache.set(new HashSet<X>());
data = cached;
}
// make sure that the access to the set is thread safe
synchronized(data) {
data.add(object);
}

My generic approximation:
public class ConcurrentHashMapWithInit<K, V> extends ConcurrentHashMap<K, V> {
private static final long serialVersionUID = 42L;
public V initIfAbsent(final K key) {
V value = get(key);
if (value == null) {
value = initialValue();
final V x = putIfAbsent(key, value);
value = (x != null) ? x : value;
}
return value;
}
protected V initialValue() {
return null;
}
}
And as example of use:
public static void main(final String[] args) throws Throwable {
ConcurrentHashMapWithInit<String, HashSet<String>> map =
new ConcurrentHashMapWithInit<String, HashSet<String>>() {
private static final long serialVersionUID = 42L;
#Override
protected HashSet<String> initialValue() {
return new HashSet<String>();
}
};
map.initIfAbsent("s1").add("chao");
map.initIfAbsent("s2").add("bye");
System.out.println(map.toString());
}

Easy, simple to use LRU cache in java

I know it's simple to implement, but I want to reuse something that already exist.
Problem I want to solve is that I load configuration (from XML so I want to cache them) for different pages, roles, ... so the combination of inputs can grow quite much (but in 99% will not). To handle this 1%, I want to have some max number of items in cache...
Till know I have found org.apache.commons.collections.map.LRUMap in apache commons and it looks fine but want to check also something else. Any recommendations?

You can use a LinkedHashMap (Java 1.4+) :
// Create cache
final int MAX_ENTRIES = 100;
Map cache = new LinkedHashMap(MAX_ENTRIES+1, .75F, true) {
// This method is called just after a new entry has been added
public boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
};
// Add to cache
Object key = "key";
cache.put(key, object);
// Get object
Object o = cache.get(key);
if (o == null && !cache.containsKey(key)) {
// Object not in cache. If null is not a possible value in the cache,
// the call to cache.contains(key) is not needed
}
// If the cache is to be used by multiple threads,
// the cache must be wrapped with code to synchronize the methods
cache = (Map)Collections.synchronizedMap(cache);

This is an old question, but for posterity I wanted to list ConcurrentLinkedHashMap, which is thread safe, unlike LRUMap. Usage is quite easy:
ConcurrentMap<K, V> cache = new ConcurrentLinkedHashMap.Builder<K, V>()
.maximumWeightedCapacity(1000)
.build();
And the documentation has some good examples, like how to make the LRU cache size-based instead of number-of-items based.

Here is my implementation which lets me keep an optimal number of elements in memory.
The point is that I do not need to keep track of what objects are currently being used since I'm using a combination of a LinkedHashMap for the MRU objects and a WeakHashMap for the LRU objects.
So the cache capacity is no less than MRU size plus whatever the GC lets me keep. Whenever objects fall off the MRU they go to the LRU for as long as the GC will have them.
public class Cache<K,V> {
final Map<K,V> MRUdata;
final Map<K,V> LRUdata;
public Cache(final int capacity)
{
LRUdata = new WeakHashMap<K, V>();
MRUdata = new LinkedHashMap<K, V>(capacity+1, 1.0f, true) {
protected boolean removeEldestEntry(Map.Entry<K,V> entry)
{
if (this.size() > capacity) {
LRUdata.put(entry.getKey(), entry.getValue());
return true;
}
return false;
};
};
}
public synchronized V tryGet(K key)
{
V value = MRUdata.get(key);
if (value!=null)
return value;
value = LRUdata.get(key);
if (value!=null) {
LRUdata.remove(key);
MRUdata.put(key, value);
}
return value;
}
public synchronized void set(K key, V value)
{
LRUdata.remove(key);
MRUdata.put(key, value);
}
}

I also had same problem and I haven't found any good libraries... so I've created my own.
simplelrucache provides threadsafe, very simple, non-distributed LRU caching with TTL support. It provides two implementations
Concurrent based on ConcurrentLinkedHashMap
Synchronized based on LinkedHashMap
You can find it here.

Here is a very simple and easy to use LRU cache in Java.
Although it is short and simple it is production quality.
The code is explained (look at the README.md) and has some unit tests.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

High-performance Concurrent MultiMap Java/Scala - java

Why not wrap ConcurrentHashMap[T,ConcurrentLinkedQueue[U]] with some nice Scala-like methods (e.g. implicit conversion to Iterable or whatever it is that you need, and an update method)?

Have you tried Google Collections? They have various Multimap implementations.

There is one in akka although I haven't used it.

I made a ConcurrentMultiMap mixin which extends the mutable.MultiMap mixin and has a concurrent.Map[A, Set[B]] self type. It locks per key, which has O(n) space complexity, but its time complexity is pretty good, if you aren't particularly write-heavy.

you should give ctries a try. here is the pdf.

I am a bit late on this topic but I think, nowadays, you can use Guava like this: Multimaps.newSetMultimap(new ConcurrentHashMap<>(), ConcurrentHashMap::newKeySet)

Use MultiMaps from Gauava. Multimaps.synchronizedMultimap(HashMultimap.create())

Have you taken a look to Javalution which is intended for Real time etc. and of course high performance.

Related

How to wrap ConcurrentSkipListSet to keep a fixed capacity of the latest values in a thread-safe way?

TreeSet Comparator

Lambdas and putIfAbsent

Should you check if the map containsKey before using ConcurrentMap's putIfAbsent

Easy, simple to use LRU cache in java

Categories

Resources