How to populate concurrenthashmap from multiple threads?

How to populate concurrenthashmap from multiple threads? - java

I have a ConcurrentHashMap which I am populating from multiple threads.
private static Map<DataCode, Long> errorMap = new ConcurrentHashMap<DataCode, Long>();
public static void addError(DataCode error) {
if (errorMap.keySet().contains(error)) {
errorMap.put(error, errorMap.get(error) + 1);
} else {
errorMap.put(error, 1L);
}
}
My above addError method is called from multiple threads which populates errorMap. I am not sure whether this is thread safe? Is there anything wrong I am doing here?
Any explanation of why it can skip updates will help me to understand better.

Whether this is safe depends on what you mean. It won't throw exceptions or corrupt the map, but it can skip updates. Consider:
Thread1: errorMap.get(error) returns 1
Thread2: errorMap.get(error) returns 1
Thread1: errorMap.put(error, 1+1);
Thread2: errorMap.put(error, 1+1);
A similar race exists around the keySet().contains(error) operation. To fix this you'll need to use atomic operations to update the map.
On Java 8, this is easy:
errorMap.compute(error, oldValue -> oldValue == null ? 1L : oldValue + 1L);
On older versions of Java you need to use a compare-and-update loop:
Long prevValue;
boolean done;
do {
prevValue = errorMap.get(error);
if (prevValue == null) {
done = errorMap.putIfAbsent(error, 1L);
} else {
done = errorMap.replace(error, prevValue, newValue);
}
} while (!done);
With this code, if two threads race one may end up retrying its update, but they'll get the right value in the end.
Alternately, you can also use Guava's AtomicLongMap which does all the thread-safety magic for you and gets higher performance (by avoiding all those boxing operations, among other things):
errorAtomicLongMap.incrementAndGet(error);

Related

Is following code Thread safe

I have a scenario where i have to maintain a Map which can be populated by multiple threads ,each modifying there respective List (unique identifier/key being thread name) and when the list size for a thread exceeds a fixed batch size we have to persist the records in DB.
Sample code below:
private volatile ConcurrentHashMap<String, List<T>> instrumentMap = new ConcurrentHashMap<String, List<T>>();
private ReadWriteLock lock ;
public void addAll(List<T> entityList, String threadName) {
try {
lock.readLock().lock();
List<T> instrumentList = instrumentMap.get(threadName);
if(instrumentList == null) {
instrumentList = new ArrayList<T>(batchSize);
instrumentMap.put(threadName, instrumentList);
}
if(instrumentList.size() >= batchSize -1){
instrumentList.addAll(entityList);
recordSaver.persist(instrumentList);
instrumentList.clear();
} else {
instrumentList.addAll(entityList);
}
} finally {
lock.readLock().unlock();
}
}
There is one more separate thread running after every 2 minutes to persist all the records in Map (to make sure we have something persisted after every 2 minutes and map size does not gets too big) and when it starts it block all other threads (check the readLock and writeLock usawhere writeLock has higher priority)
if(//Some condition) {
Thread.sleep(//2 minutes);
aggregator.getLock().writeLock().lock();
List<T> instrumentList = instrumentMap .values().stream().flatMap(x->x.stream()).collect(Collectors.toList());
if(instrumentList.size() > 0) {
saver.persist(instrumentList);
instrumentMap .values().parallelStream().forEach(x -> x.clear());
aggregator.getLock().writeLock().unlock();
}
This solution is working fine almost for every scenario we tested except sometime we see some of the records went missing i.e. not persisted at all although they were added fine in Map
My question is what is the problem with this code?
Is ConcurrentHashMap not the best solution here?
Does usage of read/write lock has some problem here?
Should i go with sequential processing?

No, it's not thread safe.
The problem is that you are using the read lock of the ReadWriteLock. This doesn't guarantee exclusive access for making updates. You'd need to use the write lock for that.
But you don't really need to use a separate lock at all. You can simply use the ConcurrentHashMap.compute method:
instrumentMap.compute(threadName, (tn, instrumentList) -> {
if (instrumentList == null) {
instrumentList = new ArrayList<>();
}
if(instrumentList.size() >= batchSize -1) {
instrumentList.addAll(entityList);
recordSaver.persist(instrumentList);
instrumentList.clear();
} else {
instrumentList.addAll(entityList);
}
return instrumentList;
});
This allows you to update items in the list whilst also guaranteeing exclusive access to the list for a given key.
I suspect that you could split the compute call into computeIfAbsent (to add the list if one is not there) and then a computeIfPresent (to update/persist the list): the atomicity of these two operations is not necessary here. But there is no real point in splitting them up.
Additionally, instrumentMap almost certainly shouldn't be volatile. Unless you really want to reassign its value (given this code, I doubt that), remove volatile and make it final.
Similarly, non-final locks are questionable too. If you stick with using a lock, make that final too.

Caching values in ConcurrentHashmap to avoid database read

Is my code below correct at using a Map as a simple threadsafe cache to avoid reading from the database? I just want to know the correctness of the code below rather than suggestions to use framework X instead.
public class Foo {
private static final Map<String, String> CACHE = new ConcurrentHashMap<>();
public void doWork(String key) {
String value = CACHE.get(key);
if (value == null) {
synchronized (CACHE) {
value = CACHE.get(key);
if (value == null) {
value = database.getValue();
CACHE.put(key, value);
}
}
}
// do work with value
}
}
Other Questions:
Instead of using CACHE in synchronized(), would it be better if I have a Object lock in my class and use synchronized on that instead?
Would using HashMap for CACHE instead work?

There is a fairly standard "pattern" for using ConcurrentHashMap in this way (in this case, you do not want to use a synchronized block or other locking mechanism):
String value = CACHE.get(key);
if (value == null) {
/* 3 */ String newValue = calculateValueForKey(key);
/* 4 */ value = CACHE.putIfAbsent(key, newValue);
if (value == null) {
value = newValue;
}
}
/* Work with 'value' */
This approach works well when calculateValueForKey() runs quickly and doesn't have any side effects - it could be invoked multiple times for the same key depending on timing. The downside is that if calculateValueForKey() takes a long time and is I/O bound (as it is in your case) you could have multiple threads that are all running calculateValueForKey() for the same key at the same time. If there are 3 threads executing line 3 for the same key, 2 of them will "lose" at line 4 and have their results thrown away which is not very efficient. For these situations I would recommend something along these lines which is mostly lifted from the Memoizer example in Java Concurrency in Practice (Goetz, B. (2006)) which I highly recommend:
private static final ConcurrentMap<String, Future<String>> CACHE
= new ConcurrentHashMap<>();
public void doWork(String key)
{
String value;
try {
value = calculateValueForKey(key);
} catch (InterruptedException e) {
// Restore interrupted status and return
Thread.currentThread.interrupt();
return;
}
// do work with value
}
private String calculateValueForKey(final String key)
throws InterruptedException
{
while (true) {
Future<String> f = CACHE.get(key);
if (f == null) {
FutureTask<String> newCalc = new FutureTask<>(new Callable<String>() {
#Override
public String call()
{
return database.getValue(key);
}
)};
f = CACHE.putIfAbsent(key, newCalc);
if (f == null) {
f = newCalc;
newCalc.run();
}
}
try {
return f.get();
} catch (CancellationException e) {
CACHE.remove(key, f);
} catch (ExecutionException e) {
Throwable cause = e.getCause();
if (cause instanceof RuntimeException) {
throw (RuntimeException) cause;
} else if (cause instanceof Error) {
throw (Error) cause;
} else {
throw new IllegalStateException("Not unchecked", cause);
}
}
}
}
Obviously this code is more complex, which is why I've extracted the meat of it into another method, but it is very powerful. Rather than putting the value into the map, you are putting a Future that represents the calculation of that value into the map. Calling get() on that future will block until the computation is complete. This means that if 3 threads were simultaneously trying to retrieve the value for a given key, only a single computation would be run while all 3 threads waiting on the same result. Subsequent requests for the same key would return immediately with the calculated result.
To answer your specific questions:
Is my code below correct at using a Map as a simple threadsafe cache to avoid reading from the database? I'm going to say no. You're use of a synchronized block here is unnecessary. Furthermore if multiple threads are simultaneously trying to access the values for different keys that are not yet in the Map, they will block each other during their respective database queries, meaning that they will run in serial rather than in parallel.
Instead of using CACHE in synchronized(), would it be better if I have a Object lock in my class and use synchronized on that instead? No. You would typically use a surrogate object for synchronization when you want to read/write multiple mutable fields and you don't want consumers of your class to be able to affect the synchronization semantics of your object "from the outside."
Would using HashMap for CACHE instead work? I guess you could? But then you would need to adjust your synchronization policies so that CACHE (or a surrogate lock object) is always synchronized when the Map is read from or written to. I'm not sure why you would want to do that given better alternatives.

CACHE.get(key) will throw a NullPointerException if the key is null. Read the manual:
Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.
Furthermore it doesn't really make sense to synchronize over your map and try to retrieve the value again. The method should rather return that it cannot get a value for that key and that's it!
Also, no need to synchronize over a ConcurrentHashMap hence the name.
Create an additional method which retrieves the value from the database if the value is not in the map!
I strongly suggest to test your methods with unit tests!

Be careful with custom cache-es. Sometimes they only make things worse. I.e. they are a great source of reference leak, e.g. when the last reference to the object comes from the cache. WeakReference-s or PhantomReference-s can solve this problem. Check this post for further details.
Another issue is the synchronization hit that comes from the ConcurrentHashMap. Sometimes it's worth the cost, sometimes not.
You might want to limit the cache size and remove the least used references - but that will cause some overhead too.
So, you'll have to measure performance carefully.

Multiple threads checking map size and conccurency

I have a method that's supposed to feed a map from a queue and it only does that if the map size is not exceeding a certain number. This prompted concurrency problem as the size I get from every thread is non coherent globaly. I replicated the problem by this code
import java.sql.Timestamp;
import java.util.Date;
import java.util.concurrent.ConcurrentHashMap;
public class ConcurrenthashMapTest {
private ConcurrentHashMap<Integer, Integer> map = new ConcurrentHashMap<Integer, Integer>();
private ThreadUx[] tArray = new ThreadUx[999];
public void parallelMapFilling() {
for ( int i = 0; i < 999; i++ ) {
tArray[i] = new ThreadUx( i );
}
for ( int i = 0; i < 999; i++ ) {
tArray[i].start();
}
}
public class ThreadUx extends Thread {
private int seq = 0;
public ThreadUx( int i ) {
seq = i;
}
#Override
public void run() {
while ( map.size() < 2 ) {
map.put( seq, seq );
System.out.println( Thread.currentThread().getName() + " || The size is: " + map.size() + " || " + new Timestamp( new Date().getTime() ) );
}
}
}
public static void main( String[] args ) {
new ConcurrenthashMapTest().parallelMapFilling();
}
}
Normally I should have only one line of output and the size not exceeding 1, but I do have some stuff like this
Thread-1 || The size is: 2 || 2016-06-07 18:32:55.157
Thread-0 || The size is: 2 || 2016-06-07 18:32:55.157
I tried marking the whole run method as synchronized but that didn't work, only when I did this
#Override
public void run() {
synchronized ( map ) {
if ( map.size() < 1 ) {
map.put( seq, seq );
System.out.println( Thread.currentThread().getName() + " || The size is: " + map.size() + " || " + new Timestamp( new Date().getTime() ) );
}
}
}
It worked, why is only the synch block working and the synch method? Also I don't want to use something as old as a synch block as I am working on a Java EE app, is there a Spring or Java EE task executor or annotation that can help?

From Java Concurrency in Practice:
The semantics of methods of ConcurrentHashMap that operate on the entire Map, such as size and isEmpty, have been slightly weakened to reflect the concurrent nature of the collection. Since the result of size could be out of date by the time it is computed, it is really only an estimate, so size is allowed to return an approximation instead of an exact count. While at first this may seem disturbing, in reality methods like size and isEmpty are far less useful in concurrent environments because these quantities are moving targets. So the requirements for these operations were weakened to enable performance optimizations for the most important operations, primarily get, put, containsKey, and remove.
The one feature offered by the synchronized Map implementations but not by ConcurrentHashMap is the ability to lock the map for exclusive access. With Hashtable and synchronizedMap, acquiring the Map lock prevents any other thread from accessing it. This might be necessary in unusual cases such as adding several mappings atomically, or iterating the Map several times and needing to see the same elements in the same order. On the whole, though, this is a reasonable tradeoff: concurrent collections should be expected to change their contents continuously.
Solutions:
Refactor design and do not use size method with concurrent access.
To use methods as size and isEmpty you can use synchronized collection Collections.synchronizedMap. Synchronized collections achieve their thread safety by serializing all access to the collection's state. The cost of this approach is poor concurrency; when multiple threads contend for the collection-wide lock, throughput suffers. Also you will need to synchronize the block where it checks-and-puts with map instance, because it's a compound action.
Third. Use third-party implementation or write your own.
public class BoundConcurrentHashMap <K,V> {
private final Map<K, V> m;
private final Semaphore semaphore;
public BoundConcurrentHashMap(int size) {
m = new ConcurrentHashMap<K, V>();
semaphore = new Semaphore(size);
}
public V get(V key) {
return m.get(key);
}
public boolean put(K key, V value) {
boolean hasSpace = semaphore.tryAcquire();
if(hasSpace) {
m.put(key, value);
}
return hasSpace;
}
public void remove(Object key) {
m.remove(key);
semaphore.release();
}
// approximation, do not trust this method
public int size(){
return m.size();
}
}
Class BoundConcurrentHashMap is as effective as ConcurrentHashMap and almost thread-safe. Because removing an element and releasing semaphore in remove method are not simultaneous as it should be. But in this case it is tolerable. size method still returns approximated value, but put method will not allow to exceed map size.

You are using ConcurrentHashMap, and according to the API doc:
Bear in mind that the results of aggregate status methods including
size, isEmpty, and containsValue are typically useful only when a map
is not undergoing concurrent updates in other threads. Otherwise the
results of these methods reflect transient states that may be adequate
for monitoring or estimation purposes, but not for program control.
Which means you cannot get accurate result unless you explicit synchronize the access to size().
Adding synchronized to the run method does not work because threads are not synchronizing on the same lock object -- each getting a lock on itself.
Synchronizing on the map itself definitely work, but IMHO it's not a good choice because then you lose the performance advantage ConcurrentHashMap can provide.
In conclusion you need to reconsider the design.

When using thread safe collections in Java, what's the best way to handle concurrency issues?

This might be a very naive of me, but I was always under assumption that the code example below would always work and not crash with a NullPointerException while using thread safe collections in Java. Unfortunately it would seem that the thread t2 is able to remove the item from the list in-between the call to containsKey() and get() methods below the two thread declarations. The commented section shows a way to handle this problem without ever getting a NullPointerException because it simply tests to see if the result from get() is null.
My question is, what's the right way to handle this problem in Java using thread safe collections? Sure I could use a mutex or a synchronized block, but doesn't this sort of defeat a lot of the benefits and ease of use surrounding thread safe collections? If I have to use a mutex or synchronized block, couldn't I just use non-thread safe collection instead? In addition, I've always heard (in academia) that checking code for null value is bad programming practice. Am I just crazy? Is there a simple answer to this problem? Thank you in advance.
package test;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
public class Test {
public static void main(String[] args) {
final Map<Integer, Integer> test = new ConcurrentHashMap<>();
Thread t1 = new Thread(new Runnable() {
#Override
public void run() {
while(true) {
test.put(0, 0);
Thread.yield();
}
}
});
Thread t2 = new Thread(new Runnable() {
#Override
public void run() {
while(true) {
test.remove(0);
Thread.yield();
}
}
});
t1.start();
t2.start();
while(true) {
if (test.containsKey(0)) {
Integer value = test.get(0);
System.out.println(value);
}
Thread.yield();
}
// OR
// while(true) {
// Integer value = test.get(0);
// if (value != null) {
// System.out.println(value);
// }
// Thread.yield();
// }
}
}

what's the right way to handle this problem in Java using thread safe collections?
Only perform one operation so it is atomic. It is faster as well.
Integer value = test.get(0);
if (value != null) {
System.out.println(value);
}
I've always heard (in academia) that checking code for null value is bad programming practice. Am I just crazy?
Possibly. I think checking for null, if a value can be null is best practice.

You're misusing thread-safe collections.
The thread-safe collections cannot possible prevent other code from running between containsKey() and get().
Instead, they provide you with additional thread-safe methods that will atomically check and get the element, without allowing other threads to interfere.
This means that you should never use a concurrent collection through the base collection interfaces (Map or List).
Instead, declare your field as a ConcurrentMap.
In your case, you can simply call get(), which will atomically return null if the key is not found.
There is no alternative to checking for null here. (unlike more elegant function languages, which use the Maybe monad instead)

This
if (test.containsKey(0)) {
Integer value = test.get(0);
System.out.println(value);
}
is still not atomic. A thread can add/remove after you've checked for containsKey.
You need to synchronize on a shared resource around that snippet. Or check for null after you get.
All operations in a ConcurrentHashMap are thread-safe, but they do not extend past method boundaries.

In addition, I've always heard (in academia) that checking code for null value is bad programming practice.
with a generic Map, when you write Integer i = map.get(0); then if i is null, you can't conclude that 0 is not in the map - it could be there but map to a null value.
However, with a ConcurrentHashMap, you have the guarantee that there are no null values:
Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.
So using:
Integer i = map.get(0);
if (i != null) ...
is perfectly fine.

Atomically incrementing counters stored in ConcurrentHashMap

I would like to collect some metrics from various places in a web app. To keep it simple, all these will be counters and therefore the only modifier operation is to increment them by 1.
The increments will be concurrent and often. The reads (dumping the stats) is a rare operation.
I was thinking to use a ConcurrentHashMap. The issue is how to increment the counters correctly. Since the map doesn't have an "increment" operation, I need to read the current value first, increment it than put the new value in the map. Without more code, this is not an atomic operation.
Is it possible to achieve this without synchronization (which would defeat the purpose of the ConcurrentHashMap)? Do I need to look at Guava ?
Thanks for any pointers.
P.S.
There is a related question on SO (Most efficient way to increment a Map value in Java) but focused on performance and not multi-threading
UPDATE
For those arriving here through searches on the same topic: besides the answers below, there's a useful presentation which incidentally covers the same topic. See slides 24-33.

In Java 8:
ConcurrentHashMap<String, LongAdder> map = new ConcurrentHashMap<>();
map.computeIfAbsent("key", k -> new LongAdder()).increment();

Guava's new AtomicLongMap (in release 11) might address this need.

You're pretty close. Why don't you try something like a ConcurrentHashMap<Key, AtomicLong>?
If your Keys (metrics) are unchanging, you could even just use a standard HashMap (they are threadsafe if readonly, but you'd be well advised to make this explicit with an ImmutableMap from Google Collections or Collections.unmodifiableMap, etc.).
This way, you can use map.get(myKey).incrementAndGet() to bump statistics.

Other than going with AtomicLong, you can do the usual cas-loop thing:
private final ConcurrentMap<Key,Long> counts =
new ConcurrentHashMap<Key,Long>();
public void increment(Key key) {
if (counts.putIfAbsent(key, 1)) == null) {
return;
}
Long old;
do {
old = counts.get(key);
} while (!counts.replace(key, old, old+1)); // Assumes no removal.
}
(I've not written a do-while loop for ages.)
For small values the Long will probably be "cached". For longer values, it may require allocation. But the allocations are actually extremely fast (and you can cache further) - depends upon what you expect, in the worst case.

Got a necessity to do the same.
I'm using ConcurrentHashMap + AtomicInteger.
Also, ReentrantRW Lock was introduced for atomic flush(very similar behavior).
Tested with 10 Keys and 10 Threads per each Key. Nothing was lost.
I just haven't tried several flushing threads yet, but hope it will work.
Massive singleusermode flush is torturing me...
I want to remove RWLock and break down flushing into small pieces. Tomorrow.
private ConcurrentHashMap<String,AtomicInteger> counters = new ConcurrentHashMap<String, AtomicInteger>();
private ReadWriteLock rwLock = new ReentrantReadWriteLock();
public void count(String invoker) {
rwLock.readLock().lock();
try{
AtomicInteger currentValue = counters.get(invoker);
// if entry is absent - initialize it. If other thread has added value before - we will yield and not replace existing value
if(currentValue == null){
// value we want to init with
AtomicInteger newValue = new AtomicInteger(0);
// try to put and get old
AtomicInteger oldValue = counters.putIfAbsent(invoker, newValue);
// if old value not null - our insertion failed, lets use old value as it's in the map
// if old value is null - our value was inserted - lets use it
currentValue = oldValue != null ? oldValue : newValue;
}
// counter +1
currentValue.incrementAndGet();
}finally {
rwLock.readLock().unlock();
}
}
/**
* #return Map with counting results
*/
public Map<String, Integer> getCount() {
// stop all updates (readlocks)
rwLock.writeLock().lock();
try{
HashMap<String, Integer> resultMap = new HashMap<String, Integer>();
// read all Integers to a new map
for(Map.Entry<String,AtomicInteger> entry: counters.entrySet()){
resultMap.put(entry.getKey(), entry.getValue().intValue());
}
// reset ConcurrentMap
counters.clear();
return resultMap;
}finally {
rwLock.writeLock().unlock();
}
}

I did a benchmark to compare the performance of LongAdder and AtomicLong.
LongAdder had a better performance in my benchmark: for 500 iterations using a map with size 100 (10 concurrent threads), the average time for LongAdder was 1270ms while that for AtomicLong was 1315ms.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to populate concurrenthashmap from multiple threads? - java

Related

Is following code Thread safe

Caching values in ConcurrentHashmap to avoid database read

Multiple threads checking map size and conccurency

When using thread safe collections in Java, what's the best way to handle concurrency issues?

Atomically incrementing counters stored in ConcurrentHashMap

Categories

Resources