Java synchronization on Collection with expensive operations

Java synchronization on Collection with expensive operations - java

I have a list that I synchronize on named synchronizedMap in my function doMapOperation. In this function, I need to add/remove items from a map and perform expensive operations on these objects. I know that I don't want to call an expensive operation in a synchronized block, but I don't know how to make sure that the map is in a consistent state while I do these operations. What is the right way to do this?
This is my initial layout which I am sure is wrong because you want to avoid calling an expensive operation in a synchronized block:
public void doMapOperation(Object key1, Object key2) {
synchronized (synchronizedMap) {
// Remove key1 if it exists.
if (synchronizedMap.containsKey(key1)) {
Object value = synchronizedMap.get(key1);
value.doExpensiveOperation(); // Shouldn't be in synchronized block.
synchronizedMap.remove(key1);
}
// Add key2 if necessary.
Object value = synchronizedMap.get(key2);
if (value == null) {
Object value = new Object();
synchronizedMap.put(key2, value);
}
value.doOtherExpensiveOperation(); // Shouldn't be in synchronized block.
} // End of synchronization.
}
I guess as a continuation of this question, how would you do this in a loop?
public void doMapOperation(Object... keys) {
synchronized (synchronizedMap) {
// Loop through keys and remove them.
for (Object key : keys) {
// Check if map has key, remove if key exists, add if key doesn't.
if (synchronizedMap.containsKey(key)) {
Object value = synchronizedMap.get(key);
value.doExpensiveOperation(); // Shouldn't be here.
synchronizedMap.remove(key);
} else {
Object value = new Object();
value.doAnotherExpensiveOperation(); // Shouldn't here.
synchronizedMap.put(key, value);
}
}
} // End of synchronization block.
}
Thanks for the help.

You can do the expensive operations outside your synchronized block like so:
public void doMapOperation(Object... keys) {
ArrayList<Object> contained = new ArrayList<Object>();
ArrayList<Object> missing = new ArrayList<Object>();
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
contained.add(synchronizedMap.get(key));
synchronizedMap.remove(key);
} else {
missing.add(synchronizedMap.get(key));
synchronizedMap.put(key, value);
}
}
for (Object o : contained)
o.doExpensiveOperation();
for (Object o : missing)
o.doAnotherExpensiveOperation();
}
The only disadvantage is you may be performing operations on values after they are removed from the synchronizedMap.

You can create a wrapper for your synchronizedMap and make sure the operations like containsKey, remove, and put are synchronized methods. Then only access to the map will be synchronized, while your expensive operations can take place outside the synchronized block.
Another advantage is by keeping your expensive operations outside the synchronized block you avoid a possible deadlock risk if the operations call another synchronized map method.

In the first snippet: Declare the two values out of the if-clause, and just assign them in the if-clause. Make the if-clause synchronized, and invoke the expensive operations outside.
In the 2nd case do the same, but inside the loop. (synchronized inside the loop). You can, of course, have only one synchronized statement, outside the loop, and simply fill a List of objects on which to invoke the expensive operation. Then, in a 2nd loop, outside the synchronized block, invoke that operations on all values in the list.

We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we should not pass
up our opportunities in that critical 3%. A good programmer will not
be lulled into complacency by such reasoning, he will be wise to look
carefully at the critical code; but only after that code has been
identified. — Donald Knuth
You have a single method, doMapOperation(). What is your performance if this method continues to be block-synchronized? If you don't know then how will you know when you've got a good performing solution? Are you prepared to handle multiple calls to your expensive operations even after they have been removed from the map?
I'm not trying to be condescending, since maybe you understand the problem at hand better than you've conveyed, but it seems like you're jumping into a level of optimization for which you may not be prepared and may not be necessary.

You can actually do it all with only one synchronization hit. The first remove is probably the easiest. If you know the object exists, and you know remove is atomic, why not just remove it and if what is returned is not null invoke the expensive operations?
// Remove key1 if it exists.
if (synchronizedMap.containsKey(key1)) {
Object value = synchronizedMap.remove(key1);
if(value != null){ //thread has exclusive access to value
value.doExpensiveOperation();
}
}
For the put, since it is expensive and should be atomic you are pretty much out of luck and need to synchronize access. I would recommend using some kind of a computing map. Take a look at google-collections and MapMaker
You can create a ConcurrentMap that will build the expensive object based on your key for example
ConcurrentMap<Key, ExpensiveObject> expensiveObjects = new MapMaker()
.concurrencyLevel(32)
.makeComputingMap(
new Function<Key, ExpensiveObject>() {
public ExpensiveObject apply(Key key) {
return createNewExpensiveObject(key);
}
});
This is simlpy a form of memoization
In both of these cases, you don't need to use synchronized at all (at least explicitly)

If you don't have null values in the Map, you don't need the containsKey() call at all: you can use Map.remove() to both remove the item and tell you whether it was there. So the true content of your synchronized block only needs to be this:
Object value = Map.remove(key);
if (value != null)
value.doExpensiveOperation();
else
{
value = new Value();
value.doExpensiveOperation();
map.put(key,value);
}
If the expensive operation itself doesn't need to be synchronized, i.e. if you don't mind other clients of the Map seeing the value while it is being operated on, you can further simplify to this:
Object value = Map.remove(key);
if (value == null)
{
value = new Value();
map.put(key,value);
}
value.doExpensiveOperation();
and the synchronized block can terminate before the expensive operation.

Related

Is following code Thread safe

I have a scenario where i have to maintain a Map which can be populated by multiple threads ,each modifying there respective List (unique identifier/key being thread name) and when the list size for a thread exceeds a fixed batch size we have to persist the records in DB.
Sample code below:
private volatile ConcurrentHashMap<String, List<T>> instrumentMap = new ConcurrentHashMap<String, List<T>>();
private ReadWriteLock lock ;
public void addAll(List<T> entityList, String threadName) {
try {
lock.readLock().lock();
List<T> instrumentList = instrumentMap.get(threadName);
if(instrumentList == null) {
instrumentList = new ArrayList<T>(batchSize);
instrumentMap.put(threadName, instrumentList);
}
if(instrumentList.size() >= batchSize -1){
instrumentList.addAll(entityList);
recordSaver.persist(instrumentList);
instrumentList.clear();
} else {
instrumentList.addAll(entityList);
}
} finally {
lock.readLock().unlock();
}
}
There is one more separate thread running after every 2 minutes to persist all the records in Map (to make sure we have something persisted after every 2 minutes and map size does not gets too big) and when it starts it block all other threads (check the readLock and writeLock usawhere writeLock has higher priority)
if(//Some condition) {
Thread.sleep(//2 minutes);
aggregator.getLock().writeLock().lock();
List<T> instrumentList = instrumentMap .values().stream().flatMap(x->x.stream()).collect(Collectors.toList());
if(instrumentList.size() > 0) {
saver.persist(instrumentList);
instrumentMap .values().parallelStream().forEach(x -> x.clear());
aggregator.getLock().writeLock().unlock();
}
This solution is working fine almost for every scenario we tested except sometime we see some of the records went missing i.e. not persisted at all although they were added fine in Map
My question is what is the problem with this code?
Is ConcurrentHashMap not the best solution here?
Does usage of read/write lock has some problem here?
Should i go with sequential processing?

No, it's not thread safe.
The problem is that you are using the read lock of the ReadWriteLock. This doesn't guarantee exclusive access for making updates. You'd need to use the write lock for that.
But you don't really need to use a separate lock at all. You can simply use the ConcurrentHashMap.compute method:
instrumentMap.compute(threadName, (tn, instrumentList) -> {
if (instrumentList == null) {
instrumentList = new ArrayList<>();
}
if(instrumentList.size() >= batchSize -1) {
instrumentList.addAll(entityList);
recordSaver.persist(instrumentList);
instrumentList.clear();
} else {
instrumentList.addAll(entityList);
}
return instrumentList;
});
This allows you to update items in the list whilst also guaranteeing exclusive access to the list for a given key.
I suspect that you could split the compute call into computeIfAbsent (to add the list if one is not there) and then a computeIfPresent (to update/persist the list): the atomicity of these two operations is not necessary here. But there is no real point in splitting them up.
Additionally, instrumentMap almost certainly shouldn't be volatile. Unless you really want to reassign its value (given this code, I doubt that), remove volatile and make it final.
Similarly, non-final locks are questionable too. If you stick with using a lock, make that final too.

Caching values in ConcurrentHashmap to avoid database read

Is my code below correct at using a Map as a simple threadsafe cache to avoid reading from the database? I just want to know the correctness of the code below rather than suggestions to use framework X instead.
public class Foo {
private static final Map<String, String> CACHE = new ConcurrentHashMap<>();
public void doWork(String key) {
String value = CACHE.get(key);
if (value == null) {
synchronized (CACHE) {
value = CACHE.get(key);
if (value == null) {
value = database.getValue();
CACHE.put(key, value);
}
}
}
// do work with value
}
}
Other Questions:
Instead of using CACHE in synchronized(), would it be better if I have a Object lock in my class and use synchronized on that instead?
Would using HashMap for CACHE instead work?

There is a fairly standard "pattern" for using ConcurrentHashMap in this way (in this case, you do not want to use a synchronized block or other locking mechanism):
String value = CACHE.get(key);
if (value == null) {
/* 3 */ String newValue = calculateValueForKey(key);
/* 4 */ value = CACHE.putIfAbsent(key, newValue);
if (value == null) {
value = newValue;
}
}
/* Work with 'value' */
This approach works well when calculateValueForKey() runs quickly and doesn't have any side effects - it could be invoked multiple times for the same key depending on timing. The downside is that if calculateValueForKey() takes a long time and is I/O bound (as it is in your case) you could have multiple threads that are all running calculateValueForKey() for the same key at the same time. If there are 3 threads executing line 3 for the same key, 2 of them will "lose" at line 4 and have their results thrown away which is not very efficient. For these situations I would recommend something along these lines which is mostly lifted from the Memoizer example in Java Concurrency in Practice (Goetz, B. (2006)) which I highly recommend:
private static final ConcurrentMap<String, Future<String>> CACHE
= new ConcurrentHashMap<>();
public void doWork(String key)
{
String value;
try {
value = calculateValueForKey(key);
} catch (InterruptedException e) {
// Restore interrupted status and return
Thread.currentThread.interrupt();
return;
}
// do work with value
}
private String calculateValueForKey(final String key)
throws InterruptedException
{
while (true) {
Future<String> f = CACHE.get(key);
if (f == null) {
FutureTask<String> newCalc = new FutureTask<>(new Callable<String>() {
#Override
public String call()
{
return database.getValue(key);
}
)};
f = CACHE.putIfAbsent(key, newCalc);
if (f == null) {
f = newCalc;
newCalc.run();
}
}
try {
return f.get();
} catch (CancellationException e) {
CACHE.remove(key, f);
} catch (ExecutionException e) {
Throwable cause = e.getCause();
if (cause instanceof RuntimeException) {
throw (RuntimeException) cause;
} else if (cause instanceof Error) {
throw (Error) cause;
} else {
throw new IllegalStateException("Not unchecked", cause);
}
}
}
}
Obviously this code is more complex, which is why I've extracted the meat of it into another method, but it is very powerful. Rather than putting the value into the map, you are putting a Future that represents the calculation of that value into the map. Calling get() on that future will block until the computation is complete. This means that if 3 threads were simultaneously trying to retrieve the value for a given key, only a single computation would be run while all 3 threads waiting on the same result. Subsequent requests for the same key would return immediately with the calculated result.
To answer your specific questions:
Is my code below correct at using a Map as a simple threadsafe cache to avoid reading from the database? I'm going to say no. You're use of a synchronized block here is unnecessary. Furthermore if multiple threads are simultaneously trying to access the values for different keys that are not yet in the Map, they will block each other during their respective database queries, meaning that they will run in serial rather than in parallel.
Instead of using CACHE in synchronized(), would it be better if I have a Object lock in my class and use synchronized on that instead? No. You would typically use a surrogate object for synchronization when you want to read/write multiple mutable fields and you don't want consumers of your class to be able to affect the synchronization semantics of your object "from the outside."
Would using HashMap for CACHE instead work? I guess you could? But then you would need to adjust your synchronization policies so that CACHE (or a surrogate lock object) is always synchronized when the Map is read from or written to. I'm not sure why you would want to do that given better alternatives.

CACHE.get(key) will throw a NullPointerException if the key is null. Read the manual:
Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.
Furthermore it doesn't really make sense to synchronize over your map and try to retrieve the value again. The method should rather return that it cannot get a value for that key and that's it!
Also, no need to synchronize over a ConcurrentHashMap hence the name.
Create an additional method which retrieves the value from the database if the value is not in the map!
I strongly suggest to test your methods with unit tests!

Be careful with custom cache-es. Sometimes they only make things worse. I.e. they are a great source of reference leak, e.g. when the last reference to the object comes from the cache. WeakReference-s or PhantomReference-s can solve this problem. Check this post for further details.
Another issue is the synchronization hit that comes from the ConcurrentHashMap. Sometimes it's worth the cost, sometimes not.
You might want to limit the cache size and remove the least used references - but that will cause some overhead too.
So, you'll have to measure performance carefully.

When using thread safe collections in Java, what's the best way to handle concurrency issues?

This might be a very naive of me, but I was always under assumption that the code example below would always work and not crash with a NullPointerException while using thread safe collections in Java. Unfortunately it would seem that the thread t2 is able to remove the item from the list in-between the call to containsKey() and get() methods below the two thread declarations. The commented section shows a way to handle this problem without ever getting a NullPointerException because it simply tests to see if the result from get() is null.
My question is, what's the right way to handle this problem in Java using thread safe collections? Sure I could use a mutex or a synchronized block, but doesn't this sort of defeat a lot of the benefits and ease of use surrounding thread safe collections? If I have to use a mutex or synchronized block, couldn't I just use non-thread safe collection instead? In addition, I've always heard (in academia) that checking code for null value is bad programming practice. Am I just crazy? Is there a simple answer to this problem? Thank you in advance.
package test;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
public class Test {
public static void main(String[] args) {
final Map<Integer, Integer> test = new ConcurrentHashMap<>();
Thread t1 = new Thread(new Runnable() {
#Override
public void run() {
while(true) {
test.put(0, 0);
Thread.yield();
}
}
});
Thread t2 = new Thread(new Runnable() {
#Override
public void run() {
while(true) {
test.remove(0);
Thread.yield();
}
}
});
t1.start();
t2.start();
while(true) {
if (test.containsKey(0)) {
Integer value = test.get(0);
System.out.println(value);
}
Thread.yield();
}
// OR
// while(true) {
// Integer value = test.get(0);
// if (value != null) {
// System.out.println(value);
// }
// Thread.yield();
// }
}
}

what's the right way to handle this problem in Java using thread safe collections?
Only perform one operation so it is atomic. It is faster as well.
Integer value = test.get(0);
if (value != null) {
System.out.println(value);
}
I've always heard (in academia) that checking code for null value is bad programming practice. Am I just crazy?
Possibly. I think checking for null, if a value can be null is best practice.

You're misusing thread-safe collections.
The thread-safe collections cannot possible prevent other code from running between containsKey() and get().
Instead, they provide you with additional thread-safe methods that will atomically check and get the element, without allowing other threads to interfere.
This means that you should never use a concurrent collection through the base collection interfaces (Map or List).
Instead, declare your field as a ConcurrentMap.
In your case, you can simply call get(), which will atomically return null if the key is not found.
There is no alternative to checking for null here. (unlike more elegant function languages, which use the Maybe monad instead)

This
if (test.containsKey(0)) {
Integer value = test.get(0);
System.out.println(value);
}
is still not atomic. A thread can add/remove after you've checked for containsKey.
You need to synchronize on a shared resource around that snippet. Or check for null after you get.
All operations in a ConcurrentHashMap are thread-safe, but they do not extend past method boundaries.

In addition, I've always heard (in academia) that checking code for null value is bad programming practice.
with a generic Map, when you write Integer i = map.get(0); then if i is null, you can't conclude that 0 is not in the map - it could be there but map to a null value.
However, with a ConcurrentHashMap, you have the guarantee that there are no null values:
Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.
So using:
Integer i = map.get(0);
if (i != null) ...
is perfectly fine.

Synchronized collection

Since c is already synchronzied collection, and it's thus thread safe. But why do we have to use synchronized(c) again for the iteration? Really confused. Thanks.
" It is imperative that the user manually synchronize on the returned
collection when iterating over it:
Collection c = Collections.synchronizedCollection(myCollection);
...
synchronized(c) {
Iterator i = c.iterator(); // Must be in the synchronized block
while (i.hasNext()) {
foo(i.next());
}
}
Failure to follow this advice may result in non-deterministic behavior. "
http://docs.oracle.com/javase/6/docs/api/java/util/Collections.

The most any synchronized collection implementation could possibly do is to guarantee that each individual method call is synchronized. But iteration necessarily involves multiple separate method calls, so you, the user of the synchronized collection, have to synchronize on the whole iteration yourself.
For example, if you didn't synchronize on c, the contents of the collection could change between i.hasNext() and i.next() -- it could even go from having elements to having no more elements, in which case i.next() would fail.

Making all the methods on a class individually synchronized doesn't make an aggregation ( calling them in a group ) of those methods thread safe. By wrapping the Iterator in a synchronized block you are protecting that particular instance of the iterator from having its individual method called interspersed with other calls by multiple threads.
If I call .add() once it is safe, if I need to call .add() multiple times to complete a logical statement, there is no guarantee that someone else hasn't added something else or removed something else between my .add() calls unless I block everything else from calling .add() ( or any other method ) by synchronizing on the variable that represents the collection.
The Iterator makes mutiple calls to the individual methods on the collection, they all have to be wrapped in a single synchronized block to make them execute as a single transaction of sorts. Examine the source code of the implemenation of Iterator you will see what I mean. Here is the source code for List it makes multiple individual calls to the underlying implementation, so they all need to be executed in uninterrupted order by the same thread to be deterministic.
#Override
public Iterator<A> iterator() {
if (tail == null)
return emptyIterator();
return new Iterator<A>() {
List<A> elems = List.this;
public boolean hasNext() {
return elems.tail != null;
}
public A next() {
if (elems.tail == null)
throw new NoSuchElementException();
A result = elems.head;
elems = elems.tail;
return result;
}
public void remove() {
throw new UnsupportedOperationException();
}
};
}
The source for AbstractList.iterator() shows even more complicated logic that makes multiple calls.
A better wrapper is wrapping them in Immutable collections, then you guarantee that nothing else can alter the underlying collection between calls.

Synchronize access to given key in a ConcurrentMap

I often enough want to access (and possibly add/remove) elements of a given ConcurrentMap so that only one thread can access any single key at a time. What is the best way to do this? Synchronizing on the key itself doesn't work: other threads might access the same key via an equal instance.
It's good enough if the answer only works with the maps built by guava MapMaker.

See a simple solution here Simple Java name based locks?
EDIT: This solution has a clear happens-before relation from unlock to lock. However, the next solution, now withdrawn, doesn't. The ConcurrentMap javadoc is too light to guaranteed that.
(Withdrawn) If you want to reuse your map as a lock pool,
private final V LOCK = ...; // a fake value
// if a key is mapped to LOCK, that means the key is locked
ConcurrentMap<K,V> map = ...;
V lock(key)
V value;
while( (value=map.putIfAbsent(key, LOCK))==LOCK )
// another thread locked it before me
wait();
// now putIfAbsent() returns a real value, or null
// and I just sucessfully put LOCK in it
// I am now the lock owner of this key
return value; // for caller to work on
// only the lock owner of the key should call this method
unlock(key, value)
// I put a LOCK on the key to stall others
// now I just need to swap it back with the real value
if(value!=null)
map.put(key, value);
else // map doesn't accept null value
map.remove(key)
notifyAll();
test()
V value = lock(key);
// work on value
// unlock.
// we have a chance to specify a new value here for the next worker
newValue = ...; // null if we want to remove the key from map
unlock(key, newValue); // in finally{}
This is quite messy because we reuse the map for two difference purposes. It's better to have lock pool as a separate data structure, leave map simply as the k-v storage.

private static final Set<String> lockedKeys = new HashSet<>();
private void lock(String key) throws InterruptedException {
synchronized (lockedKeys) {
while (!lockedKeys.add(key)) {
lockedKeys.wait();
}
}
}
private void unlock(String key) {
synchronized (lockedKeys) {
lockedKeys.remove(key);
lockedKeys.notifyAll();
}
}
public void doSynchronouslyOnlyForEqualKeys(String key) throws InterruptedException {
try {
lock(key);
//Put your code here.
//For different keys it is executed in parallel.
//For equal keys it is executed synchronously.
} finally {
unlock(key);
}
}
key can be not only a 'String' but any class with correctly overridden 'equals' and 'hashCode' methods.
try-finally - is very important - you must guarantee to unlock waiting threads after your operation even if your operation threw exception.
It will not work if your back-end is distributed across multiple servers/JVMs.

Can't you just create you own class that extends concurrentmap.
Override the get(Object key) method, so it checks if the requested key object is already 'checked out' by another thread ?
You'll also have to make a new method in your concurrentmap that 'returns' the items to the map, so they are available again to another thread.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java synchronization on Collection with expensive operations - java

Related

Is following code Thread safe

Caching values in ConcurrentHashmap to avoid database read

When using thread safe collections in Java, what's the best way to handle concurrency issues?

Synchronized collection

Synchronize access to given key in a ConcurrentMap

Categories

Resources