Synchronize access to given key in a ConcurrentMap - java

I often enough want to access (and possibly add/remove) elements of a given ConcurrentMap so that only one thread can access any single key at a time. What is the best way to do this? Synchronizing on the key itself doesn't work: other threads might access the same key via an equal instance.
It's good enough if the answer only works with the maps built by guava MapMaker.

See a simple solution here Simple Java name based locks?
EDIT: This solution has a clear happens-before relation from unlock to lock. However, the next solution, now withdrawn, doesn't. The ConcurrentMap javadoc is too light to guaranteed that.
(Withdrawn) If you want to reuse your map as a lock pool,
private final V LOCK = ...; // a fake value
// if a key is mapped to LOCK, that means the key is locked
ConcurrentMap<K,V> map = ...;
V lock(key)
V value;
while( (value=map.putIfAbsent(key, LOCK))==LOCK )
// another thread locked it before me
wait();
// now putIfAbsent() returns a real value, or null
// and I just sucessfully put LOCK in it
// I am now the lock owner of this key
return value; // for caller to work on
// only the lock owner of the key should call this method
unlock(key, value)
// I put a LOCK on the key to stall others
// now I just need to swap it back with the real value
if(value!=null)
map.put(key, value);
else // map doesn't accept null value
map.remove(key)
notifyAll();
test()
V value = lock(key);
// work on value
// unlock.
// we have a chance to specify a new value here for the next worker
newValue = ...; // null if we want to remove the key from map
unlock(key, newValue); // in finally{}
This is quite messy because we reuse the map for two difference purposes. It's better to have lock pool as a separate data structure, leave map simply as the k-v storage.

private static final Set<String> lockedKeys = new HashSet<>();
private void lock(String key) throws InterruptedException {
synchronized (lockedKeys) {
while (!lockedKeys.add(key)) {
lockedKeys.wait();
}
}
}
private void unlock(String key) {
synchronized (lockedKeys) {
lockedKeys.remove(key);
lockedKeys.notifyAll();
}
}
public void doSynchronouslyOnlyForEqualKeys(String key) throws InterruptedException {
try {
lock(key);
//Put your code here.
//For different keys it is executed in parallel.
//For equal keys it is executed synchronously.
} finally {
unlock(key);
}
}
key can be not only a 'String' but any class with correctly overridden 'equals' and 'hashCode' methods.
try-finally - is very important - you must guarantee to unlock waiting threads after your operation even if your operation threw exception.
It will not work if your back-end is distributed across multiple servers/JVMs.

Can't you just create you own class that extends concurrentmap.
Override the get(Object key) method, so it checks if the requested key object is already 'checked out' by another thread ?
You'll also have to make a new method in your concurrentmap that 'returns' the items to the map, so they are available again to another thread.

Related

How to synchronize multiple threads from accessing some common data

I have three different threads which creates three different objects to read/manipulate some data which is common for all the threads. Now, I need to ensure that we are giving an access only to one thread at a time.
The example goes something like this.
public interface CommonData {
public void addData(); // adds data to the cache
public void getDataAccessKey(); // Key that will be common across different threads for each data type
}
/*
* Singleton class
*/
public class CommonDataCache() {
private final Map dataMap = new HashMap(); // this takes keys and values as custom objects
}
The implementation class of the interface would look like this
class CommonDataImpl implements CommonData {
private String key;
public CommonDataImpl1(String key) {
this.key = key;
}
public void addData() {
// access the singleton cache class and add
}
public void getDataAccessKey() {
return key;
}
}
Each thread will be invoked as follows:
CommonData data = new CommonDataImpl("Key1");
new Thread(() -> data.addData()).start();
CommonData data1 = new CommonDataImpl("Key1");
new Thread(() -> data1.addData()).start();
CommonData data2 = new CommonDataImpl("Key1");
new Thread(() -> data2.addData()).start();
Now, I need to synchronize those threads if and only if the keys of the data object (passed on to the thread) is the same.
My thought process so far:
I tried to have a class that provides the lock on the fly for a given key which looks something like this.
/*
* Singleton class
*/
public class DataAccessKeyToLockProvider {
private volatile Map<String, ReentrantLock> accessKeyToLockHolder = new ConcurrentHashMap<>();
private DataAccessKeyToLockProvider() {
}
public ReentrantLock getLock(String key) {
return accessKeyToLockHolder.putIfAbsent(key, new ReentrantLock());
}
public void removeLock(BSSKey key) {
ReentrantLock removedLock = accessKeyToLockHolder.remove(key);
}
}
So each thread would call this class and get the lock and use it and remove it once the processing is done. But this can so result in a case where the second thread could get the lock object that was inserted by the first thread and waiting for the first thread to release the lock. Once the first thread removes the lock, now the third thread would get a different lock altogether, so the 2nd thread and the 3rd thread are not in sync anymore.
Something like this:
new Thread(() -> {
ReentrantLock lock = DataAccessKeyToLockProvider.get(data.getDataAccessKey());
lock.lock();
data.addData();
lock.unlock();
DataAccessKeyToLockProvider.remove(data.getDataAccessKey());
).start();
Please let me know if you need any additional details to help me resolve my problem
P.S: Removing the key from the lock provider is kind of mandatory as i will be dealing with some millions of keys (not necessarily strings), so I don't want the lock provider to eat up my memory
Inspired the solution provided #rzwitserloot, I have tried to put some generic code that waits for the other thread to complete its processing before giving the access to the next thread.
public class GenericKeyToLockProvider<K> {
private volatile Map<K, ReentrantLock> keyToLockHolder = new ConcurrentHashMap<>();
public synchronized ReentrantLock getLock(K key) {
ReentrantLock existingLock = keyToLockHolder.get(key);
try {
if (existingLock != null && existingLock.isLocked()) {
existingLock.lock(); // Waits for the thread that acquired the lock previously to release it
}
return keyToLockHolder.put(key, new ReentrantLock()); // Override with the new lock
} finally {
if (existingLock != null) {
existingLock.unlock();
}
}
}
}
But looks like the entry made by the last thread wouldn't be removed. Anyway to solve this?
First, a clarification: You either use ReentrantLock, OR you use synchronized. You don't synchronized on a ReentrantLock instance (you synchronize on any object you want) – or, if you want to go the lock route, you can call the lock lock method on your lock object, using a try/finally guard to always ensure you call unlock later (and don't use synchronized at all).
synchronized is low-level API. Lock, and all the other classes in the java.util.concurrent package are higher level and offer far more abstractions. It's generally a good idea to just peruse the javadoc of all the classes in the j.u.c package from time to time, very useful stuff in there.
The key issue is to remove all references to a lock object (thus ensuring it can be garbage collected), but not until you are certain there are zero active threads locking on it. Your current approach does not know how many classes are waiting. That needs to be fixed. Once you return an instance of a Lock object, it's 'out of your hands' and it is not possible to track if the caller is ever going to call lock on it. Thus, you can't do that. Instead, call lock as part of the job; the getLock method should actually do the locking as part of the operation. That way, YOU get to control the process flow. However, let's first take a step back:
You say you'll have millions of keys. Okay; but it is somewhat unlikely you'll have millions of threads. After all, a thread requires a stack, and even using the -Xss parameter to reduce the stack size to the minimum of 128k or so, a million threads implies you're using up 128GB of RAM just for stacks; seems unlikely.
So, whilst you might have millions of keys, the number of 'locked' keys is MUCH smaller. Let's focus on those.
You could make a ConcurrentHashMap which maps your string keys to lock objects. Then:
To acquire a lock:
Create a new lock object (literally: Object o = new Object(); - we are going to be using synchronized) and add it to the map using putIfAbsent. If you managed to create the key/value pair (compare the returned object using == to the one you made; if they are the same, you were the one to add it), you got it, go, run the code. Once you're done, acquire the sync lock on your object, send a notification, release, and remove:
public void doWithLocking(String key, Runnable op) {
Object locker = new Object();
Object o = concurrentMap.putIfAbsent(key, locker);
if (o == locker) {
op.run();
synchronized (locker) {
locker.notifyAll(); // wake up everybody waiting.
concurrentMap.remove(key); // this has to be inside!
}
} else {
...
}
}
To wait until the lock is available, first acquire a lock on the locker object, THEN check if the concurrentMap still contains it. If not, you're now free to retry this operation. If it's still in, then we now wait for a notification. In any case we always just retry from scratch. Thus:
public void performWithLocking(String key, Runnable op) throws InterruptedException {
while (true) {
Object locker = new Object();
Object o = concurrentMap.putIfAbsent(key, locker);
if (o == locker) {
try {
op.run();
} finally {
// We want to lock even if the operation throws!
synchronized (locker) {
locker.notifyAll(); // wake up everybody waiting.
concurrentMap.remove(key); // this has to be inside!
}
}
return;
} else {
synchronized (o) {
if (concurrentMap.containsKey(key)) o.wait();
}
}
}
}
}
Instead of this setup where you pass the operation to execute along with the lock key, you could have tandem 'lock' and 'unlock' methods but now you run the risk of writing code that forgets to call unlock. Hence why I wouldn't advise it!
You can call this with, for example:
keyedLockSupportThingie.doWithLocking("mykey", () -> {
System.out.println("Hello, from safety!");
});

Correct use of Hazelcast EntryProcessor

We're trying to work out the best way to use Hazelcast's IMap without using pessimistic locking.
EntryProcessor seems like the correct choice, however we need to apply two different types of operations: 'create' when containsKey is false, and 'update' when containsKey is true.
How can I utilise EntryProcessor to support these logic checks?
If two threads hit the containsKey() at the same time and it returns false to both of them, I don't want both of them to create the key. I'd want the second thread to apply an update instead.
This is what we have so far:
public void put(String key, Object value) {
IMap<String, Object> map = getMap();
if (!map.containsKey(key)) {
// create key here
} else {
// update existing value here
// ...
map.executeOnKey(key, new TransactionEntryProcessor({my_new_value}));
}
}
private static class MyEntryProcessor implements
EntryProcessor<String, Object>, EntryBackupProcessor<String, Object>, Serializable {
private static final long serialVersionUID = // blah blah
private static final ThreadLocal<Object> entryToSet = new ThreadLocal<>();
MyEntryProcessor(Object entryToSet) {
MyEntryProcessor.entryToSet.set(entryToSet);
}
#Override
public Object process(Map.Entry<String, Object> entry) {
entry.setValue(entryToSet.get());
return entry.getValue();
}
#Override
public EntryBackupProcessor<String, Object> getBackupProcessor() {
return MyEntryProcessor.this;
}
#Override
public void processBackup(Map.Entry<String, Object> entry) {
entry.setValue(entryToSet.get());
}
}
You can see that two threads can enter the put method and call containsKey at the same time. The second will overwrite the outcome of the first.
EntryProcessor by definition is a processing logic that gets executed on the entry itself, eliminating the need of serializing/deserializing the value. Internally, EPs are executed by partition threads, where one partition thread takes care of multiple partitions. When an EP comes to HC, it is picked by the owner thread of the partition where the key belongs. Once the processing is completed, the partition thread is ready to accept and execute other tasks (which may well be same EP for same key, submitted by another thread). Therefore, it may seem so but EPs should not be used as alternatives to pessimistic locking.
If you are insistent and really keen on using EP for this then you could try putting a null check inside process method. Something like this:
public Object process(Map.Entry<String, Object> entry) {
if(null == entry.getValue()) {
entry.setValue("value123");
}
return entry.getValue();
}
This way two things will happen:
1. The other thread will wait for partition thread to be available again
2. Since the value already exists, you wont overwrite anything

Use of a key to synchronize access to code block

Normally I would lock on a critical section like the following.
public class Cache {
private Object lockObject = new Object();
public Object getFromCache(String key) {
synchronized(lockObject) {
if (cache.containsKey(key)) {
// key found in cache - return cache value
}
else {
// retrieve cache value from source, cache it and return
}
}
}
}
The idea being I avoid a race condition which could result in the data source being hit multiple times and the key being added to the cache multiple times.
Right now if two threads come in at about the same time for different cache keys, I will still block one.
Assuming the keys are unique - will the lock still work by locking on the key?
I think it won't work because I understand that the object reference should be the same for the lock to come into effect. I guess this comes down to how it checks for equality.
public class Cache {
public Object getFromCache(String key) {
synchronized(key) {
if (cache.containsKey(key)) {
// key found in cache - return cache value
}
else {
// retrieve cache value from source, cache it and return
}
}
}
}
public class Cache {
private static final Set<String> lockedKeys = new HashSet<>();
private void lock(String key) throws InterruptedException {
synchronized (lockedKeys) {
while (!lockedKeys.add(key)) {
lockedKeys.wait();
}
}
}
private void unlock(String key) {
synchronized (lockedKeys) {
lockedKeys.remove(key);
lockedKeys.notifyAll();
}
}
public Object getFromCache(String key) throws InterruptedException {
try {
lock(key);
if (cache.containsKey(key)) {
// key found in cache - return cache value
}
else {
// retrieve cache value from source, cache it and return
}
} finally {
unlock(key);
}
}
}
try-finally - is very important - you must guarantee to unlock waiting threads after your operation even if your operation threw exception.
It will not work if your back-end is distributed across multiple servers/JVMs.
Each object has an implicit monitor upon which synchronization works. String object may be created in heap and also may be different for same set of characters (if created by using new) or may be from pool. Two threads will acess the critical section with synchronized block only if they synchronize on same object.
Synchronizing on String literal is really a bad idea. String literal from pool are shared. Just imagine if at two different parts of your code you are having two synchronized sections and you synchronize on two references of String but initilized with string with same set of characters, if String from pool is used then both the places it will be the same object. Even though both the places may have different business context but still you may end up in your application being hanged. It will be very difficult to debug too.
For the specific to the question of will the purpose be solved if synchronization is done on keys.
You want to avoid two threads trying to write without reading the latest value of cache. You will have different key for each entry. Suppose a thread1 and thread2 wants to access the same key then synchronization on the same key object will prevent both of them to enter the synchronized block. Meanwhile if a thread3 wants to access another different key then it can very well do so. Here we see the read and writes are faster as compared to single common object for reads and writes for all keys. So far so good but the problem will arise if suppose you are keeping an array or any other similar non thread safe data structure for storing the cached values. Simultaneous writes (for two or more different keys) can result in one write being overwritten by another at same index.
So it depends upon the implementation of cache data structure how best you can prepare it for faster read and writes in a multi threaded enviornment.

Adding or deleting elements concurrently from a Hashmap and achieving synchronization

I am new to Java and concurrency stuff.
The purpose of the assignment was to learn concurrency.
- So when answering this question please keep in mind that I am supposed to use only Hashmap (which is not synchronized by nature) and synchronize it myself. If you provide more knowledge its appreciated but not required.
I declared a hashmap like this:
private HashMap<String, Flight> flights = new HashMap<>();
recordID is the key of the flight to be deleted.
Flight flightObj = flights.get(recordID);
synchronized(flightObj){
Flight deletedFlight = flights.remove(recordID);
editResponse = "Flight with flight ID " + deletedFlight.getFlightID() +" deleted successfully";
return editResponse;
}
Now my doubt: Is it fine to synch on the basis of flightObj?
Doubt 2:
Flight newFlight = new Flight(FlightServerImpl.createFlightID());
flights.put(newFlight.getFlightID(),newFlight);
If I create flightts by using above code and if more than 1 thread try execute this code will there be any data consistency issues ? Why or why not?
Thanks in advance.
To quickly answer you questions:
Both are not okay - you can't remove two different objects in parallel, and you can't add two different objects in parallel.
From java documentation:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
So, it's okay for many threads to use get concurrently and even put that replaces an object.
But if you remove or add a new object - you need to synchronize before calling any hashmap function.
In that case you can either do what's suggested in the documentation and use a global lock. But, it seems that since some limited concurrency is still allowed, you could get that concurrency it by using a read/write lock.
You can do the following
class MySynchronizedHashMap<E> implements Collection<E>, Serializable {
private static final long serialVersionUID = 3053995032091335093L;
final Collection<E> c; // Backing Collection
final Object mutex; // Object on which to synchronize
SynchronizedCollection(Collection<E> c) {
this.c = Objects.requireNonNull(c);
mutex = this;
}
public boolean add(E e) {
synchronized (mutex) {return c.add(e);}
}
public boolean remove(Object o) {
synchronized (mutex) {return c.remove(o);}
}
}
MySynchronizedHashMap mshm = new MySynchronizedHashMap<>(new HashMap<String, Flight>());
mshm.add(new Flight());

Java synchronization on Collection with expensive operations

I have a list that I synchronize on named synchronizedMap in my function doMapOperation. In this function, I need to add/remove items from a map and perform expensive operations on these objects. I know that I don't want to call an expensive operation in a synchronized block, but I don't know how to make sure that the map is in a consistent state while I do these operations. What is the right way to do this?
This is my initial layout which I am sure is wrong because you want to avoid calling an expensive operation in a synchronized block:
public void doMapOperation(Object key1, Object key2) {
synchronized (synchronizedMap) {
// Remove key1 if it exists.
if (synchronizedMap.containsKey(key1)) {
Object value = synchronizedMap.get(key1);
value.doExpensiveOperation(); // Shouldn't be in synchronized block.
synchronizedMap.remove(key1);
}
// Add key2 if necessary.
Object value = synchronizedMap.get(key2);
if (value == null) {
Object value = new Object();
synchronizedMap.put(key2, value);
}
value.doOtherExpensiveOperation(); // Shouldn't be in synchronized block.
} // End of synchronization.
}
I guess as a continuation of this question, how would you do this in a loop?
public void doMapOperation(Object... keys) {
synchronized (synchronizedMap) {
// Loop through keys and remove them.
for (Object key : keys) {
// Check if map has key, remove if key exists, add if key doesn't.
if (synchronizedMap.containsKey(key)) {
Object value = synchronizedMap.get(key);
value.doExpensiveOperation(); // Shouldn't be here.
synchronizedMap.remove(key);
} else {
Object value = new Object();
value.doAnotherExpensiveOperation(); // Shouldn't here.
synchronizedMap.put(key, value);
}
}
} // End of synchronization block.
}
Thanks for the help.
You can do the expensive operations outside your synchronized block like so:
public void doMapOperation(Object... keys) {
ArrayList<Object> contained = new ArrayList<Object>();
ArrayList<Object> missing = new ArrayList<Object>();
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
contained.add(synchronizedMap.get(key));
synchronizedMap.remove(key);
} else {
missing.add(synchronizedMap.get(key));
synchronizedMap.put(key, value);
}
}
for (Object o : contained)
o.doExpensiveOperation();
for (Object o : missing)
o.doAnotherExpensiveOperation();
}
The only disadvantage is you may be performing operations on values after they are removed from the synchronizedMap.
You can create a wrapper for your synchronizedMap and make sure the operations like containsKey, remove, and put are synchronized methods. Then only access to the map will be synchronized, while your expensive operations can take place outside the synchronized block.
Another advantage is by keeping your expensive operations outside the synchronized block you avoid a possible deadlock risk if the operations call another synchronized map method.
In the first snippet: Declare the two values out of the if-clause, and just assign them in the if-clause. Make the if-clause synchronized, and invoke the expensive operations outside.
In the 2nd case do the same, but inside the loop. (synchronized inside the loop). You can, of course, have only one synchronized statement, outside the loop, and simply fill a List of objects on which to invoke the expensive operation. Then, in a 2nd loop, outside the synchronized block, invoke that operations on all values in the list.
We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we should not pass
up our opportunities in that critical 3%. A good programmer will not
be lulled into complacency by such reasoning, he will be wise to look
carefully at the critical code; but only after that code has been
identified. — Donald Knuth
You have a single method, doMapOperation(). What is your performance if this method continues to be block-synchronized? If you don't know then how will you know when you've got a good performing solution? Are you prepared to handle multiple calls to your expensive operations even after they have been removed from the map?
I'm not trying to be condescending, since maybe you understand the problem at hand better than you've conveyed, but it seems like you're jumping into a level of optimization for which you may not be prepared and may not be necessary.
You can actually do it all with only one synchronization hit. The first remove is probably the easiest. If you know the object exists, and you know remove is atomic, why not just remove it and if what is returned is not null invoke the expensive operations?
// Remove key1 if it exists.
if (synchronizedMap.containsKey(key1)) {
Object value = synchronizedMap.remove(key1);
if(value != null){ //thread has exclusive access to value
value.doExpensiveOperation();
}
}
For the put, since it is expensive and should be atomic you are pretty much out of luck and need to synchronize access. I would recommend using some kind of a computing map. Take a look at google-collections and MapMaker
You can create a ConcurrentMap that will build the expensive object based on your key for example
ConcurrentMap<Key, ExpensiveObject> expensiveObjects = new MapMaker()
.concurrencyLevel(32)
.makeComputingMap(
new Function<Key, ExpensiveObject>() {
public ExpensiveObject apply(Key key) {
return createNewExpensiveObject(key);
}
});
This is simlpy a form of memoization
In both of these cases, you don't need to use synchronized at all (at least explicitly)
If you don't have null values in the Map, you don't need the containsKey() call at all: you can use Map.remove() to both remove the item and tell you whether it was there. So the true content of your synchronized block only needs to be this:
Object value = Map.remove(key);
if (value != null)
value.doExpensiveOperation();
else
{
value = new Value();
value.doExpensiveOperation();
map.put(key,value);
}
If the expensive operation itself doesn't need to be synchronized, i.e. if you don't mind other clients of the Map seeing the value while it is being operated on, you can further simplify to this:
Object value = Map.remove(key);
if (value == null)
{
value = new Value();
map.put(key,value);
}
value.doExpensiveOperation();
and the synchronized block can terminate before the expensive operation.

Categories

Resources