Execution of `remappingFunction` in ConcurrentHashMap.computeIfPresent - java

This is follow up question to my original SO question.
Thanks to the answer on that question, it looks like that according to ConcurrentMap.computeIfPresent javadoc
The default implementation may retry these steps when multiple threads
attempt updates including potentially calling the remapping function
multiple times.
My question is:
Does ConcurrentHashMap.computeIfPresent call remappingFunction multiple times when it is shared between multiple threads only or can also be called multiple times when created and passed from a single thread?
And if it is the latter case why would it be called multiple times instead of once?

The general contract of the interface method ConcurrentMap.computeIfPresent allows implementations to repeat evaluations in the case of contention and that’s exactly what happens when a ConcurrentMap inherits the default method, as it would be impossible to provide atomicity atop the general ConcurrentMap interface in a default method.
However, the implementation class ConcurrentHashMap overrides this method and provides a guaranty in its documentation:
If the value for the specified key is present, attempts to compute a new mapping given the key and its current mapped value. The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.
emphasis mine
So, since your question asks for ConcurrentHashMap.computeIfPresent specifically, the answer is, its argument function will never get evaluated multiple times. This differs from, e.g. ConcurrentSkipListMap.computeIfPresent where the function may get evaluated multiple times.

Does ConcurrentMap.computeIfPresent call remappingFunction multiple
times when it is shared between multiple threads or can be called
multiple times when created and passed from a single thread?
The documentation does not specify, but the implication is that it is contention of multiple threads to modify the mapping of the same key (not necessarily all via computeIfPresent()) that might cause the remappingFunction to be run multiple times. I would anticipate that an implementation would check whether the value presented to the remapping function is still the one associated with the key before setting the remapping result as that key's new value. If not, it would try again, computing a new remapped value from the new current value.

you can see the code here:
#Override
default V computeIfPresent(K key,
BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
Objects.requireNonNull(remappingFunction);
V oldValue;
while((oldValue = get(key)) != null) {
V newValue = remappingFunction.apply(key, oldValue);
if (newValue != null) {
if (replace(key, oldValue, newValue))
return newValue;
} else if (remove(key, oldValue))
return null;
}
return oldValue;
}
if thread 1 comes in and calls the remappingFunction and gets the value X,
then thread 2 comes and changes the value while thread 1 is waiting, and only then thread 1 calls "replace"...
then the "replace" method will return "false" due to the value change.
so thread 1 will loop again and call the remappingFunction once again.
this can go on and on and create "infinite" invocations of the remappingFunction.

Related

Java8: ConcurrentHashMap.compute atomicity

I am trying to use ConcurrentHashMap.compute to implement something and I cannot figure out if my logic is 100% threadSafe and correct.
Javadoc says about compute(K key, BiFunction<? super K, ? super V, ? extends V> remappingFunction) method:
Attempts to compute a mapping for the specified key and its current
mapped value (or null if there is no current mapping). The entire
method invocation is performed atomically. Some attempted update
operations on this map by other threads may be blocked while
computation is in progress, so the computation should be short and
simple, and must not attempt to update any other mappings of this Map.
The question is: Can I pass a function having side-effects to this method? Is there any guarantee that the function is applied only once even if there are other threads trying to update the same key?
LE: As requested, this is my update function (tries to generate an event and remove the cached updates from map):
final Set<UniqueID> updatedIds = new HashSet<>(updatesMap.keySet());
for (final ID id : updatedIds) {
updatesMap.compute(id, (modifiedId, updates) -> {
final Event event = createEvent(modifiedId, updates);
threadPool.addEvent(event);
return null;
});
}
I must be sure that there is only one event created from each updatedId.

java concurrency synchronized on map value

The following code, I am confused about what would happen when 2 threads compete the lock for map.get(k). When thread A wins, it makes map.get(k) null and the second thread would get a synchronized(null)? Or would it be both threads see it as synchronized(v) even though the first thread changes it to null but during which thread B still sees it as v?
synchronized(map.get(k)) {
map.get(k).notify();
map.remove(k);
}
The question is a similar to another question, except lock object is value of a map.
UPDATE:
compared the discussion in this post and that in the above link, is it true that
synchronized(v) {
v.notify();
v = null;
}
would cause the 2nd thread synchronized(null). But for the synchronized(map.get(k)), the 2nd thread would have synchronized(v)???
UPDATE:
To answer #Holger's question, the main difference between this post and the other one is:
final V v = new V();
synchonized(map.get(k)) {
map.get(k).notify();
map.remove(k);
}
The second thread won't "request" a lock on thread.get(k), both threads will request a lock on the result of map.get(k) before the first one starts executing. So the code is roughly similar to:
Object val = map.get(k);
val.notify();
So, when the thread that obtained the lock finishes executing, the second thread will still have a reference to Object val, even if map[k] doesn't point to it anymore (or points to null)
EDIT: (following many useful comments)
It seems that the lock on map.get(k) is being acquired to ensure that the processing is done only once (map.remove(k) is called after processing). While it's true that 2 threads that compete for the lock on val won't run into null.notify(), the safety of this code is not guaranteed as the second thread may call synchronized(map.get(k)) after the first one has exited the synchronized block.
To ensure that the k is processed atomically, a safer approach may be needed. One way to do this is to use a concurrent hash map, like below:
map.computeIfPresent(k, (key, value) -> {
//process the value here
//key is k
//value is the value to which k is mapped.
return null; //return null to remove the value after processing.
});
Please note that map in the preceding example is an instance of ConcurrentHashMap. This will ensure that the value is processed once (computeIfPresent runs atomically).
To quote ConcurrentHashMap.computeIfAbsent doc comments:
If the value for the specified key is present, attempts to compute a new mapping given the key and its current mapped value. The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.
What would happen is that you would lock on the value currently in the hashmap entry for key k.
Problem #1 - if the map.get(k) call returns null, then you would get an NPE.
Problem #2 - since you are not locking on map:
you are likely to get race conditions with other threads; e.g. if some other thread does a map.put(k, v) with a different v to the one you are locking, and
the map.remove(k) may result in memory anomalies leading (potentially) to corruption of the map data structure.
It is not clear what you are actually trying to achieve by synchronizing on map.get(k) (rather than map). But whatever it is, this code is not thread-safe.
Re your update: Yes that is true ... assuming that other thread is synchronizing on the value of the same variable v. Note that you always synchronize on an object, so when you do synchronized(v), that means "take the current value of v and synchroize on that object".

How to "'validate and lazy swap' atomically" like CAS?

Here is prototype of function I want:
atomicReference.validateAndSwap(value -> isInvalid(value), () -> createValid());
It assumed to be called from multiple threads.
Second lambda is called only when first returns true.
First (plus second if first returns true) lambda calls should be a single atomic operation.
It is even possible to implement without synchronized?
Are there ready solutions for similar functionality?
Have I wrong way of thinking and miss something?
I’m not sure whether you mean the right thing when saying “First (plus second if first returns true) lambda calls should be a single atomic operation.” The point of atomic references is that the update function evaluation may overlap and therefore, should not have interference, but will act as if being atomic, as when evaluations overlap, only one can succeed with CAS and the other has to be repeated based on the new value.
If you want truly atomic evaluations, using a Lock or synchronized is unavoidable. If you have appropriate non-interfering functions and want implement updates as if atomic, it can be implemented like
Value old;
do old = atomicReference.get();
while(isInvalid(old) && !atomicReference.compareAndSet(old, createValid()));
Since in this specific case, the createValid() function does not depend on the old value, we could avoid repeated evaluation in the contended case:
Value old = atomicReference.get();
if(isInvalid(old)) {
Value newValue = createValid();
while(!atomicReference.compareAndSet(old, newValue)) {
old=atomicReference.get();
if(!isInvalid(old)) break;
}
}
That all assuming that the validity of an object cannot change in-between. Otherwise, locking or synchronizing is unavoidable.
Note that the Java 8’s update methods follow the same principle. So you can write
atomicReference.updateAndGet(old -> isInvalid(old)? createValid(): old);
to achieve the same, but it also isn’t truly atomic but rather behaves as-if atomic if concurrent evaluations of the update function have no interference.

Hashtable: why is get method synchronized?

I known a Hashtable is synchronized, but why its get() method is synchronized?
Is it only a read method?
If the read was not synchronized, then the Hashtable could be modified during the execution of read. New elements could be added, the underlying array could become too small and could be replaced by a bigger one, etc. Without sequential execution, it is difficult to deal with these situations.
However, even if get would not crash when the Hashtable is modified by another thread, there is another important aspect of the synchronized keyword, namely cache synchronization. Let's use a simplified example:
class Flag {
bool value;
bool get() { return value; } // WARNING: not synchronized
synchronized void set(bool value) { this->value = value; }
}
set is synchronized, but get isn't. What happens if two threads A and B simultaneously read and write to this class?
1. A calls read
2. B calls set
3. A calls read
Is it guaranteed at step 3 that A sees the modification of thread B?
No, it isn't, as A could be running on a different core, which uses a separate cache where the old value is still present. Thus, we have to force B to communicate the memory to other core, and force A to fetch the new data.
How can we enforce it? Everytime, a thread enters and leaves a synchronized block, an implicit memory barrier is executed. A memory barrier forces the cache to be updated. However, it is required that both the writer and the reader have to execute the memory barrier. Otherwise, the information is not properly communicated.
In our example, thread B already uses the synchronized method set, so its data modification is communicated at the end of the method. However, A does not see the modified data. The solution is to make get synchronized, so it is forced to get the updated data.
Have a look in Hashtable source code and you can think of lots of race conditions that can cause problem in a unsynchronized get() .
(I am reading JDK6 source code)
For example, a rehash() will create a empty array, and assign it to the instance var table, and put the entries from old table to the new one. Therefore if your get occurs after the empty array assignment, but before actually putting entries in it, you cannot find your key even it is in the table.
Another example is, there is a loop iterate thru the linked list at the table index, if in middle in your iteration, rehash happens. You may also failed to find the entry even it exists in the hashtable.
Hashtable is synchronized meaning the whole class is thread-safe
Inside the Hashtable, not only get() method is synchronized but also many other methods are. And particularly put() method is synchronized like Tom said.
A read method must be synchronized as a write method because it will make sure the visibility and the consistency of the variable.

Using putIfAbsent like a short circuit operator

Is it possible to use putIfAbsent or any of its equivalents like a short circuit operator.
myConcurrentMap.putIfAbsent(key,calculatedValue)
I want that if there is already a calculatedValue it shouldnt be calculated again.
by default putIfAbsent would still do the calculation every time even though it will not actually store the value again.
Java doesn't allow any form of short-circuiting save the built-in cases, sadly - all method calls result in the arguments being fully evaluated before control passes to the method. Thus you couldn't do this with "normal" syntax; you'd need to manually wrap up the calculation inside a Callable or similar, and then explicitly invoke it.
In this case I find it difficult to see how it could work anyway, though. putIfAbsent works on the basis of being an atomic, non-blocking operation. If it were to do what you want, the sequence of events would roughly be:
Check if key exists in the map (this example assumes it doesn't)
Evaluate calculatedValue (probably expensive, given the context of the question)
Put result in map
It would be impossible for this to be non-blocking if the value didn't already exist at step two - two different threads calling this method at the same time could only perform correctly if blocking happened. At this point you may as well just use synchronized blocks with the flexibility of implementation that that entails; you can definitely implement what you're after with some simple locking, something like the following:
private final Map<K, V> map = ...;
public void myAdd(K key, Callable<V> valueComputation) {
synchronized(map) {
if (!map.containsKey(key)) {
map.put(key, valueComputation.call());
}
}
}
You can put Future<V> objects into the map. Using putIfAbsent, only one object will be there, and computation of final value will be performed by calling Future.get() (e.g. by FutureTask + Callable classes). Check out Java Concurrency in Practice for discussion about using this technique. (Example code is also in this question here on SO.
This way, your value is computed only once, and all threads get same value. Access to map isn't blocked, although access to value (through Future.get()) will block until this value is computed by one of the threads.
You could consider to use a Guava ComputingMap
ConcurrentMap<Key, Value> myConcurrentMap = new MapMaker()
.makeComputingMap(
new Function<Key, Value>() {
public Value apply(Key key) {
Value calculatedValue = calculateValue(key);
return calculatedValue;
}
});

Categories

Resources