ConcurrentHashMap Locking at every read? - java

I wanted to understand how does locking work in Java ConcurrentHashMap. Accordingly to the source-code here, it looks like for every read it is locking the reader using the lock of that particular segment. Have I got it wrong?
V readValueUnderLock(HashEntry<K,V> e) {
lock();
try {
return e.value;
} finally {
unlock();
}
}

Every Read is not locked below is documentation of method readValueUnderLock
Reads value field of an entry under lock. Called if value field ever
appears to be null. This is possible only if a compiler happens to
reorder a HashEntry initialization with its table assignment, which is
legal under memory model but is not known to ever occur.
Read in a ConcurrentHashMap does not synchronize on the entire map. Infact traversal does not synchronize at all except under one condition. The internal LinkedList implementation is aware of the changes to the underlying collection. If it detects any such changes during traversal it synchronizes itself on the bucket it is traversing and then tries to re-read the values. This always insures that while the values received are always fresh, there is minimalistic locking if any.
Below is get implementation in this class readValueUnderLock is called only when v is null
V get(Object key, int hash) {
if (count != 0) { // read-volatile
HashEntry<K,V> e = getFirst(hash);
while (e != null) {
if (e.hash == hash && key.equals(e.key)) {
V v = e.value;
if (v != null)
return v;
return readValueUnderLock(e); // recheck
}
e = e.next;
}
}
return null;
}

Related

How does ConcurrentHashMap.get() prevent dirty read?

I am looking at the source code of ConcurrentHashMap and wondering how the get() method works without any monitor, here's the code:
public V get(Object key) {
Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
int h = spread(key.hashCode());
if ((tab = table) != null && (n = tab.length) > 0 &&
(e = tabAt(tab, (n - 1) & h)) != null) {
if ((eh = e.hash) == h) {
if ((ek = e.key) == key || (ek != null && key.equals(ek))) // mark here for possible dirty read
return e.val;
}
else if (eh < 0)
return (p = e.find(h, key)) != null ? p.val : null;
while ((e = e.next) != null) {
if (e.hash == h &&
((ek = e.key) == key || (ek != null && key.equals(ek)))) // mark here for possible dirty read
return e.val;
}
}
return null;
}
The two lines I marked are doing the same thing: checking if the key of the current Node<K, V> equals to the key needed. If true, will return its corresponding value. But what if another thread cuts in before return and remove() this node from the data structure. Since the local variable e is still holding the reference of the removed node, GC will leave it be and the get() method will still return the removed value, thus causing a dirty read.
Did I miss something?
It doesn't:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.)
This is generally not a problem, since get will never return a result that couldn't have happened if the get method acquired a lock, blocking the update operation in the other thread. You just get the result as if the get call happened before the update operation began.
So, if you don't mind whether the get happens before or after the update, you also shouldn't mind it happening during the update, because there is no observable difference between during and before. If you do want the get to appear to happen after the update, then you will need to signal from the updating thread that the update is complete; waiting to acquire a lock wouldn't achieve that anyway, because you might get the lock before the update happens (in which case you'd get the same result as if you didn't acquire the lock).

Why readValueUnderLock(e) exist in ConcurrentHashMap’s get method?

When I read the source code of ConcurrentHashMap at JDK1.6, I found that readValueUnderLock(e) can't be reached, because the put method has checked the value: if value is null, it must throw NullPointerException. So I think there may be something wrong but i'm not sure what it is. I'll be grateful if someone can answer me!
some source code here:
V get(Object key, int hash) {
if (count != 0) { // read-volatile
HashEntry<K,V> e = getFirst(hash);
while (e != null) {
if (e.hash == hash && key.equals(e.key)) {
V v = e.value;
if (v != null)
return v;
return readValueUnderLock(e); // recheck
}
e = e.next;
}
}
return null;
}
V readValueUnderLock(HashEntry<K,V> e) {
lock();
try {
return e.value;
} finally {
unlock();
}
}
public V put(K key, V value) {
if (value == null)
throw new NullPointerException();
int hash = hash(key.hashCode());
return segmentFor(hash).put(key, hash, value, false);
}
V is just a snapshot of the Entry.value. The Entry may not be fully constructed yet (consider the double-check lock issue in previous Java Memory Model) and it could be null. While this is just a extreme edge case, JRE has to make sure this works, so there is your readValueUnderLock.
PS: It's better to keep up with the time. Java is evolving and Java 9 is coming in several months. There has been some tremendous changes in its codebase. Filling up your head with obsolete knowledge may not be a good idea.

Atomic compareAndSet but with callback?

I know that AtomicReference has compareAndSet, but I feel like what I want to do is this
private final AtomicReference<Boolean> initialized = new AtomicReference<>( false );
...
atomicRef.compareSetAndDo( false, true, () -> {
// stuff that only happens if false
});
this would probably work too, might be better.
atomicRef.compareAndSet( false, () -> {
// stuff that only happens if false
// if I die still false.
return true;
});
I've noticed there's some new functional constructs but I'm not sure if any of them are what I'm looking for.
Can any of the new constructs do this? if so please provide an example.
update
To attempt to simplify my problem, I'm trying to find a less error prone way to guard code in a "do once for object" or (really) lazy initializer fashion, and I know that some developers on my team find compareAndSet confusing.
guard code in a "do once for object"
how exactly to implement that depends on what you want other threads attempting to execute the same thing in the meantime. if you just let them run past the CAS they may observe things in an intermediate state while the one thread that succeeded does its action.
or (really) lazy initializer fashion
that construct is not thread-safe if you're using it for lazy initializers because the "is initialized" boolean may be set to true by one thread and then execute the block while another thread observes the true-state but reads an empty result.
You can use Atomicreference::updateAndGet if multiple concurrent/repeated initialization attempts are acceptable with one object winning in the end and the others being discarded by GC. The update method should be side-effect-free.
Otherwise you should just use the double checked locking pattern with a variable reference field.
Of course you can always package any of these into a higher order function that returns a Runnable or Supplier which you then assign to a final field.
// == FunctionalUtils.java
/** #param mayRunMultipleTimes must be side-effect-free */
public static <T> Supplier<T> instantiateOne(Supplier<T> mayRunMultipleTimes) {
AtomicReference<T> ref = new AtomicReference<>(null);
return () -> {
T val = ref.get(); // fast-path if already initialized
if(val != null)
return val;
return ref.updateAndGet(v -> v == null ? mayRunMultipleTimes.get() : v)
};
}
// == ClassWithLazyField.java
private final Supplier<Foo> lazyInstanceVal = FunctionalUtils.instantiateOne(() -> new Foo());
public Foo getFoo() {
lazyInstanceVal.get();
}
You can easily encapsulate various custom control-flow and locking patterns this way. Here are two of my own..
compareAndSet returns true if the update was done, and false if the actual value was not equal to the expected value.
So just use
if (ref.compareAndSet(expectedValue, newValue)) {
...
}
That said, I don't really understand your examples, since you're passing true and false to a method taking object references as argument. And your second example doesn't do the same thing as the first one. If the second is what you want, I think what you're after is
ref.getAndUpdate(value -> {
if (value.equals(expectedValue)) {
return someNewValue(value);
}
else {
return value;
}
});
You’re over-complicating things. Just because there are now lambda expression, you don’t need to solve everything with lambdas:
private volatile boolean initialized;
…
if(!initialized) synchronized(this) {
if(!initialized) {
// stuff to be done exactly once
initialized=true;
}
}
The double checked locking might not have a good reputation, but for non-static properties, there are little alternatives.
If you consider multiple threads accessing it concurrently in the uninitialized state and want a guaranty that the action runs only once, and that it has completed, before dependent code is executed, an Atomic… object won’t help you.
There’s only one thread that can successfully perform compareAndSet(false,true), but since failure implies that the flag already has the new value, i.e. is initialized, all other threads will proceed as if the “stuff to be done exactly once” has been done while it might still be running. The alternative would be reading the flag first and conditionally perform the stuff and compareAndSet afterwards, but that allows multiple concurrent executions of “stuff”. This is also what happens with updateAndGet or accumulateAndGet and it’s provided function.
To guaranty exactly one execution before proceeding, threads must get blocked, if the “stuff” is currently executed. The code above does this. Note that once the “stuff” has been done, there will be no locking anymore and the performance characteristics of the volatile read are the same as for the Atomic… read.
The only solution which is simpler in programming, is to use a ConcurrentMap:
private final ConcurrentHashMap<String,Boolean> initialized=new ConcurrentHashMap<>();
…
initialized.computeIfAbsent("dummy", ignore -> {
// stuff to do exactly once
return true;
});
It might look a bit oversized, but it provides exactly the required performance characteristics. It will guard the initial computation using synchronized (or well, an implementation dependent exclusion mechanism) but perform a single read with volatile semantics on subsequent queries.
If you want a more lightweight solution, you may stay with the double checked locking shown at the beginning of this answer…
I know this is old, but I've found there is no perfect way to achieve this, more specifically this:
trying to find a less error prone way to guard code in a "do (anything) once..."
I'll add to this "while respecting a happens before behavior." which is required for instantiating singletons in your case.
IMO The best way to achieve this is by means of a synchronized function:
public<T> T transaction(Function<NonSyncObject, T> transaction) {
synchronized (lock) {
return transaction.apply(nonSyncObject);
}
}
This allows to preform atomic "transactions" on the given object.
Other options are double-check spin-locks:
for (;;) {
T t = atomicT.get();
T newT = new T();
if (atomicT.compareAndSet(t, newT)) return;
}
On this one new T(); will get executed repeatedly until the value is set successfully, so it is not really a "do something once".
This would only work on copy on write transactions, and could help on "instantiating objects once" (which in reality is instantiating many but at the end is referencing the same) by tweaking the code.
The final option is a worst performant version of the first one, but this one is a true happens before AND ONCE (as opposed to the double-check spin-lock):
public void doSomething(Runnable r) {
while (!atomicBoolean.compareAndSet(false, true)) {}
// Do some heavy stuff ONCE
r.run();
atomicBoolean.set(false);
}
The reason why the first one is the better option is that it is doing what this one does, but in a more optimized way.
As a side note, in my projects I've actually used the code below (similar to #the8472's answer), that at the time I thought safe, and it may be:
public T get() {
T res = ref.get();
if (res == null) {
res = builder.get();
if (ref.compareAndSet(null, res))
return res;
else
return ref.get();
} else {
return res;
}
}
The thing about this code is that, as the copy on write loop, this one generates multiple instances, one for each contending thread, but only one is cached, the first one, all the other constructions eventually get GC'd.
Looking at the putIfAbsent method I see the benefit is the skipping of 17 lines of code and then a synchronized body:
/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
int hash = spread(key.hashCode());
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
if (tab == null || (n = tab.length) == 0)
tab = initTable();
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
else if ((fh = f.hash) == MOVED)
tab = helpTransfer(tab, f);
else {
V oldVal = null;
synchronized (f) {
if (tabAt(tab, i) == f) {
And then the synchronized body itself is another 34 lines:
synchronized (f) {
if (tabAt(tab, i) == f) {
if (fh >= 0) {
binCount = 1;
for (Node<K,V> e = f;; ++binCount) {
K ek;
if (e.hash == hash &&
((ek = e.key) == key ||
(ek != null && key.equals(ek)))) {
oldVal = e.val;
if (!onlyIfAbsent)
e.val = value;
break;
}
Node<K,V> pred = e;
if ((e = e.next) == null) {
pred.next = new Node<K,V>(hash, key,
value, null);
break;
}
}
}
else if (f instanceof TreeBin) {
Node<K,V> p;
binCount = 2;
if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
value)) != null) {
oldVal = p.val;
if (!onlyIfAbsent)
p.val = value;
}
}
}
}
The pro(s) of using a ConcurrentHashMap is that it will undoubtedly work.

Synchronized on HashMap value object

I've got a question about synchronization of objects inside a Map (same objects I later change value of). I want to atomically read, do checks and possibly do updates to a value from a map without locking the entire map. Is this a valid way to work with synchronization of objects?
private final Map<String, AtomicInteger> valueMap = new HashMap<>();
public Response addValue(#NotNull String key, #NotNull Integer value) {
AtomicInteger currentValue = valueMap.get(key);
if (currentValue == null) {
synchronized (valueMap) {
// Doublecheck that value hasn't been changed before entering synchronized
currentValue = valueMap.get(key);
if (currentValue == null) {
currentValue = new AtomicInteger(0);
valueMap.put(key, currentValue);
}
}
}
synchronized (valueMap.get(key)) {
// Check that value hasn't been changed when changing synchronized blocks
currentValue = valueMap.get(key);
if (currentValue.get() + value > MAX_LIMIT) {
return OVERFLOW;
}
currentValue.addAndGet(value);
return OK;
}
}
I fail to see much of a difference between your approach and that of a standard ConcurrentHashMap - asides from the fact that ConcurrentHashMap has been heavily tested, and can be configured for minimal overhead with the exact number of threads you want to run the code with.
In a ConcurrentHashMap, you would use the replace(K key, V old, V new) method to atomically update key to new only when the old value has not changed.
The space savings due to removing all those AtomicIntegers and the time savings due to lower synchronization overhead will probably compensate having to wrap the replace(k, old, new) calls within while-loops:
ConcurrentHashMap<String, Integer> valueMap =
new ConcurrentHashMap<>(16, .75f, expectedConcurrentThreadCount);
public Response addToKey(#NotNull String key, #NotNull Integer value) {
if (value > MAX_LIMIT) {
// probably should set value to MAX_LIMIT-1 before failing
return OVERFLOW;
}
boolean updated = false;
do {
Integer old = putIfAbsent(key, value);
if (old == null) {
// it was absent, and now it has been updated to value: ok
updated = true;
} else if (old + value > MAX_LIMIT) {
// probably should set value to MAX_LIMIT-1 before failing
return OVERFLOW;
} else {
updated = valueMap.replace(key, old, old+value);
}
} while (! updated);
return OK;
}
Also, on the plus side, this code works even if the key was removed after checking it (yours throws an NPE in this case).

CAS and Non Blocking Counter

I have been reading JCIP by Brian Goetz. He explains the implementation of a non-blocking counter using CAS instruction. I could not understand how the increment is happening using CAS instruction. Can anyone help me understand this.
public class CasCounter {
private SimulatedCAS value;
public int getValue() {
return value.get();
}
public int increment() {
int v;
do {
v = value.get();
}
while (v != value.compareAndSwap(v, v + 1));
return v + 1;
}
}
value.compareAndSwap(v, v + 1) is equivalent to the following, except that the entire block is atomic: (see compare-and-swap for details)
int old = value.val;
if (old == v) {
value.val = v + 1;
}
return old;
Now v = value.get() gets the current value of the counter, and if nobody else is trying to update the counter at the same time, old == v will be true, so the value is set to v+1 (i.e. it is incremented) and old is returned. The loop terminates since v == old.
Suppose someone else incremented the counter just after we did v = value.get(), then old == v would be false, and the method will immediately return old, which is the updated value. Since v != old now, the loop continues.
The compareAndSwap() method will perform the following operations atomically:
- determine if `value` is equal to `v`
- if so, it will set `value` to `v+1`
- it returns whatever `value` was when the method was entered (whether or not `value` was updated)
The caller can check to see if value was what they expected it to be when the called compareAndSwap(). If it was, then the caller knows it's been updated. If it wasn't what was expected, the caller knows that it wasn't updated, and will try again, using the 'new' current value of value as what's expected (that's what the loop is doing).
This way, the caller can know that the increment operation doesn't get lost by some other thread that tries to modify value at the same moment.

Categories

Resources