How does ConcurrentHashMap.get() prevent dirty read?

How does ConcurrentHashMap.get() prevent dirty read? - java

I am looking at the source code of ConcurrentHashMap and wondering how the get() method works without any monitor, here's the code:
public V get(Object key) {
Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
int h = spread(key.hashCode());
if ((tab = table) != null && (n = tab.length) > 0 &&
(e = tabAt(tab, (n - 1) & h)) != null) {
if ((eh = e.hash) == h) {
if ((ek = e.key) == key || (ek != null && key.equals(ek))) // mark here for possible dirty read
return e.val;
}
else if (eh < 0)
return (p = e.find(h, key)) != null ? p.val : null;
while ((e = e.next) != null) {
if (e.hash == h &&
((ek = e.key) == key || (ek != null && key.equals(ek)))) // mark here for possible dirty read
return e.val;
}
}
return null;
}
The two lines I marked are doing the same thing: checking if the key of the current Node<K, V> equals to the key needed. If true, will return its corresponding value. But what if another thread cuts in before return and remove() this node from the data structure. Since the local variable e is still holding the reference of the removed node, GC will leave it be and the get() method will still return the removed value, thus causing a dirty read.
Did I miss something?

It doesn't:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.)
This is generally not a problem, since get will never return a result that couldn't have happened if the get method acquired a lock, blocking the update operation in the other thread. You just get the result as if the get call happened before the update operation began.
So, if you don't mind whether the get happens before or after the update, you also shouldn't mind it happening during the update, because there is no observable difference between during and before. If you do want the get to appear to happen after the update, then you will need to signal from the updating thread that the update is complete; waiting to acquire a lock wouldn't achieve that anyway, because you might get the lock before the update happens (in which case you'd get the same result as if you didn't acquire the lock).

Related

Why does the CompletableFuture allOf method do a binary search?

I wanted to know if the allOf method of CompletableFuture does polling or goes into a wait state till all the CompletableFutures passed into the method complete their execution.
I looked at the code of the allOf method in IntelliJ and it is doing some sort of binary search.
Please help me to find out what the allOf method of CompletableFuture actually does.
public static CompletableFuture<Void> allOf(CompletableFuture<?>... cfs) {
return andTree(cfs, 0, cfs.length - 1);
}
/** Recursively constructs a tree of completions. */
static CompletableFuture<Void> andTree(CompletableFuture<?>[] cfs, int lo, int hi) {
CompletableFuture<Void> d = new CompletableFuture<Void>();
if (lo > hi) // empty
d.result = NIL;
else {
CompletableFuture<?> a, b;
int mid = (lo + hi) >>> 1;
if ((a = (lo == mid ? cfs[lo] :
andTree(cfs, lo, mid))) == null ||
(b = (lo == hi ? a : (hi == mid+1) ? cfs[hi] :
andTree(cfs, mid+1, hi))) == null)
throw new NullPointerException();
if (!d.biRelay(a, b)) {
BiRelay<?,?> c = new BiRelay<>(d, a, b);
a.bipush(b, c);
c.tryFire(SYNC);
}
}
return d;
}
/** Pushes completion to this and b unless both done. */
final void bipush(CompletableFuture<?> b, BiCompletion<?,?,?> c) {
if (c != null) {
Object r;
while ((r = result) == null && !tryPushStack(c))
lazySetNext(c, null); // clear on failure
if (b != null && b != this && b.result == null) {
Completion q = (r != null) ? c : new CoCompletion(c);
while (b.result == null && !b.tryPushStack(q))
lazySetNext(q, null); // clear on failure
}
}
}
final CompletableFuture<V> tryFire(int mode) {
CompletableFuture<V> d;
CompletableFuture<T> a;
CompletableFuture<U> b;
if ((d = dep) == null ||
!d.orApply(a = src, b = snd, fn, mode > 0 ? null : this))
return null;
dep = null; src = null; snd = null; fn = null;
return d.postFire(a, b, mode);
}

It doesn't do a binary search -- it's building a balanced binary tree with the input futures at the leaves, and inner nodes that each complete when its two children have both completed.
For some reason that is not apparent from the code, the author of the code must have decided it was most efficient to consider allOf(_,_) between exactly two futures to be his primitive operation, and if he's asked for an allOf(...) between more than two futures, he's manufacturing it as a cascade of these binary primitives.
The tree should be balanced for such that no matter what the last future to complete is, there will only be a small number of levels left to collapse before the future at the top can complete. This improves performance in some situations, because it ensures that as much work as possible can be handled before we're completely done, at a point where (if we're lucky) the CPU might just be sitting idle, waiting for something asynchronous to complete.
Balancing the tree is done by having the topmost inner node have about as many leaves under its left child as under its right child -- so both children get about half of the original array, and then the code recursively builds a tree from each half of the array. Splitting in halves can look a bit like the index calculations for a binary search.
The basic structure is obscured slightly by special cases that appear to be designed to
use an optimized code path with fewer allocations when some of the original futures are already completed, and
make sure that the result of allOf(_) with exactly one element will return a fresh CompleteableFuture. For most purposes it would work to just return that single element, but the author must have wanted to ensure that users of the library can rely on the object being fresh, if they are using them as keys in hash maps, or other logic that depends on being able to tell the output from the inputs, and
have only one throw new NullPointerException(); by using ?: and inline assignments instead of honest if statements. This probably produces slightly smaller bytecode at the expense of readability. Cannot be recommended as a style to learn from, unless you personally pay for the storage cost of the resulting bytecode ...

A confusion about the source code for ConcurrentHashMap's putVal method

Here is part of codes for putVal method:
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
int hash = spread(key.hashCode());
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
if (tab == null || (n = tab.length) == 0)
tab = initTable(); // lazy Initialization
//step1,tabAt(...) is CAS
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
//step2,casTabAt(...) is CAS
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
...
return null;
}
Suppose there are currently two threads, A and B, and when A executes the step1 , it gets true ,but at the same time B also executes step1 and gets true as well. And both A and B execute step2.
from this situation, B's Node replace the A's Node, or said A's data is replaced by B, this's is wrong.
I don't know is it right or wrong, can anyone help me to solve it?

Here's how casTabAt is implemented:
static final <K,V> boolean casTabAt(Node<K,V>[] tab, int i,
Node<K,V> c, Node<K,V> v) {
return U.compareAndSwapObject(tab, ((long)i << ASHIFT) + ABASE, c, v);
}
Whereas U is declared as follows: private static final sun.misc.Unsafe U;. Methods of this class guarantees atomicity at low level. And from this usage:
casTabAt(tab, i, null, new Node<K,V>(hash, key, value, null))
we can see, assuming that the third parameter of compareAndSwapObject is expected value, that, due to atomicity guaranteed, either A or B thread that executes compareAndSwapObject first will see null here and compareAndSwapObject will actually replace the object, whereas the next thread executing compareAndSwapObject won't change the value, because the actual value is not null anymore, whereas null was expected as a condition to make a change for a value.

Atomic compareAndSet but with callback?

I know that AtomicReference has compareAndSet, but I feel like what I want to do is this
private final AtomicReference<Boolean> initialized = new AtomicReference<>( false );
...
atomicRef.compareSetAndDo( false, true, () -> {
// stuff that only happens if false
});
this would probably work too, might be better.
atomicRef.compareAndSet( false, () -> {
// stuff that only happens if false
// if I die still false.
return true;
});
I've noticed there's some new functional constructs but I'm not sure if any of them are what I'm looking for.
Can any of the new constructs do this? if so please provide an example.
update
To attempt to simplify my problem, I'm trying to find a less error prone way to guard code in a "do once for object" or (really) lazy initializer fashion, and I know that some developers on my team find compareAndSet confusing.

guard code in a "do once for object"
how exactly to implement that depends on what you want other threads attempting to execute the same thing in the meantime. if you just let them run past the CAS they may observe things in an intermediate state while the one thread that succeeded does its action.
or (really) lazy initializer fashion
that construct is not thread-safe if you're using it for lazy initializers because the "is initialized" boolean may be set to true by one thread and then execute the block while another thread observes the true-state but reads an empty result.
You can use Atomicreference::updateAndGet if multiple concurrent/repeated initialization attempts are acceptable with one object winning in the end and the others being discarded by GC. The update method should be side-effect-free.
Otherwise you should just use the double checked locking pattern with a variable reference field.
Of course you can always package any of these into a higher order function that returns a Runnable or Supplier which you then assign to a final field.
// == FunctionalUtils.java
/** #param mayRunMultipleTimes must be side-effect-free */
public static <T> Supplier<T> instantiateOne(Supplier<T> mayRunMultipleTimes) {
AtomicReference<T> ref = new AtomicReference<>(null);
return () -> {
T val = ref.get(); // fast-path if already initialized
if(val != null)
return val;
return ref.updateAndGet(v -> v == null ? mayRunMultipleTimes.get() : v)
};
}
// == ClassWithLazyField.java
private final Supplier<Foo> lazyInstanceVal = FunctionalUtils.instantiateOne(() -> new Foo());
public Foo getFoo() {
lazyInstanceVal.get();
}
You can easily encapsulate various custom control-flow and locking patterns this way. Here are two of my own..

compareAndSet returns true if the update was done, and false if the actual value was not equal to the expected value.
So just use
if (ref.compareAndSet(expectedValue, newValue)) {
...
}
That said, I don't really understand your examples, since you're passing true and false to a method taking object references as argument. And your second example doesn't do the same thing as the first one. If the second is what you want, I think what you're after is
ref.getAndUpdate(value -> {
if (value.equals(expectedValue)) {
return someNewValue(value);
}
else {
return value;
}
});

You’re over-complicating things. Just because there are now lambda expression, you don’t need to solve everything with lambdas:
private volatile boolean initialized;
…
if(!initialized) synchronized(this) {
if(!initialized) {
// stuff to be done exactly once
initialized=true;
}
}
The double checked locking might not have a good reputation, but for non-static properties, there are little alternatives.
If you consider multiple threads accessing it concurrently in the uninitialized state and want a guaranty that the action runs only once, and that it has completed, before dependent code is executed, an Atomic… object won’t help you.
There’s only one thread that can successfully perform compareAndSet(false,true), but since failure implies that the flag already has the new value, i.e. is initialized, all other threads will proceed as if the “stuff to be done exactly once” has been done while it might still be running. The alternative would be reading the flag first and conditionally perform the stuff and compareAndSet afterwards, but that allows multiple concurrent executions of “stuff”. This is also what happens with updateAndGet or accumulateAndGet and it’s provided function.
To guaranty exactly one execution before proceeding, threads must get blocked, if the “stuff” is currently executed. The code above does this. Note that once the “stuff” has been done, there will be no locking anymore and the performance characteristics of the volatile read are the same as for the Atomic… read.
The only solution which is simpler in programming, is to use a ConcurrentMap:
private final ConcurrentHashMap<String,Boolean> initialized=new ConcurrentHashMap<>();
…
initialized.computeIfAbsent("dummy", ignore -> {
// stuff to do exactly once
return true;
});
It might look a bit oversized, but it provides exactly the required performance characteristics. It will guard the initial computation using synchronized (or well, an implementation dependent exclusion mechanism) but perform a single read with volatile semantics on subsequent queries.
If you want a more lightweight solution, you may stay with the double checked locking shown at the beginning of this answer…

I know this is old, but I've found there is no perfect way to achieve this, more specifically this:
trying to find a less error prone way to guard code in a "do (anything) once..."
I'll add to this "while respecting a happens before behavior." which is required for instantiating singletons in your case.
IMO The best way to achieve this is by means of a synchronized function:
public<T> T transaction(Function<NonSyncObject, T> transaction) {
synchronized (lock) {
return transaction.apply(nonSyncObject);
}
}
This allows to preform atomic "transactions" on the given object.
Other options are double-check spin-locks:
for (;;) {
T t = atomicT.get();
T newT = new T();
if (atomicT.compareAndSet(t, newT)) return;
}
On this one new T(); will get executed repeatedly until the value is set successfully, so it is not really a "do something once".
This would only work on copy on write transactions, and could help on "instantiating objects once" (which in reality is instantiating many but at the end is referencing the same) by tweaking the code.
The final option is a worst performant version of the first one, but this one is a true happens before AND ONCE (as opposed to the double-check spin-lock):
public void doSomething(Runnable r) {
while (!atomicBoolean.compareAndSet(false, true)) {}
// Do some heavy stuff ONCE
r.run();
atomicBoolean.set(false);
}
The reason why the first one is the better option is that it is doing what this one does, but in a more optimized way.
As a side note, in my projects I've actually used the code below (similar to #the8472's answer), that at the time I thought safe, and it may be:
public T get() {
T res = ref.get();
if (res == null) {
res = builder.get();
if (ref.compareAndSet(null, res))
return res;
else
return ref.get();
} else {
return res;
}
}
The thing about this code is that, as the copy on write loop, this one generates multiple instances, one for each contending thread, but only one is cached, the first one, all the other constructions eventually get GC'd.
Looking at the putIfAbsent method I see the benefit is the skipping of 17 lines of code and then a synchronized body:
/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
int hash = spread(key.hashCode());
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
if (tab == null || (n = tab.length) == 0)
tab = initTable();
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
else if ((fh = f.hash) == MOVED)
tab = helpTransfer(tab, f);
else {
V oldVal = null;
synchronized (f) {
if (tabAt(tab, i) == f) {
And then the synchronized body itself is another 34 lines:
synchronized (f) {
if (tabAt(tab, i) == f) {
if (fh >= 0) {
binCount = 1;
for (Node<K,V> e = f;; ++binCount) {
K ek;
if (e.hash == hash &&
((ek = e.key) == key ||
(ek != null && key.equals(ek)))) {
oldVal = e.val;
if (!onlyIfAbsent)
e.val = value;
break;
}
Node<K,V> pred = e;
if ((e = e.next) == null) {
pred.next = new Node<K,V>(hash, key,
value, null);
break;
}
}
}
else if (f instanceof TreeBin) {
Node<K,V> p;
binCount = 2;
if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
value)) != null) {
oldVal = p.val;
if (!onlyIfAbsent)
p.val = value;
}
}
}
}
The pro(s) of using a ConcurrentHashMap is that it will undoubtedly work.

ConcurrentHashMap Locking at every read?

I wanted to understand how does locking work in Java ConcurrentHashMap. Accordingly to the source-code here, it looks like for every read it is locking the reader using the lock of that particular segment. Have I got it wrong?
V readValueUnderLock(HashEntry<K,V> e) {
lock();
try {
return e.value;
} finally {
unlock();
}
}

Every Read is not locked below is documentation of method readValueUnderLock
Reads value field of an entry under lock. Called if value field ever
appears to be null. This is possible only if a compiler happens to
reorder a HashEntry initialization with its table assignment, which is
legal under memory model but is not known to ever occur.
Read in a ConcurrentHashMap does not synchronize on the entire map. Infact traversal does not synchronize at all except under one condition. The internal LinkedList implementation is aware of the changes to the underlying collection. If it detects any such changes during traversal it synchronizes itself on the bucket it is traversing and then tries to re-read the values. This always insures that while the values received are always fresh, there is minimalistic locking if any.
Below is get implementation in this class readValueUnderLock is called only when v is null
V get(Object key, int hash) {
if (count != 0) { // read-volatile
HashEntry<K,V> e = getFirst(hash);
while (e != null) {
if (e.hash == hash && key.equals(e.key)) {
V v = e.value;
if (v != null)
return v;
return readValueUnderLock(e); // recheck
}
e = e.next;
}
}
return null;
}

Un-overiding hashCode

I have the following situation: I have many BSTs, and I want to merge isomorphic subtrees to save space.
I am hashing Binary Search Tree nodes into a "unique table" - basically a hash of BST nodes.
Nodes that have the same left and right child and the same key have the same hash code, and I have overridden equals for the node class appropriately.
Everything works, except that computing the hash is expensive - it involves computing the hash for the child nodes.
I would like to cache the hashed value for a node. The problem I have is the natural way of doing this, a HashMap from nodes to integers, will itself call the hash function on the nodes.
I've gotten around this by declaring a new field in the nodes, which I use to store the hash code. However, I feel this is not the right solution.
What I really want is to to map nodes to their hash codes using a hash which uses the node's address. I thought I could do this by making HashMap, and casting the nodes to object, which would then invoke the hashCode method on objects, but this didn't work (inserts into the hash still call the node hash and equality functions.
I would appreciate insight into the best way of implementing the node to hash code cache. I've attached code below illustrating what's going on below.
import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;
class Bst {
int key;
String name;
Bst left;
Bst right;
public Bst( int k, String name, Bst l, Bst r ) {
this.key = k;
this.name = name;
this.left = l;
this.right = r;
}
public String toString() {
String l = "";
String r = "";
if ( left != null ) {
l = left.toString();
}
if ( right != null ) {
r = right.toString();
}
return key + ":" + name + ":" + l + ":" + r;
}
#Override
public boolean equals( Object o ) {
System.out.println("calling Bst's equals");
if ( o == null ) {
return false;
}
if ( !(o instanceof Bst) ) {
return false;
}
Bst n = (Bst) o;
if ( n == null || n.key != key ) {
return false;
} else if ( n.left != null && left == null || n.right != null && right == null ||
n.left == null & left != null || n.right == null && right != null ) {
return false;
} else if ( n.left != null && n.right == null ) {
return n.left.equals( left );
} else if ( n.left != null && n.right != null ) {
return n.left.equals( left ) && n.right.equals( right );
} else if ( n.left == null && n.right != null ) {
return n.right.equals( right );
} else {
return true;
}
}
#Override
public int hashCode() {
// the real hash function is more complex, entails
// calling hashCode on children if they are not null
System.out.println("calling Bst's hashCode");
return key;
}
}
public class Hashing {
static void p(String s) { System.out.println(s); }
public static void main( String [] args ) {
Set<Bst> aSet = new HashSet<Bst>();
Bst a = new Bst(1, "a", null, null );
Bst b = new Bst(2, "b", null, null );
Bst c = new Bst(3, "c", null, null );
Bst d = new Bst(1, "d", null, null );
a.left = b;
a.right = c;
d.left = b;
d.right = c;
aSet.add( a );
if ( aSet.contains( d ) ) {
p("d is a member of aSet");
} else {
p("d is a not member of aSet");
}
if ( a.equals( d ) ) {
p("a and d are equal");
} else {
p("a and d are not equal");
}
// now try casts to objects to avoid calling Bst's HashCode and equals
Set<Object> bSet = new HashSet<Object>();
Object foo = new Bst( a.key, a.name, a.left, a.right );
Object bar = new Bst( a.key, a.name, a.left, a.right );
bSet.add( foo );
p("added foo");
if ( bSet.contains( bar ) ) {
p("bar is a member of bSet");
} else {
p("bar is a not member of bSet");
}
}
}

Storing the hash in a field in the node feels like exactly the right solution to me. It's also what java.lang.String uses for its own hash code. Aside from anything else, it means that you can't possibly end up with cache entries for objects which can otherwise be collected, etc.
If you really want the value of hashCode that would be returned by the implementation in Object, you can use System.identityHashCode though. You shouldn't rely on this - or any other hash code - being unique though.
One other point: your tree is mutable at the moment by virtue of the fields being package access. If you cache the hash code the first time you call it, you won't "notice" if it would have changed due to fields changing. Basically you shouldn't change a node after you've used its hash code.

Java's built-in IdentityHashMap does what you're describing.
That said, Jon Skeet's answer sounds more like the right way to go.

storing the hash in a field can actually be equivalent to "caching" the value so that it does not have to be recomputed too frequently.
It's not necessarily a bad practice, but you have to make sure that you are clearing/recomputing it correctly whenever there is a change, which can be daunting if you have to notify of a change up or down a complex graph or tree.
If you want to use a hash code computed by the JVM (roughly based on the "RAM address" of the object, even if it's value is implementation specific), you can use System.identityHashCode(x), which does exactly that, and exactly what Object.hashCode does.

What I really want is to to map nodes to their hash codes using a hash which uses the node's address.
What do you mean by the node's address? There is no such concept in Java, and there is no unique identifier for objects that I know of, like the physical address in non VM based languages e.g. C++. References in Java are not memory addresses, and objects may be relocated in memory anytime by the GC.
I thought I could do this by making HashMap, and casting the nodes to object, which would then invoke the hashCode method on objects, but this didn't work
Indeed, since hashCode is virtual, and is overridden in your node class, so always the subclass implementation will be called, regardless of the static type of the reference you have.
I am afraid any attempt to use a map to cache hash values bumps into the same chicken and egg problem, that - as you mention - the map needs the hash value itself first.
I don't see any better way than caching the hash values within the nodes as you did. You need to ensure though that the cached values are invalidated whenever the child nodes change. Wrong - as Jon's answer points out, changing the hashcode of an object after it is stored in a map breaks the map's internal integrity, so it must not happen.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How does ConcurrentHashMap.get() prevent dirty read? - java

Related

Why does the CompletableFuture allOf method do a binary search?

A confusion about the source code for ConcurrentHashMap's putVal method

Atomic compareAndSet but with callback?

ConcurrentHashMap Locking at every read?

Un-overiding hashCode

Categories

Resources