Concurrent byte array access in Java with as few locks as possible

Concurrent byte array access in Java with as few locks as possible - java

I'm trying to reduce the memory usage for the lock objects of segmented data. See my questions here and here. Or just assume you have a byte array and every 16 bytes can (de)serialize into an object. Let us call this a "row" with row length of 16 bytes. Now if you modify such a row from a writer thread and read from multiple threads you need locking. And if you have a byte array size of 1MB (1024*1024) this means 65536 rows and the same number of locks.
This is a bit too much, also that I need much larger byte arrays, and I would like to reduce it to something roughly proportional to the number of threads. My idea was to create a
ConcurrentHashMap<Integer, LockHelper> concurrentMap;
where Integer is the row index and before a thread 'enters' a row it puts a lock object in this map (got this idea from this answer). But no matter what I think through I cannot find an approach that is really thread-safe:
// somewhere else where we need to write or read the row
LockHelper lock1 = new LockHelper();
LockHelper lock = concurrentMap.putIfAbsent(rowIndex, lock1);
lock.addWaitingThread(); // is too late
synchronized(lock) {
try {
// read or write row at rowIndex e.g. writing like
bytes[rowIndex/16] = 1;
bytes[rowIndex/16 + 1] = 2;
// ...
} finally {
if(lock.noThreadsWaiting())
concurrentMap.remove(rowIndex);
}
}
Do you see a possibility to make this thread-safe?
I have the feeling that this will look very similar like the concurrentMap.compute contstruct (e.g. see this answer) or could I even utilize this method?
map.compute(rowIndex, (key, value) -> {
if(value == null)
value = new Object();
synchronized (value) {
// access row
return value;
}
});
map.remove(rowIndex);
Is the value and the 'synchronized' necessary at all as we already know the compute operation is atomically?
// null is forbidden so use the key also as the value to avoid creating additional objects
ConcurrentHashMap<Integer, Integer> map = ...;
// now the row access looks really simple:
map.compute(rowIndex, (key, value) -> {
// access row
return key;
});
map.remove(rowIndex);
BTW: Since when we have this compute in Java. Since 1.8? Cannot find this in the JavaDocs
Update: I found a very similar question here with userIds instead rowIndices, note that the question contains an example with several problems like missing final, calling lock inside the try-finally-clause and lack of shrinking the map. Also there seems to be a library JKeyLockManager for this purpose but I don't think it is thread-safe.
Update 2: The solution seem to be really simple as Nicolas Filotto pointed out how to avoid the removal:
map.compute(rowIndex, (key, value) -> {
// access row
return null;
});
So this is really less memory intense BUT the simple segment locking with synchronized is at least 50% faster in my scenario.

Is the value and the synchronized necessary at all as we already
know the compute operation is atomically?
I confirm that it is not needed to add a synchronized block in this case as the compute method is done atomically as stated in the Javadoc of ConcurrentHashMap#compute(K key, BiFunction<? super K,? super V,? extends V> remappingFunction) that has been added with BiFunction since Java 8, I quote:
Attempts to compute a mapping for the specified key and its current
mapped value (or null if there is no current mapping). The entire
method invocation is performed atomically. Some attempted update
operations on this map by other threads may be blocked while
computation is in progress, so the computation should be short and
simple, and must not attempt to update any other mappings of this Map.
What you try to achieve with the compute method could be totally atomic if you make your BiFunction always returns null to remove the key atomically too such that everything will be done atomically.
map.compute(
rowIndex,
(key, value) -> {
// access row here
return null;
}
);
This way you will then fully rely on the locking mechanism of a ConcurrentHashMap to synchronize your accesses to your rows.

Related

minimizing lock scope for JDK8 ConcurrentHashMap check-and-set operation

1.
I have multiple threads updating a ConcurrentHashMap.
Each thread appends a list of integers to the Value of a map entry based on the key.
There is no removal operation by any thread.
The main point here is that I want to minimize the scope of locking and synchronization as much as possible.
I saw that the doc of computeIf...() method says "Some attempted update operations on this map by other threads may be blocked while computation is in progress", which is not so encouraging. On the other hand, when I look into its source code, I failed to observe where it locks/synchronizes on the entire map.
Therefore, I wonder about the comparison of theoretical performance of using computeIf...() and the following home-grown 'method 2'.
2.
Also, I feel that the problem I described here is perhaps one of the most simplified check-n-set (or generally a 'compound') operation you can carry out on ConcurrentHashMap.
Yet I'm not quite confident and can't quite find much guideline about how to do even this kind of simple compound operations on ConcurrentHashMap, without Locking/Synchronizing on the entire map.
So any general good practice advice for this will be much appreciated.
public void myConcurrentHashMapTest1() {
ConcurrentHashMap<String, List<Integer>> myMap = new ConcurrentHashMap<String, List<Integer>>();
// MAP KEY: a Word found by a thread on a page of a book
String myKey = "word1";
// -- Method 1:
// Step 1.1 first, try to use computeIfPresent(). doc says it may lock the
// entire myMap.
myMap.computeIfPresent(myKey, (key,val) -> val.addAll(getMyVals()));
// Step 1.2 then use computeIfAbsent(). Again, doc says it may lock the
// entire myMap.
myMap.computeIfAbsent(myKey, key -> getMyVals());
}
public void myConcurrentHashMapTest2() {
// -- Method 2: home-grown lock splitting (kind of). Will it theoretically
// perform better?
// Step 2.1: TRY to directly put an empty list for the key
// This may have no effect if the key is already present in the map
List<Integer> myEmptyList = new ArrayList<Integer>();
myMap.putIfAbsent(myKey, myEmptyList);
// Step 2.2: By now, we should have the key present in the map
// ASSUMPTION: no thread does removal
List<Integer> listInMap = myMap.get(myKey);
// Step 2.3: Synchronize on that list, append all the values
synchronized(listInMap){
listInMap.addAll(getMyVals());
}
}
public List<Integer> getMyVals(){
// MAP VALUE: e.g. Page Indices where word is found (by a thread)
List<Integer> myValList = new ArrayList<Integer>();
myValList.add(1);
myValList.add(2);
return myValList;
}

You're basing your assumption (that using ConcurrentHashMap as intended will be too slow for you) on a misinterpretation of the Javadoc. The Javadoc doesn't state that the whole map will be locked. It also doesn't state that each computeIfAbsent() operation performs pessimistic locking.
What could actually be locked is a bin (a.k.a. bucket) which corresponds to a single element in the internal array backing of the ConcurrentHashMap. Note that this is not Java 7's map segment containing multiple buckets. When such a bin is locked, potentially blocked operations are solely updates for keys that hash to the same bin.
On the other hand, your solution doesn't mean that all internal locking within ConcurrentHashMap is avoided - computeIfAbsent() is just one of the methods that can degrade to using a synchronized block while updating. Even the putIfAbsent() with which you're initially putting an empty list for some key, can block if it doesn't hit an empty bin.
What's worse though is that your solution doesn't guarantee the visibility of your synchronized bulk updates. You are guaranteed that a get() happens-before a putIfAbsent() which value it observes, but there's no happens-before between your bulk updates and a subsequent get().
P.S. You can read further about the locking in ConcurrentHashMap in its OpenJDK implementation: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/concurrent/ConcurrentHashMap.java, lines 313-352.

As already explained by Dimitar Dimitrov, a compute… method doesn’t generally lock the entire map. In the best case, i.e. there’s no need to increase the capacity and there’s no hash collision, only the mapping for the single key is locked.
However, there are still things you can do better:
generally, avoid performing multiple lookups. This applies to both variants, using computeIfPresent followed by computeIfAbsent, as well as using putIfAbsent followed by get
it’s still recommended to minimize the code executed when holding a lock, i.e. don’t invoke getMyVals() while holding the lock as it doesn’t depend on the map’s state
Putting it together, the update should look like:
// compute without holding a lock
List<Integer> toAdd=getMyVals();
// update the map
myMap.compute(myKey, (key,val) -> {
if(val==null) val=toAdd; else val.addAll(toAdd);
return val;
});
or
// compute without holding a lock
List<Integer> toAdd=getMyVals();
// update the map
myMap.merge(myKey, toAdd, (a,b) -> { a.addAll(b); return a; });
which can be simplified to
myMap.merge(myKey, getMyVals(), (a,b) -> { a.addAll(b); return a; });

Client/server multithreading and ConcurrentHashMap - why isn't clients.get(id) being locked?

I have the following block of code in my server:
clients.putIfAbsent(id, new Integer(0));
synchronized (clients.get(id)) {
if (o instanceof Integer) {
x = new Integer(((Integer) o).intValue());
value = clients.get(id); // existing value in HashMap
value = new Integer(value.intValue() + x);
clients.put(id, value);
} else if (o instanceof String) {
clients.put(id, new Integer(0));
}
Thread.sleep(SockServer5.sleepTime);
out.writeObject(clients.get(id));
out.flush();
}
clients is a ConcurrentHashMap, while o is the object input being read from the client. For each new client connection, the server spawns a new thread.
I would like to know, is there any particular reason why my clients.get(id) isn't being locked?

It is really hard to understand what you are trying to achieve, clients.get(id) returns an instance of an Object and you are synchronizing on it.
Fine.
That does not prevent access to the concurrent hashmap by another thread. I suspect you want to prevent access to the Hashmap, in which case you should use an Object() as a mutex

In both branches of your if statement, you are putting a new object into the map. Therefore all other threads will find a different object as result of client.get(id). Even if you have two equal integers in the map, they are not same objects.
Example: if 'o' is always a string, each execution of the code will replace the value in the map, and each following thread will get a new object in client.get() (well some may be lucky enough to get a previous object, but most will get a new one, since the synchronized block is rather small and fast comparing to stream handling and your sleep (because the object gets replaced, before the sleep finishes)).
If you want to synchronize on a non-existing object, on the idea (or id) of the object, check out this github repo: https://github.com/anotheria/idbasedlock

Two things: first
One shall not use new Integer but Integer.valueOf(i)
The other thing is that there isn't even a need to use synchronized. It completely defies the reason to use a concurrent map, because it throttles down threads to single-threaded thanks to blocking. Here is a solution that does not use synchronized but is still thread-safe:
clients.putIfAbsent(id, Integer.valueOf(0));
Integer value;
do {
value = clients.get(id);
} while (clients.replace(id, value, value + o) == false); // repeat until value did not change in between
Thread.sleep(SockServer5.sleepTime);
out.writeObject(value);
out.flush();
Where clients is defined as (replace String with whatever you use):
final ConcurrentHashMap<String, Integer> clients;
This uses the same concept as compareAndSet does in other Atomics: it repeats until no thread interfered in between. The big benefit is that no thread needs to be blocked, so on a multi-core machine all cores can be used at 100% all the time.
Please note as well the usage of automatic boxing of Integers in value + o. Java will use the most efficient form here on its own.

Updating BigDecimal concurrently within ConcurrentHashMap thread safe

Is the code below thread/concurrency safe when there are multiple threads calling the totalBadRecords() method from inside other method? Both map objects parameters to this method are ConcurrentHashMap. I want to ensure that each call updates the total properly.
If it is not safe, please explain what do I have to do to ensure thread safety.
Do I need to synchronize the add/put or is there a better way?
Do i need to synchronize the get method in TestVO. TestVO is simple java bean and having getter/setter method.
Below is my Sample code:
public void totalBadRecords(final Map<Integer, TestVO> sourceMap,
final Map<String, String> logMap) {
BigDecimal badCharges = new BigDecimal(0);
boolean badRecordsFound = false;
for (Entry<Integer, TestVO> e : sourceMap.entrySet()) {
if ("Y".equals(e.getValue().getInd()))
badCharges = badCharges.add(e.getValue()
.getAmount());
badRecordsFound = true;
}
if (badRecordsFound)
logMap.put("badRecordsFound:", badCharges.toPlainString());
}

That depends on how your objects are used in your whole application.
If each call to totalBadRecords takes a different sourceMap and the map (and its content) is not mutated while counting, it's thread-safe:
badCharges is a local variable, it can't be shared between thread, and is thus thread-safe (no need to synchronize add)
logMap can be shared between calls to totalBadRecords: the method put of ConcurrentHashMap is already synchronized (or behaves as if it was).
if instances of TestVO are not mutated, the value from getValue() and getInd() are always coherent with one other.
the sourceMap is not mutated, so you can iterate over it.
Actually, in this case, you don't need a concurrent map for sourceMap. You could even make it immutable.
If the instances of TestVO and the sourceMap can change while counting, then of course you could be counting wrongly.

It depends on what you mean by thread-safe. And that boils down to what the requirements for this method are.
At the data structure level, the method will not corrupt any data structures, because the only data structures that could be shared with other threads are ConcurrentHashMap instances, and they safe against that kind of problem.
The potential thread-safety issue is that iterating a ConcurrentHashMap is not an atomic operation. The guarantees for the iterators are such that you are not guaranteed to see all entries in the iteration if the map is updated (e.g. by another thread) while you are iterating. That means that the totalBadRecords method may not give an accurate count if some other thread modifies the map during the call. Whether this is a real thread-safety issue depends on whether or not the totalBadRecords is required to give an accurate result in that circumstance.
If you need to get an accurate count, then you have to (somehow) lock out updates to the sourceMap while making the totalBadRecords call. AFAIK, there is no way to do this using (just) the ConcurrentHashMap API, and I can't think of a way to do it that doesn't make the map a concurrency bottleneck.
In fact, if you need to calculate accurate counts, you have to use external locking for (at least) the counting operation, and all operations that could change the outcome of the counting. And even that doesn't deal with the possibility that some thread may modify one of the TestVO objects while you are counting records, and cause the TestVO to change from "good" to "bad" or vice-versa.

You could use something like the following.
That would guarantee you that after a call to the totalBadRecords method, the String representing the bad charges in the logMap is accurate, you don't have lost updates. Of course a phantom read can always happen, as you do not lock the sourceMap.
private static final String BAD_RECORDS_KEY = "badRecordsFound:";
public void totalBadRecords(final ConcurrentMap<Integer, TestVO> sourceMap,
final ConcurrentMap<String, String> logMap) {
while (true) {
// get the old value that is going to be replaced.
String oldValue = logMap.get(BAD_RECORDS_KEY);
// calculate new value
BigDecimal badCharges = BigDecimal.ZERO;
for (TestVO e : sourceMap.values()) {
if ("Y".equals(e.getInd()))
badCharges = badCharges.add(e.getAmount());
}
final String newValue = badCharges.toPlainString();
// insert into map if there was no mapping before
if (oldValue == null) {
oldValue = logMap.putIfAbsent(BAD_RECORDS_KEY, newValue);
if (oldValue == null) {
oldValue = newValue;
}
}
// replace the entry in the map
if (logMap.replace(BAD_RECORDS_KEY, oldValue, newValue)) {
// update succeeded -> there where no updates to the logMap while calculating the bad charges.
break;
}
}
}

Using putIfAbsent like a short circuit operator

Is it possible to use putIfAbsent or any of its equivalents like a short circuit operator.
myConcurrentMap.putIfAbsent(key,calculatedValue)
I want that if there is already a calculatedValue it shouldnt be calculated again.
by default putIfAbsent would still do the calculation every time even though it will not actually store the value again.

Java doesn't allow any form of short-circuiting save the built-in cases, sadly - all method calls result in the arguments being fully evaluated before control passes to the method. Thus you couldn't do this with "normal" syntax; you'd need to manually wrap up the calculation inside a Callable or similar, and then explicitly invoke it.
In this case I find it difficult to see how it could work anyway, though. putIfAbsent works on the basis of being an atomic, non-blocking operation. If it were to do what you want, the sequence of events would roughly be:
Check if key exists in the map (this example assumes it doesn't)
Evaluate calculatedValue (probably expensive, given the context of the question)
Put result in map
It would be impossible for this to be non-blocking if the value didn't already exist at step two - two different threads calling this method at the same time could only perform correctly if blocking happened. At this point you may as well just use synchronized blocks with the flexibility of implementation that that entails; you can definitely implement what you're after with some simple locking, something like the following:
private final Map<K, V> map = ...;
public void myAdd(K key, Callable<V> valueComputation) {
synchronized(map) {
if (!map.containsKey(key)) {
map.put(key, valueComputation.call());
}
}
}

You can put Future<V> objects into the map. Using putIfAbsent, only one object will be there, and computation of final value will be performed by calling Future.get() (e.g. by FutureTask + Callable classes). Check out Java Concurrency in Practice for discussion about using this technique. (Example code is also in this question here on SO.
This way, your value is computed only once, and all threads get same value. Access to map isn't blocked, although access to value (through Future.get()) will block until this value is computed by one of the threads.

You could consider to use a Guava ComputingMap
ConcurrentMap<Key, Value> myConcurrentMap = new MapMaker()
.makeComputingMap(
new Function<Key, Value>() {
public Value apply(Key key) {
Value calculatedValue = calculateValue(key);
return calculatedValue;
}
});

How to safely modify values in Java HashMaps concurrently?

I have a block of Java code that looks something like this that I'm trying to parallelize:
value = map.get(key);
if (value == null) {
value = new Value();
map.put(key,value);
}
value.update();
I want to block any other thread from accessing the map with that particular key until after value.update() is called even if key is not in the key set. Accessing with other keys should be allowed. How could I achieve this?

Short answer is there's no safe way to do this without synchronizing the entire block. You could use java.util.concurrent.ConcurrentHashMap though, see this article for more details. The basic idea is to use ConcurrentHashMap.putIfAbsent instead of the normal put.

You cannot parallelize updates to HashMap because update can trigger resize of the underlying array including recalculation of all keys.
Use other collection, for example java.util.concurrent.ConcurrentHashMap which is a "A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates." according to javadoc.

I wouldn't use HashMap if you need to be concerned about threading issues. Make use of the Java 5 concurrent package and look into ConcurrentHashMap.

You just described the use case for the Guava computing map. You create it with:
Map<Key, Value> map = new MapMaker().makeComputingMap(new Function<Key, Value>() {
public Value apply(Key key) {
return new Value().update();
}
));
and use it:
Value v = map.get(key);
This guarantees only one thread will call update() and other threads will block and wait until the method completes.
You probably don't actually want your value having a mutable update method on it, but that's another discussion.

private void synchronized functionname() {
value = map.get(key);
if (value == null) {
value = new Value();
map.put(key,value);
}
value.update();
}
You can learn more about synchronized methods here: Synchronized Methods
You might also want to investigate the ConcurrentHashMap class, which might suit your purposes. You can see it on the JavaDoc.

Look into Concurrent HashMap. It has excellent performance even for single-threaded applications. It allows concurrent modification of Map from various threads without any need of blocking them.

One possibility is to manage multiple locks. So you can keep an array of locks that is retrieved based on the key's hash code. This should give you better through-put then synchronizing the whole method. You can size the array based on the number of thread that you believe will be accessing the code.
private static final int NUM_LOCKS = 16;
Object [] lockArray = new Object[NUM_LOCKS];
...
// Load array with Objects or Reentrant Locks
...
Object keyLock = lockArray[key.hashcode % NUM_LOCKS];
synchronize(keyLock){
value = map.get(key);
if (value == null) {
value = new Value();
map.put(key,value);
}
value.update();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.