Populating map from multiple threads

Populating map from multiple threads - java

I have a ConcurrentHashMap which I am populating from multiple threads as shown below:
private static Map<ErrorData, Long> holder = new ConcurrentHashMap<ErrorData, Long>();
public static void addError(ErrorData error) {
if (holder.keySet().contains(error)) {
holder.put(error, holder.get(error) + 1);
} else {
holder.put(error, 1L);
}
}
Is there any possibility of race condition in above code and it can skip updates? Also how can I use Guava AtomicLongMap here if that can give better performance?
I am on Java 7.

Yes, there is a possibility of a race because you are not checking contains and putting atomically.
You can use AtomicLongMap as follows, which does this check atomically:
private static final AtomicLongMap<ErrorData> holder = AtomicLongMap.create();
public static void addError(ErrorData error) {
holder.getAndIncrement(error);
}
As described in the javadoc:
[T]he typical mechanism for writing to this map is addAndGet(K, long), which adds a long to the value currently associated with K. If a key has not yet been associated with a value, its implicit value is zero.
and
All operations are atomic unless otherwise noted.

If you are using java 8, you can take advantage of the new merge method:
holder.merge(error, 1L, Long::sum);

A 'vanilla' java 5+ solution :
public static void addError(final ErrorData errorData) {
Long previous = holder.putIfAbsent(errorData, 1L);
// if the error data is already mapped to some value
if (previous != null) {
// try to replace the existing value till no update takes place in the meantime
while (!map.replace(errorData, previous, previous + 1)) {
previous = map.get(errorData);
}
}
}

In Java 7 or older versions you need to use a compare-and-update loop:
Long prevValue;
boolean done;
do {
prevValue = holder.get(error);
if (prevValue == null) {
done = holder.putIfAbsent(error, 1L);
} else {
done = holder.replace(error, prevValue, newValue);
}
} while (!done);
With this code, if two threads race one may end up retrying its update, but they'll get the right value in the end.
Consider:
Thread1: holder.get(error) returns 1
Thread2: holder.get(error) returns 1
Thread1: holder.put(error, 1+1);
Thread2: holder.put(error, 1+1);
To fix this you need to use atomic operations to update the map.

Related

Synchronization on ConcurrentHashMap

In my application I am using a ConcurrentHashMap and I need this type of "custom put-if-absent" method to be executed atomically.
public boolean putIfSameMappingNotExistingAlready(String key, String newValue) {
String value;
synchronized (concurrentHashMap) {
if (value = concurrentHashMap.putIfAbsent(key, newValue)) == null) {
// There was no mapping for the key
return true;
} else { if (value.equals(newValue)) {
// The mapping <key, newValue> already exists in the map
return false;
} else {
concurrentHashMap.put(key, newValue);
return true;
}
}
}
}
I read (in the concurrent package documentation) that
A concurrent collection is thread-safe, but not governed by a single exclusion lock.
So you can not get an exclusive lock on a ConcurrentHashMap.
My questions are:
Is the code above thread-safe? To me it looks like it is guaranteed that the code in the synchronized block can be executed only by a single thread at the same time, but I want to confirm it.
Wouldn't it be "cleaner" to use Collections.synchronizedMap() instead of ConcurrentHashMap in this case?
Thanks a lot!

The following code uses a compare-and-set loop (as suggested by SlakS) to implement thread safety (Note the infinite loop):
/**
* Updates or adds the mapping for the given key.
* Returns true, if the operation was successful and false,
* if key is already mapped to newValue.
*/
public boolean updateOrAddMapping(String key, String newValue) {
while (true) {
// try to insert if absent
String oldValue = concurrentHashMap.putIfAbsent(key, newValue);
if (oldValue == null) return true;
// test, if the values are equal
if (oldValue.equals(newValue)) return false;
// not equal, so we have to replace the mapping.
// try to replace oldValue by newValue
if (concurrentHashMap.replace(key, oldValue, newValue)) return true;
// someone changed the mapping in the meantime!
// loop and try again from start.
}
}

By synchronizing on the entire collection like that you are essentially replacing the fine-grained synchronization within the concurrent collection with your own blunt-force approach.
If you aren't using the concurrency protections elsewhere then you could just use a standard HashMap for this and wrap it in your own synchronization. Using a synchronizedMap may work, but wouldn't cover multi-step operations such as above where you put, check, put.

How to resolve the findbug Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be atomic

Hi I am getting the bug "Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be atomic " when i am running find bug in my project for the below code.
public static final ConcurrentHashMap<String,Vector<Person>> personTypeMap = new ConcurrentHashMap<String, Vector<Person>>();
private static void setDefaultPersonGroup() {
PersonDao crud = PersonDao.getInstance();
List<Person> personDBList = crud.retrieveAll();
for (Person person : personDBList) {
Vector<Person> personTypeCollection = personTypeMap.get(person
.getGroupId());
if (personTypeCollection == null) {
personTypeCollection = new Vector<Person>();
personTypeMap.put(personTypeCollection.getGroupId(),
personTypeCollection);
}
personTypeCollection.add(person);
}
}
I am facing the problem at the line
personTypeMap.put(personTypeCollection.getGroupId(),
personTypeCollection);
Can any one help me to resolve the problem.

Compound operations are unsafe in concurrent environment.
What compound operations are you performing?
1) You are checking whether Map contains a vector for a key
2) You are putting a new Vector if no value is found
So this is a two step action and is compound, so it is unsafe.
Why are they not safe?
Because they are not atomic. Think of a scenario in which you have two threads.
Consider this timeline:
Thread 1 --- checks for == null -> true puts a new Vector
Thread 2 --- checks for ==null -> true puts a new Vector
Use putIfAbsent() method on ConcurrentHashMap, which provides you an atomic solution to what you are trying to perform.
ConcurrentHashMap#putIfAbsent()
References:
Proper use of putIfAbsent

That findbugs message is telling you in the case of multi-threaded access it's not safe:
You're fetching something from personTypeMap, checking to see if it's null, then putting a new entry in if that's the case. Two threads could easily interleave here:
Thread1: get from map
Thread2: get from map
Thread1: check returned value for null
Thread1: put new value
Thread2: check returned value for null
Thread2: put new value
(Just as an example; in reality the ordering is not a given - the point is both threads get null then act on it)
You should be creating a new entry, then calling personTypeMap.putIfAbsent() as this guarantees atomicity.

In your case, your code should looks like this:
public static final ConcurrentHashMap<String,Vector<Person>> personTypeMap = new ConcurrentHashMap<String, Vector<Person>>();
private static void setDefaultPersonGroup() {
PersonDao crud = PersonDao.getInstance();
List<Person> personDBList = crud.retrieveAll();
for (Person person : personDBList) {
// the putIfAbsent works in the same way of your
//previous code, but in atomic way
Vector<Person> personTypeCollection = personTypeMap.putIfAbsent(person
.getGroupId());
personTypeCollection.add(person);
}
}

My ConcurrentHashmap's value type is List,how to make appending to that list thread safe?

My class extends from ConcurrentHashmap[String,immutable.List[String]]
and it has 2 methods :
def addEntry(key: String, newList: immutable.List[String]) = {
...
//if key exist,appending the newList to the exist one
//otherwise set the newList as the value
}
def resetEntry(key: String): Unit = {
this.remove(key)
}
in order to make the addEntry method thread safe,I tried :
this.get(key).synchronized{
//append or set here
}
but that will raise null pointer exception if key does not exist,and use putIfAbsent(key, new immutable.List()) before synchronize won't work cause after putIfAbsent and before goes into synchronized block,the key may be removed by resetEntry.
make addEntry and resetEntry both synchronized method will work but the lock is too large
So, what could I do?
ps.this post is similiar with How to make updating BigDecimal within ConcurrentHashMap thread safe while plz help me figure out how to code other than general guide
--update--
checkout https://stackoverflow.com/a/34309186/404145, solved this after almost 3+ years later.

Instead of removing the entry, can you simply clear it? You can still use a synchronized list and ensure atomicity.
def resetEntry(key: String, currentBatchSize: Int): Unit = {
this.get(key).clear();
}
This works with the assumption that each key has an entry. For example if this.get(key)==null You would want to insert a new sychronizedList which should act as a clear as well.

After more than 3 years, I think now I can answer my question.
The original problem is:
I get a ConcurrentHashMap[String, List], many threads are appending values to it, how can I make it thread-safe?
Make addEntry() synchronized will work, right?
synchronize(map.get(key)){
map.append(key, value)
}
In most cases yes except when map.get(key) is null, which will cause NullPointerException.
So what about adding map.putIfAbsent(key, new List) like this:
map.putIfAbsent(key, new List)
synchronize(map.get(key)){
map.append(key, value)
}
Better now, but if after putIfAbsent() another thread called resetEntry(), we will see NullPointerException again.
Make addEntry and resetEntry both synchronized method will work but the lock is too big.
So what about MapEntry Level Lock when appending and Map Level Lock when resetting?
Here comes the ReentrantReadWriteLock:
When calling addEntry(), we acquire a share lock of the map, that makes appending as concurrently as possible, and when calling resetEntry(), we acquire an exclusive lock to make sure that no other threads are changing the map at the same time.
The code looks like this:
class MyMap extends ConcurrentHashMap{
val lock = new ReentrantReadWriteLock();
def addEntry(key: String, newList: immutable.List[String]) = {
lock.readLock.lock()
//if key exist,appending the newList to the exist one
//otherwise set the newList as the value
this.putIfAbsent(key, new List())
this(key).synchronized{
this(key) += newList
}
lock.readLock.unlock()
}
def resetEntry(key: String, currentBatchSize: Int): Unit = {
lock.writeLock.lock()
this.remove(key)
lock.writeLock.unlock()
}
}

You can try a method inspired by the CAS (Compare and Swap) process:
(in pseudo-java-scala-code, as my Scala is still in its infancy)
def addEntry(key: String, newList: immutable.List[String]) = {
val existing = putIfAbsent(key, newList);
if (existing != null) {
synchronized(existing) {
if (get(key) == existing) { // ask again for the value within the synchronized block to ensure consistence. This is the compare part of CAS
return put(key,existing ++ newList); // Swap the old value by the new
} else {
throw new ConcurrentModificationException(); // how else mark failure?
}
}
}
return existing;
}

ConcurrentSkipListSet and replace remove(key)

I am using ConcurrentSkipListSet, which I fill with 20 keys.
I want to replace these keys continuously. However, ConcurrentSkipListSet doesn't seem to have an atomic replace function.
This is what I am using now:
ConcurrentSkipListSet<Long> set = new ConcurrentSkipListSet<Long>();
AtomicLong uniquefier = new AtomicLong(1);
public void fillSet() {
// fills set with 20 unique keys;
}
public void updateSet() {
Long now = Calendar.getInstance().getTimeInMillis();
Long oldestKey = set.first();
if (set.remove(oldestKey)) {
set.add(makeUnique(now));
}
}
private static final long MULTIPLIER = 1024;
public Long makeUnique(long in) {
return (in*MULTIPLIER+uniquefier.getAndSet((uniquefier.incrementAndGet())%(MULTIPLIER/2)));
}
The goal of this whole operation is to keep the list as long as it is, and only update by replacing. updateSet is called some 100 times per ms.
Now, my question is this: does remove return true if the element itself was present before (and isn't after), or does the method return true only if the call was actually responsible for the removal?
I.e.: if multiple threads call remove on the very same key at the very same time, will they /all/ return true, or will only one return true?

set.remove will only return true for the thread that actually caused the object to be removed.
The idea behind the set's concurrency is that multiple threads can be updating multiple objects. However, each individual object can only be updated by one thread at a time.

Should you check if the map containsKey before using ConcurrentMap's putIfAbsent

I have been using Java's ConcurrentMap for a map that can be used from multiple threads. The putIfAbsent is a great method and is much easier to read/write than using standard map operations. I have some code that looks like this:
ConcurrentMap<String, Set<X>> map = new ConcurrentHashMap<String, Set<X>>();
// ...
map.putIfAbsent(name, new HashSet<X>());
map.get(name).add(Y);
Readability wise this is great but it does require creating a new HashSet every time even if it is already in the map. I could write this:
if (!map.containsKey(name)) {
map.putIfAbsent(name, new HashSet<X>());
}
map.get(name).add(Y);
With this change it loses a bit of readability but does not need to create the HashSet every time. Which is better in this case? I tend to side with the first one since it is more readable. The second would perform better and may be more correct. Maybe there is a better way to do this than either of these.
What is the best practice for using a putIfAbsent in this manner?

Concurrency is hard. If you are going to bother with concurrent maps instead of straightforward locking, you might as well go for it. Indeed, don't do lookups more than necessary.
Set<X> set = map.get(name);
if (set == null) {
final Set<X> value = new HashSet<X>();
set = map.putIfAbsent(name, value);
if (set == null) {
set = value;
}
}
(Usual stackoverflow disclaimer: Off the top of my head. Not tested. Not compiled. Etc.)
Update: 1.8 has added computeIfAbsent default method to ConcurrentMap (and Map which is kind of interesting because that implementation would be wrong for ConcurrentMap). (And 1.7 added the "diamond operator" <>.)
Set<X> set = map.computeIfAbsent(name, n -> new HashSet<>());
(Note, you are responsible for the thread-safety of any operations of the HashSets contained in the ConcurrentMap.)

Tom's answer is correct as far as API usage goes for ConcurrentMap. An alternative that avoids using putIfAbsent is to use the computing map from the GoogleCollections/Guava MapMaker which auto-populates the values with a supplied function and handles all the thread-safety for you. It actually only creates one value per key and if the create function is expensive, other threads asking getting the same key will block until the value becomes available.
Edit from Guava 11, MapMaker is deprecated and being replaced with the Cache/LocalCache/CacheBuilder stuff. This is a little more complicated in its usage but basically isomorphic.

You can use MutableMap.getIfAbsentPut(K, Function0<? extends V>) from Eclipse Collections (formerly GS Collections).
The advantage over calling get(), doing a null check, and then calling putIfAbsent() is that we'll only compute the key's hashCode once, and find the right spot in the hashtable once. In ConcurrentMaps like org.eclipse.collections.impl.map.mutable.ConcurrentHashMap, the implementation of getIfAbsentPut() is also thread-safe and atomic.
import org.eclipse.collections.impl.map.mutable.ConcurrentHashMap;
...
ConcurrentHashMap<String, MyObject> map = new ConcurrentHashMap<>();
map.getIfAbsentPut("key", () -> someExpensiveComputation());
The implementation of org.eclipse.collections.impl.map.mutable.ConcurrentHashMap is truly non-blocking. While every effort is made not to call the factory function unnecessarily, there's still a chance it will be called more than once during contention.
This fact sets it apart from Java 8's ConcurrentHashMap.computeIfAbsent(K, Function<? super K,? extends V>). The Javadoc for this method states:
The entire method invocation is performed atomically, so the function
is applied at most once per key. Some attempted update operations on
this map by other threads may be blocked while computation is in
progress, so the computation should be short and simple...
Note: I am a committer for Eclipse Collections.

By keeping a pre-initialized value for each thread you can improve on the accepted answer:
Set<X> initial = new HashSet<X>();
...
Set<X> set = map.putIfAbsent(name, initial);
if (set == null) {
set = initial;
initial = new HashSet<X>();
}
set.add(Y);
I recently used this with AtomicInteger map values rather than Set.

In 5+ years, I can't believe no one has mentioned or posted a solution that uses ThreadLocal to solve this problem; and several of the solutions on this page are not threadsafe and are just sloppy.
Using ThreadLocals for this specific problem isn't only considered best practices for concurrency, but for minimizing garbage/object creation during thread contention. Also, it's incredibly clean code.
For example:
private final ThreadLocal<HashSet<X>>
threadCache = new ThreadLocal<HashSet<X>>() {
#Override
protected
HashSet<X> initialValue() {
return new HashSet<X>();
}
};
private final ConcurrentMap<String, Set<X>>
map = new ConcurrentHashMap<String, Set<X>>();
And the actual logic...
// minimize object creation during thread contention
final Set<X> cached = threadCache.get();
Set<X> data = map.putIfAbsent("foo", cached);
if (data == null) {
// reset the cached value in the ThreadLocal
listCache.set(new HashSet<X>());
data = cached;
}
// make sure that the access to the set is thread safe
synchronized(data) {
data.add(object);
}

My generic approximation:
public class ConcurrentHashMapWithInit<K, V> extends ConcurrentHashMap<K, V> {
private static final long serialVersionUID = 42L;
public V initIfAbsent(final K key) {
V value = get(key);
if (value == null) {
value = initialValue();
final V x = putIfAbsent(key, value);
value = (x != null) ? x : value;
}
return value;
}
protected V initialValue() {
return null;
}
}
And as example of use:
public static void main(final String[] args) throws Throwable {
ConcurrentHashMapWithInit<String, HashSet<String>> map =
new ConcurrentHashMapWithInit<String, HashSet<String>>() {
private static final long serialVersionUID = 42L;
#Override
protected HashSet<String> initialValue() {
return new HashSet<String>();
}
};
map.initIfAbsent("s1").add("chao");
map.initIfAbsent("s2").add("bye");
System.out.println(map.toString());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Populating map from multiple threads - java

If you are using java 8, you can take advantage of the new merge method: holder.merge(error, 1L, Long::sum);

Related

Synchronization on ConcurrentHashMap

How to resolve the findbug Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be atomic

My ConcurrentHashmap's value type is List,how to make appending to that list thread safe?

ConcurrentSkipListSet and replace remove(key)

Should you check if the map containsKey before using ConcurrentMap's putIfAbsent

Categories

Resources