How to make updating BigDecimal within ConcurrentHashMap thread safe - java

I am making an application that takes a bunch of journal entries and calculate sum.
Is below way of doing it is thread/concurrency safe when there are multiple threads calling the addToSum() method. I want to ensure that each call updates the total properly.
If it is not safe, please explain what do I have to do to ensure thread safety.
Do I need to synchronize the get/put or is there a better way?
private ConcurrentHashMap<String, BigDecimal> sumByAccount;
public void addToSum(String account, BigDecimal amount){
BigDecimal newSum = sumByAccount.get(account).add(amount);
sumByAccount.put(account, newSum);
}
Thanks so much!
Update:
Thanks everyone for the answer, I already get that the code above is not thread-safe.
Thanks Vint for suggesting the AtomicReference as an alternative to synchronize. I was using AtomicInteger to hold integer sums before and I was wondering if there are something like that for BigDecimal.
Is the a definitive conclusion on the pro and con of the two?

You can use synchronized like the others suggested but if want a minimally blocking solution you can try AtomicReference as a store for the BigDecimal
ConcurrentHashMap<String,AtomicReference<BigDecimal>> map;
public void addToSum(String account, BigDecimal amount) {
AtomicReference<BigDecimal> newSum = map.get(account);
for (;;) {
BigDecimal oldVal = newSum.get();
if (newSum.compareAndSet(oldVal, oldVal.add(amount)))
return;
}
}
Edit - I'll explain this more:
An AtomicReference uses CAS to atomically assigns a single reference. The loop says this.
If the current field stored in AtomicReference == oldVal [their location in memory, not their value] then replace the value of the field stored in AtomicReference with oldVal.add(amount). Now, any time after the for-loop you invoke newSum.get() it will have the BigDecimal object that has been added to.
You want to use a loop here because it is possible two threads are trying to add to the same AtomicReference. It can happen that one thread succeeds and another thread fails, if that happens just try again with the new added value.
With moderate thread contention this would be a faster implementation, with high contention you are better off using synchronized

Your solution is not thread safe. The reason is that it is possible for a sum to be missed since the operation to put is separate from the operation to get (so the new value you are putting into the map could miss a sum that is being added at the same time).
The safest way to do what you want to do is to synchronize your method.

That is not safe, because threads A and B might both call sumByAccount.get(account) at the same time (more or less), so neither one will see the result of the other's add(amount). That is, things might happen in this sequence:
thread A calls sumByAccount.get("accountX") and gets (for example) 10.0.
thread B calls sumByAccount.get("accountX") and gets the same value that thread A did: 10.0.
thread A sets its newSum to (say) 10.0 + 2.0 = 12.0.
thread B sets its newSum to (say) 10.0 + 5.0 = 15.0.
thread A calls sumByAccount.put("accountX", 12.0).
thread B calls sumByAccount.put("accountX", 15.0), overwriting what thread A did.
One way to fix this is to put synchronized on your addToSum method, or to wrap its contents in synchronized(this) or synchronized(sumByAccount). Another way, since the above sequence of events only happens if two threads are updating the same account at the same time, might be to synchronize externally based on some sort of Account object. Without seeing the rest of your program logic, I can't be sure.

Yes, you need to synchronize since otherwise you can have two threads each getting the same value (for the same key), say A and thread 1 add B to it and thread 2 adds C to it and store it back. The result now will not be A+B+C, but A+B or A+C.
What you need to do is lock on something that is common to the additions. Synchronizing on get/put will not help, unless you do
synchronize {
get
add
put
}
but if you do that then you will prevent threads from updating values even if it is for different keys. You want to synchronize on the account. However, synchronizing on the string seems unsafe as it could lead to deadlocks (you don't know what else locks the string). Can you create an account object instead and use that for locking?

Related

Is it OK to modify items in an ArrayList from multiple threads, if those threads never modify the same item?

A bit of (simplified) context.
Let's say I have an ArrayList<ContentStub> where ContentStub is:
public class ContentStub {
ContentType contentType;
Object content;
}
And I have multiple implementations of classes that "inflate" stubs for each ContentType, e.g.
public class TypeAStubInflater {
public void inflate(List<ContentStub> contentStubs) {
contentStubs.forEach(stub ->
{
if(stub.contentType == ContentType.TYPE_A) {
stub.content = someService.getContent();
}
});
}
}
The idea being, there is TypeAStubInflater which only modifies items ContentType.TYPE_A running in one thread, and TypeBStubInflater which only modifies items ContentType.TYPE_B, etc. - but each instance's inflate() method is modifying items in the same contentStubs List, in parallel.
However:
No thread ever changes the size of the ArrayList
No thread ever attempts to modify a value that's being modified by another thread
No thread ever attempts to read a value written by another thread
Given all this, it seems that no additional measures to ensure thread-safety are necessary. From a (very) quick look at the ArrayList implementation, it seems that there is no risk of a ConcurrentModificationException - however, that doesn't mean that something else can't go wrong. Am I missing something, or this safe to do?
In general, that will work, because you are not modifying the state of the List itself, which would throw a ConcurrentModificationException if any iterator is active at the time of looping, but rather are modifying just an object inside the list, which is fine from the list's POV.
I would recommend splitting up your into a Map<ContentType, List<ContentStub>> and then start Threads with those specific lists.
You could convert your list to a map with this:
Map<ContentType, ContentStub> typeToStubMap = stubs.stream().collect(Collectors.toMap(stub -> stub.contentType, Function.identity()));
If your List is not that big (<1000 entries) I would even recommend not using any threading, but just use a plain for-i loop to iterate, even .foreach if that 2 extra integers are no concern.
Let's assume the thread A writes TYPE_A content and thread B writes TYPE_B content. The List contentStubs is only used to obtain instances of ContentStub: read-access only. So from the perspective of A, B and contentStubs, there is no problem. However, the updates done by threads A and B will likely never be seen by another thread, e.g. another thread C will likely conclude that stub.content == null for all elements in the list.
The reason for this is the Java Memory Model. If you don't use constructs like locks, synchronization, volatile and atomic variables, the memory model gives no guarantee if and when modifications of an object by one thread are visible for another thread. To make this a little more practical, let's have an example.
Imagine that a thread A executes the following code:
stub.content = someService.getContent(); // happens to be element[17]
List element 17 is a reference to a ContentStub object on the global heap. The VM is allowed to make a private thread copy of that object. All subsequent access to reference in thread A, uses the copy. The VM is free to decide when and if to update the original object on the global heap.
Now imagine a thread C that executes the following code:
ContentStub stub = contentStubs.get(17);
The VM will likely do the same trick with a private copy in thread C.
If thread C already accessed the object before thread A updated it, thread C will likely use the – not updated – copy and ignore the global original for a long time. But even if thread C accesses the object for the first time after thread A updated it, there is no guarantee that the changes in the private copy of thread A already ended up in the global heap.
In short: without a lock or synchronization, thread C will almost certainly only read null values in each stub.content.
The reason for this memory model is performance. On modern hardware, there is a trade-off between performance and consistency across all CPUs/cores. If the memory model of a modern language requires consistency, that is very hard to guarantee on all hardware and it will likely impact performance too much. Modern languages therefore embrace low consistency and offer the developer explicit constructs to enforce it when needed. In combination with instruction reordering by both compilers and processors, that makes old-fashioned linear reasoning about your program code … interesting.

Multithreads: lock on get and set

I know that in a program that works with multiple threads it's necessary to synchronize the methods because it's possible to have problems like race conditions.
But I cannot understand why we need to synchronize also the methods that need just to read a shared variable.
Look at this example:
public ConcurrentIntegerArray(final int size) {
arr = new int[size];
}
public void set(final int index, final int value) {
lock.lock();
try {
arr[index] = value;
} finally {
lock.unlock();
}
}
public int get(final int index) {
lock.lock();
try {
return arr[index];
} finally {
lock.unlock();
}
}
They did a look on the get and also on the set method. On the set method I understand why. For example if I want to put with Thread1 in index=3 the number 5 and after some milliseconds the Thread2 have to put in index=3 the number 6. Can it happen that I have in index=3 in my array still a 5 instead of a 6 (if I don't do a synchronization on the method set)? This because the Thread1 can have a switch-context and so the Thread2 enter in the same method put the value and after the Thread1 assign the value 5 on the same position So instead of a 6 I have a 5.
But I don't understand why we need (look the example) to synchronize also the method get. I'm asking this question because we need just to read on the memory and not to write.So why we need also on the method get to have a synchronization? Can someone give to me a very simple example?
Both methods need to be synchronized. Without synchronization on the get method, this sequence is possible:
get is called, but the old value isn't returned yet.
Another thread calls set and updates the value.
The first thread that called get now examines the now-returned value and sees what is now an outdated value.
Synchronization would disallow this scenario by guaranteeing that another thread can't just call set and invalidate the get value before it even returns. It would force a thread that calls set to wait for the thread that calls get to finish.
If you do not lock in the get method than a thread might keep a local copy of the array and never refreshes from the main memory. So its possible that a get never sees a value which was updated by a set method. Lock will force the visibility.
Each thread maintain their own copy of value. The synchronized ensures that the coherency is maintained between different threads. Without synchronized, one can never be sure if any one has modified it. Alternatively, one can define the variable as volatile and it will have the same memory effects as synchronized.
The locking action also guarantees memory visibility. From the Lock doc:
All Lock implementations must enforce the same memory synchronization semantics as provided by the built-in monitor lock, [...]:
A successful lock operation has the same memory synchronization effects as a successful Lock action.
A successful unlock operation has the same memory synchronization effects as a successful Unlock action.
Without acquiring the lock, due to memory consistency errors, there's no reason a call to get needs to see the most updated value. Modern processors are very fast, access to DRAM is comparatively very slow, so processors store values they are working on in a local cache. In concurrent programming this means one thread might write to a variable in memory but a subsequent read from a different thread gets a stale value because it is read from the cache.
The locking guarantees that the value is actually read from memory and not from the cache.

Two threads accessing the same ArrayList at the same time?

I have the following code in thread 1:
synchronized (queues.get(currentQueue)) { //line 1
queues.get(currentQueue).add(networkEvent); //line 2
}
and the following in thread 2:
synchronized (queues.get(currentQueue)) {
if (queues.get(currentQueue).size() > 10) {
currentQueue = 1;
}
}
Now to my question: The currentQueue variable currently has the value of 0. When thread 2 changes the value of currentQueue to 1 and thread 1 waits at line 1 (because of the synchronized), does thread 1 then use the updated currentQueue value in line 2 after thread 2 has finished (that's what I want to).
The answer to the question is that it depends. I assume there is other chunk of code that increments the currentQueue variable. This being the case, the lock is happening not at the 'currentQueue' variable and neither is it happening at the collection of 'queues', but rather it is happening on one of the 10 queues (or however many you have) in the 'queues' collection.
Hence, if both threads happen to access the same queue (say queue 5), then the answer to your question is yes. However, for that to happen is one in ten chance (one in x chance, where x = the number or queues in the 'queues' collection). Therefore, if the threads access different queues, then the answer is no.
The correct answer to your question is: The result is undefined.
Your monitor object is queues.get(currentQueue), but since currentQueue is variable, your monitor is variable, therefore the state it is currently in is more or less random. Effectively this code would break eventually.
A simple way to fix it would be a function like this:
protected synchronized QueueType getCurrentQueue() {
return queues.get(currentQueue);
}
However this is still a bad way of implementing the whole thing. You should either try to eliminate the synchronization completely through the use of a concurrent Queue (like ConcurrentLinkedQueue) or work with a lock/final monitor object.
final Object queueLock = new Object();
...
synchronized(queueLock) {
queues.get(currentQueue).add(networkEvent);
}
Note that you will have to use that locking every time you access queues or currentQueue as both define the dataset you are using.
Assuming you have no other thread will change the value of currentQueue, yes Thread 1 will end up using the queue pointed to by the updated value of currentQueue, since you're invoking queues.get(currentQueue) once again in the body of the synchronized block. This however doesn't mean that your synchronization is sound. You actually should synchronize on currentQueue, since it seems to be the shared key to access the current queue.
Also remember when you use synchronize you're synchronizing on the reference of the variable, and not its value. So if you reassign a new object to it, your synchronization doesn't make sense anymore.

How atomicity is achieved in the classes defined in java.util.concurrent.atomic package?

I was going through the source code of java.util.concurrent.atomic.AtomicInteger to find out how atomicity is achieved by the atomic operations provided by the class. For instance AtomicInteger.getAndIncrement() method source is as follows
public final int getAndIncrement() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return current;
}
}
I am not able to understand the purpose of writing the sequence of operations inside a infinite for loop. Does it serve any special purpose in Java Memory Model (JMM). Please help me find a descriptive understanding. Thanks in advance.
I am not able to understand the purpose of writing the sequence of operations inside a infinite for loop.
The purpose of this code is to ensure that the volatile field gets updated appropriately without the overhead of a synchronized lock. Unless there are a large number of threads all competing to update this same field, this will most likely spin a very few times to accomplish this.
The volatile keyword provides visibility and memory synchronization guarantees but does not in itself ensure atomic operations with multiple operations (test and set). If you are testing and then setting a volatile field there are race-conditions if multiple threads are trying to perform the same operation at the same time. In this case, if multiple threads are trying to increment the AtomicInteger at the same time, you might miss one of the increments. The concurrent code here uses the spin loop and the compareAndSet underlying methods to make sure that the volatile int is only updated to 4 (for example) if it still is equal to 3.
t1 gets the atomic-int and it is 0.
t2 gets the atomic-int and it is 0.
t1 adds 1 to it
t1 atomically tests to make sure it is 0, it is, and stores 1.
t2 adds 1 to it
t2 atomically tests to make sure it is 0, it is not, so it has to spin and try again.
t2 gets the atomic-int and it is 1.
t2 adds 1 to it
t2 atomically tests to make sure it is 1, it is, and stores 2.
Does it serve any special purpose in Java Memory Model (JMM).
No, it serves the purpose of the class and method definitions and uses the JMM and the language definitions around volatile to achieve its purpose. The JMM defines what the language does with the synchronized, volatile, and other keywords and how multiple threads interact with cached and central memory. This is mostly about native code interactions with operating system and hardware and is rarely, if ever, about Java code.
It is the compareAndSet(...) method which gets closer to the JMM by calling into the Unsafe class which is mostly native methods with some wrappers:
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
}
I am not able to understand the purpose of writing the sequence of
operations inside a infinite for loop.
To understand why it is in an infinite loop I find it helpful to understand what the compareAndSet does and how it may return false.
Atomically sets the value to the given updated value if the current
value == the expected value.
Parameters:
expect - the expected value
update - the new value
Returns:
true if successful. False return indicates that the actual value was not
equal to the expected value
So you read the Returns message and ask how is that possible?
If two threads are invoking incrementAndGet at close to the same time, and they both enter and see the value current == 1. Both threads will create a thread-local next == 2 and try to set via compareAndSet. Only one thread will win as per documented and the thread that loses must try again.
This is how CAS works. You attempt to change the value if you fail, try again, if you succeed then continue on.
Now simply declaring the field as volatile will not work because incrementing is not atomic. So something like this is not safe from the scenario I explained
volatile int count = 0;
public int incrementAndGet(){
return ++count; //may return the same number more than once.
}
Java's compareAndSet is based on CPU compare-and-swap (CAS) instructions see http://en.wikipedia.org/wiki/Compare-and-swap. It compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value.
In case of incrementAndGet we read the current value and call compareAndSet(current, current + 1). If it returns false it means that another thread interfered and changed the current value, which means that our attempt failed and we need to repeat the whole cycle until it succeeds.

Is this incrementAndGet thread-safe? It seems to pull object from eh cache

This servlet seems to fetch an object from ehCache, from an Element which has this object: http://code.google.com/p/adwhirl/source/browse/src/obj/HitObject.java?repo=servers-mobile
It then goes on to increment the counter which is an atomic long:
http://code.google.com/p/adwhirl/source/browse/src/servlet/MetricsServlet.java?repo=servers-mobile#174
//Atomically record the hit
if(i_hitType == AdWhirlUtil.HITTYPE.IMPRESSION.ordinal()) {
ho.impressions.incrementAndGet();
}
else {
ho.clicks.incrementAndGet();
}
This doesn't seem thread-safe to me as multiple threads could be fetching from the cache and if both increment at the same time you might loose a click/impression count.
Do you agree that this is not thread-safe?
AtomicLong and AtomicInteger use a CAS internally -- compare and set (or compare-and-swap). The idea is that you tell the CAS two things: the value you expect the long/int to have, and the value you want to update it to. If the long/int has the value you say it should have, the CAS will atomically make the update and return true; otherwise, it won't make the update, and it'll return false. Many modern chips support CAS very efficiently at the machine-code level; if the JVM is running in an environment that doesn't have a CAS, it can use mutexes (what Java calls synchronization) to implement the CAS. Regardless, once you have a CAS, you can safely implement an atomic increment via this logic (in pseudocode):
long incrementAndGet(atomicLong, byIncrement)
do
oldValue = atomicLong.get() // 1
newValue = oldValue + byIncrement
while ! atomicLong.cas(oldValue, newValue) // 2
return newValue
If another thread has come in and does its own increment between lines // 1 and // 2, the CAS will fail and the loop will try again. Otherwise, the CAS will succeed.
There's a gamble in this kind of approach: if there's low contention, a CAS is faster than a synchronized block isn't as likely to cause a thread context switch. But if there's a lot of contention, some threads are going to have to go through multiple loop iterations per increment, which obviously amounts to wasted work. Generally speaking, the incrementAndGet is going to be faster under most common loads.
The increment is thread safe since AtomicInteger and family guarantee that. But there is a problem with the insertion and fetching from the cache, where two (or more) HitObject could be created and inserted. That would cause potentially losing some hits on the first time this HitObject is accessed. As #denis.solonenko has pointed, there is already a TODO in the code to fix this.
However I'd like to point out that this code only suffers from concurrency on first accessing a given HitObject. Once you have the HitObject in the cache (and there are no more threads creating or inserting the HitObject) then this code is perfectly thread-safe. So this is only a very limited concurrency problem, and probably that's the reason they have not yet fixed it.

Categories

Resources