Thread-safe implementation of max

Thread-safe implementation of max - java

I need to implement global object collecting statistics for web server. I have Statistics singleton, which has method addSample(long sample), which subsequently call updateMax. This has to be obviously thread-safe. I have this method for updating maximum of whole Statistics:
AtomicLong max;
private void updateMax(long sample) {
while (true) {
long curMax = max.get();
if (curMax < sample) {
boolean result = max.compareAndSet(curMax, sample);
if (result) break;
} else {
break;
}
}
}
Is this implementation correct? I am using java.util.concurrent, because I believe it would be faster than simple synchronized. Is there some other / better way to implement this?

As of Java 8, LongAccumulator has been introduced.
It is advised as
This class is usually preferable to AtomicLong when multiple threads
update a common value that is used for purposes such as collecting
statistics, not for fine-grained synchronization control. Under low
update contention, the two classes have similar characteristics. But
under high contention, expected throughput of this class is
significantly higher, at the expense of higher space consumption.
You can use it as follows:
LongAccumulator maxId = new LongAccumulator(Long::max, 0); //replace 0 with desired initial value
maxId.accumulate(newValue); //from each thread

I think it's correct, but I'd probably rewrite it a little for clarity, and definitely add comments:
private void updateMax(long sample) {
while (true) {
long curMax = max.get();
if (curMax >= sample) {
// Current max is higher, so whatever other threads are
// doing, our current sample can't change max.
break;
}
// Try updating the max value, but only if it's equal to the
// one we've just seen. We don't want to overwrite a potentially
// higher value which has been set since our "get" call.
boolean setSuccessful = max.compareAndSet(curMax, sample);
if (setSuccessful) {
// We managed to update the max value; no other threads
// got in there first. We're definitely done.
break;
}
// Another thread updated the max value between our get and
// compareAndSet calls. Our sample can still be higher than the
// new value though - go round and try again.
}
}
EDIT: Usually I'd at least try the synchronized version first, and only go for this sort of lock-free code when I'd found that it was causing a problem.

With Java 8 you can take advantage of functional interfaces and a simple lamda expression to solve this with one line and no looping:
private void updateMax(long sample) {
max.updateAndGet(curMax -> (sample > curMax) ? sample : curMax);
}
The solution uses the updateAndGet(LongUnaryOperator) method. The current value is contained in curMax and using the conditional operator a simple test is performed replacing the current max value with the sample value if the sample value is greater than the current max value.

as if you didn't have your pick of answers, here's mine:
// while the update appears bigger than the atomic, try to update the atomic.
private void max(AtomicDouble atomicDouble, double update) {
double expect = atomicDouble.get();
while (update > expect) {
atomicDouble.weakCompareAndSet(expect, update);
expect = atomicDouble.get();
}
}
it's more or less the same as the accepted answer, but doesn't use break or while(true) which I personally don't like.
EDIT: just discovered DoubleAccumulator in java 8. the documentation even says this is for summary statistics problems like yours:
DoubleAccumulator max = new DoubleAccumulator(Double::max, Double.NEGATIVE_INFINITY);
parallelStream.forEach(max::accumulate);
max.get();

I believe what you did is correct, but this is a simpler version that I also think is correct.
private void updateMax(long sample){
//this takes care of the case where between the comparison and update steps, another thread updates the max
//For example:
//if the max value is set to a higher max value than the current value in between the comparison and update step
//sample will be the higher value from the other thread
//this means that the sample will now be higher than the current highest (as we just set it to the value passed into this function)
//on the next iteration of the while loop, we will update max to match the true max value
//we will then fail the while loop check, and be done with trying to update.
while(sample > max.get()){
sample = max.getAndSet(sample);
}
}

Related

Can't implement synchronization in class [duplicate]

Why is i++ not atomic in Java?
To get a bit deeper in Java I tried to count how often the loop in threads are executed.
So I used a
private static int total = 0;
in the main class.
I have two threads.
Thread 1: Prints System.out.println("Hello from Thread 1!");
Thread 2: Prints System.out.println("Hello from Thread 2!");
And I count the lines printed by thread 1 and thread 2. But the lines of thread 1 + lines of thread 2 don't match the total number of lines printed out.
Here is my code:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Test {
private static int total = 0;
private static int countT1 = 0;
private static int countT2 = 0;
private boolean run = true;
public Test() {
ExecutorService newCachedThreadPool = Executors.newCachedThreadPool();
newCachedThreadPool.execute(t1);
newCachedThreadPool.execute(t2);
try {
Thread.sleep(1000);
}
catch (InterruptedException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
run = false;
try {
Thread.sleep(1000);
}
catch (InterruptedException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
System.out.println((countT1 + countT2 + " == " + total));
}
private Runnable t1 = new Runnable() {
#Override
public void run() {
while (run) {
total++;
countT1++;
System.out.println("Hello #" + countT1 + " from Thread 2! Total hello: " + total);
}
}
};
private Runnable t2 = new Runnable() {
#Override
public void run() {
while (run) {
total++;
countT2++;
System.out.println("Hello #" + countT2 + " from Thread 2! Total hello: " + total);
}
}
};
public static void main(String[] args) {
new Test();
}
}

i++ is probably not atomic in Java because atomicity is a special requirement which is not present in the majority of the uses of i++. That requirement has a significant overhead: there is a large cost in making an increment operation atomic; it involves synchronization at both the software and hardware levels that need not be present in an ordinary increment.
You could make the argument that i++ should have been designed and documented as specifically performing an atomic increment, so that a non-atomic increment is performed using i = i + 1. However, this would break the "cultural compatibility" between Java, and C and C++. As well, it would take away a convenient notation which programmers familiar with C-like languages take for granted, giving it a special meaning that applies only in limited circumstances.
Basic C or C++ code like for (i = 0; i < LIMIT; i++) would translate into Java as for (i = 0; i < LIMIT; i = i + 1); because it would be inappropriate to use the atomic i++. What's worse, programmers coming from C or other C-like languages to Java would use i++ anyway, resulting in unnecessary use of atomic instructions.
Even at the machine instruction set level, an increment type operation is usually not atomic for performance reasons. In x86, a special instruction "lock prefix" must be used to make the inc instruction atomic: for the same reasons as above. If inc were always atomic, it would never be used when a non-atomic inc is required; programmers and compilers would generate code that loads, adds 1 and stores, because it would be way faster.
In some instruction set architectures, there is no atomic inc or perhaps no inc at all; to do an atomic inc on MIPS, you have to write a software loop which uses the ll and sc: load-linked, and store-conditional. Load-linked reads the word, and store-conditional stores the new value if the word has not changed, or else it fails (which is detected and causes a re-try).

i++ involves two operations :
read the current value of i
increment the value and assign it to i
When two threads perform i++ on the same variable at the same time, they may both get the same current value of i, and then increment and set it to i+1, so you'll get a single incrementation instead of two.
Example :
int i = 5;
Thread 1 : i++;
// reads value 5
Thread 2 : i++;
// reads value 5
Thread 1 : // increments i to 6
Thread 2 : // increments i to 6
// i == 6 instead of 7

Java specification
The important thing is the JLS (Java Language Specification) rather than how various implementations of the JVM may or may not have implemented a certain feature of the language.
The JLS defines the ++ postfix operator in clause 15.14.2 which says i.a. "the value 1 is added to the value of the variable and the sum is stored back into the variable". Nowhere does it mention or hint at multithreading or atomicity.
For multithreading or atomicity, the JLS provides volatile and synchronized. Additionally, there are the Atomic… classes.

Why is i++ not atomic in Java?
Let's break the increment operation into multiple statements:
Thread 1 & 2 :
Fetch value of total from memory
Add 1 to the value
Write back to the memory
If there is no synchronization then let's say Thread one has read the value 3 and incremented it to 4, but has not written it back. At this point, the context switch happens. Thread two reads the value 3, increments it and the context switch happens. Though both threads have incremented the total value, it will still be 4 - race condition.

i++ is a statement which simply involves 3 operations:
Read current value
Write new value
Store new value
These three operations are not meant to be executed in a single step or in other words i++ is not a compound operation. As a result all sorts of things can go wrong when more than one threads are involved in a single but non-compound operation.
Consider the following scenario:
Time 1:
Thread A fetches i
Thread B fetches i
Time 2:
Thread A overwrites i with a new value say -foo-
Thread B overwrites i with a new value say -bar-
Thread B stores -bar- in i
// At this time thread B seems to be more 'active'. Not only does it overwrite
// its local copy of i but also makes it in time to store -bar- back to
// 'main' memory (i)
Time 3:
Thread A attempts to store -foo- in memory effectively overwriting the -bar-
value (in i) which was just stored by thread B in Time 2.
Thread B has nothing to do here. Its work was done by Time 2. However it was
all for nothing as -bar- was eventually overwritten by another thread.
And there you have it. A race condition.
That's why i++ is not atomic. If it was, none of this would have happened and each fetch-update-store would happen atomically. That's exactly what AtomicInteger is for and in your case it would probably fit right in.
P.S.
An excellent book covering all of those issues and then some is this:
Java Concurrency in Practice

In the JVM, an increment involves a read and a write, so it's not atomic.

If the operation i++ would be atomic you wouldn't have the chance to read the value from it. This is exactly what you want to do using i++ (instead of using ++i).
For example look at the following code:
public static void main(final String[] args) {
int i = 0;
System.out.println(i++);
}
In this case we expect the output to be: 0
(because we post increment, e.g. first read, then update)
This is one of the reasons the operation can't be atomic, because you need to read the value (and do something with it) and then update the value.
The other important reason is that doing something atomically usually takes more time because of locking. It would be silly to have all the operations on primitives take a little bit longer for the rare cases when people want to have atomic operations. That is why they've added AtomicInteger and other atomic classes to the language.

There are two steps:
fetch i from memory
set i+1 to i
so it's not atomic operation.
When thread1 executes i++, and thread2 executes i++, the final value of i may be i+1.

In JVM or any VM, the i++ is equivalent to the following:
int temp = i; // 1. read
i = temp + 1; // 2. increment the value then 3. write it back
that is why i++ is non-atomic.

Concurrency (the Thread class and such) is an added feature in v1.0 of Java. i++ was added in the beta before that, and as such is it still more than likely in its (more or less) original implementation.
It is up to the programmer to synchronize variables. Check out Oracle's tutorial on this.
Edit: To clarify, i++ is a well defined procedure that predates Java, and as such the designers of Java decided to keep the original functionality of that procedure.
The ++ operator was defined in B (1969) which predates java and threading by just a tad.

Using LongAdder in Java

I am reading the book "Core Java I" written by Cay S. Horstmann and at page 580 he mentiones about the LongAdder:
If you anticipate high contention [*1], you should simply use a LongAdder instead of an
AtomicLong. The method names are slightly different. Call increment to increment a counter
or add to add a quantity, and sum to retrieve the total.
var adder = new LongAdder();
for (. . .)
pool.submit(() -> {
while (. . .) {
. . .
if (. . .) adder.increment();
}
});
. . .
long total = adder.sum();
Note
Of course, the increment method does not return the old [*2] value. Doing that would undo
the efficiency gain of splitting the sum into multiple summands.
In [*1] by the word "contention", I assume he means heavily overloaded second of the machine that there are lots of threads that runs the java code.
In [*2] he mentioned about the old value. What does old and new value in this context? Could you please explain briefly.

[*1]: The term "contention" in context of multithreading means that many threads try to access/call/update something at the same time; in this case the LongAdder or counter in general.
[*2]: The old value in this context is the previous value of the LongAdder. While all updating methods of AtomicLong, except set and some CAS-methods, return the previous value stored, LongAdder#increment returns void. The new value is simply the .. new value, the one that you can get via sum.
The class LongAdder works differently than AtomicLong to increase throughput, which is why e.g. increment doesn't return anything. You can read about it here: How LongAdder performs better than AtomicLong

LongAdder doesn't maintain one value. When you increment/add a new value, it stores 1 or new value in different Cell. It doesn't maintain total value.
When you want to get actual value you call sum() method which sums all values to get you result.
For better understanding, here's how the sum method is implemented in LongAdder:
public long sum() {
Cell[] cs = cells;
long sum = base;
if (cs != null) {
for (Cell c : cs)
if (c != null)
sum += c.value;
}
return sum;
}

Why is there a loop in getAndIncrement() in Java AtomicInteger?

the source code of getAndIncrement is:
public final int getAndIncrement() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return current;
}
}
I don't understand why there is a loop. If some other threads have changed the value, then how can it be atomic?
let's say the value is 5, then I call getAndIncrement(), we expect it to be 6, but at the same time some other threads have changed the value to 6, then getAndIncrement() will make the value to 7, which is not expected.
Where am I wrong?

The loop will keep going until it manages to do the get(), the +1, and the compareAndSet without any other thread getting in a compareAndSet first. If another thread does get a compareAndSet in, then this thread's compareAndSet will fail, and the loop will retry.
The end result is that each call to getAndIncrement() will result in exactly one increment to the value. If the value is initially 5, and two threads call getAndIncrement(), then one will return 6 and the other will return 7.
Put another way: one of them will appear to happen fully after the other, which is what "atomic" means.

As already answered,
each call to getAndIncrement() will result in exactly one increment to the value
Confusion seems to stem from your comment
Let's say its original value is 5, now I want to make it 6, but if some other threads have made it 6 , why should it retry to make it 7
Okey, so you want the system to behave one way, but the methods you are using are designed to do different. getAndIncrement is designed to ensure every invocation causes an increment, what you want is all invocations combined cause ONE increment. So clearly getAndIncrement should not be used here.
It is worth noting that the behavior you expect is rarely encountered in single-machine system, but frequently in distributed-system. If you are not doing distributed, then other people are right in finding fault in your approach.

The key to understanding this is to understand what compareAndSet() does:
/**
* Atomically sets the value to the given updated value
* if the current value {#code ==} the expected value.
*
* #param expect the expected value
* #param update the new value
* #return true if successful. False return indicates that
* the actual value was not equal to the expected value.
*/
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
}
In Unsafe.java:
/**
* Atomically update Java variable to <tt>x</tt> if it is currently
* holding <tt>expected</tt>.
* #return <tt>true</tt> if successful
*/
public final native boolean compareAndSwapInt(Object o, long offset,
int expected,
int x);
So this method uses JVM internals to atomically:
check whether the value has the expected value
if not, do nothing and return false
if so, set to the new value and return true
The loop you've found exits when compareAndSet() returns true.
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return current;
}
... is equivalent to:
boolean done = false;
int current;
while(!done) {
current = get();
int next = current + 1;
done = compareAndSet(current, next);
}
return current;
... but slightly terser and cleaner.

#Lily, as #yshavit explains, the compareAndSet will only succeed if current is still valid and the counter was not updated by another thread. So it atomically updates the counter or it will return false. So it will continue iterating until it eventually succeeds. Current and next are recalculated on each iteration. So it will update the counter by exactly 1 or not at all.
This is a form of optimistic locking, which means that instead of having locks where other threads have to check whether they can proceed or have to wait, it does not lock at all and simply keeps trying opportunistically until it succeeds. The rationale is that this is cheaper than having synchronized blocks because typically the overhead for that is not needed and iterating and trying again is cheaper than having locks around code blocks.
Btw. in Oracle java 8, the implementation has changed and it now uses sun.misc.Unsafe internally, which probably calls some native logic to achieve the same goal.

Performance difference between assignment and conditional test

This question is specifically geared towards the Java language, but I would not mind feedback about this being a general concept if so. I would like to know which operation might be faster, or if there is no difference between assigning a variable a value and performing tests for values. For this issue we could have a large series of Boolean values that will have many requests for changes. I would like to know if testing for the need to change a value would be considered a waste when weighed against the speed of simply changing the value during every request.
public static void main(String[] args){
Boolean array[] = new Boolean[veryLargeValue];
for(int i = 0; i < array.length; i++) {
array[i] = randomTrueFalseAssignment;
}
for(int i = 400; i < array.length - 400; i++) {
testAndChange(array, i);
}
for(int i = 400; i < array.length - 400; i++) {
justChange(array, i);
}
}
This could be the testAndChange method
public static void testAndChange(Boolean[] pArray, int ind) {
if(pArray)
pArray[ind] = false;
}
This could be the justChange method
public static void justChange(Boolean[] pArray, int ind) {
pArray[ind] = false;
}
If we were to end up with the very rare case that every value within the range supplied to the methods were false, would there be a point where one method would eventually become slower than the other? Is there a best practice for issues similar to this?
Edit: I wanted to add this to help clarify this question a bit more. I realize that the data type can be factored into the answer as larger or more efficient datatypes can be utilized. I am more focused on the task itself. Is the task of a test "if(aConditionalTest)" is slower, faster, or indeterminable without additional informaiton (such as data type) than the task of an assignment "x=avalue".

As #TrippKinetics points out, there is a semantical difference between the two methods. Because you use Boolean instead of boolean, it is possible that one of the values is a null reference. In that case the first method (with the if-statement) will throw an exception while the second, simply assigns values to all the elements in the array.
Assuming you use boolean[] instead of Boolean[]. Optimization is an undecidable problem. There are very rare cases where adding an if-statement could result in better performance. For instance most processors use cache and the if-statement can result in the fact that the executed code is stored exactly on two cache-pages where without an if on more resulting in cache faults. Perhaps you think you will save an assignment instruction but at the cost of a fetch instruction and a conditional instruction (which breaks the CPU pipeline). Assigning has more or less the same cost as fetching a value.
In general however, one can assume that adding an if statement is useless and will nearly always result in slower code. So you can quite safely state that the if statement will slow down your code always.
More specifically on your question, there are faster ways to set a range to false. For instance using bitvectors like:
long[] data = new long[(veryLargeValue+0x3f)>>0x06];//a long has 64 bits
//assign random values
int low = 400>>0x06;
int high = (veryLargeValue-400)>>0x06;
data[low] &= 0xffffffffffffffff<<(0x3f-(400&0x3f));
for(int i = low+0x01; i < high; i++) {
data[i] = 0x00;
}
data[high] &= 0xffffffffffffffff>>(veryLargeValue-400)&0x3f));
The advantage is that a processor can perform operations on 32- or 64-bits at once. Since a boolean is one bit, by storing bits into a long or int, operations are done in parallel.

Optimization: replace for loop with ListIterator

It's my first working on a quite big project, and I've been asked to obtain the best performances.
So I've thouhgt to replace my for loops with a ListIterator, because I've got around 180 loops which call list.get(i) on lists with about 5000 elements.
So I've got two questions.
1) Are those 2 snippets equal? I mean, do them produce the same output? If no, how can I correct the ListIterator thing?
ListIterator<Corsa> ridesIterator = rides.listIterator();
while (ridesIterator.hasNext()) {
ridesIterator.next();
Corsa previous = ridesIterator.previous(); //rides.get(i-1)
Corsa current = ridesIterator.next(); //rides.get(i)
if (current.getOP() < d.getFFP() && previous.getOA() > d.getIP() && current.wait(previous) > DP) {
doSomething();
break;
}
}
__
for (int i = 1; i < rides.size(); i++) {
if (rides.get(i).getOP() < d.getFP() && rides.get(i - 1).getOA() > d.getIP() && rides.get(i).getOP() - rides.get(i - 1).getOA() > DP) {
doSomething();
break;
}
}
2) How will it be the first snippet if I've got something like this? (changed i and its exit condition)
for (int i = 0; i < rides.size() - 1; i++) {
if (rides.get(i).getOP() < d.getFP() && rides.get(i + 1).getOA() > d.getIP() && rides.get(i).getOP() - rides.get(i + 1).getOA() > DP) {
doSomething();
break;
}
}
I'm asking because it's the first time that I'm using a ListIterator and I can't try it now!
EDIT: I'm not using an ArrayList, it's a custom List based on a LinkedList
EDIT 2 : I'm adding some more infos.
I can't use a caching system because my data is changing on evry iteration and managing the cache would be hard as I'd have to deal with inconsistent data.
I can't even merge some of this loops into one big loop, as I've got them on different methods because they need to do a lot of different things.
So, sticking on this particular case, what do you think is the best pratice?
Is ListIterator the best way to deal with my case? And how can I use the ListIterator if my for loop works between 0 and size-1 ?

If you know the maximum size, you will get the best performance if you resign from collections such as ArrayList replacing them with simple arrays.
So instead creating ArrayList<Corsa> with 5000 elements, do Corsa[] rides = new Corsa[5000]. Instead of hard-coding 5000 use it as final static int MAX_RIDES = 5000 for example, to avoid magic number in the code. Then iterate with normal for, referring to rides[i].
Generally if you look for performance, you should code in Java, as if it was C/C++ (of course where you can). The code is not so object-oriented and beautiful, but it's fast. Remember to do optimization always in the end, when you are sure, you have found a bottleneck. Otherwise, your efforts are futile, only making the code less readable and maintainable. Also use a profiler, to make sure your changes are in fact upgrades, not downgrades.
Another downside of using ListIterator is that it internally allocates memory. So GC (Garbage Collector) will awake more often, which also can have impact on the overall performance.

No they do not do the same.
while (ridesIterator.hasNext()) {
ridesIterator.next();
Corsa previous = ridesIterator.previous(); //rides.get(i-1)
Corsa current = ridesIterator.next(); //rides.get(i)
The variables previous and current would contain the same "Corsa" value, see the ListIterator documentation for details (iterators are "in between" positions).
The correct code would look as follows:
while (ridesIterator.hasNext()) {
Corsa previous = ridesIterator.next(); //rides.get(i-1)
if(!ridesIterator.hasNext())
break; // We are already at the last element
Corsa current = ridesIterator.next(); //rides.get(i)
ridesIterator.previous(); // going back 1, to start correctly next time
The code would actually look exactly the same, only the interpretation (as shown in the comments) would be different:
while (ridesIterator.hasNext()) {
Corsa previous = ridesIterator.next(); //rides.get(i)
if(!ridesIterator.hasNext())
break; // We are already at the last element
Corsa current = ridesIterator.next(); //rides.get(i+1)
ridesIterator.previous(); // going back 1, to start correctly next time
From a (premature?) optimization viewpoint the ListIterator implementation is better.
LinkedList is a doubly-linked list which means that each element links to both its predecessor (previous) as well as its successor (next). So it does 3 referals per loop. => 3*N
Each get(i) needs to go through all previous elements to get to the i index position. So on average N/4 referals per loop. (You'd think N/2, but LinkedList starts from the beginning or the end of the list.) => 2 * N * N/4 == N^2 /2

Here are some suggestions, hopefully one or two will be applicable to your situation.
Try to do only one rides.get(x) per loop.
Cache method results in local variables as appropriate for your code.
In some cases the compiler can optimize multiple calls to the same thing doing it just once instead, but not always for many subtle reasons. As a programmer, if you know for a fact that these should deliver the same values, then cache them in local variables.
For example,
int sz = rides.size ();
float dFP = d.getFP (); // wasn't sure of the type, so just called if float..
float dIP = d.getIP ();
Corsa lastRide = rides.get ( 0 );
for ( int i = 1; i < sz; i++ ) {
Corsa = rides.get ( i );
float rOP = r.getOP ();
if ( rOP < dFP ) {
float lastRideOA = lastRide.getOA (); // only get OA if rOP < dFP
if ( lastRideOA > dIP && rOP - lastRideOA > DP ) {
doSomething ();
// maybe break;
}
}
lastRide = r;
}
These are optimizations that may not work in all cases. For example, if your doSomething expands the list, then you need to recompute sz, or maybe go back to doing rides.size() each iteration. These optimizations also assumes that the list is stable in that the elements don't change during the get..()'s. If doSomething makes changes to the list, then you'd need to cache less. Hopefully you get the idea. You can apply some of these techniques to the iterator form of the loop as well.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.