Flush cache after multithreaded mapping writes to primitive arrays

Flush cache after multithreaded mapping writes to primitive arrays - java

This question relates to the latest version of Java.
I have a primitive two-dimensional array sized as below.
int[][] array = new int[numPasses][n*10]; //n threads write; during the i-th pass, the k-th thread writes to array[i] at locations k*10 to (k+1)*10-1.
//the array above is allocated at the beginning, and constantly rewritten.
During pass i, each of n producer threads writes to its own memory location in array[i], so there are no race conditions during the write process. After writing, m consumer threads read the results of this write. I do not need the consumers to access array[i] at any point in time before all the writes are done.
My first question: Would a structure like the following flush all the producer writes from cache? If not, how would one go about doing this for primitive arrays? (For technical reasons, I cannot use Atomic*Arrays.)
void flush() {//invoked after writes from all producer threads are done.
if(producerThreadID == 0) {
synchronized(array[i]) {//done at pass i.
}
}
My second question: Is there a better way to do this?
EDIT: Okay, I accept that what I want to do is essentially impossible with the empty synchronized block. Let's say that, instead of the structure above, each producer thread has access to its own pass, i.e.:
int[][] array = new int[numPasses][n*10]; //n = numPasses threads write; during the i-th pass, the i-th thread writes to all elements in array[i].
(This is Zim-Zam's suggestion.)
My (hopefully final) question: Then, would the following structure in the i-th thread ensure visibility for consumer threads after the synchronized block?
//i-th producer thread acquires lock on array[i]
void produce() {
synchronized(array[i])
//modify array[i][*] here
}

Your algorithm is probably going to create false sharing, which occurs when two threads write to nearby memory locations - if thread1 and thread2 are writing to data that shares a cache line, then the cache protocol will force thread2 to block until or re-execute after thread1 completes or vice versa. You can avoid this by using coarser grained parallelism, e.g. use one thread per pass (one thread per array) rather than one thread per array element - this way each thread is operating on its own array and there probably isn't going to be any false sharing.

I would study carefully your reasons or not using Atomics because they are exactly what you are needing.
If there is truly a problem then have you considered using sun.misc.Unsafe like the Atomics use?
Alternatively - use an array of objects holding a volatile field.
class Vint {
public volatile int i;
}
Vint[] arr = new Vint[10];
{
for (int i = 0; i < arr.length; i++) {
arr[i] = new Vint();
}
}

Related

Is boolean array itself thread safe in Java? [duplicate]

Are there any concurrency problems with one thread reading from one index of an array, while another thread writes to another index of the array, as long as the indices are different?
e.g. (this example not necessarily recommended for real use, only to illustrate my point)
class Test1
{
static final private int N = 4096;
final private int[] x = new int[N];
final private AtomicInteger nwritten = new AtomicInteger(0);
// invariant:
// all values x[i] where 0 <= i < nwritten.get() are immutable
// read() is not synchronized since we want it to be fast
int read(int index) {
if (index >= nwritten.get())
throw new IllegalArgumentException();
return x[index];
}
// write() is synchronized to handle multiple writers
// (using compare-and-set techniques to avoid blocking algorithms
// is nontrivial)
synchronized void write(int x_i) {
int index = nwriting.get();
if (index >= N)
throw SomeExceptionThatIndicatesArrayIsFull();
x[index] = x_i;
// from this point forward, x[index] is fixed in stone
nwriting.set(index+1);
}
}
edit: critiquing this example is not my question, I literally just want to know if array access to one index, concurrently to access of another index, poses concurrency problems, couldn't think of a simple example.

While you will not get an invalid state by changing arrays as you mention, you will have the same problem that happens when two threads are viewing a non volatile integer without synchronization (see the section in the Java Tutorial on Memory Consistency Errors). Basically, the problem is that Thread 1 may write a value in space i, but there is no guarantee when (or if) Thread 2 will see the change.
The class java.util.concurrent.atomic.AtomicIntegerArray does what you want to do.

The example has a lot of stuff that differs from the prose question.
The answer to that question is that distinct elements of an array are accessed independently, so you don't need synchronization if two threads change different elements.
However, the Java memory model makes no guarantees (that I'm aware of) that a value written by one thread will be visible to another thread, unless you synchronize access.
Depending on what you're really trying to accomplish, it's likely that java.util.concurrent already has a class that will do it for you. And if it doesn't, I still recommend taking a look at the source code for ConcurrentHashMap, since your code appears to be doing the same thing that it does to manage the hash table.

I am not really sure if synchronizing only the write method, while leaving the read method unsychronized would work. Not really what are all the consequences, but at least it might lead to read method returning some values that has just been overriden by write.

Yes, as bad cache interleaving can still happen in a multi-cpu/core environment. There are several options to avoid it:
Use the Unsafe Sun-private library to atomically set an element in an array (or the jsr166y added feature in Java7
Use AtomicXYZ[] array
Use custom object with one volatile field and have an array of that object.
Use the ParallelArray of jsr166y addendum instead in your algorithm

Since read() is not synchronized you could have the following scenario:
Thread A enters write() method
Thread A writes to nwriting = 0;
Thread B reads from nwriting =0;
Thread A increments nwriting. nwriting=1
Thread A exits write();
Since you want to guarantee that your variable addresses never conflict, what about something like (discounting array index issues):
int i;
synchronized int curr(){ return i; }
synchronized int next(){ return ++i;}
int read( ) {
return values[curr()];
}
void write(int x){
values[next()]=x;
}

Java - cache coherence between successive parallel streams?

Consider the following piece of code (which isn't quite what it seems at first glance).
static class NumberContainer {
int value = 0;
void increment() {
value++;
}
int getValue() {
return value;
}
}
public static void main(String[] args) {
List<NumberContainer> list = new ArrayList<>();
int numElements = 100000;
for (int i = 0; i < numElements; i++) {
list.add(new NumberContainer());
}
int numIterations = 10000;
for (int j = 0; j < numIterations; j++) {
list.parallelStream().forEach(NumberContainer::increment);
}
list.forEach(container -> {
if (container.getValue() != numIterations) {
System.out.println("Problem!!!");
}
});
}
My question is: In order to be absolutely certain that "Problem!!!" won't be printed, does the "value" variable in the NumberContainer class need to be marked volatile?
Let me explain how I currently understand this.
In the first parallel stream, NumberContainer-123 (say) is incremented by ForkJoinWorker-1 (say). So ForkJoinWorker-1 will have an up-to-date cache of NumberContainer-123.value, which is 1. (Other fork-join workers, however, will have out-of-date caches of NumberContainer-123.value - they will store the value 0. At some point, these other workers' caches will be updated, but this doesn't happen straight away.)
The first parallel stream finishes, but the common fork-join pool worker threads aren't killed. The second parallel stream then starts, using the very same common fork-join pool worker threads.
Suppose, now, that in the second parallel stream, the task of incrementing NumberContainer-123 is assigned to ForkJoinWorker-2 (say). ForkJoinWorker-2 will have its own cached value of NumberContainer-123.value. If a long period of time has elapsed between the first and second increments of NumberContainer-123, then presumably ForkJoinWorker-2's cache of NumberContainer-123.value will be up-to-date, i.e. the value 1 will be stored, and everything is good. But what if the time elapsed between first and second increments if NumberContainer-123 is extremely short? Then perhaps ForkJoinWorker-2's cache of NumberContainer-123.value might be out of date, storing the value 0, causing the code to fail!
Is my description above correct? If so, can anyone please tell me what kind of time delay between the two incrementing operations is required to guarantee cache consistency between the threads? Or if my understanding is wrong, then can someone please tell me what mechanism causes the thread-local caches to be "flushed" in between the first parallel stream and the second parallel stream?

It should not need any delay. By the time you're out of ParallelStream's forEach, all the tasks have finished. That establishes a happens-before relation between the increment and the end of forEach. All the forEach calls are ordered by being called from the same thread, and the check, similarly, happens-after all the forEach calls.
int numIterations = 10000;
for (int j = 0; j < numIterations; j++) {
list.parallelStream().forEach(NumberContainer::increment);
// here, everything is "flushed", i.e. the ForkJoinTask is finished
}
Back to your question about the threads, the trick here is, the threads are irrelevant. The memory model hinges on the happens-before relation, and the fork-join task ensures happens-before relation between the call to forEach and the operation body, and between the operation body and the return from forEach (even if the returned value is Void)
See also Memory visibility in Fork-join
As #erickson mentions in comments,
If you can't establish correctness through happens-before relationships,
no amount of time is "enough." It's not a wall-clock timing issue; you
need to apply the Java memory model correctly.
Moreover, thinking about it in terms of "flushing" the memory is wrong, as there are many more things that can affect you. Flushing, for instance, is trivial: I have not checked, but can bet that there's just a memory barrier on the task completion; but you can get wrong data because the compiler decided to optimise non-volatile reads away (the variable is not volatile, and is not changed in this thread, so it's not going to change, so we can allocate it to a register, et voila), reorder the code in any way allowed by the happens-before relation, etc.
Most importantly, all those optimizations can and will change over time, so even if you went to the generated assembly (which may vary depending on the load pattern) and checked all the memory barriers, it does not guarantee that your code will work unless you can prove that your reads happen-after your writes, in which case Java Memory Model is on your side (assuming there's no bug in JVM).
As for the great pain, it's the very goal of ForkJoinTask to make the synchronization trivial, so enjoy. It was (it seems) done by marking the java.util.concurrent.ForkJoinTask#status volatile, but that's an implementation detail you should not care about or rely upon.

Data Races on individual elements of an AtomicReference

I had a question related to accessing individual elements via an Atomic Reference.
If I have an IntegerArray and an atomic reference to it;will reading and writing to individual elements of the array via the AtomicReference variable cause data races?
In the code below: num is an Integer Array with aRnumbers being the atomic reference to the array.
In threads 1 and 2; I access aRnumbers.get()[1] and increment it by 1.
I am able to access individual elements via the atomic reference without data races to accurate results each time with 22 as the output of aRnumbers.get()[1] in the main thread after both threads complete.
But,since the atomic reference is defined on the array and not on the individual elements; shouldn't there be a data race in this case leading to 21/22 as the output?
Isn't having data races in this case the motivation for having a AtomicIntegerArray data structure which provides a separate AtomicReference to each element?
Please find below the java code that i am trying to run.Could anyone kindly let me know where I am going wrong.
import java.util.concurrent.atomic.AtomicReference;
public class AtomicReferenceExample {
private static int[] num= new int[2];
private static AtomicReference<int[]> aRnumbers;
public static void main(String[] args) throws InterruptedException {
Thread t1 = new Thread(new MyRun1());
Thread t2 = new Thread(new MyRun2());
num[0]=10;
num[1]=20;
aRnumbers = new AtomicReference<int[]>(num);
System.out.println("In Main before:"+aRnumbers.get()[0]+aRnumbers.get()[1]);
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println("In Main after:"+aRnumbers.get()[0]+aRnumbers.get()[1]);
}
static class MyRun1 implements Runnable {
public void run() {
System.out.println("In T1 before:"+aRnumbers.get()[1]);
aRnumbers.get()[1]=aRnumbers.get()[1]+1;
}
}
static class MyRun2 implements Runnable {
public void run() {
System.out.println("In T2 before:"+aRnumbers.get()[1]);
aRnumbers.get()[1]=aRnumbers.get()[1]+1;
}
}
}

shouldn't there be a data race in this case leading to 21/22 as the output?
Indeed there is. Your thread are so short lived that most likely they are not running at the same time.
Isn't having data races in this case the motivation for having a AtomicIntegerArray data structure which provides a separate AtomicReference to each element?
Yes, it is.
Could anyone kindly let me know where I am going wrong.
Starting a thread takes 1 - 10 milli-seconds.
Incrementing a value like this even without the code being JITed is likely to take << 50 microseconds. If it was optimised it would take about 50 - 200 nano-seconds per increment.
As starting athread takes about 20 - 200x longer than the operating they won't be running at the same time so there is no race condition.
Try incrementing the value a few million times, so you have a race condition because both threads are running at the same time.

Incrementing an element consists of three steps:
Reading the value.
Incrementing the value.
Writing the value back.
A race condition can occur. Take an example: Thread 1 reads the value (let's say 20). Task switch. Thread 2 reads the value (20 again), increments it and writes it back (21). Task switch. The first thread increments the value and writes it back (21). So while 2 incrementing operations took place, the final value is still incremented only by one.
The data structure does not help in this case. A thread safe collection helps keeping the structure consistent when concurrent threads are adding, accessing and removing elements. But here you need to lock access to an element during the three steps of the increment operation.

Using volatile collections and arrays in Java

Imagine we have
volatile int publisher = 0;
volatile List<String> list = Arrays.asList("Buenos Aires", "Córdoba", "La Plata");
volatile String[] array = {"Buenos Aires", "Córdoba", "La Plata"};
As far as I understand.
Initial values in list and array are published correctly and are visible to all the reading threads.
All values added after the initialization are not safe-published.
Still we can read and publish them safely using
//in Thread 1
list.add("Safe City");
array[2] = "Safe city";
publisher = 1;
//in Thread2
if(publisher == 1) {
String city = list.get(3);
city = array[2];
}
Am I right?

Looking strictly at what the code is doing, and nothing more, and assessing it only in terms of the memory model, you are correct. The write to the volatile variable publisher in thread 1 and the read from the volatile variable in thread 2 establish a happens-before relationship, so all previous writes from thread 1 will be visible to subsequent reads from thread 2.
As CupawnTae noted, it's not necessary for the list and the array to be volatile in order for this to hold. Only publisher needs to be volatile.
Looking at this from a broader perspective, it's very difficult to extend this code to do anything else. (Set aside the fact that the List returned by Arrays.asList cannot have elements added to it; assume it's an ArrayList instead.) Presumably thread 1, or some other thread, will want to continue to add elements to the list. If this happens to cause the ArrayList to reallocate its underlying array, this might occur while thread 2 is still reading results from the previous addition. Thus, inconsistent state might be visible to thread 2.
Suppose further that thread 1 wants to do subsequent updates. It will have to set publisher to some other value, say 2. Now how do reading threads know what the correct value is to test for? Well, they can read the expected value from some other volatile variable....
It's undoubtedly possible to construct a scheme where thread 1 can write to a list (or array) at will, and thread 2 will never see anything but consistent snapshots, but you have to be exceptionally careful about memory visiblity at every step of the way. At a certain point it's easier just to use locks.

That is correct, but...
The volatile keyword on the list and array are irrelevant here - the fact that you write a value to the volatile publisher after you write the other values, and read back that value in your if condition before reading the other values in the second thread guarantees you memory consistency between those threads.
If you remove the volatile keyword from the list and array, your code will still be safe.
If you remove the publisher variable write/read, then the add operation* and array assignment are no longer safe.
And yes, the initial assignment to the variables is also safe.
* which is actually invalid on that particular list anyway as pointed out by Stuart Marks, but let's assume it's e.g. an ArrayList

The "publishing" happens between the thread that sets the volatile value and teh thread that gets it.
You need to both
publisher = 1;
in one thread and
int local = publisher;
in the other.

Have you considered using synchronized blocks to provide locking of the data structures that you're trying to read/write to/from?
//in Thread 1
synchronized(someLockingMonitor) {
list.add("Safe City");
array[2] = "Safe city";
}
//in Thread2
synchronized(someLockingMonitor) {
String city = list.get(3);
city = array[2];
}
This will however force any thread wishing to access one of the blocks, to wait until any other thread currently executing inside one of these block to leave the block.
If concurrency is important to you, i.e. you really want different threads reading and writing at the same time, have a look at the concurrent collections in java.util.concurrent.

is .get() operation in arraylist/atomic double array (Google Guava) thread-safe?

Following is one of the main operations in my java code:
AtomicDoubleArray array1 = new AtomicDoubleArray(25);
for(int i =0 ; i< array1.size(); i++){
double a = array1.get(i)*0.001;
double b = a+ array1.get(i);
array1.set(b);
}
Is the above code is thread safe? If not i can I make above code thread safe?I would like not to keep locking while reading elements but locking during setting the value of each of the components.It means a number of threads can set the different componets of array1.

Is the above code is thread safe?
That depends on what you mean with thread safety. Each individual get() and set() operation should be thread safe, but multiple threads could be calling this method concurrently, so the individual array entries could be reassigned by a second thread before the first thread completes the iteration. There's nothing you can do about that except synchronizing on a common Object (which could either be the array or some other dedicated lock Object)
I would like not to keep locking while reading elements but locking
during setting the value of each of the components.It means a number
of threads can set the different componets of array1.
If I understand this right, you can use your code as-is without additional locking (see above), except for this part:
array1.set(b);
which needs to read:
array1.set(i, b);

You may get different values on the two consecutive calls to to array1.get(i). If you want to avoid synchronisation, have a look at copy on write data structures (e.g. CopyOnWriteArrayList - http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/CopyOnWriteArrayList.html)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.