Is boolean array itself thread safe in Java? [duplicate] - java

Are there any concurrency problems with one thread reading from one index of an array, while another thread writes to another index of the array, as long as the indices are different?
e.g. (this example not necessarily recommended for real use, only to illustrate my point)
class Test1
{
static final private int N = 4096;
final private int[] x = new int[N];
final private AtomicInteger nwritten = new AtomicInteger(0);
// invariant:
// all values x[i] where 0 <= i < nwritten.get() are immutable
// read() is not synchronized since we want it to be fast
int read(int index) {
if (index >= nwritten.get())
throw new IllegalArgumentException();
return x[index];
}
// write() is synchronized to handle multiple writers
// (using compare-and-set techniques to avoid blocking algorithms
// is nontrivial)
synchronized void write(int x_i) {
int index = nwriting.get();
if (index >= N)
throw SomeExceptionThatIndicatesArrayIsFull();
x[index] = x_i;
// from this point forward, x[index] is fixed in stone
nwriting.set(index+1);
}
}
edit: critiquing this example is not my question, I literally just want to know if array access to one index, concurrently to access of another index, poses concurrency problems, couldn't think of a simple example.

While you will not get an invalid state by changing arrays as you mention, you will have the same problem that happens when two threads are viewing a non volatile integer without synchronization (see the section in the Java Tutorial on Memory Consistency Errors). Basically, the problem is that Thread 1 may write a value in space i, but there is no guarantee when (or if) Thread 2 will see the change.
The class java.util.concurrent.atomic.AtomicIntegerArray does what you want to do.

The example has a lot of stuff that differs from the prose question.
The answer to that question is that distinct elements of an array are accessed independently, so you don't need synchronization if two threads change different elements.
However, the Java memory model makes no guarantees (that I'm aware of) that a value written by one thread will be visible to another thread, unless you synchronize access.
Depending on what you're really trying to accomplish, it's likely that java.util.concurrent already has a class that will do it for you. And if it doesn't, I still recommend taking a look at the source code for ConcurrentHashMap, since your code appears to be doing the same thing that it does to manage the hash table.

I am not really sure if synchronizing only the write method, while leaving the read method unsychronized would work. Not really what are all the consequences, but at least it might lead to read method returning some values that has just been overriden by write.

Yes, as bad cache interleaving can still happen in a multi-cpu/core environment. There are several options to avoid it:
Use the Unsafe Sun-private library to atomically set an element in an array (or the jsr166y added feature in Java7
Use AtomicXYZ[] array
Use custom object with one volatile field and have an array of that object.
Use the ParallelArray of jsr166y addendum instead in your algorithm

Since read() is not synchronized you could have the following scenario:
Thread A enters write() method
Thread A writes to nwriting = 0;
Thread B reads from nwriting =0;
Thread A increments nwriting. nwriting=1
Thread A exits write();
Since you want to guarantee that your variable addresses never conflict, what about something like (discounting array index issues):
int i;
synchronized int curr(){ return i; }
synchronized int next(){ return ++i;}
int read( ) {
return values[curr()];
}
void write(int x){
values[next()]=x;
}

Related

Is it necessary to make a primitive instance variable volatile?

Just to experiment with multithreading concepts, I'm implementing my own version of AtomicInteger that uses pessimistic locking. It looks something like this:
public class ThreadSafeInt {
public int i; // Should this be volatile?
public ThreadSafeInt(int i) {
this.i = i;
}
public synchronized int get() {
return i;
}
public synchronized int getAndIncrement() {
return this.i++;
}
// other synchronized methods for incrementAndGet(), etc...
}
I wrote a test that takes an instance of ThreadSafeInt, gives it to hundreds of threads, and makes each of those threads call getAndIncrement 100,000 times. What I'm seeing is that all the increments happen correctly, with the value of the integer being exactly (number of threads) * (number of increments per thread), even though I'm not using volatile on the primitive instance variable i. I expected that if I did not make i volatile, then I would get lots of visibility problems where, for instance, thread 1 increments i from 0 to 1, but thread 2 still sees the value of 0 and also increments it to only 1, causing a final value that is less than the correct value.
I understand that visibility problems occur randomly and can depend on properties of my environment, so that my test can appear to work fine even though there is inherent potential for visibility problems. So I'm inclined to think the volatile keyword is still necessary.
But is this correct? Or is there some property of my code (maybe the fact that it's just a primitive variable, etc) which I can actually trust to obviate the need for the volatile keyword?
even though I'm not using volatile on the primitive instance variable i. I expected that if I did not make i volatile, then I would get lots of visibility problems
By making your getAndIncrement() and get() methods synchronized, all of the threads that are modifying i are properly locking it for both the updates and the retrieval of the value. The synchronized blocks make it unnecessary for i to be volatile because they also ensure memory synchronization.
That said, you should be using an AtomicInteger instead which wraps a volatile int field. AtomicInteger getAndIncrement() method updates the value without having to resort to a synchronized block which is much faster while still being thread-safe.
public final AtomicInteger i = new AtomicInteger();
...
// no need for synchronized here
public int get() {
return i.get();
}
// nor here
public int getAndIncrement() {
return i.getAndIncrement();
}
I would get lots of visibility problems where, for instance, thread 1 increments i from 0 to 1, but thread 2 still sees the value of 0 and also increments it to only 1, causing a final value that is less than the correct value.
If your get() method was not synchronized then your increment might be handled right but other threads would not see the value of i published correctly. But with both methods being synchronized this ensures memory synchronization on reads and writes. synchronized also does the locks so that you can do the i++. Again the AtomicInteger handles the memory synchronization and the increment race conditions much more efficiently.
More specifically, when a synchronized block is entered, it crosses a read memory barrier which is the same as reading from a volatile field. When a synchronized block is exited, it crosses a write memory barrier which is the same as writing to a volatile field. The difference with the synchronized blocks is that there is also locking to ensure only one person is locking a particular object at one time.

How to read unique elements from array per thread?

I have an object based on array, which implements the following interface:
public interface PairSupplier<Q, E> {
public int size();
public Pair<Q, E> get(int index);
}
I would like to create a specific iterator over it:
public boolean hasNext(){
return true;
}
public Pair<Q, E> next(){
//some magic
}
In method next I would like to return some element from PairSupplier.
This element should be unique for thread, other threads should not have this element.
Since PairSupplier has a final size, this situation is not always possible, but I would like to approach it.
The order of elements doesn't matter, thread can take same element at a different time.
Example: 2 Threads, 5 elements - {1,2,3,4,5}
Thread 1 | Thread 2
1 2
3 4
5 1
3 2
4 5
My solution:
I create AtomicInteger index, which I increment on every next call.
PairSupplier pairs;
AtomicInteger index;
public boolean hasNext(){
return true;
}
public Pair<Q, E> next(){
int position = index.incrementAndGet() % pairs.size;
if (position < 0) {
position *= -1;
position = pairs.size - position;
}
return pairs.get(position);
}
pairs and index are shared among all threads.
I found this solution not scalable (because all threads go for increment), maybe someone have better ideas?
This iterator will be used by 50-1000 threads.
Your question details are ambiguous - your example suggests that two threads can be handed the same Pair but you say otherwise in the description.
As the more difficult to achieve, I will offer an Iterable<Pair<Q,E>> that will deliver Pairs one per thread until the supplier cycles - then it will repeat.
public interface Supplier<T> {
public int size();
public T get(int index);
}
public interface PairSupplier<Q, E> extends Supplier<Pair<Q, E>> {
}
public class IterableSupplier<T> implements Iterable<T> {
// The common supplier to use across all threads.
final Supplier<T> supplier;
// The atomic counter.
final AtomicInteger i = new AtomicInteger();
public IterableSupplier(Supplier<T> supplier) {
this.supplier = supplier;
}
#Override
public Iterator<T> iterator() {
/**
* You may create a NEW iterator for each thread while they all share supplier
* and Will therefore distribute each Pair between different threads.
*
* You may also share the same iterator across multiple threads.
*
* No two threads will get the same pair twice unless the sequence cycles.
*/
return new ThreadSafeIterator();
}
private class ThreadSafeIterator implements Iterator<T> {
#Override
public boolean hasNext() {
/**
* Always true.
*/
return true;
}
private int pickNext() {
// Just grab one atomically.
int pick = i.incrementAndGet();
// Reset to zero if it has exceeded - but no spin, let "just someone" manage it.
int actual = pick % supplier.size();
if (pick != actual) {
// So long as someone has a success before we overflow int we're good.
i.compareAndSet(pick, actual);
}
return actual;
}
#Override
public T next() {
return supplier.get(pickNext());
}
#Override
public void remove() {
throw new UnsupportedOperationException("Remove not supported.");
}
}
}
NB: I have adjusted the code a little to accommodate both scenarios. You can take an Iterator per thread or share a single Iterator across threads.
You have a piece of information ("has anyone taken this Pair already?") that must be shared between all threads. So for the general case, you're stuck. However, if you have an idea about this size of your array and the number of threads, you could use buckets to make it less painful.
Let's suppose we know that there will be 1,000,000 array elements and 1,000 threads. Assign each thread a range (thread #1 gets elements 0-999, etc). Now instead of 1,000 threads contending for one AtomicInteger, you can have no contention at all!
That works if you can be sure that all your threads will run at about the same pace. If you need to handle the case where sometimes thread #1 is busy doing other things while thread #2 is idle, you can modify your bucket pattern slightly: each bucket has an AtomicInteger. Now threads will generally only contend with themselves, but if their bucket is empty, they can move on to the next bucket.
I'm having some trouble understanding what the problem you are trying to solve is?
Does each thread process the whole collection?
Is the concern that no two threads can work on the same Pair at the same time? But each thread needs to process each Pair in the collection?
Or do you want the collection processed once by using all of the threads?
There is one key thing which is obscure in your example - what exactly is the meaning this?
The order of elements doesn't matter, thread can take same element at a different time.
"different time" means what? Within N milliseconds of each other? Does it mean that absolutely two threads will never be touching the same Pair at the same time? I will assume that.
If you want to decrease the probability that threads will block on each other contending for the same Pair, and there is a backing array of Pairs, try this:
Partition your array into numPairs / threadCount sub-arrays (you don't have to actually create sub-arrays, just start at different offsets - but it's easier to think about as sub-array)
Assign each thread to a different sub-array; when a thread exhausts its sub-array, increment the index of its sub array
Say we have 6 Pairs and 2 threads - your assignments look like Thread-1:[0,1,2] Thread-2:[3,4,5]. When Thread-1 starts it will be looking at a different set of Pairs than thread 2, so it is unlikely that they will contend for the same pair
If it is important that two threads really not touch a Pair at the same time, then wrap all of the code which touches a Pair object in synchronized(pair) (synchronize on the instance, not the type!) - there may occasionally be blocking, but you're never blocking all threads on a single thing, as with the AtomicInteger - threads can only block each other because they are really trying to touch the same object
Note this is not guaranteed never to block - for that, all threads would have to run at exactly the same speed, and processing every Pair object would have to take exactly the same amount of time, and the OS's thread scheduler would have to never steal time from one thread but not another. You cannot assume any of those things. What this gives you is a higher probability that you will get better concurrency, by dividing the areas to work in and making the smallest unit of state that is shared be the lock.
But this is the usual pattern for getting more concurrency on a data structure - partition the data between threads so that they rarely are touching the same lock at the same time.
The most easy that o see, is create Hash set or Map, and give a unique hash for every thread. After that just do simple get by this hash code.
This is standard java semaphore usage problem. The following javadoc gives almost similar example as your problem. http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Semaphore.html
If you need more help, let me know?
I prefer a lock and release process.
If a thread is asking for a pair object, the Pair object is removed from the supplier. Before the thread is asking for a new pair, the 'old' pair is added the the suplier again.
You can push from front and put at the end.

Flush cache after multithreaded mapping writes to primitive arrays

This question relates to the latest version of Java.
I have a primitive two-dimensional array sized as below.
int[][] array = new int[numPasses][n*10]; //n threads write; during the i-th pass, the k-th thread writes to array[i] at locations k*10 to (k+1)*10-1.
//the array above is allocated at the beginning, and constantly rewritten.
During pass i, each of n producer threads writes to its own memory location in array[i], so there are no race conditions during the write process. After writing, m consumer threads read the results of this write. I do not need the consumers to access array[i] at any point in time before all the writes are done.
My first question: Would a structure like the following flush all the producer writes from cache? If not, how would one go about doing this for primitive arrays? (For technical reasons, I cannot use Atomic*Arrays.)
void flush() {//invoked after writes from all producer threads are done.
if(producerThreadID == 0) {
synchronized(array[i]) {//done at pass i.
}
}
My second question: Is there a better way to do this?
EDIT: Okay, I accept that what I want to do is essentially impossible with the empty synchronized block. Let's say that, instead of the structure above, each producer thread has access to its own pass, i.e.:
int[][] array = new int[numPasses][n*10]; //n = numPasses threads write; during the i-th pass, the i-th thread writes to all elements in array[i].
(This is Zim-Zam's suggestion.)
My (hopefully final) question: Then, would the following structure in the i-th thread ensure visibility for consumer threads after the synchronized block?
//i-th producer thread acquires lock on array[i]
void produce() {
synchronized(array[i])
//modify array[i][*] here
}
Your algorithm is probably going to create false sharing, which occurs when two threads write to nearby memory locations - if thread1 and thread2 are writing to data that shares a cache line, then the cache protocol will force thread2 to block until or re-execute after thread1 completes or vice versa. You can avoid this by using coarser grained parallelism, e.g. use one thread per pass (one thread per array) rather than one thread per array element - this way each thread is operating on its own array and there probably isn't going to be any false sharing.
I would study carefully your reasons or not using Atomics because they are exactly what you are needing.
If there is truly a problem then have you considered using sun.misc.Unsafe like the Atomics use?
Alternatively - use an array of objects holding a volatile field.
class Vint {
public volatile int i;
}
Vint[] arr = new Vint[10];
{
for (int i = 0; i < arr.length; i++) {
arr[i] = new Vint();
}
}

Lazy initialization without synchronization or volatile keyword

The other day Howard Lewis Ship posted a blog entry called "Things I Learned at Hacker Bed and Breakfast", one of the bullet points is:
A Java instance field that is assigned exactly once via lazy
initialization does not have to be synchronized or volatile (as long
as you can accept race conditions across threads to assign to the
field); this is from Rich Hickey
On the face of it this seems at odds with the accepted wisdom about visibility of changes to memory across threads, and if this is covered in the Java Concurrency in Practice book or in the Java language spec then I have missed it. But this was something HLS got from Rich Hickey at an event where Brian Goetz was present, so it would seem there must be something to it. Could someone please explain the logic behind this statement?
This statement sounds a little bit cryptic. However, I guess HLS refers to the case when you lazily initialize an instance field and don't care if several threads performs this initialization more than once.
As an example, I can point to the hashCode() method of String class:
private int hashCode;
public int hashCode() {
int hash = hashCode;
if (hash == 0) {
if (count == 0) {
return 0;
}
final int end = count + offset;
final char[] chars = value;
for (int i = offset; i < end; ++i) {
hash = 31*hash + chars[i];
}
hashCode = hash;
}
return hash;
}
As you can see access to the hashCode field (which holds cached value of the computed String hash) is not synchronized and the field isn't declared as volatile. Any thread which calls hashCode() method will still receive the same value, though hashCode field may be written more than once by different threads.
This technique has limited usability. IMHO it's usable mostly for the cases like in the example: a cached primitive/immutable object which is computed from the others final/immutable fields, but its computation in the constructor is an overkill.
Hrm. As I read this it is technically incorrect but okay in practice with some caveats. Only final fields can safely be initialized once and accessed in multiple threads without synchronization.
Lazy initialized threads can suffer from synchronization issues in a number of ways. For example, you can have constructor race conditions where the reference of the class has been exported without the class itself being initialized fully.
I think it highly depends on whether or not you have a primitive field or an object. Primitive fields that can be initialized multiple times where you don't mind that multiple threads do the initialization would work fine. However HashMap style initialization in this manner may be problematic. Even long values on some architectures may store the different words in multiple operations so may export half of the value although I suspect that a long would never cross a memory page so therefore it would never happen.
I think it depends highly on whether or not an application has any memory barriers -- any synchronized blocks or access to volatile fields. The devil is certainly in the details here and the code that does the lazy initialization may work fine on one architecture with one set of code and not in a different thread model or with an application that synchronizes rarely.
Here's a good piece on final fields as a comparison:
http://www.javamex.com/tutorials/synchronization_final.shtml
As of Java 5, one particular use of the final keyword is a very important and often overlooked weapon in your concurrency armoury. Essentially, final can be used to make sure that when you construct an object, another thread accessing that object doesn't see that object in a partially-constructed state, as could otherwise happen. This is because when used as an attribute on the variables of an object, final has the following important characteristic as part of its definition:
Now, even if the field is marked final, if it is a class, you can modify the fields within the class. This is a different issue and you must still have synchronization for this.
This works fine under some conditions.
its okay to try and set the field more than once.
its okay if individual threads see different values.
Often when you create an object which is not changed e.g. loading a Properties from disk, having more than one copy for a short amount of time is not an issue.
private static Properties prop = null;
public static Properties getProperties() {
if (prop == null) {
prop = new Properties();
try {
prop.load(new FileReader("my.properties"));
} catch (IOException e) {
throw new AssertionError(e);
}
}
return prop;
}
In the short term this is less efficient than using locking, but in the long term it could be more efficient. (Although Properties has a lock of it own, but you get the idea ;)
IMHO, Its not a solution which works in all cases.
Perhaps the point is that you can use more relaxed memory consistency techniques in some cases.
I think the statement is untrue. Another thread can see a partially initialized object, so the reference can be visible to another thread even though the constructor hasn't finished running. This is covered in Java Concurrency in Practice, section 3.5.1:
public class Holder {
private int n;
public Holder (int n ) { this.n = n; }
public void assertSanity() {
if (n != n)
throw new AssertionError("This statement is false.");
}
}
This class isn't thread-safe.
If the visible object is immutable, then I you are OK, because of the semantics of final fields means you won't see them until its constructor has finished running (section 3.5.2).

Java: How exactly do synchronized operations relate to volatility?

Sorry this is such a long question.
Ive been doing lots of research lately into multi-threading as I slowly implement it into a personal project. However, probably due to an abundance of slightly incorrect examples, the use of synchronized blocks and volatility in certain situations is still a bit unclear to me.
My core question is this: Are changes to references and primitives automatically volatile (that is, performed on the main memory and not a cache) when a thread is inside a synchronized block, or does the read also have to be synchronized for it to work properly?
If so What is the purpose of synchronizing a simple getter method? (see example 1 ) Also, are ALL changes sent to main memory as long as the thread has synchronized on anything? eg if it is sent off to do loads of work all over the place inside a very high level sync will every single change then made be to main memory, and nothing ever to cache, until its unlocked again?
If not Does the change have to be explicitly inside a synchronized block, or can java actually pick up on, for example, uses of the Lock object? (see example 3)
If either Does the synchronized object need to be related to the reference/primitive being changed in any way (eg the immediate object that contains it)? Can I write by syncing on one object and read with another if its otherwise safe? (see example 2)
(please note for the following examples that I know that synchronized methods and synchronized(this) are frowned upon and why, but discussion about that is beyond the scope of my question)
Example 1:
class Counter{
int count = 0;
public synchronized void increment(){
count++;
}
public int getCount(){
return count;
}
}
In this example, increment() needs to be synchronized since ++ is not an atomic operation. As such, two threads incremending at the same time may result in a overall increase of 1 to the count. The count primitive needs to be atomic (eg not long/double/reference), and it is so thats fine.
Does getCount() need to be synchronized here and why exactly? The explanation I have heard the most is that I will have no guarantee whether the count returned will be the pre- or post-increment. However, this seems like the explanation for something slightly different, thats found itself in the wrong place. I mean if I were to synchronize getCount(), then I still see no guarantee - its now down to not knowing the locking order, insead of not knowing whether the actual read happens to be before/after the actual write.
Example 2:
Is the following example threadsafe, if you assume that through trickery not shown here that none of these methods will never be called at the same time? Will count increment in an expected way if its done so using a random method each time, and then be read properly, or does the lock have to be the same object? (btw I fully realise how rediculous this example is but Im more interested in theory than practice)
class Counter{
private final Object lock1 = new Object();
private final Object lock2 = new Object();
private final Object lock3 = new Object();
int count = 0;
public void increment1(){
synchronized(lock1){
count++;
}
}
public void increment2(){
synchronized(lock2){
count++;
}
}
public int getCount(){
synchronized(lock3){
return count;
}
}
}
Example 3:
Is the happens-before relationship simply a java concept, or is it an actual thing built into the JVM? Even though I can guarantee a conceptual happens-before relationship for this next example, is java smart enough to pick it up if its a built in thing? I am assuming it is not, but is this example actually threadsafe? If its threadsafe, what about if getCount() did no locking?
class Counter{
private final Lock lock = new Lock();
int count = 0;
public void increment(){
lock.lock();
count++;
lock.unlock();
}
public int getCount(){
lock.lock();
int count = this.count;
lock.unlock();
return count;
}
}
Yes, the read has to be synchronized as well. This page says:
The results of a write by one thread are guaranteed to be visible to a
read by another thread only if the write operation happens-before the
read operation.
[...]
An unlock (synchronized block or method exit) of a monitor
happens-before every subsequent lock (synchronized block or method
entry) of that same monitor
The same page says:
Actions prior to "releasing" synchronizer methods such as Lock.unlock,
Semaphore.release, and CountDownLatch.countDown happen-before actions
subsequent to a successful "acquiring" method such as Lock.lock
So locks offer the same visibility guarantees as synchronized blocks.
Whether you use synchronized blocks or locks, the visibility is only guaranteed if the reader thread uses the same monitor or lock as the writer thread.
Your Example 1 is incorrect: the getter must be synchronized as well if you want to see the latest value of the count.
Your example 2 is incorrect because it uses different locks to guard the same count.
Your example 3 is OK. If the getter did not lock, you could see an older value of the count. The happens-before is something that is guaranteed by the JVM. The JVM has to respect the rules specified, by flushing caches to the main memory for example.
Try to view it in terms of two distinct, simple operations:
Locking (mutual exclusion),
Memory barrier (cache sync, instruction reordering barrier).
Entering a synchronized block entails both locking and memory barrier; leaving the synchronized block entails unlocking + memory barrier; reading/writing a volatile field entails memory barrier only. Thinking in these terms I think you can clarify for yourself all the question above.
As for Example 1, the reading thread will not have any kind of memory barrier. It's not just between seeing the value before/after read, it's about never observing any change to the var after a thread is started.
Example 2. is the most interesting issue you raise. You are indeed given no guarantees by the JLS in this case. In practice you won't be given any ordering guarantees (it's as if the locking aspect wasn't there at all), but you'll still have the benefit of the memory barriers so you will observe changes, unlike the first example. Basically, this is exactly the same as removing synchronized and tagging the int as volatile (apart from the runtime costs of acquiring locks).
Regarding Example 3, by "just a Java thing" I feel you have generics with erasure in mind, something that only the static code checking is aware of. This is not like that -- both locks and memory barriers are pure runtime artifacts. In fact, the compiler can't reason about them at all.

Categories

Resources