I have read some questions & answers on visibility of Java array elements from multiple threads, but I still can't really wrap my head around some cases. To demonstrate what I'm having trouble with, I have come up with a simple scenario: Assume that I have a simple collection that adds elements into one of its n buckets by hashing them into one (Bucket is like a list of some sort). And each bucket is separately synchronized. E.g. :
private final Object[] locks = new Object[10];
private final Bucket[] buckets = new Bucket[10];
Here a bucket i is supposed to be guarded by lock[i]. Here is how add elements code looks like:
public void add(Object element) {
int bucketNum = calculateBucket(element); //hashes element into a bucket
synchronized (locks[bucketNum]) {
buckets[bucketNum].add(element);
}
}
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
And finally, a bit trickier part. Assume I want to copy out and merge the contents of all buckets and empty the whole data structure, from an arbitrary thread, like this:
public List<Bucket> clear() {
List<Bucket> allBuckets = new List<>();
for(int bucketNum = 0; bucketNum < buckets.length; bucketNum++) {
synchronized (locks[bucketNum]) {
allBuckets.add(buckets[bucketNum]);
buckets[bucketNum] = new Bucket();
}
}
return allBuckets;
}
I basically swap the old bucket with a newly created one and return the old one. This case is different from the add() one because we are not modifying the object referred by the reference in the array but we are directly changing the array/reference.
Note that I do not care if bucket 2 is modified while I'm holding the lock for bucket 1, I don't need the structure to be fully synchronized and consistent, just visibility and near consistency is enough.
So assuming every bucket[i] is only ever modified under lock[i], would you say that this code works? I hope to be able to learn why and why nots and have a better grasp of visibility, thanks.
First question.
Thread safety in this case depends on whether the reference to the object containing locks and buckets (let's call it Container) is properly shared.
Just imagine: one thread is busy instantiating a new Container object (allocating memory, instantiating arrays, etc.), while another thread starts using this half-instantiated object where locks and buckets are still null (they have not been instantiated by the first thread yet). In this case this code:
synchronized (locks[bucketNum]) {
becomes broken and throws NullPointerException. The final keyword prevents this and guarantees that by the time the reference to Container is not null, its final fields have already been initialized:
An object is considered to be completely initialized when its
constructor finishes. A thread that can only see a reference to an
object after that object has been completely initialized is guaranteed
to see the correctly initialized values for that object's final
fields. (JLS 17.5)
Second question.
Assuming that locks and buckets fields are final and you don't care about consistency of the whole array and "every bucket[i] is only ever modified under lock[i]", this code is fine.
Just to add to Pavel's answer:
In your first question you ask
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
I'm not sure what you mean by 'visibility problems', but for sure, without the synchronized this code would be incorrect, if multiple threads would access buckets[i] with one of them modifying it (e.g. writing to it). There would be no guarantee that what one thread have written, becomes visible to another one. This also involves internal structures of the bucket which might be modified by the call to add.
Remember that final on buckets pertain only to the single reference to the array itself, not to its cells.
Related
A bit of (simplified) context.
Let's say I have an ArrayList<ContentStub> where ContentStub is:
public class ContentStub {
ContentType contentType;
Object content;
}
And I have multiple implementations of classes that "inflate" stubs for each ContentType, e.g.
public class TypeAStubInflater {
public void inflate(List<ContentStub> contentStubs) {
contentStubs.forEach(stub ->
{
if(stub.contentType == ContentType.TYPE_A) {
stub.content = someService.getContent();
}
});
}
}
The idea being, there is TypeAStubInflater which only modifies items ContentType.TYPE_A running in one thread, and TypeBStubInflater which only modifies items ContentType.TYPE_B, etc. - but each instance's inflate() method is modifying items in the same contentStubs List, in parallel.
However:
No thread ever changes the size of the ArrayList
No thread ever attempts to modify a value that's being modified by another thread
No thread ever attempts to read a value written by another thread
Given all this, it seems that no additional measures to ensure thread-safety are necessary. From a (very) quick look at the ArrayList implementation, it seems that there is no risk of a ConcurrentModificationException - however, that doesn't mean that something else can't go wrong. Am I missing something, or this safe to do?
In general, that will work, because you are not modifying the state of the List itself, which would throw a ConcurrentModificationException if any iterator is active at the time of looping, but rather are modifying just an object inside the list, which is fine from the list's POV.
I would recommend splitting up your into a Map<ContentType, List<ContentStub>> and then start Threads with those specific lists.
You could convert your list to a map with this:
Map<ContentType, ContentStub> typeToStubMap = stubs.stream().collect(Collectors.toMap(stub -> stub.contentType, Function.identity()));
If your List is not that big (<1000 entries) I would even recommend not using any threading, but just use a plain for-i loop to iterate, even .foreach if that 2 extra integers are no concern.
Let's assume the thread A writes TYPE_A content and thread B writes TYPE_B content. The List contentStubs is only used to obtain instances of ContentStub: read-access only. So from the perspective of A, B and contentStubs, there is no problem. However, the updates done by threads A and B will likely never be seen by another thread, e.g. another thread C will likely conclude that stub.content == null for all elements in the list.
The reason for this is the Java Memory Model. If you don't use constructs like locks, synchronization, volatile and atomic variables, the memory model gives no guarantee if and when modifications of an object by one thread are visible for another thread. To make this a little more practical, let's have an example.
Imagine that a thread A executes the following code:
stub.content = someService.getContent(); // happens to be element[17]
List element 17 is a reference to a ContentStub object on the global heap. The VM is allowed to make a private thread copy of that object. All subsequent access to reference in thread A, uses the copy. The VM is free to decide when and if to update the original object on the global heap.
Now imagine a thread C that executes the following code:
ContentStub stub = contentStubs.get(17);
The VM will likely do the same trick with a private copy in thread C.
If thread C already accessed the object before thread A updated it, thread C will likely use the – not updated – copy and ignore the global original for a long time. But even if thread C accesses the object for the first time after thread A updated it, there is no guarantee that the changes in the private copy of thread A already ended up in the global heap.
In short: without a lock or synchronization, thread C will almost certainly only read null values in each stub.content.
The reason for this memory model is performance. On modern hardware, there is a trade-off between performance and consistency across all CPUs/cores. If the memory model of a modern language requires consistency, that is very hard to guarantee on all hardware and it will likely impact performance too much. Modern languages therefore embrace low consistency and offer the developer explicit constructs to enforce it when needed. In combination with instruction reordering by both compilers and processors, that makes old-fashioned linear reasoning about your program code … interesting.
Consider the code snippet
class A {
private Map<String, Object> taskMap = new HashMap<>();
private volatile Object[] tasksArray ;
// assume this happens on thread1
public void assignTasks() {
synchronized(taskMap){
// put new tasks into map
// reassign values from map as a new array to tasksArray ( for direct random access )
}
}
// assume this is invoked on Thread2
public void action(){
int someindex = <init index to some value within tasksArray.length>;
Object[] localTasksArray = tasksArray;
Object oneTask = localTasksArray[someindex];
// Question : is the above operation safe with respect to memory visibility for Object oneTask ?
// is it possible that oneTask may appear null or in some other state than expected ?
}
}
Question : is the operation Object oneTask = localTasksArray[someindex]; safe with respect to memory visibility for Object oneTask ?
is it possible that oneTask may appear null or in some other state than expected ?
My thoughts are these :
It is possible that thread2 may see oneTask as null or in some state other than expected. This is because , even though the taskArray is volatile , and a read of this array will ensure proper visibility of the array itself , this does not ensure the visibility of the state internal to the object oneTask.
The volatile keyword only protects the field taskArray which is a reference to an Object[]. Whenever you read or write this field, it will have consistent ordering. However, this doesn't extend to the array referenced nor the objects that array references.
Most likely you want AtomicReferenceArray instead.
As far as I recall (I am researching now to confirm), Java only guarantees the volatile object itself is flushed to RAM, not sub parts of that object (as in array entries) or sub-fields of the object (in the case of objects).
However - I believe most (if not all) JVMs implement volatile access as a memory barrier (see here and the referenced page The JSR-133 Cookbook for Compiler Writers). As a memory barrier this therefore means that all previous writes by other threads will be flushed from cache to main memory when the access occurs - thus making all memory consistent at that time.
This should not be relied upon - if it is crucial that you have complete control of what is visible and what is not you should use one of the Atomic classes such as AtomicReferenceArray as suggested by others. These may even be more efficient than volatile for exactly this reason.
Declaring a field volatile ensures that any write of that variable synchronizes-with (and is therefore visible to) any subsequent (according to synchronization order) of this value. Note that there is no such thing as a volatile object, or a volatile array. There are only volatile fields.
Therefore, in your code, there is no happens-before relationship between Thread 1 storing an object into the array, and Thread 2 reading it from there.
I have few ArrayList<T> containing user defined objects (e.g. List<Student>, List<Teachers>). The objects are immutable in nature, i.e. no setters are provided - and also, due to the nature of the problem, "no one" will ever attempt to modify these objects. Once the 'ArrayList' is populated, no further addition/removal of objects is allowed/possible. So List will not dynamically change.
With such given condition, can this data structures (i.e. ArraList) be safely used by multiple threads (simultaneously)? Each of the thread will just read the object-properties, but there is no "set" operation possible.
So, my question is can I rely on ArrayList? If not, what other less expensive data structures can be used in such scenario?
You can share any objects or data structures between threads if they are never modified after a safe publication. As mentioned in the comments, there must be a * happen-before* relationship between the writes that initialize the ArrayList and the read by which the other threads acquire the reference.
E.g. if you setup the ArrayList completely before starting the other threads or before submitting the tasks working on the list to an ExecutorService you are safe.
If the threads are already running you have to use one of the thread safe mechanisms to hand-over the ArrayList reference to the other threads, e.g. by putting it on a BlockingQueue.
Even simplest forms like storing the reference into a static final or volatile field will work.
Keep in mind that your precondition of never modifying the object afterwards must always hold. It’s recommended to enforce that constraint by wrapping the list using Collections.unmodifiableList(…) and forget about the original list reference before publishing:
class Example {
public static final List<String> THREAD_SAFE_LIST;
static {
ArrayList<String> list=new ArrayList<>();
// do the setup
THREAD_SAFE_LIST=Collections.unmodifiableList(list);
}
}
or
class Example {
public static final List<String> THREAD_SAFE_LIST
=Collections.unmodifiableList(Arrays.asList("foo", "bar"));
}
Yes, you should be able to pass the array into each thread. There should be no access errors so long as the array is finished being written to before any thread is possibly getting the information.
The javadocs are not clear about this: are unmodifiable sets thread safe? Or should I worry about internal state concurrency problems?
Set<String> originalSet = new HashSet<>();
originalSet.add("Will");
originalSet.add("this");
originalSet.add("be");
originalSet.add("thread");
originalSet.add("safe?");
final Set<String> unmodifiableSet = Collections.unmodifiableSet(originalSet);
originalSet = null; // no other references to the originalSet
// Can unmodifiableSet be shared among several threads?
I have stumbled upon a piece of code with a static, read-only Set being shared against multiple threads... The original author wrote something like this:
mySet = Collections.synchronizedSet(Collections.unmodifiableSet(originalSet));
And then every thread access it with code such as:
synchronized (mySet) {
// Iterate and read operations
}
By this logic only one thread can operate on the Set at once...
So my question is, for a unmodifiable set, when using operations such as for each, contains, size, etc, do I really need to synchronize access?
If it's an unmodifiable Set<String>, as per your example, then you're fine; because String objects are immutable. But if it's a set of something that's not immutable, you have to be careful about two threads both trying to change the same object inside the set.
You also have to be careful about whether there's a reference somewhere to the Set, that's not unmodifiable. It's possible for a variable to be unmodifiable, but still be referring to a Set which can be modified via a different variable; but your example seems to have that covered.
Objects that are "de facto" immutable are thread safe. i.e. objects that never change their state. That includes objects that could theoretically change but never do.
However all the objects contained inside must also be "de facto" immutable.
Furthermore the object only starts to become thread safe when you stop modifying it.
And it needs to be passed to the other threads in a safe manner. There are 2 ways to do that.
1.) you start the other threads only after you stopped modifying your object. In that case you don't need any synchronization at all.
2.) the other threads are already running while you are modifying the object, but once you completed constructing the object, you pass it to them through a synchronized mechanism e.g. a ConcurrentLinkedDeque. After that you don't need any further synchronization.
I known a Hashtable is synchronized, but why its get() method is synchronized?
Is it only a read method?
If the read was not synchronized, then the Hashtable could be modified during the execution of read. New elements could be added, the underlying array could become too small and could be replaced by a bigger one, etc. Without sequential execution, it is difficult to deal with these situations.
However, even if get would not crash when the Hashtable is modified by another thread, there is another important aspect of the synchronized keyword, namely cache synchronization. Let's use a simplified example:
class Flag {
bool value;
bool get() { return value; } // WARNING: not synchronized
synchronized void set(bool value) { this->value = value; }
}
set is synchronized, but get isn't. What happens if two threads A and B simultaneously read and write to this class?
1. A calls read
2. B calls set
3. A calls read
Is it guaranteed at step 3 that A sees the modification of thread B?
No, it isn't, as A could be running on a different core, which uses a separate cache where the old value is still present. Thus, we have to force B to communicate the memory to other core, and force A to fetch the new data.
How can we enforce it? Everytime, a thread enters and leaves a synchronized block, an implicit memory barrier is executed. A memory barrier forces the cache to be updated. However, it is required that both the writer and the reader have to execute the memory barrier. Otherwise, the information is not properly communicated.
In our example, thread B already uses the synchronized method set, so its data modification is communicated at the end of the method. However, A does not see the modified data. The solution is to make get synchronized, so it is forced to get the updated data.
Have a look in Hashtable source code and you can think of lots of race conditions that can cause problem in a unsynchronized get() .
(I am reading JDK6 source code)
For example, a rehash() will create a empty array, and assign it to the instance var table, and put the entries from old table to the new one. Therefore if your get occurs after the empty array assignment, but before actually putting entries in it, you cannot find your key even it is in the table.
Another example is, there is a loop iterate thru the linked list at the table index, if in middle in your iteration, rehash happens. You may also failed to find the entry even it exists in the hashtable.
Hashtable is synchronized meaning the whole class is thread-safe
Inside the Hashtable, not only get() method is synchronized but also many other methods are. And particularly put() method is synchronized like Tom said.
A read method must be synchronized as a write method because it will make sure the visibility and the consistency of the variable.