Consider the code snippet
class A {
private Map<String, Object> taskMap = new HashMap<>();
private volatile Object[] tasksArray ;
// assume this happens on thread1
public void assignTasks() {
synchronized(taskMap){
// put new tasks into map
// reassign values from map as a new array to tasksArray ( for direct random access )
}
}
// assume this is invoked on Thread2
public void action(){
int someindex = <init index to some value within tasksArray.length>;
Object[] localTasksArray = tasksArray;
Object oneTask = localTasksArray[someindex];
// Question : is the above operation safe with respect to memory visibility for Object oneTask ?
// is it possible that oneTask may appear null or in some other state than expected ?
}
}
Question : is the operation Object oneTask = localTasksArray[someindex]; safe with respect to memory visibility for Object oneTask ?
is it possible that oneTask may appear null or in some other state than expected ?
My thoughts are these :
It is possible that thread2 may see oneTask as null or in some state other than expected. This is because , even though the taskArray is volatile , and a read of this array will ensure proper visibility of the array itself , this does not ensure the visibility of the state internal to the object oneTask.
The volatile keyword only protects the field taskArray which is a reference to an Object[]. Whenever you read or write this field, it will have consistent ordering. However, this doesn't extend to the array referenced nor the objects that array references.
Most likely you want AtomicReferenceArray instead.
As far as I recall (I am researching now to confirm), Java only guarantees the volatile object itself is flushed to RAM, not sub parts of that object (as in array entries) or sub-fields of the object (in the case of objects).
However - I believe most (if not all) JVMs implement volatile access as a memory barrier (see here and the referenced page The JSR-133 Cookbook for Compiler Writers). As a memory barrier this therefore means that all previous writes by other threads will be flushed from cache to main memory when the access occurs - thus making all memory consistent at that time.
This should not be relied upon - if it is crucial that you have complete control of what is visible and what is not you should use one of the Atomic classes such as AtomicReferenceArray as suggested by others. These may even be more efficient than volatile for exactly this reason.
Declaring a field volatile ensures that any write of that variable synchronizes-with (and is therefore visible to) any subsequent (according to synchronization order) of this value. Note that there is no such thing as a volatile object, or a volatile array. There are only volatile fields.
Therefore, in your code, there is no happens-before relationship between Thread 1 storing an object into the array, and Thread 2 reading it from there.
Related
public class Test{
private MyObj myobj = new MyObj(); //it is not volatile
public class Updater extends Thred{
myobje = getNewObjFromDb() ; //not am setting new object
}
public MyObj getData(){
//getting stale date is fine for
return myobj;
}
}
Updated regularly updates myobj
Other classes fetch data using getData
IS this code thread safe without using volatile keyword?
I think yes. Can someone confirm?
No, this is not thread safe. (What makes you think it is?)
If you are updating a variable in one thread and reading it from another, you must establish a happens-before relationship between the write and the subsequent read.
In short, this basically means making both the read and write synchronized (on the same monitor), or making the reference volatile.
Without that, there are no guarantees that the reading thread will see the update - and it wouldn't even be as simple as "well, it would either see the old value or the new value". Your reader threads could see some very odd behaviour with the data corruption that would ensue. Look at how lack of synchronization can cause infinite loops, for example (the comments to that article, especially Brian Goetz', are well worth reading):
The moral of the story: whenever mutable data is shared across threads, if you don’t use synchronization properly (which means using a common lock to guard every access to the shared variables, read or write), your program is broken, and broken in ways you probably can’t even enumerate.
No, it isn't.
Without volatile, calling getData() from a different thread may return a stale cached value.
volatile forces assignments from one thread to be visible on all other threads immediately.
Note that if the object itself is not immutable, you are likely to have other problems.
You may get a stale reference. You may not get an invalid reference.
The reference you get is the value of the reference to an object that the variable points to or pointed to or will point to.
Note that there are no guarantees how much stale the reference may be, but it's still a reference to some object and that object still exists. In other words, writing a reference is atomic (nothing can happen during the write) but not synchronized (it is subject to instruction reordering, thread-local cache et al.).
If you declare the reference as volatile, you create a synchronization point around the variable. Simply speaking, that means that all cache of the accessing thread is flushed (writes are written and reads are forgotten).
The only types that don't get atomic reads/writes are long and double because they are larger than 32-bits on 32-bit machines.
If MyObj is immutable (all fields are final), you don't need volatile.
The big problem with this sort of code is the lazy initialization. Without volatile or synchronized keywords, you could assign a new value to myobj that had not been fully initialized. The Java memory model allows for part of an object construction to be executed after the object constructor has returned. This re-ordering of memory operations is why the memory-barrier is so critical in multi-threaded situations.
Without a memory-barrier limitation, there is no happens-before guarantee so you do not know if the MyObj has been fully constructed. This means that another thread could be using a partially initialized object with unexpected results.
Here are some more details around constructor synchronization:
Constructor synchronization in Java
Volatile would work for boolean variables but not for references. Myobj seems to perform like a cached object it could work with an AtomicReference. Since your code extracts the value from the DB I'll let the code stay as is and add the AtomicReference to it.
import java.util.concurrent.atomic.AtomicReference;
public class AtomicReferenceTest {
private AtomicReference<MyObj> myobj = new AtomicReference<MyObj>();
public class Updater extends Thread {
public void run() {
MyObj newMyobj = getNewObjFromDb();
updateMyObj(newMyobj);
}
public void updateMyObj(MyObj newMyobj) {
myobj.compareAndSet(myobj.get(), newMyobj);
}
}
public MyObj getData() {
return myobj.get();
}
}
class MyObj {
}
I have read some questions & answers on visibility of Java array elements from multiple threads, but I still can't really wrap my head around some cases. To demonstrate what I'm having trouble with, I have come up with a simple scenario: Assume that I have a simple collection that adds elements into one of its n buckets by hashing them into one (Bucket is like a list of some sort). And each bucket is separately synchronized. E.g. :
private final Object[] locks = new Object[10];
private final Bucket[] buckets = new Bucket[10];
Here a bucket i is supposed to be guarded by lock[i]. Here is how add elements code looks like:
public void add(Object element) {
int bucketNum = calculateBucket(element); //hashes element into a bucket
synchronized (locks[bucketNum]) {
buckets[bucketNum].add(element);
}
}
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
And finally, a bit trickier part. Assume I want to copy out and merge the contents of all buckets and empty the whole data structure, from an arbitrary thread, like this:
public List<Bucket> clear() {
List<Bucket> allBuckets = new List<>();
for(int bucketNum = 0; bucketNum < buckets.length; bucketNum++) {
synchronized (locks[bucketNum]) {
allBuckets.add(buckets[bucketNum]);
buckets[bucketNum] = new Bucket();
}
}
return allBuckets;
}
I basically swap the old bucket with a newly created one and return the old one. This case is different from the add() one because we are not modifying the object referred by the reference in the array but we are directly changing the array/reference.
Note that I do not care if bucket 2 is modified while I'm holding the lock for bucket 1, I don't need the structure to be fully synchronized and consistent, just visibility and near consistency is enough.
So assuming every bucket[i] is only ever modified under lock[i], would you say that this code works? I hope to be able to learn why and why nots and have a better grasp of visibility, thanks.
First question.
Thread safety in this case depends on whether the reference to the object containing locks and buckets (let's call it Container) is properly shared.
Just imagine: one thread is busy instantiating a new Container object (allocating memory, instantiating arrays, etc.), while another thread starts using this half-instantiated object where locks and buckets are still null (they have not been instantiated by the first thread yet). In this case this code:
synchronized (locks[bucketNum]) {
becomes broken and throws NullPointerException. The final keyword prevents this and guarantees that by the time the reference to Container is not null, its final fields have already been initialized:
An object is considered to be completely initialized when its
constructor finishes. A thread that can only see a reference to an
object after that object has been completely initialized is guaranteed
to see the correctly initialized values for that object's final
fields. (JLS 17.5)
Second question.
Assuming that locks and buckets fields are final and you don't care about consistency of the whole array and "every bucket[i] is only ever modified under lock[i]", this code is fine.
Just to add to Pavel's answer:
In your first question you ask
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
I'm not sure what you mean by 'visibility problems', but for sure, without the synchronized this code would be incorrect, if multiple threads would access buckets[i] with one of them modifying it (e.g. writing to it). There would be no guarantee that what one thread have written, becomes visible to another one. This also involves internal structures of the bucket which might be modified by the call to add.
Remember that final on buckets pertain only to the single reference to the array itself, not to its cells.
I have been trying to figure out that how immutable objects which are safely published could be observed with stale reference.
public final class Helper {
private final int n;
public Helper(int n) {
this.n = n;
}
}
class Foo {
private Helper helper;
public Helper getHelper() {
return helper;
}
public void setHelper(int num) {
helper = new Helper(num);
}
}
So far I could understand that Helper is immutable and can be safely published. A reading thread either reads null or fully initialized Helper object as it won't be available until fully constructed. The solution is to put volatile in Foo class which I don't understand.
The fact that you are publishing a reference to an immutable object is irrelevant here.
If you are reading the value of a reference from multiple threads, you need to ensure that the write happens before a read if you care about all threads using the most up-to-date value.
Happens before is a precisely-defined term in the language spec, specifically the part about the Java Memory Model, which allows threads to make optimisations for example by not always updating things in main memory (which is slow), instead holding them in their local cache (which is much faster, but can lead to threads holding different values for the "same" variable). Happens-before is a relation that helps you to reason about how multiple threads interact when using these optimisations.
Unless you actually create a happens-before relationship, there is no guarantee that you will see the most recent value. In the code you have shown, there is no such relationship between writes and reads of helper, so your threads are not guaranteed to see "new" values of helper. They might, but they likely won't.
The easiest way to make sure that the write happens before the read would be to make the helper member variable final: the writes to values of final fields are guaranteed to happen before the end of the constructor, so all threads always see the correct value of the field (provided this wasn't leaked in the constructor).
Making it final isn't an option here, apparently, because you have a setter. So you have to employ some other mechanism.
Taking the code at face value, the simplest option would be to use a (final) AtomicInteger instead of the Helper class: writes to AtomicInteger are guaranteed to happen before subsequent reads. But I guess your actual helper class is probably more complicated.
So, you have to create that happens-before relationship yourself. Three mechanisms for this are:
Using AtomicReference<Helper>: this has similar semantics to AtomicInteger, but allows you to store a reference-typed value. (Thanks for pointing this out, #Thilo).
Making the field volatile: this guarantees visibility of the most recently-written value, because it causes writes to flush to main memory (as opposed to reading from a thread's cache), and reads to read from main memory. It effectively stops the JVM making this particular optimization.
Accessing the field in a synchronized block. The easiest thing to do would be to make the getter and setter methods synchronized. Significantly, you should not synchronize on helper, since this field is being changed.
Cite from Volatile vs Static in Java
This means that if two threads update a variable of the same Object concurrently, and the variable is not declared volatile, there could be a case in which one of the thread has in cache an old value.
Given your code, the following can happen:
Thread 1 calls getHelper() and gets null
Thread 2 calls getHelper() and gets null
Thread 1 calls setHelper(42)
Thread 2 calls setHelper(24)
And in this case your trouble starts regarding which Helper object will be used in which thread. The keyword volatile will at least solve the caching problem.
The variable helper is being read by multiple threads simultaneously. At the least, you have to make it volatile or the compiler will begin caching it in registers local to threads and any updates to the variable may not reflect in the main memory. Using volatile, when a thread starts reading a shared variable, it will clear its cache and fetch a fresh value from the global memory. When it finishes reading it, it will flush the contents of its cache into the main memory so that other threads may get the updated value.
I was digging inside the source code of hibernate-jpa today and stumbled upon the following code snippet (that you can also find here):
private static class PersistenceProviderResolverPerClassLoader implements PersistenceProviderResolver {
//FIXME use a ConcurrentHashMap with weak entry
private final WeakHashMap<ClassLoader, PersistenceProviderResolver> resolvers =
new WeakHashMap<ClassLoader, PersistenceProviderResolver>();
private volatile short barrier = 1;
/**
* {#inheritDoc}
*/
public List<PersistenceProvider> getPersistenceProviders() {
ClassLoader cl = getContextualClassLoader();
if ( barrier == 1 ) {} //read barrier syncs state with other threads
PersistenceProviderResolver currentResolver = resolvers.get( cl );
if ( currentResolver == null ) {
currentResolver = new CachingPersistenceProviderResolver( cl );
resolvers.put( cl, currentResolver );
barrier = 1;
}
return currentResolver.getPersistenceProviders();
}
That weird statement if ( barrier == 1 ) {} //read barrier syncs state with other threads disturbed me. I took the time to dig into the volatile keyword specification.
To put it simply, in my understanding, it ensures that any READ or WRITE operation on the corresponding variable will allways be performed directly in the memory at the place the value is usually stored. It specifically prevents accesses through caches or registrars that hold a copy of the value and are not necessarily aware if the value has changed or is being modified by a concurrent thread on another core.
As a consequence it causes a drop in performances because every access implies to go all the way into the memory instead of using the usual (pipelined?) shortcuts. But it also ensures that whenever a thread reads the variable it will always be up to date.
I provided those details to let you know what my understanding of the keyword is. But now when I re-read the code I am telling myself "Ok wo we are slowing the execution by ensuring that a value which is always 1 is always 1 (and setting it to 1). How does that help?"
Anybody can explain this?
You understand volatile wrong.
it ensures that any READ or WRITE operation on the corresponding
variable will allways be performed directly in the memory at the place
the value is usually stored. It specifically prevents accesses through
caches or registrars that hold a copy of the value and are not
necessarily aware if the value has changed or is being modified by a
concurrent thread on another core.
You are talking about the implemention, while the implemention may differs from jvm to jvm.
volatile is much like some kind of specification or rule, it can gurantee that
Write to a volatile variable establishes a happens-before relationship
with subsequent reads of that same variable. This means that changes
to a volatile variable are always visible to other threads. What's
more, it also means that when a thread reads a volatile variable, it
sees not just the latest change to the volatile, but also the side
effects of the code that led up the change.
and
Using simple atomic variable access is more efficient than accessing
these variables through synchronized code, but requires more care by
the programmer to avoid memory consistency errors. Whether the extra
effort is worthwhile depends on the size and complexity of the
application.
In this case, volatile is not used to gurantte barrier == 1:
if ( barrier == 1 ) {} //read
PersistenceProviderResolver currentResolver = resolvers.get( cl );
if ( currentResolver == null ) {
currentResolver = new CachingPersistenceProviderResolver( cl );
resolvers.put( cl, currentResolver );
barrier = 1; //write
}
it is used to gurantee that the side effects between the read and write is visible to other threads.
Without it, if you put something in the resolvers in Thread1, Thread2 might not notice it.
With it, if Thread2 read barrier after Thread1 write it, Thread2 is gurantted to see this put action.
And, there are many other synchronization mechanism, such as:
synchronized keyword
ReentrantLock
AtomicInteger
....
Usually, they can also build this happens-before relation ship between different threads.
This is done to make updates done to resolvers map to other threads by establishing happens before relationship (https://www.logicbig.com/tutorials/core-java-tutorial/java-multi-threading/happens-before.html).
In a single thread the following instructions have happens before relation
resolvers.put( cl, currentResolver );
barrier = 1;
But to make change in resolvers visible to other threads we need to read value from volatile variable barrier because write and subsequent read of the same volatile variable establish happens before relation (which is also transitive). So basically this is the overall result:
Update resolvers
Write to volatile barrier
Read from volatile barrier to make update made in step 1 visible to a thread which reads value from barrier
Volatile variables - is lightweight form of synchronization in Java.
Declaring a field volatile will give the following effects:
Compiler will not reorder the operations
Variable will be not cashed in registers
Operations on 64-bit data structures will be executed as atomic one
It will affect visibility synchronization of other variables
Quote from Brian Goetz's Concurrency in practice:
The visibility effects of volatile variables extend beyond the value
of the volatile variable itself. When thread A writes to a volatile
variable and subsequently thread B reads that same variable, the
values of all variables that were visible to A prior to writing to the
volatile variable become visible to B after reading the volatile
variable.
Okay, what is the point of keeping 1 and not declare resolvers as volatile WeakHashMap?
This safe publication guarantee applies only to primitive fields and object references. For the purposes of this visibility guarantee, the actual member is the object reference; the objects referred to by volatile object references are beyond the scope of the safe publication guarantee. Consequently, declaring an object reference to be volatile is insufficient to guarantee that changes to the members of the referent are published to other threads. A thread may fail to observe a recent write from another thread to a member field of such an object referent.
Furthermore, when the referent is mutable and lacks thread safety, other threads might see a partially constructed object or an object in a inconsistent state.
The instance of the Map object is mutable because of its put() method.
Interleaved calls to get() and put() may result in the retrieval of internally inconsistent values from the Map object because put() modifies its state. Declaring the object reference volatile is insufficient to eliminate this data race.
Since volatile variable establishes a happens-before relationship, when one thread has an update, it's just can inform others accessing barrier.
From a memory visibility perspective, writing a volatile
variable is like exiting a synchronized block and reading a volatile
variable is like entering a synchronized block.
I am trying to understand the two methods here in java unsafe:
public native short getShortVolatile(Object var1, long var2);
vs
public native short getShort(Object var1, long var2);
What is the real difference here? What does volatile here really work for? I found API doc here: http://www.docjar.com/docs/api/sun/misc/Unsafe.html#getShortVolatile(Object,%20long)
But it does not really explain anything for the difference between the two functions.
My understanding is that, for volatile, it only matters when we do write. To me, it should make sense that we call putShortVolatile and then for reading, we can simply call getShort() since volatile write already guarantee the new value has been flushed into main memory.
Please kindly correct me if anything is wrong. Thanks!
Here there is an article: http://mydailyjava.blogspot.it/2013/12/sunmiscunsafe.html
Unsafe supports all primitive values and can even write values without hitting thread-local caches by using the volatile forms of the methods
getXXX(Object target, long offset): Will read a value of type XXX from target's address at the specified offset.
getXXXVolatile(Object target, long offset): Will read a value of type XXX from target's address at the specified offset and not hit any thread local caches.
putXXX(Object target, long offset, XXX value): Will place value at target's address at the specified offset.
putXXXVolatile(Object target, long offset, XXX value): Will place value at target's address at the specified offset and not hit any thread local caches.
UPDATE:
You can find more information about memory management and volatile fields on this article: http://cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html (it contains also some example of reordering).
In multiprocessor systems, processors generally have one or more layers of memory cache, which improves performance both by speeding access to data (because the data is closer to the processor) and reducing traffic on the shared memory bus (because many memory operations can be satisfied by local caches.) Memory caches can improve performance tremendously, but they present a host of new challenges. What, for example, happens when two processors examine the same memory location at the same time? Under what conditions will they see the same value?
Some processors exhibit a strong memory model, where all processors see exactly the same value for any given memory location at all times. Other processors exhibit a weaker memory model, where special instructions, called memory barriers, are required to flush or invalidate the local processor cache in order to see writes made by other processors or make writes by this processor visible to others.
The issue of when a write becomes visible to another thread is compounded by the compiler's reordering of code. If a compiler defers an operation, another thread will not see it until it is performed; this mirrors the effect of caching. Moreover, writes to memory can be moved earlier in a program; in this case, other threads might see a write before it actually "occurs" in the program.
Java includes several language constructs, including volatile, final, and synchronized, which are intended to help the programmer describe a program's concurrency requirements to the compiler. The Java Memory Model defines the behavior of volatile and synchronized, and, more importantly, ensures that a correctly synchronized Java program runs correctly on all processor architectures.
As you can see in the section What does volatile do?
Volatile fields are special fields which are used for communicating state between threads. Each read of a volatile will see the last write to that volatile by any thread; in effect, they are designated by the programmer as fields for which it is never acceptable to see a "stale" value as a result of caching or reordering. The compiler and runtime are prohibited from allocating them in registers. They must also ensure that after they are written, they are flushed out of the cache to main memory, so they can immediately become visible to other threads. Similarly, before a volatile field is read, the cache must be invalidated so that the value in main memory, not the local processor cache, is the one seen.
There are also additional restrictions on reordering accesses to volatile variables. Accesses to volatile variables could not be reordered with each other. Is now no longer so easy to reorder normal field accesses around them. Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.
So the difference is that the setXXX() and getXXX() could be reorded or could use cached values not yet synchronized between the threads, while the setXXXVolatile() and the getXXXVolatile() won't be reordered and will use always the last value.
The thread local cache is a temporary storage used from java to improve performances: the data will be written/read into/from the cache before to be flushed on the memory.
In a single thread context you can use both the not-volatile than the volatile version of those methods, there will be no difference. When you write something, it doesn't matter if it is written immediately on memory or just in the thread local cache: when you'll try to read it, you'll be in the same thread, so you'll get the last value for sure (the thread local cache contain the last value).
In a multi thread context, instead, the cache could give you some throubles.
If you init an unsafe object, and you share it between two or more threads, each of those threads will have a copy of it into its local cache (the two threads could be runned on different processors, each one with its cache).
If you use the setXXX() method on a thread, the new value could be written in the thread local cache, but not yet in the memory. So it could happens that just one of the multiple thread contains the new value, while the memory and the other threadds local cache contain the old value. This could bring to unexpected results. The setXXXVolatile() method will write the new value directly on memory, so also the other threadds will be able to access to the new value (if they use the getXXXVolatile() methods).
If you use the getXXX() method, you'll get the local cache value. So if another thread has changed the value on the memory, the current thread local cache could still contains the old value, and you'll get unexpeted results. If you use the getXXXVolatile() method, you'll access directly to the memory, and you'll get the last value for sure.
Using the example of the previous link:
class DirectIntArray {
private final static long INT_SIZE_IN_BYTES = 4;
private final long startIndex;
public DirectIntArray(long size) {
startIndex = unsafe.allocateMemory(size * INT_SIZE_IN_BYTES);
unsafe.setMemory(startIndex, size * INT_SIZE_IN_BYTES, (byte) 0);
}
}
public void setValue(long index, int value) {
unsafe.putInt(index(index), value);
}
public int getValue(long index) {
return unsafe.getInt(index(index));
}
private long index(long offset) {
return startIndex + offset * INT_SIZE_IN_BYTES;
}
public void destroy() {
unsafe.freeMemory(startIndex);
}
}
This class use the putInt and the getInt to write the values into an array allocated on the memory (so outside the heap space).
As said before, those methods write the data in the thread local cache, not immediately in the memory. So when you use the setValue() method, the local cache will be updated immediately, the allocated memory will be updated after a while (it depends from the JVM implementation).
In a single thread context that class will work without problem.
In a multi threads context it could fails.
DirectIntArray directIntArray = new DirectIntArray(maximum);
Runnable t1 = new MyThread(directIntArray);
Runnable t2 = new MyThread(directIntArray);
new Thread(t1).start();
new Thread(t2).start();
Where MyThread is:
public class MyThread implements Runnable {
DirectIntArray directIntArray;
public MyThread(DirectIntArray parameter) {
directIntArray = parameter;
}
public void run() {
call();
}
public void call() {
synchronized (this) {
assertEquals(0, directIntArray.getValue(0L)); //the other threads could have changed that value, this assert will fails if the local thread cache is already updated, will pass otherwise
directIntArray.setValue(0L, 10);
assertEquals(10, directIntArray.getValue(0L));
}
}
}
With putIntVolatile() and getIntVolatile(), one of the two threads will fails for sure (the second threads will get 10 instead of 0).
With putInt() and getInt(), both the threads could finish with success (because the local cache of both threads could still contains 0 if the writer cache wasn't been flushed or the reader cache wasn't been refreshed).
I think that getShortVolatile is reading a plain short from an Object, but treats it as a volatile; it's like reading a plain variable and inserting the needed barriers (if any) yourself.
Much simplified (and to some degree wrong, but just to get the idea). Release/Acquire semantics:
Unsafe.weakCompareAndSetIntAcquire // Acquire
update some int here
Unsafe.weakCompareAndSetIntRelease // Release
As to why this is needed (this is for getIntVolatile, but the case still stands), is to probably enforce non-reorderings. Again, this is a bit beyond me and Gil Tene explaining this is FAR more suited.