Initially I thought a volatile variable was better than synchronized keyword as it did not involve BLOCKING or CONTEXT SWITCHING. But reading this I am now confused.
Is volatile implemented in a non-blocking approach using low level atomic locks or no?
Is volatile implemented in a non-blocking approach using low level atomic locks or no?
Volatile's implementation varies between each processor but it is a non-blocking field load/store - it is usually implemented via memory-fences but can can also be managed with cache-coherent protocols.
I just read that post. That poster is actually incorrect in his explanations of Volatile vs Synchronized flow and someone corrected him as a comment. Volatile will not hold a lock, you may read that a volatile store is similar to a synchronized release and a volatile load is similar to a synchronized acquire but that only pertains to memory visibility and not actual implementation details
Is volatile implemented in a non-blocking approach using low level atomic locks or no?
Use of volatile erects a memory barrier around the field in question. This does not cause a thread to be put into the "BLOCKING" state. However when the volatile field is accessed, the program has to flush changes to central memory and update cache memory which takes cycles. It may result in a context switch but doesn't necessary cause one.
It's true that volatile does not cause blocking.
However, the statement
a volatile variable was better than synchronized keyword as it did not
involve BLOCKING or CONTEXT SWITCHING.
is very debatable and depends heavily on what you are trying to do. volatile is not equivalent to a lock and declaring a variable volatile does not give any guarantees regarding the atomicity of operations in which that variable is involved e.g. increment.
What volatile does is prevent the compiler and/or CPU from performing instruction reordering or caching of the specific variable. This is known as a memory fence. This nasty little mechanism is required to ensure that in a multithreaded environment all threads reading a specific variable have an up-to-date view of its value. This is called visibility and is different from atomicity.
Atomicity can only be guaranteed in the general case by the use of locks (synchronized) or atomic primitives.
What can be however, confusing, is the fact that using synchronization mechanisms also generates an implicit memory fence, so declaring a variable volatile if you're only going to read/write it inside synchronized blocks is redundant.
Volatile is a java language modifier and how it is providing its guarantees comes down to the JVM implementation. Putting it simple if you set a primitive field as volatile you guarantee whatever thread reads this field it will read the most recent value. It basically prohibits any JVM behind the scenes optimizations and forces all the threads to cross the memory barrier to read the volatile primitive.
BLOCKING means that threads don't wait for each other when reading the same volatile variable doing it without mutual exclusion. However, they trigger putting fences on the hardware level to observe "happens-before" semantics(no memory reordering).
To make this more clear, volatile variable is non-blocking because whenever it is read/ by multiple threads concurrently, CPU-cores tied to their threads communicate directly with the main memory or via CPU cache-coherency (depends on hardware/JVM implementation) and no locking mechanism is put in place.
CONTEXT-SWITCHING
The volatile keyword does not trigger context switching itself from its semantics, but possible and depends on lower-level implementations.
Related
I was reading about concurrency in Java, including the volatile variable, for example here: http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
The following quote is very interesting but I don't quite understand it yet:
In effect, because the new memory model places stricter constraints on
reordering [by e.g. the processor for efficiency] of volatile field accesses with other field accesses,
volatile or not, anything that was visible to thread A when it writes
to volatile field f becomes visible to thread B when it reads f.
I already understood that a volatile variable cannot be cached in registers, so any write by any thread will be immediately visible by all other threads. Also according to this (https://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html) reads and writes on volatile variables are atomic (not sure if that would include something like x++, but it's beside the point to this post).
But the quote I provided seems to imply something in addition to that. It says that anything visible to thread A will now be visible to thread B.
So just to make sure I have that right, does this mean that when a thread writes to a volatile variable, it does a full dump of its entire processor registers to main memory? Can you give some more context about how and why this happens? It might also help to compare/contrast this with synchronization (does it follow a similar mechanism or different?). Also, examples never hurt with something as complex as this :).
On x64, the JIT produced an instruction with a read or write barrier. The implementation is in hardware, not software.
does this mean that when a thread writes to a volatile variable, it does a full dump of its entire processor registers to main memory?
No, only data written to memory is flushed. Not registers.
Can you give some more context about how and why this happens?
The CPU implements this using an L2 cache coherency protocol (depending on the CPU)
Note: on a single cpu system, it doesn't need to do anything.
It would also help to compare/contrast this with synchronization (does it follow a similar mechanism or different?).
It uses the same instructions.
Also, examples never hurt with something as complex as this :).
When you read, it adds a read barrier.
When you write, it adds a write barrier.
The CPU then ensures the data stored in your L1 & L2 cache is appropriately synchronised with other CPUs.
Yes, you are correct. This is exactly what happens. This is related to passing so called memory barrier. More details here: https://dzone.com/articles/memory-barriersfences
My understanding, is that the JSR-133 cookbook is a well quoted guide of how to implement the Java memory model using a series of memory barriers, (or at least the visibility guarantees).
It is also my understanding based on the description of the different types of barriers, that StoreLoad is the only one that guarantees all CPU buffers are flushed to cache and therefore ensure fresh reads (by avoiding store forwarding) and guarantees the observation of the latest value due to cache coherency.
I was looking at the table of specific barriers required for different program order inter-leavings of volatile/regular stores/loads and what memory barriers would be required.
From my intuition this table seems incomplete. For example, the Java memory model guarantees visibility on the acquire action of a monitor to all actions performed before it's release in another thread, even if the values being updated are non volatile. In the table in the link above, it seems as if the only actions that flush CPU buffers and propagate changes/allow new changes to be observed are a Volatile Store or MonitorExit followed by a Volatile Load or MonitorEnter. I don't see how the barriers could guarantee visibility in my above example, when those operations (according to the table) only use LoadStore and StoreStore which from my understanding are only concerned with re-ordering in a thread and cannot enforce the happens before guarantee (across threads that is).
Where have I gone wrong in my understanding here? Or does this implementation only enforce happens before and not the synchronization guarantees or extra actions on acquiring/releasing monitors.
Thanks
StoreLoad is the only one that guarantees all CPU buffers are flushed to cache and therefore ensure fresh reads (by avoiding store forwarding) and guarantees the observation of the latest value due to cache coherency.
This may be true for x86 architectures, but you shouldn't be thinking on that level of abstraction. It may be the case that cache coherence can be costly for the processors to be executing.
Take mobile devices for example, one important goal is to reduce the amount of battery use programs consume. In that case, they may not participate in cache coherence and StoreLoad loses this feature.
I don't see how the barriers could guarantee visibility in my above example, when those operations (according to the table) only use LoadStore and StoreStore which from my understanding are only concerned with re-ordering in a thread and cannot enforce the happens before guarantee (across threads that is).
Let's just consider a volatile field. How would a volatile load and store look? Well, Aleksey Shipilëv has a great write up on this, but I will take a piece of it.
A volatile store and then subsequent load would look like:
<other ops>
[StoreStore]
[LoadStore]
x = 1; // volatile store
[StoreLoad] // Case (a): Guard after volatile stores
...
[StoreLoad] // Case (b): Guard before volatile loads
int t = x; // volatile load
[LoadLoad]
[LoadStore]
<other ops>
So, <other ops> can be non-volatile writes, but as you see those writes are committed to memory prior to the volatile store. Then when we are ready to read the LoadLoad LoadStore will force a wait until the volatile store succeeds.
Lastly, the StoreLoad before and after ensures the volatile load and store cannot be reordered if the immediately precede one another.
The barriers in the document are abstract concepts that more-or-less map to different things on different CPUs. But they are only guidelines. The rules that the JVMs actually have to follow are those in JLS Chapter 17.
Barriers as a concept are also "global" in the sense that they order all prior and following instructions.
For example, the Java memory model guarantees visibility on the acquire action of a monitor to all actions performed before it's release in another thread, even if the values being updated are non volatile.
Acquiring a monitor is the monitor-enter in the cookbook, which only needs to be visible to other threads that contend on the lock. The monitor-exit is the release action, which will prevent loads and stores prior to it from moving bellow it. You can see this in the cookbook tables where the first operation is a normal load/store, and the second is a volatile-store or monitor-exit.
On CPUs with Total Store Order, the store buffers, where available, have no impact on correctness; only on performance.
In any case, it's up to the JVM to use instructions that provide the atomicity and visibility semantics that the JLS demands. And that's the key take-away: If you write Java code, you code against the abstract machine defined in the JLS. You would only dive into the implementation details of the concrete machine, if coding only to the abstract machine doesn't give you the performance you need. You don't need to go there for correctness.
I'm not sure where you got that StoreLoad barriers are the only type that enforce some particular behavior. All of the barriers, abstractly, enforce exactly what they are defined to enforce. For example, LoadLoad prevents any prior load from reordering with any subsequent load.
There may be architecture specific descriptions of how a particular barrier is enforced: for example, on x86 all the barriers other than StoreLoad are no-ops since the chip architecture enforces the other orderings automatically, and StoreLoad is usually implemented as a store buffer flush. Still, all the barriers have their abstract definition which is architecture-independent and the cookbook is defined in terms of that, along with a mapping of the conceptual barriers to actual ISA-specific implementations.
In particular, even if a barrier is "no-op" on a particular platform, it means that the ordering is preserved and hence all the happens-before and other synchronization requirements are satisfied.
What exactly is a situation where you would make use of the volatile keyword? And more importantly: How does the program benefit from doing so?
From what I've read and know already: volatile should be used for variables that are accessed by different threads, because they are slightly faster to read than non-volatile ones. If so, shouldn't there be a keyword to enforce the opposite?
Or are they actually synchronized between all threads? How are normal variables not?
I have a lot of multithreading code and I want to optimize it a bit. Of course I don't hope for huge performance enhancement (I don't have any problems with it atm anyway), but I'm always trying to make my code better. And I'm slightly confused with this keyword.
When a multithreaded program is running, and there is some shared variable which isn't declared as volatile, what these threads do is create a local copy of the variable, and work on the local copy instead. So the changes on the variable aren't reflected. This local copy is created because cached memory access is much faster compared to accessing variables from main memory.
When you declare a variable as volatile, it tells the program NOT to create any local copy of the variable and use the variable directly from the main memory.
By declaring a variable as volatile, we are telling the system that its value can change unexpectedly from anywhere, so always use the value which is kept in the main memory and always make changes to the value of the variable in the main memory and not create any local copies of the variable.
Note that volatile is not a substitute for synchronization, and when a field is declared volatile, the compiler and runtime are put on notice that this variable is shared and that operations on it should not be reordered with other memory operations. Volatile variables are not cached in registers or in caches where they are hidden from other processors, so a read of a volatile variable always returns the most recent write by any thread.
Volatile make accessing the variables slower by having every thread actually access the value each time from memory thus getting the newest value.
This is useful when accessing the variable from different threads.
Use a profiler to tune code and read Tips optimizing Java code
The volatile keyword means that the compiler will force a new read of the variable every time it is referenced. This is useful when that variable is something other than standard memory. Take for instance an embedded system where you're reading a hardware register or interface which appears as a memory location to the processor. External system changes which change the value of that register will not be read correctly if the processor is using a cached value that was read earlier. Using volatile forces a new read and keeps everything synchronized.
Heres a good stack overflow explanation
and Heres a good wiki article
In computer programming, particularly in the C, C++, C#, and Java programming languages, a variable or object declared with the volatile keyword usually has special properties related to optimization and/or threading. Generally speaking, the volatile keyword is intended to prevent the compiler from applying certain optimizations which it might have otherwise applied because ordinarily it is assumed variables cannot change value "on their own."
**^wiki
In short it guarantees that a given thread access the same copy of some data. Any changes in one thread would immediately be noticeable within another thread
volatile concerns memory visibility. The value of the volatile variable becomes visible to all readers after a write operation completes on it. Kind of like turning off caching.
Here is a good stack overflow response: Do you ever use the volatile keyword in Java?
Concerning specific questions, no they are not synchronized. You still need to use locking to accomplish that. Normal variables are neither synchronized or volatile.
To optimize threaded code its probably worth reading up on granularity, optimistic and pessimistic locking.
In a recent answer I suggested that it is possible to achieve the functionality of volatile by synchronizing on the object containing the variable we need to be volatile (asker does not have access to the variable in code).
This got me thinking that I actually don't need to block on the containing object, I just need to achieve a memory barrier. As synchronized achieves both synchronisation and a memory barrier, if all I need is the memory barrier (as in this case) would it actually be better to use synchronized(new Object()) to achieve my memory barrier and ensure the lock is never contended?
As explained here: http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#synchronization
synchronized(new Object()) is considered a noop and may be removed entirely by the compiler. You won't get a memory barrier out of it.
In addition to #assylias's very good point, also consider that synchronized does not achieve a memory barrier by specification. It is only the case that this is how it is implemented on today's typical CPU-memory architectures. The specification only guarantees what happens when two threads acquire the same lock.
Anyway, if you don't care about the specification, but only about the real-world implementations, then why not introduce your own volatile variable and simply write to it whenever you want a memory barrier? It is irrelevant which volatile you write to, as long as we're talking about a constrained set of architectures which is implied by your synchronized(new Object()) idea.
would it actually be better to use synchronized(new Object()) to achieve my memory barrier and ensure the lock is never contended?
Nop. The JVM can easily prove that this lock can't be accessed by two threads (since it is a thread-local variable) and will almost certainly turn it into a no-op, i.e. completely remove the synchronized statement.
I know that writing to a volatile variable flushes it from the memory of all the cpus, however I want to know if reads to a volatile variable are as fast as normal reads?
Can volatile variables ever be placed in the cpu cache or is it always fetched from the main memory?
You should really check out this article: http://brooker.co.za/blog/2012/09/10/volatile.html. The blog article argues volatile reads can be a lot slower (also for x86) than non-volatile reads on x86.
Test 1 is a parallel read and write to a non-volatile variable. There
is no visibility mechanism and the results of the reads are
potentially stale.
Test 2 is a parallel read and write to a volatile variable. This does not address the OP's question specifically. However worth noting that a contended volatile can be very slow.
Test 3 is a read to a volatile in a tight loop. Demonstrated is that the semantics of what it means to be volatile indicate that the value can change with each loop iteration. Thus the JVM can not optimize the read and hoist it out of the loop. In Test 1, it is likely the value was read and stored once, thus there is no actual "read" occurring.
Credit to Marc Booker for running these tests.
The answer is somewhat architecture dependent. On an x86, there is no additional overhead associated with volatile reads specifically, though there are implications for other optimizations.
JMM cookbook from Doug Lea, see architecture table near the bottom.
To clarify: There is not any additional overhead associated with the read itself. Memory barriers are used to ensure proper ordering. JSR-133 classifies four barriers "LoadLoad, LoadStore, StoreLoad, and StoreStore". Depending on the architecture, some of these barriers correspond to a "no-op", meaning no action is taken, others require a fence. There is no implicit cost associated with the Load itself, though one may be incurred if a fence is in place. In the case of the x86, only a StoreLoad barrier results in a fence.
As pointed out in a blog post, the fact that the variable is volatile means there are assumptions about the nature of the variable that can no longer be made and some compiler optimizations would not be applied to a volatile.
Volatile is not something that should be used glibly, but it should also not be feared. There are plenty of cases where a volatile will suffice in place of more heavy handed locking.
It is architecture dependent. What volatile does is tell the compiler not to optimise that variable away. It forces most operations to treat the variable's state as an unknown. Because it is volatile, it could be changed by another thread or some other hardware operation. So, reads will need to re-read the variable and operations will be of the read-modify-write kind.
This kind of variable is used for device drivers and also for synchronisation with in-memory mutexes/semaphores.
Volatile reads cannot be as quick, especially on multi-core CPUs (but also only single-core).
The executing core has to fetch from the actual memory address to make sure it gets the current value - the variable indeed cannot be cached.
As opposed to one other answer here, volatile variables are not used just for device drivers! They are sometimes essential for writing high performance multi-threaded code!
volatile implies that the compiler cannot optimize the variable by placing its value in a CPU register. It must be accessed from main memory. It may, however, be placed in a CPU cache. The cache will guaranty consistency between any other CPUs/cores in the system. If the memory is mapped to IO, then things are a little more complicated. If it was designed as such, the hardware will prevent that address space from being cached and all accesses to that memory will go to the hardware. If there isn't such a design, the hardware designers may require extra CPU instructions to insure that the read/write goes through the caches, etc.
Typically, the 'volatile' keyword is only used for device drivers in operating systems.