In a recent answer I suggested that it is possible to achieve the functionality of volatile by synchronizing on the object containing the variable we need to be volatile (asker does not have access to the variable in code).
This got me thinking that I actually don't need to block on the containing object, I just need to achieve a memory barrier. As synchronized achieves both synchronisation and a memory barrier, if all I need is the memory barrier (as in this case) would it actually be better to use synchronized(new Object()) to achieve my memory barrier and ensure the lock is never contended?
As explained here: http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#synchronization
synchronized(new Object()) is considered a noop and may be removed entirely by the compiler. You won't get a memory barrier out of it.
In addition to #assylias's very good point, also consider that synchronized does not achieve a memory barrier by specification. It is only the case that this is how it is implemented on today's typical CPU-memory architectures. The specification only guarantees what happens when two threads acquire the same lock.
Anyway, if you don't care about the specification, but only about the real-world implementations, then why not introduce your own volatile variable and simply write to it whenever you want a memory barrier? It is irrelevant which volatile you write to, as long as we're talking about a constrained set of architectures which is implied by your synchronized(new Object()) idea.
would it actually be better to use synchronized(new Object()) to achieve my memory barrier and ensure the lock is never contended?
Nop. The JVM can easily prove that this lock can't be accessed by two threads (since it is a thread-local variable) and will almost certainly turn it into a no-op, i.e. completely remove the synchronized statement.
Related
From the book Java in Nutshell 6th edition one can read:
The reason we use the word synchronized as the keyword for “requires temporary
exclusive access” is that in addition to acquiring the monitor, the JVM also rereads
the current state of the object from the main memory when the block is entered. Similarly,
when the synchronized block or method is exited, the JVM flushes any modified
state of the object back to the main memory.
as well as:
Without synchronization, different CPU cores in the system may not see the same
view of memory and memory inconsistencies can damage the state of a running
the program, as we saw in our ATM example.
It suggests when the synchronized method is entered the object is loaded from the main memory to maintain memory consistency
But is this the case for objects without synchronized keywords also? So in case of a normal object is modified in one core of a CPU is synchronized with main memory so that other cores can see it?
While the other answer talks about the importance of cache synchronisation and the memory hierarchy, the synchronized keyword doesn’t control the state of the object as a whole; rather, it is about the lock associated with that object.
Every object instance in Java can have an associated lock which prevents multiple threads running at the same time for those blocks which are synchronized on the lock. This is either implicit on the this instance or on the argument to the synchronized keyword (or the class if a static method).
While the JMM semantics say that this lock object is properly controlled and available in cache levels, it doesn’t necessarily mean therefore that the object as a whole is protected; fields read from different threads while a single thread is running in a synchronized block or method aren’t dealt with, for example.
In addition the Java memory model has defined “happens before” relationships about how data changes may become visible between threads that you need to take into account, which is why the “volatile” keyword and AtomicXxxx types are present, including var handles relaxed memory models.
So when you talk about synchronised, you need to be aware that it’s only shot the state of the object’s lock and not the state within the object that it is protecting.
First, similar to what happen with other miss information going around like:
Volatile is supposed to make the threads read the values from RAM
disabling thread cache
More detail about why that is not the case can be found this SO thread.
That can be applied to the statements:
the JVM also rereads the current state of the object from the main
memory when the block is entered
and
when the synchronized block or method is exited, the JVM flushes any
modified state of the object back to the main memory
Citing David Schwarz that kindly pointed out the following in the comments and allowed me to used:
That does not happen on any modern system. These are things that the platform might, in theory, have to do to make synchronized work but if they're not necessary on the platform (and they aren't on any platform you're likely to use), they aren't done.
These statements are all in regard to a hypothetical system that has no hardware synchronization and might require these steps. Actual systems have very different hardware designs from this simplified hypothetical system and require very different things to be done. (For example, they typically just require optimization and memory barriers, not flushes to main memory or reads. This is because modern systems are optimized and use caches to avoid having to flush to or re-read from main memory because main memory is very slow and so modern CPUs have hardware optimizations to avoid it.)
Now going back to your question:
But this is case for object without synchronized keyword also ? So in
case of normal object is modified in one core of CPU is synchronized
with main memory so that other core can see it?
TL;DR: It might or not happen; it depends on the hardware and if the Object is read from cache; However, with the use of the synchronized the JVM ensures that it will be.
More detailed answer
So in case of the normal object is modified in one core of CPU is
synchronized with main memory so that other core can see it?
To keep simple and concise, without synchronized it depends on the hardware architecture (e.g., Cache protocol) where the code will be executed and it depends if the Object is (or not) in the cache when it is read/updated.
If the architecture forces that the data in the cores is always consistence with the other cores, then yes. Accessing the cache is much faster than accessing the main memory, and accessing the first levels of cache (e.g., L1) is also faster than access the other levels.
Hence, for performance reasons, normally when the data (e.g., an Object) is loaded from main memory it gets stored in the cache (e.g., L1, L2, and L3) for quicker access in case that same data is needed again.
The first levels of cache tend to be private to each core. Therefore, it might happen that different cores have stored in their private cache (e.g., L1) different states of the "same Object". Consequently, Threads might also be reading/updating different states of the "same Object".
Notice that I wrote "same Object" because conceptually it is the same Object but in practice it is not the same entity but rather a copy of the same Object that was read from the main memory.
I was reading about concurrency in Java, including the volatile variable, for example here: http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
The following quote is very interesting but I don't quite understand it yet:
In effect, because the new memory model places stricter constraints on
reordering [by e.g. the processor for efficiency] of volatile field accesses with other field accesses,
volatile or not, anything that was visible to thread A when it writes
to volatile field f becomes visible to thread B when it reads f.
I already understood that a volatile variable cannot be cached in registers, so any write by any thread will be immediately visible by all other threads. Also according to this (https://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html) reads and writes on volatile variables are atomic (not sure if that would include something like x++, but it's beside the point to this post).
But the quote I provided seems to imply something in addition to that. It says that anything visible to thread A will now be visible to thread B.
So just to make sure I have that right, does this mean that when a thread writes to a volatile variable, it does a full dump of its entire processor registers to main memory? Can you give some more context about how and why this happens? It might also help to compare/contrast this with synchronization (does it follow a similar mechanism or different?). Also, examples never hurt with something as complex as this :).
On x64, the JIT produced an instruction with a read or write barrier. The implementation is in hardware, not software.
does this mean that when a thread writes to a volatile variable, it does a full dump of its entire processor registers to main memory?
No, only data written to memory is flushed. Not registers.
Can you give some more context about how and why this happens?
The CPU implements this using an L2 cache coherency protocol (depending on the CPU)
Note: on a single cpu system, it doesn't need to do anything.
It would also help to compare/contrast this with synchronization (does it follow a similar mechanism or different?).
It uses the same instructions.
Also, examples never hurt with something as complex as this :).
When you read, it adds a read barrier.
When you write, it adds a write barrier.
The CPU then ensures the data stored in your L1 & L2 cache is appropriately synchronised with other CPUs.
Yes, you are correct. This is exactly what happens. This is related to passing so called memory barrier. More details here: https://dzone.com/articles/memory-barriersfences
From this question : AtomicInteger lazySet vs. set and form this link : https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/package-summary.html
I could gather following points
lazySet could be faster than set
lazySet uses store-store barrier (writes before are honored but not the contended writes, which were yet to happen)
I could find one use-case where it could be applied, from the documentation :
Use lazySet when you want to null out a pointer to aid GC.
Are there any other practical use-cases for lazySet ?
Caffeine uses lazy or relaxed writes in many of its data structures.
When nulling out a field (e.g. ConcurrentLinkedStack)
When writing to volatile fields before publishing (e.g. SingleConsumerQueue)
When publish is safely delayable (e.g. BoundedBuffer)
When races are benign (e.g. cache expiration timestamps)
When inside a lock (e.g. BoundedLocalCache)
ConcurrentLinkedQueue uses relaxed writes prior to publishing a node and may lazily sets a node's next field (prior to publishing or to indicate a stale traversal).
You may also enjoy reading the Linux Kernel Memory Barriers paper.
TL;DR How to use .lazySet()? With care, if at all.
The main problem here is that AtomicXXX.lazySet() is low-level performance optimization and it is out of current JLS. You can't prove correctness if your concurrent code with JMM tools if you are using lazySet().
Why it is much faster than volatile write?
Main difference between set and lazySet is absence of StoreLoad barrier.
JSR-133 Cookbook for Compiler Writers:
StoreLoad barriers are needed on nearly all recent multiprocessors, and are usually the most expensive kind.
Moreover, on most popular x86-based hardware StoreLoad is the only explicit barrier (others are just no-op's and cost nothing), so with lazySet you eliminate all (explicit) memory barriers.
Guarantees of lazySet
From the point of JLS there isn't any.
But actually you can reason about lazySet as delayed write which cannot be reordered with any previous write and will happen eventually. Eventually is finite time, if your process makes any progress (e.g. any synchronization action occurs; in addition, size of processor's store buffer is finite). If written value became visible for other thread, you can be sure that all previous writes are visible either (although you cannot formally prove it). So you can treat it as delayed happens-before relationship (but, of course, it's not even close to it's strict and formal definition).
Usage
Most practical usage (except nulling-out references) is making writes far cheaper in context of progress. Simplest example is using lazySet() instead of set() within synchronized block (but in this case there is no great performance impact). Or you can use it instead of writes in single producer cases, when no compete on write occurs.
Disruptor developers are using lazySet exactly for this purpose in their lock-free implementation. Again, it's very hard to argue about correctness of such code, but it's good trick to be aware of.
I would think many uses of AtomicBoolean would benefit from the usage of lazySet() because they are often used as flags to indicate whether something is complete or not, or an outer loop should finish.
This is because in this case the value is initially one value and it eventually becomes another value and then stays there. Obviously this argument applied to almost any atomic that is used in that way.
public void test() {
final AtomicBoolean finished = new AtomicBoolean(false);
new Thread(new Runnable() {
#Override
public void run() {
while (!finished.get()) {
// A long process.
if (wereAllDone()) {
finished.lazySet(true);
}
}
}
}).start();
}
Initially I thought a volatile variable was better than synchronized keyword as it did not involve BLOCKING or CONTEXT SWITCHING. But reading this I am now confused.
Is volatile implemented in a non-blocking approach using low level atomic locks or no?
Is volatile implemented in a non-blocking approach using low level atomic locks or no?
Volatile's implementation varies between each processor but it is a non-blocking field load/store - it is usually implemented via memory-fences but can can also be managed with cache-coherent protocols.
I just read that post. That poster is actually incorrect in his explanations of Volatile vs Synchronized flow and someone corrected him as a comment. Volatile will not hold a lock, you may read that a volatile store is similar to a synchronized release and a volatile load is similar to a synchronized acquire but that only pertains to memory visibility and not actual implementation details
Is volatile implemented in a non-blocking approach using low level atomic locks or no?
Use of volatile erects a memory barrier around the field in question. This does not cause a thread to be put into the "BLOCKING" state. However when the volatile field is accessed, the program has to flush changes to central memory and update cache memory which takes cycles. It may result in a context switch but doesn't necessary cause one.
It's true that volatile does not cause blocking.
However, the statement
a volatile variable was better than synchronized keyword as it did not
involve BLOCKING or CONTEXT SWITCHING.
is very debatable and depends heavily on what you are trying to do. volatile is not equivalent to a lock and declaring a variable volatile does not give any guarantees regarding the atomicity of operations in which that variable is involved e.g. increment.
What volatile does is prevent the compiler and/or CPU from performing instruction reordering or caching of the specific variable. This is known as a memory fence. This nasty little mechanism is required to ensure that in a multithreaded environment all threads reading a specific variable have an up-to-date view of its value. This is called visibility and is different from atomicity.
Atomicity can only be guaranteed in the general case by the use of locks (synchronized) or atomic primitives.
What can be however, confusing, is the fact that using synchronization mechanisms also generates an implicit memory fence, so declaring a variable volatile if you're only going to read/write it inside synchronized blocks is redundant.
Volatile is a java language modifier and how it is providing its guarantees comes down to the JVM implementation. Putting it simple if you set a primitive field as volatile you guarantee whatever thread reads this field it will read the most recent value. It basically prohibits any JVM behind the scenes optimizations and forces all the threads to cross the memory barrier to read the volatile primitive.
BLOCKING means that threads don't wait for each other when reading the same volatile variable doing it without mutual exclusion. However, they trigger putting fences on the hardware level to observe "happens-before" semantics(no memory reordering).
To make this more clear, volatile variable is non-blocking because whenever it is read/ by multiple threads concurrently, CPU-cores tied to their threads communicate directly with the main memory or via CPU cache-coherency (depends on hardware/JVM implementation) and no locking mechanism is put in place.
CONTEXT-SWITCHING
The volatile keyword does not trigger context switching itself from its semantics, but possible and depends on lower-level implementations.
I know that writing to a volatile variable flushes it from the memory of all the cpus, however I want to know if reads to a volatile variable are as fast as normal reads?
Can volatile variables ever be placed in the cpu cache or is it always fetched from the main memory?
You should really check out this article: http://brooker.co.za/blog/2012/09/10/volatile.html. The blog article argues volatile reads can be a lot slower (also for x86) than non-volatile reads on x86.
Test 1 is a parallel read and write to a non-volatile variable. There
is no visibility mechanism and the results of the reads are
potentially stale.
Test 2 is a parallel read and write to a volatile variable. This does not address the OP's question specifically. However worth noting that a contended volatile can be very slow.
Test 3 is a read to a volatile in a tight loop. Demonstrated is that the semantics of what it means to be volatile indicate that the value can change with each loop iteration. Thus the JVM can not optimize the read and hoist it out of the loop. In Test 1, it is likely the value was read and stored once, thus there is no actual "read" occurring.
Credit to Marc Booker for running these tests.
The answer is somewhat architecture dependent. On an x86, there is no additional overhead associated with volatile reads specifically, though there are implications for other optimizations.
JMM cookbook from Doug Lea, see architecture table near the bottom.
To clarify: There is not any additional overhead associated with the read itself. Memory barriers are used to ensure proper ordering. JSR-133 classifies four barriers "LoadLoad, LoadStore, StoreLoad, and StoreStore". Depending on the architecture, some of these barriers correspond to a "no-op", meaning no action is taken, others require a fence. There is no implicit cost associated with the Load itself, though one may be incurred if a fence is in place. In the case of the x86, only a StoreLoad barrier results in a fence.
As pointed out in a blog post, the fact that the variable is volatile means there are assumptions about the nature of the variable that can no longer be made and some compiler optimizations would not be applied to a volatile.
Volatile is not something that should be used glibly, but it should also not be feared. There are plenty of cases where a volatile will suffice in place of more heavy handed locking.
It is architecture dependent. What volatile does is tell the compiler not to optimise that variable away. It forces most operations to treat the variable's state as an unknown. Because it is volatile, it could be changed by another thread or some other hardware operation. So, reads will need to re-read the variable and operations will be of the read-modify-write kind.
This kind of variable is used for device drivers and also for synchronisation with in-memory mutexes/semaphores.
Volatile reads cannot be as quick, especially on multi-core CPUs (but also only single-core).
The executing core has to fetch from the actual memory address to make sure it gets the current value - the variable indeed cannot be cached.
As opposed to one other answer here, volatile variables are not used just for device drivers! They are sometimes essential for writing high performance multi-threaded code!
volatile implies that the compiler cannot optimize the variable by placing its value in a CPU register. It must be accessed from main memory. It may, however, be placed in a CPU cache. The cache will guaranty consistency between any other CPUs/cores in the system. If the memory is mapped to IO, then things are a little more complicated. If it was designed as such, the hardware will prevent that address space from being cached and all accesses to that memory will go to the hardware. If there isn't such a design, the hardware designers may require extra CPU instructions to insure that the read/write goes through the caches, etc.
Typically, the 'volatile' keyword is only used for device drivers in operating systems.