Does JVM guarantee to cache not volatile variable ?
Can a programer depend upon on JVM to always cache non-volatile variables locally for each thread.
Or JVM may or may not do this, thus a programer should not depend upon JVM for this.
Thanks for the answers in advance.
No. The JVM doesn't guarantee "caching" of non-volatile fields. What implementations of JVM guarantee is how volatile fields should behave. Caching of fields is non-standard (unspecified) and can vary from JVM to JVM implementation. So, you shouldn't really rely on it (even if find out, by some way that some data is being cached by a thread)
The java language spec is pretty clear about volatile:
The Java programming language provides a second mechanism, volatile fields, that is more convenient than locking for some purposes.
A field may be declared volatile, in which case the Java Memory Model ensures that all threads see a consistent value for the variable (ยง17.4).
That's it. You got a special keyword defining this special semantic. So, when you think the other way round: without that special keyword, you can't rely on any special semantics. Then you get what the Java Memory Model has to offer; but nothing more.
And to be fully correct - there is of course Unsafe, allowing you to tamper with memory in unsafe ways with very special semantics.
The recommended pattern if you need a snapshot of a field is to copy it to a local variable. This is commonly used when writing code that makes heavy use of atomics and read-modify-conditional-write loops.
Related
Volatile eliminates visibility and ordering issues. While atomic toolkit provides atomicity of operations.
Volatile uses happens-before relation and Atomic uses compare and swap.
Why there is a need to introduce new layer of abstraction like atomic toolkit, instead of enhancing volatile keyword itself? Is there any specific cases which may be solved by atomic toolkit?
Actually if you'll take closer look into Atomic* implemetations then you'll see that all of them holds volatile field with value.
IMHO atomics is already an extension of volatile mechanism which provides convenient way to do atomic CAS operations.
Also there is a benefits of hiding CAS implementation. For example hotspot jvm heavily uses intrinsics to achieve to squeeze performance.
Changing an existing language construct like volatile would most likely mean that you break existing applications. So this is typically not an option, in particular not for the Java language. Creating a new library does not effect existing applications and is thus completely backwards compatible.
In addition to that, the classes in the atomicpackage offer advanced operations like compareAndSet that you cannot just add to the volatile keyword.
In Java, how can I explicitly trigger a full memory fence/barrier, equal to the invocation of
System.Threading.Thread.MemoryBarrier();
in C#?
I know that since Java 5 reads and writes to volatile variables have been causing a full memory fence, but maybe there is an (efficient) way without volatile.
Compared to MemoryBarrier(), Java's happens-before is a much sharper tool, leaving more leeway for aggressive optimization while maintaining thread safety.
A sharper tool, as you would expect, also requires more care to use properly, and that is how the semantics of volatile variable access could be described. You must write to a volatile variable on the write site and read from the same volatile on each reading site. By implication you can have any number of independent, localized "memory barriers", one per a volatile variable, and each guards only the state reachable from that variable.
The full idiom is usually referred to as "safe publication" (although this is a more general term) and implies populating an immutable object graph which will be shared between threads, then writing a reference to it to a volatile variable.
Java 8, via JEP 108 added another possibility. Access to three fences have been to the Java API, fullFence, loadFence and storeFence.
There are no direct equivalent. Use volatile field or more high level things.
What exactly is a situation where you would make use of the volatile keyword? And more importantly: How does the program benefit from doing so?
From what I've read and know already: volatile should be used for variables that are accessed by different threads, because they are slightly faster to read than non-volatile ones. If so, shouldn't there be a keyword to enforce the opposite?
Or are they actually synchronized between all threads? How are normal variables not?
I have a lot of multithreading code and I want to optimize it a bit. Of course I don't hope for huge performance enhancement (I don't have any problems with it atm anyway), but I'm always trying to make my code better. And I'm slightly confused with this keyword.
When a multithreaded program is running, and there is some shared variable which isn't declared as volatile, what these threads do is create a local copy of the variable, and work on the local copy instead. So the changes on the variable aren't reflected. This local copy is created because cached memory access is much faster compared to accessing variables from main memory.
When you declare a variable as volatile, it tells the program NOT to create any local copy of the variable and use the variable directly from the main memory.
By declaring a variable as volatile, we are telling the system that its value can change unexpectedly from anywhere, so always use the value which is kept in the main memory and always make changes to the value of the variable in the main memory and not create any local copies of the variable.
Note that volatile is not a substitute for synchronization, and when a field is declared volatile, the compiler and runtime are put on notice that this variable is shared and that operations on it should not be reordered with other memory operations. Volatile variables are not cached in registers or in caches where they are hidden from other processors, so a read of a volatile variable always returns the most recent write by any thread.
Volatile make accessing the variables slower by having every thread actually access the value each time from memory thus getting the newest value.
This is useful when accessing the variable from different threads.
Use a profiler to tune code and read Tips optimizing Java code
The volatile keyword means that the compiler will force a new read of the variable every time it is referenced. This is useful when that variable is something other than standard memory. Take for instance an embedded system where you're reading a hardware register or interface which appears as a memory location to the processor. External system changes which change the value of that register will not be read correctly if the processor is using a cached value that was read earlier. Using volatile forces a new read and keeps everything synchronized.
Heres a good stack overflow explanation
and Heres a good wiki article
In computer programming, particularly in the C, C++, C#, and Java programming languages, a variable or object declared with the volatile keyword usually has special properties related to optimization and/or threading. Generally speaking, the volatile keyword is intended to prevent the compiler from applying certain optimizations which it might have otherwise applied because ordinarily it is assumed variables cannot change value "on their own."
**^wiki
In short it guarantees that a given thread access the same copy of some data. Any changes in one thread would immediately be noticeable within another thread
volatile concerns memory visibility. The value of the volatile variable becomes visible to all readers after a write operation completes on it. Kind of like turning off caching.
Here is a good stack overflow response: Do you ever use the volatile keyword in Java?
Concerning specific questions, no they are not synchronized. You still need to use locking to accomplish that. Normal variables are neither synchronized or volatile.
To optimize threaded code its probably worth reading up on granularity, optimistic and pessimistic locking.
When objects are locked in languages like C++ and Java where actually on a low level scale) is this performed? I don't think it's anything to do with the CPU/cache or RAM. My best guestimate is that this occurs somewhere in the OS? Would it be within the same part of the OS which performs context switching?
I am referring to locking objects, synchronizing on method signatures (Java) etc.
It could be that the answer depends on which particular locking mechanism?
Locking involves a synchronisation primitive, typically a mutex. While naively speaking a mutex is just a boolean flag that says "locked" or "unlocked", the devil is in the detail: The mutex value has to be read, compared and set atomically, so that multiple threads trying for the same mutex don't corrupt its state.
But apart from that, instructions have to be ordered properly so that the effects of a read and write of the mutex variable are visible to the program in the correct order and that no thread inadvertently enters the critical section when it shouldn't because it failed to see the lock update in time.
There are two aspects to memory access ordering: One is done by the compiler, which may choose to reorder statements if that's deemed more efficient. This is relatively trivial to prevent, since the compiler knows when it must be careful. The far more difficult phenomenon is that the CPU itself, internally, may choose to reorder instructions, and it must be prevented from doing so when a mutex variable is being accessed for the purpose of locking. This requires hardware support (e.g. a "lock bit" which causes a pipeline flush and a bus lock).
Finally, if you have multiple physical CPUs, each CPU will have its own cache, and it becomes important that state updates are propagated to all CPU caches before any executing instructions make further progress. This again requires dedicated hardware support.
As you can see, synchronisation is a (potentially) expensive business that really gets in the way of concurrent processing. That, however, is simply the price you pay for having one single block of memory on which multiple independent context perform work.
There is no concept of object locking in C++. You will typically implement your own on top of OS-specific functions or use synchronization primitives provided by libraries (e.g. boost::scoped_lock). If you have access to C++11, you can use the locks provided by the threading library which has a similar interface to boost, take a look.
In Java the same is done for you by the JVM.
The java.lang.Object has a monitor built into it. That's what is used to lock for the synchronized keyword. JDK 6 added a concurrency packages that give you more fine-grained choices.
This has a nice explanation:
http://www.artima.com/insidejvm/ed2/threadsynch.html
I haven't written C++ in a long time, so I can't speak to how to do it in that language. It wasn't supported by the language when I last wrote it. I believe it was all 3rd party libraries or custom code.
It does depend on the particular locking mechanism, typically a semaphore, but you cannot be sure, since it is implementation dependent.
All architectures I know of use an atomic Compare And Swap to implement their synchronization primitives. See, for example, AbstractQueuedSynchronizer, which was used in some JDK versions to implement Semiphore and ReentrantLock.
As a C++ programmer becoming more familiar with Java, it's a little odd to me to see language level support for locking on arbitrary objects without any kind of declaration that the object supports such locking. Creating mutexes for every object seems like a heavy cost to be automatically opted into. Besides memory usage, mutexes are an OS limited resource on some platforms. You could spin lock if mutexes aren't available but the performance characteristics of that are significantly different, which I would expect to hurt predictability.
Is the JVM smart enough in all cases to recognize that a particular object will never be the target of the synchronized keyword and thus avoid creating the mutex? The mutexes could be created lazily, but that poses a bootstrapping problem that itself necessitates a mutex, and even if that were worked around I assume there's still going to be some overhead for tracking whether a mutex has already been created or not. So I assume if such an optimization is possible, it must be done at compile time or startup. In C++ such an optimization would not be possible due to the compilation model (you couldn't know if the lock for an object was going to be used across library boundaries), but I don't know enough about Java's compilation and linking to know if the same limitations apply.
Speaking as someone who has looked at the way that some JVMs implement locks ...
The normal approach is to start out with a couple of reserved bits in the object's header word. If the object is never locked, or if it is locked but there is no contention it stays that way. If and when contention occurs on a locked object, the JVM inflates the lock into a full-blown mutex data structure, and it stays that way for the lifetime of the object.
EDIT - I just noticed that the OP was talking about OS-supported mutexes. In the examples that I've looked at, the uninflated mutexes were implemented directly using CAS instructions and the like, rather than using pthread library functions, etc.
This is really an implementation detail of the JVM, and different JVMs may implement it differently. However, it is definitely not something that can be optimized at compile time, since Java links at runtime, and this it is possible for previously unknown code to get a hold of an object created in older code and start synchronizing on it.
Note that in Java lingo, the synchronization primitive is called "monitor" rather than mutex, and it is supported by special bytecode operations. There's a rather detailed explanation here.
You can never be sure that an object will never be used as a lock (consider reflection). Typically every object has a header with some bits dedicated to the lock. It is possible to implement it such that the header is only added as needed, but that gets a bit complicated and you probably need some header anyway (class (equivalent of "vtbl" and allocation size in C++), hash code and garbage collection).
Here's a wiki page on the implementation of synchronisation in the OpenJDK.
(In my opinion, adding a lock to every object was a mistake.)
can't JVM use compare-and-swap instruction directly? let's say each object has a field lockingThreadId storing the id of the thread that is locking it,
while( compare_and_swap (obj.lockingThreadId, null, thisThreadId) != thisTheadId )
// failed, someone else got it
mark this thread as waiting on obj.
shelf this thead
//out of loop. now this thread locked the object
do the work
obj.lockingThreadId = null;
wake up threads waiting on the obj
this is a toy model, but it doesn't seem too expensive, and does no rely on OS.