Are java Lock implementations using synchronized code? [duplicate] - java

Is there a difference between 'ReentrantLock' and 'synchronized' on how it's implemented on CPU level?
Or do they use the same 'CAS' approach?

If we are talking about ReentrantLock vs synchronized (also known as "intrinsic lock") then it's a good idea to look at Lock documentation:
All Lock implementations must enforce the same memory synchronization semantics as provided by the built-in monitor lock:
A successful lock operation acts like a successful monitorEnter
action
A successful unlock operation acts like a successful monitorExit
action
So in general consider that synchronized is just an easy-to-use and concise approach of locking. You can achieve exactly the same synchronization effects by writing code with ReentrantLock with a bit more code (but it offers more options and flexibility).
Some time ago ReentrantLock was way faster under certain conditions (high contention for example), but now Java uses different optimizations techniques (like lock coarsening and adaptive locking) to make performance differences in many typical scenarios barely visible to the programmer.
There was also done a great job to optimize intrinsic lock in low-contention cases (e.g. biased locking). Authors of Java platform do like synchronized keyword and intrinsic-locking approach, they want programmers do not fear to use this handy tool (and prevent possible bugs). That's why synchronized optimizations and "synchronization is slow" myth busting was such a big deal for Sun and Oracle.
"CPU-part" of the question:
synchronized uses a locking mechanism that is built into the JVM and MONITORENTER / MONITOREXIT bytecode instructions. So the underlying implementation is JVM-specific (that is why it is called intrinsic lock) and AFAIK usually (subject to change) uses a pretty conservative strategy: once lock is "inflated" after threads collision on lock acquiring, synchronized begin to use OS-based locking ("fat locking") instead of fast CAS ("thin locking") and do not "like" to use CAS again soon (even if contention is gone).
ReentrantLock implementation is based on AbstractQueuedSynchronizer and coded in pure Java (uses CAS instructions and thread descheduling which was introduced it Java 5), so it is more stable across platforms, offers more flexibility and tries to use fast CAS appoach for acquiring a lock every time (and OS-level locking if failed).
So, the main difference between these locks implementations in terms of performance is a lock acquiring strategy (which may not exist in specific JVM implementation or situation).
And there is no general answer which locking is better + it is a subject to change during the time and platforms. You should look at the specific problem and its nature to pick the most suitable solution (as usually in Java)
PS: you're pretty curious and I highly recommend you to look at HotSpot sources to go deeper (and to find out exact implementations for specific platform version). It may really help. Starting point is somewhere here: http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/synchronizer.cpp

The ReentrantLock class, which implements Lock, has the same concurrency and memory semantics as synchronized, but also adds features like lock polling, timed lock waits, and interruptible lock waits. Additionally, it offers far better performance under heavy contention.
Source
Above answer is extract from Brian Goetz's article. You should read entire article. It helped me to understand differences in both.

Related

How does StampedLock queue lock requests?

I am investigating locking a cache based on Java8's StampedLock (javadoc here) but I can't find a convincing implementation on the net to follow, despite reading articles like StampedLock Idioms.
I don't feel very positive about Java's multi-threading and concurrency offerings after being shocked that ReentrantReadWriteLock doesn't allow upgrading of a read lock to a write lock, followed by the difficulty homing in on a reputable alternative solution.
My issue is that there's no definitive statement to allay my fears that StampedLock will block write requests indefinitely while there are read requests queued.
Looking at the documentation, there are 2 comments which raise my suspicions.
From the Javadoc:
The scheduling policy of StampedLock does not consistently prefer
readers over writers or vice versa. All "try" methods are best-effort
and do not necessarily conform to any scheduling or fairness policy.
From the source code:
* These rules apply to threads actually queued. All tryLock forms
* opportunistically try to acquire locks regardless of preference
* rules, and so may "barge" their way in. Randomized spinning is
* used in the acquire methods to reduce (increasingly expensive)
* context switching while also ....
So it hints at a queue for read and write locks but I'd need to read and digest the whole 1500 lines of source code to nail it.
I assume it must be there because I found a good benchmarking article which shows that StampedLock is the way to go for many reads / few writes. However I'm still concerned because of the lack of coverage online.
Fundamentally I guess I expected an implementation where I could plug'n'play following the javadoc, but in the end I'm left rooting around the net wondering why there isn't an example anywhere of a looped StampedLock#tryOptimisticRead() - even the code from the benchmark article doesn't do that.
Is Java concurrency this difficult or have I missed something obvious?
"Is Java concurrency this difficult or have I missed something obvious?"
It is a matter of opinion1 whether Java concurrency is more difficult than (say) C++ or Python concurrency.
But yes, concurrency is difficult2 in any language that allows different threads to directly update shared data structures. Languages that (only) support CSP-like concurrency are easier to understand and reason about.
Reference:
https://en.wikipedia.org/wiki/Communicating_sequential_processes
To your point about fairness, it is true that most forms of locking in Java do not guarantee fairness. And indeed many things to do with thread scheduling are (deliberately) loosely specified. But it is not difficult to write code that avoids these problems ... once you understand the libraries and how to use them.
To your specific issue about StampedLock behavior.
My issue is that there's no definitive statement to allay my fears that StampedLock will block write requests indefinitely while there are read requests queued.
There is no such statement3 because such behavior is possible. It follows from a careful reading of the StampedLock API documentation. For example, it says:
"The scheduling policy of StampedLock does not consistently prefer readers over writers or vice versa."
In short, there is nothing there that guarantees that an untimed writeLock will eventually acquire the lock.
If you need to categorically avoid the scenario of readers causing writer starvation, then don't use writeLock and readLock. You could use tryOptimisticRead instead of readLock. Or you could design and implement a different synchronization mechanism.
Finally, you seems to be implying that StampedLock should provide a way to directly deal with your scenario and / or that the document should specifically explain to non-expert users how to deal with it. I draw people's attention to this:
"StampedLocks are designed for use as internal utilities in the development of thread-safe components.".
The fact that you are having difficulty finding pertinent examples in not the fault of the javadocs. If anything, it supports the inference that this API is for experts ...
1 - My opinion is that Java's concurrency support is at least easier to reason about than most other languages of its ilk. The Java Memory Model (Chapter 17.4 of the JLS) is well specified, and "Java Concurrency In Practice" by Goetz et al does a good job of explaining the ins and outs of concurrent programming.
2 - .... for most programmers.
3 - If this is not definitive enough for you, write yourself an example where there is a (simulated) indefinitely large sequence of read requests and multiple reader threads. Run it and see if the writer threads get stalled until the read requests are all drained.

ReadWriteLock vs StampedLock

I've been using ReadWriteLock`s to implement/maintain a locking idioms.
Since JDK8 StampedLock has been introduced. And as RWLocks are known with their slowness and bad performance, StampedLock's look like an alternative (they are not reentrant, so much faster).
However except the performance, it looks to me that StampedLock's are much harder and complex to maintain and use - e.g. threads can now deadlock against themselves - so corresponding actions should be taken.
What are the benefits of StampedLock over RWLock ?
This article explains the differences in detail.
The ReentrantReadWriteLock had a lot of shortcomings: It suffered from starvation. You could not upgrade a read lock into a write lock. There was no support for optimistic reads. Programmers "in the know" mostly avoided using them.
Doug Lea's new Java 8 StampedLock addresses all these shortcomings. With some clever code idioms we can also get better performance.
Well, yes ReentrantReadWriteLock had problems (when compared to the traditional synchronized block) in 5.0 but they fixed it in java 6.0.
So, if you guys use Java 6 in production, you can use the lock API with confidence.
Performance wise lock & traditional synchronization gives you same.
The good thing about lock API is that it uses CAS/non-blocking so it will never end up with deadlock unless you forget to unlock it in the finally block.

synchronized vs ReentrantLock vs AtomicInteger execution time

I can see that ReentrantLock is around 50% faster than synchronized and AtomicInteger 100% faster. Why such difference with the execution time of these three synchronization methods: synchronized blocks, ReentrantLock and AtomicInteger (or whatever class from the Atomic package).
Are there any other popular and extended synchronizing methods aside than these ones?
A number of factor effect this.
the version of Java. Java 5.0 was much faster for ReentrantLock, Java 7 not so much
the level of contention. synchronized works best (as does locking in general) with low contention rates. ReentrantLock works better with higher contention rates. YMWV
how much optimisation can the JIT do. The JIT optimise synchronized in ways ReentrantLOck is not. If this is not possible you won't see the advantage.
synchronized is GC free in it's actions. ReentrantLock can create garbage which can make it slower and trigger GCs depending on how it is used.
AtomicInteger uses the same primitives that locking uses but does a busy wait. CompareAndSet also called CompareAndSwap i.e. it is much simpler in what it does (and much more limited as well)
The ConcurrentXxxx, CopyOnWriteArrayXxxx collections are very popular. These provide concurrency without needing to use lock directly (and in some cases no locks at all)
AtomicInteger is much faster than the other two synchronization methods on your hardware because it is lock-free. On architectures where the CPU provides basic facilities for lock-free concurrency, AtomicInteger's operations are performed entirely in hardware, with the critical usually taking a single CPU instruction. In contrast, ReentrantLock and synchronized use multiple instructions to perform their task, so you see some considerable overhead associated with them.
I think that you are doing a common mistake evaluating those 3 elements for comparison.
Basically when a ReentrantLock is something that allows you more flexibility when your are synchronizing blocks compared with the synchronized key. Atomic is something that adopts a different approach based on CAS(Compare and Swap) to manage the updates in a concurrent context.
I suggest you to read in deep a bible of concurrency for the Java platform.
Java Concurrency in Practice - Brian Göetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes & Doug Lea
There's a lot difference in having a deep knowledge on concurrency and know what a language can offer you to solve concurrency problems and taking advantage of multithreading.
In terms of performance, it depends on the current scenario.

What is meant by "fast-path" uncontended synchronization?

From the Performance and Scalability chapter of the JCIP book:
The synchronized mechanism is optimized for the uncontended
case(volatile is always uncontended), and at this writing, the
performance cost of a "fast-path" uncontended synchronization ranges
from 20 to 250 clock cycles for most systems.
What does the author mean by fast-path uncontended synchronization here?
There are two distinct concepts here.
Fast-path and Slow-path code
Uncontended and Contended synchronization
Slow-path vs Fast-path code
This is another way to identify the producer of the machine specific binary code.
With HotSpot VM, slow-path code is binary code produced by a C++ implementation, where fast-path code means code produced by JIT compiler.
In general sense, fast-path code is a lot more optimised. To fully understand JIT compilers wikipedia is a good place to start.
Uncontended and Contended synchronization
Java's synchronization construct (Monitors) have the concept of ownership. When a thread tries to lock (gain the ownership of) the monitor, it can either be locked (owned by another thread) or unlocked.
Uncontended synchronization happens in two different scenarios:
Unlocked Monitor (ownership gained strait away)
Monitor already owned by the same thread
Contended synchronization, on the other hand, means the thread will be blocked until the owner thread release the monitor lock.
Answering the question
By fast-path uncontended synchronization the author means, the fastest bytecode translation (fast-path) in the cheapest scenario (uncontended synchronization).
I'm not familiar with the topic of the book, but in general a “fast path” is a specific possible control flow branch which is significantly more efficient than the others and therefore preferred, but cannot handle complex cases.
I assume that the book is talking about Java's synchronized block/qualifier. In this case, the fast path is most likely one where it is easy to detect that there are no other threads accessing the same data. What the book is saying, then, is that the implementation of synchronized has been optimized to have the best performance in the case where only one thread is actually using the object, as opposed to the case where multiple threads are and the synchronization must actually mediate among them.
The first step of acquiring a synchronized lock is a single volatile write (monitor owner field). If the lock is uncontested then that is all which will happen.
If the lock is contested then there will be various context switches and other mechanisms which will increase clock cycles.

C++ (and possibly Java) how are objects locked for synchronization?

When objects are locked in languages like C++ and Java where actually on a low level scale) is this performed? I don't think it's anything to do with the CPU/cache or RAM. My best guestimate is that this occurs somewhere in the OS? Would it be within the same part of the OS which performs context switching?
I am referring to locking objects, synchronizing on method signatures (Java) etc.
It could be that the answer depends on which particular locking mechanism?
Locking involves a synchronisation primitive, typically a mutex. While naively speaking a mutex is just a boolean flag that says "locked" or "unlocked", the devil is in the detail: The mutex value has to be read, compared and set atomically, so that multiple threads trying for the same mutex don't corrupt its state.
But apart from that, instructions have to be ordered properly so that the effects of a read and write of the mutex variable are visible to the program in the correct order and that no thread inadvertently enters the critical section when it shouldn't because it failed to see the lock update in time.
There are two aspects to memory access ordering: One is done by the compiler, which may choose to reorder statements if that's deemed more efficient. This is relatively trivial to prevent, since the compiler knows when it must be careful. The far more difficult phenomenon is that the CPU itself, internally, may choose to reorder instructions, and it must be prevented from doing so when a mutex variable is being accessed for the purpose of locking. This requires hardware support (e.g. a "lock bit" which causes a pipeline flush and a bus lock).
Finally, if you have multiple physical CPUs, each CPU will have its own cache, and it becomes important that state updates are propagated to all CPU caches before any executing instructions make further progress. This again requires dedicated hardware support.
As you can see, synchronisation is a (potentially) expensive business that really gets in the way of concurrent processing. That, however, is simply the price you pay for having one single block of memory on which multiple independent context perform work.
There is no concept of object locking in C++. You will typically implement your own on top of OS-specific functions or use synchronization primitives provided by libraries (e.g. boost::scoped_lock). If you have access to C++11, you can use the locks provided by the threading library which has a similar interface to boost, take a look.
In Java the same is done for you by the JVM.
The java.lang.Object has a monitor built into it. That's what is used to lock for the synchronized keyword. JDK 6 added a concurrency packages that give you more fine-grained choices.
This has a nice explanation:
http://www.artima.com/insidejvm/ed2/threadsynch.html
I haven't written C++ in a long time, so I can't speak to how to do it in that language. It wasn't supported by the language when I last wrote it. I believe it was all 3rd party libraries or custom code.
It does depend on the particular locking mechanism, typically a semaphore, but you cannot be sure, since it is implementation dependent.
All architectures I know of use an atomic Compare And Swap to implement their synchronization primitives. See, for example, AbstractQueuedSynchronizer, which was used in some JDK versions to implement Semiphore and ReentrantLock.

Categories

Resources