How does StampedLock queue lock requests? - java

I am investigating locking a cache based on Java8's StampedLock (javadoc here) but I can't find a convincing implementation on the net to follow, despite reading articles like StampedLock Idioms.
I don't feel very positive about Java's multi-threading and concurrency offerings after being shocked that ReentrantReadWriteLock doesn't allow upgrading of a read lock to a write lock, followed by the difficulty homing in on a reputable alternative solution.
My issue is that there's no definitive statement to allay my fears that StampedLock will block write requests indefinitely while there are read requests queued.
Looking at the documentation, there are 2 comments which raise my suspicions.
From the Javadoc:
The scheduling policy of StampedLock does not consistently prefer
readers over writers or vice versa. All "try" methods are best-effort
and do not necessarily conform to any scheduling or fairness policy.
From the source code:
* These rules apply to threads actually queued. All tryLock forms
* opportunistically try to acquire locks regardless of preference
* rules, and so may "barge" their way in. Randomized spinning is
* used in the acquire methods to reduce (increasingly expensive)
* context switching while also ....
So it hints at a queue for read and write locks but I'd need to read and digest the whole 1500 lines of source code to nail it.
I assume it must be there because I found a good benchmarking article which shows that StampedLock is the way to go for many reads / few writes. However I'm still concerned because of the lack of coverage online.
Fundamentally I guess I expected an implementation where I could plug'n'play following the javadoc, but in the end I'm left rooting around the net wondering why there isn't an example anywhere of a looped StampedLock#tryOptimisticRead() - even the code from the benchmark article doesn't do that.
Is Java concurrency this difficult or have I missed something obvious?

"Is Java concurrency this difficult or have I missed something obvious?"
It is a matter of opinion1 whether Java concurrency is more difficult than (say) C++ or Python concurrency.
But yes, concurrency is difficult2 in any language that allows different threads to directly update shared data structures. Languages that (only) support CSP-like concurrency are easier to understand and reason about.
Reference:
https://en.wikipedia.org/wiki/Communicating_sequential_processes
To your point about fairness, it is true that most forms of locking in Java do not guarantee fairness. And indeed many things to do with thread scheduling are (deliberately) loosely specified. But it is not difficult to write code that avoids these problems ... once you understand the libraries and how to use them.
To your specific issue about StampedLock behavior.
My issue is that there's no definitive statement to allay my fears that StampedLock will block write requests indefinitely while there are read requests queued.
There is no such statement3 because such behavior is possible. It follows from a careful reading of the StampedLock API documentation. For example, it says:
"The scheduling policy of StampedLock does not consistently prefer readers over writers or vice versa."
In short, there is nothing there that guarantees that an untimed writeLock will eventually acquire the lock.
If you need to categorically avoid the scenario of readers causing writer starvation, then don't use writeLock and readLock. You could use tryOptimisticRead instead of readLock. Or you could design and implement a different synchronization mechanism.
Finally, you seems to be implying that StampedLock should provide a way to directly deal with your scenario and / or that the document should specifically explain to non-expert users how to deal with it. I draw people's attention to this:
"StampedLocks are designed for use as internal utilities in the development of thread-safe components.".
The fact that you are having difficulty finding pertinent examples in not the fault of the javadocs. If anything, it supports the inference that this API is for experts ...
1 - My opinion is that Java's concurrency support is at least easier to reason about than most other languages of its ilk. The Java Memory Model (Chapter 17.4 of the JLS) is well specified, and "Java Concurrency In Practice" by Goetz et al does a good job of explaining the ins and outs of concurrent programming.
2 - .... for most programmers.
3 - If this is not definitive enough for you, write yourself an example where there is a (simulated) indefinitely large sequence of read requests and multiple reader threads. Run it and see if the writer threads get stalled until the read requests are all drained.

Related

Are java Lock implementations using synchronized code? [duplicate]

Is there a difference between 'ReentrantLock' and 'synchronized' on how it's implemented on CPU level?
Or do they use the same 'CAS' approach?
If we are talking about ReentrantLock vs synchronized (also known as "intrinsic lock") then it's a good idea to look at Lock documentation:
All Lock implementations must enforce the same memory synchronization semantics as provided by the built-in monitor lock:
A successful lock operation acts like a successful monitorEnter
action
A successful unlock operation acts like a successful monitorExit
action
So in general consider that synchronized is just an easy-to-use and concise approach of locking. You can achieve exactly the same synchronization effects by writing code with ReentrantLock with a bit more code (but it offers more options and flexibility).
Some time ago ReentrantLock was way faster under certain conditions (high contention for example), but now Java uses different optimizations techniques (like lock coarsening and adaptive locking) to make performance differences in many typical scenarios barely visible to the programmer.
There was also done a great job to optimize intrinsic lock in low-contention cases (e.g. biased locking). Authors of Java platform do like synchronized keyword and intrinsic-locking approach, they want programmers do not fear to use this handy tool (and prevent possible bugs). That's why synchronized optimizations and "synchronization is slow" myth busting was such a big deal for Sun and Oracle.
"CPU-part" of the question:
synchronized uses a locking mechanism that is built into the JVM and MONITORENTER / MONITOREXIT bytecode instructions. So the underlying implementation is JVM-specific (that is why it is called intrinsic lock) and AFAIK usually (subject to change) uses a pretty conservative strategy: once lock is "inflated" after threads collision on lock acquiring, synchronized begin to use OS-based locking ("fat locking") instead of fast CAS ("thin locking") and do not "like" to use CAS again soon (even if contention is gone).
ReentrantLock implementation is based on AbstractQueuedSynchronizer and coded in pure Java (uses CAS instructions and thread descheduling which was introduced it Java 5), so it is more stable across platforms, offers more flexibility and tries to use fast CAS appoach for acquiring a lock every time (and OS-level locking if failed).
So, the main difference between these locks implementations in terms of performance is a lock acquiring strategy (which may not exist in specific JVM implementation or situation).
And there is no general answer which locking is better + it is a subject to change during the time and platforms. You should look at the specific problem and its nature to pick the most suitable solution (as usually in Java)
PS: you're pretty curious and I highly recommend you to look at HotSpot sources to go deeper (and to find out exact implementations for specific platform version). It may really help. Starting point is somewhere here: http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/synchronizer.cpp
The ReentrantLock class, which implements Lock, has the same concurrency and memory semantics as synchronized, but also adds features like lock polling, timed lock waits, and interruptible lock waits. Additionally, it offers far better performance under heavy contention.
Source
Above answer is extract from Brian Goetz's article. You should read entire article. It helped me to understand differences in both.

ReadWriteLock vs StampedLock

I've been using ReadWriteLock`s to implement/maintain a locking idioms.
Since JDK8 StampedLock has been introduced. And as RWLocks are known with their slowness and bad performance, StampedLock's look like an alternative (they are not reentrant, so much faster).
However except the performance, it looks to me that StampedLock's are much harder and complex to maintain and use - e.g. threads can now deadlock against themselves - so corresponding actions should be taken.
What are the benefits of StampedLock over RWLock ?
This article explains the differences in detail.
The ReentrantReadWriteLock had a lot of shortcomings: It suffered from starvation. You could not upgrade a read lock into a write lock. There was no support for optimistic reads. Programmers "in the know" mostly avoided using them.
Doug Lea's new Java 8 StampedLock addresses all these shortcomings. With some clever code idioms we can also get better performance.
Well, yes ReentrantReadWriteLock had problems (when compared to the traditional synchronized block) in 5.0 but they fixed it in java 6.0.
So, if you guys use Java 6 in production, you can use the lock API with confidence.
Performance wise lock & traditional synchronization gives you same.
The good thing about lock API is that it uses CAS/non-blocking so it will never end up with deadlock unless you forget to unlock it in the finally block.

What is meant by "fast-path" uncontended synchronization?

From the Performance and Scalability chapter of the JCIP book:
The synchronized mechanism is optimized for the uncontended
case(volatile is always uncontended), and at this writing, the
performance cost of a "fast-path" uncontended synchronization ranges
from 20 to 250 clock cycles for most systems.
What does the author mean by fast-path uncontended synchronization here?
There are two distinct concepts here.
Fast-path and Slow-path code
Uncontended and Contended synchronization
Slow-path vs Fast-path code
This is another way to identify the producer of the machine specific binary code.
With HotSpot VM, slow-path code is binary code produced by a C++ implementation, where fast-path code means code produced by JIT compiler.
In general sense, fast-path code is a lot more optimised. To fully understand JIT compilers wikipedia is a good place to start.
Uncontended and Contended synchronization
Java's synchronization construct (Monitors) have the concept of ownership. When a thread tries to lock (gain the ownership of) the monitor, it can either be locked (owned by another thread) or unlocked.
Uncontended synchronization happens in two different scenarios:
Unlocked Monitor (ownership gained strait away)
Monitor already owned by the same thread
Contended synchronization, on the other hand, means the thread will be blocked until the owner thread release the monitor lock.
Answering the question
By fast-path uncontended synchronization the author means, the fastest bytecode translation (fast-path) in the cheapest scenario (uncontended synchronization).
I'm not familiar with the topic of the book, but in general a “fast path” is a specific possible control flow branch which is significantly more efficient than the others and therefore preferred, but cannot handle complex cases.
I assume that the book is talking about Java's synchronized block/qualifier. In this case, the fast path is most likely one where it is easy to detect that there are no other threads accessing the same data. What the book is saying, then, is that the implementation of synchronized has been optimized to have the best performance in the case where only one thread is actually using the object, as opposed to the case where multiple threads are and the synchronization must actually mediate among them.
The first step of acquiring a synchronized lock is a single volatile write (monitor owner field). If the lock is uncontested then that is all which will happen.
If the lock is contested then there will be various context switches and other mechanisms which will increase clock cycles.

C++ (and possibly Java) how are objects locked for synchronization?

When objects are locked in languages like C++ and Java where actually on a low level scale) is this performed? I don't think it's anything to do with the CPU/cache or RAM. My best guestimate is that this occurs somewhere in the OS? Would it be within the same part of the OS which performs context switching?
I am referring to locking objects, synchronizing on method signatures (Java) etc.
It could be that the answer depends on which particular locking mechanism?
Locking involves a synchronisation primitive, typically a mutex. While naively speaking a mutex is just a boolean flag that says "locked" or "unlocked", the devil is in the detail: The mutex value has to be read, compared and set atomically, so that multiple threads trying for the same mutex don't corrupt its state.
But apart from that, instructions have to be ordered properly so that the effects of a read and write of the mutex variable are visible to the program in the correct order and that no thread inadvertently enters the critical section when it shouldn't because it failed to see the lock update in time.
There are two aspects to memory access ordering: One is done by the compiler, which may choose to reorder statements if that's deemed more efficient. This is relatively trivial to prevent, since the compiler knows when it must be careful. The far more difficult phenomenon is that the CPU itself, internally, may choose to reorder instructions, and it must be prevented from doing so when a mutex variable is being accessed for the purpose of locking. This requires hardware support (e.g. a "lock bit" which causes a pipeline flush and a bus lock).
Finally, if you have multiple physical CPUs, each CPU will have its own cache, and it becomes important that state updates are propagated to all CPU caches before any executing instructions make further progress. This again requires dedicated hardware support.
As you can see, synchronisation is a (potentially) expensive business that really gets in the way of concurrent processing. That, however, is simply the price you pay for having one single block of memory on which multiple independent context perform work.
There is no concept of object locking in C++. You will typically implement your own on top of OS-specific functions or use synchronization primitives provided by libraries (e.g. boost::scoped_lock). If you have access to C++11, you can use the locks provided by the threading library which has a similar interface to boost, take a look.
In Java the same is done for you by the JVM.
The java.lang.Object has a monitor built into it. That's what is used to lock for the synchronized keyword. JDK 6 added a concurrency packages that give you more fine-grained choices.
This has a nice explanation:
http://www.artima.com/insidejvm/ed2/threadsynch.html
I haven't written C++ in a long time, so I can't speak to how to do it in that language. It wasn't supported by the language when I last wrote it. I believe it was all 3rd party libraries or custom code.
It does depend on the particular locking mechanism, typically a semaphore, but you cannot be sure, since it is implementation dependent.
All architectures I know of use an atomic Compare And Swap to implement their synchronization primitives. See, for example, AbstractQueuedSynchronizer, which was used in some JDK versions to implement Semiphore and ReentrantLock.

Dead Lock Thread Check

Can somebody tell me how I can find out "how many threads are in deadlock condition" in a Java multi-threading application? What is the way to find out the list of deadlocked threads?
I heard about Thread Dump and Stack Traces, but I don't know how to implement it.
I also want to know what new features have been introduced in Java 5 for Threading?
Please let me know with your comments and suggestions.
Way of obtaining thread dumps:
ctrl-break (Windows) or ctrl-\, possibly ctrl-4 and kill -3 on Linux/UNIX
jstack and your process id (use jps)
jconsole or visualvm
just about any debugger
Major new threading features in J2SE 5.0 (released 2004, in End Of Service Life Period):
java.util.concurrent
New Java Memory Model.
use kill -3 on the process id
this will print out to the console a thread dump and an overview of thread contention
From within your program, the ThreadMXBean class has a method findMonitorDeadlockedThreads(), as well as methods for querying the current stack traces of threads. From the console in Windows, doing Ctrl+Break gives you a list of stack traces and indicates deadlocked threads.
As well as some tweaks to the Java memory model that tidy up some concurrency "loopholes", the most significant feature underlyingly in Java 5 is that it exposes Compare-And-Set (CAS) operations to the programmer. Then, on the back of this, a whole raft of concurrency utilities are provided in the platform. There's really a whole host of stuff, but they include:
concurrent collections
executors, which effectively allow you to implement things such as thread pools
other common concurrency constructs (queues, latches, barriers)
atomic variables
You may be interested in some tutorials I've written on many of the Java 5 concurrency features.
If you want to learn about the new concurrent features in Java 5 you could do a lot worse than getting a copy of Java Concurrency in Practice by Brian Goetz (Brian Goetz and a number of the coauthors designed the Java 5 concurrency libraries). It is both highly readable and authoritative , and combining practical examples and theory.
The executive summary of the new concurrent utilities is as follows:
Task Scheduling Framework - The Executor framework is a framework for standardizing invocation, scheduling, execution, and control of asynchronous tasks according to a set of execution policies. Implementations are provided that allow tasks to be executed within the submitting thread, in a single background thread (as with events in Swing), in a newly created thread, or in a thread pool, and developers can create of Executor supporting arbitrary execution policies. The built-in implementations offer configurable policies such as queue length limits and saturation policy which can improve the stability of applications by preventing runaway resource consumption.
Concurrent Collections - Several new Collections classes have been added, including the new Queue and BlockingQueue interfaces, and high-performance, concurrent implementations of Map, List, and Queue.
Atomic Variables - Classes for atomically manipulating single variables (primitive types or references), providing high-performance atomic arithmetic and compare-and-set methods. The atomic variable implementations in java.util.concurrent.atomic offer higher performance than would be available by using synchronization (on most platforms), making them useful for implementing high-performance concurrent algorithms as well as conveniently implementing counters and sequence number generators.
Synchronizers - General purpose synchronization classes, including semaphores, mutexes, barriers, latches, and exchangers, which facilitate coordination between threads.
Locks - While locking is built into the Java language via the synchronized keyword, there are a number of inconvenient limitations to built-in monitor locks. The java.util.concurrent.locks package provides a high-performance lock implementation with the same memory semantics as synchronization, but which also supports specifying a timeout when attempting to acquire a lock, multiple condition variables per lock, non-lexically scoped locks, and support for interrupting threads which are waiting to acquire a lock.
Nanosecond-granularity timing - The System.nanoTime method enables access to a nanosecond-granularity time source for making relative time measurements, and methods which accept timeouts (such as the BlockingQueue.offer, BlockingQueue.poll, Lock.tryLock, Condition.await, and Thread.sleep) can take timeout values in nanoseconds. The actual precision of System.nanoTime is platform-dependent.

Categories

Resources