I'm working on class that implements hash-code based atomic locking by multiple objects. The main purpose is to unpark a waiter thread as soon as all required locks become available, thus reducing overall waiting time.
Its lockAndGet(List objects) returns a single "composite lock" that involves all listed objects. After the caller finishes his job, it calls unlock().
The main components are: 1) common lock table--int[] array showing which locks are held now; 2) deque of waiter threads.
Algorithm is quite simple:
When a thread calls lockAndGet(), it is marked for parking, then new waiter object with the state LOCKED containing this thread is created and added to the tail of the queue, after which makeRound() method is invoked;
makeRound() traverses the deque starting from the head, trying to find which waiters can acquire their locks. When one such waiter is found, lock table is updated, waiter state is changed to UNLOCKED and removed from the deque, waiter's thread is unparked. After traversal, if current thread is marked for parking, it is parked;
When unlock() is called on some composite lock, lock table state is updated and makeRound() is invoked.
Here, to avoid race conditions, update of the state of lock table must be performed atomically with update of the state of a waiter. Now, it is achieved by synchronization on common exclusive Lock, and it works well, but I'd like to implement similar mechanism in free-lock manner using CAS operations, making waiter queue lock-free. It is pretty simple for step 1, but problematic for step 2.
Since DCAS isn't supported by Java, does anyone know how
can be achieved update of the queue node (change of waiter's state and mark for deletion) atomically with update of other object (lock table), using only CAS? I don't ask for code, just some hints or approaches that could help.
You can try to implement a multi-word CAS using the available CAS, however this is dependent on how much of a performance hit you are willing to take in order to achieve this.
You can look at Harris' http://www.cl.cam.ac.uk/research/srg/netos/papers/2002-casn.pdf
Related
I am trying to understand how Guava Cache achieves thread-safety in its get() method and also ensures that if 2 threads simultaneously invoke the get method and the value is not present only one thread will calculate the value and the rest will wait. Also, when the thread is done with calculating the value the rest of the threads can use that value.
This SO answer says Guava does this via a combination of ReentrantLock and ConcurrentHashMap.
I can intuitively understand how to achieve mutual exclusion via the use of locks.
But what I can't understand is that how after the value is calculated the other threads will not lock and use that value. The only way I can implement this is via double-checked locking, but that seems too slow since after the value is calculated all threads will have to lock once.
What I mean is the following:
Check to see if Item X is cached.
If missing tryForLock() on underlying data-structure (most likely some sort of HashMap)
If lock acquired check to see value computed (If already computed by another thread)
If not, Compute Value
Put in underlying cache data structure.
Release lock.
Now let's say the above Code is invoked by 3 Threads A,B,C : only
one thread can get the lock when it tries for it. Rest will be waiting
for lock. Once the thread that got the lock, computes the value,
releases the lock.
All the other 2 threads will also be acquiring the lock in a
sequential way. And all will check if the value is computed or not,
but not compute it themselves. And release the lock. This will all
happen sequentially.
Is this what Guava Cache does? Or is there a better way to implement this pattern?
Yes, that is basically the pattern except that no locking is needed if present. The cost of the computation is much greater than the penalty of locking (millis or seconds, vs nanos).
A lookup for the entry is lock-free so this quickly identifies if it is absent, present, or in-flight.
If absent then the threads race for the segment lock to insert an in-flight entry
if winner, then insert an in-flight entry, unlock, and compute
otherwise find the entry, unlock, and wait on it
If the entry is found,
if in-flight then a thread will block on it
waiting for the result.
Otherwise the entry's value is returned immediately.
A lock is cheap if not on the hot path that serializes all of the threads. Since the in-flight entry is quickly visible in the map and looked up lock-free, the segment holding multiple entries is not locked excessively. The descheduling of the entry's waiting threads can free resources for other application work.
ConcurrentHashMap added native functionality with computeIfAbsent. It reuses the hashbin's lock (a fine-grained segment), and does a similar pattern. Since a computing thread locks the hashbin which holds multiple entries, there is a higher chance of collisions causing unnecessary stalls. Their approach has nice linearization and memory efficiency benefits, but it can be more problematic for long-running computations. In those cases one can emulate Guava's approach by storing a Future<V> value as the map's entry. Caffeine, a rewrite of Guava's cache, offers that through its AsyncCache.
The way Lock interface with class Reentrant(true) lock works is that it uses BlockingQueue to store Threads that want to acquire the lock. In that way thread that 'came first, go out first'-FIFO. All clear about that.
But where do 'unfair locks' go, or ReentrantLock(false). What is their internal implementation? How does OS decide which thread now to pick? And most importantly are now these threads also stored in a queue or where? (they must be somewhere)
The class ReentrantLock does not use a BlockingQueue. It uses a non-public subclass of AbstractQueuedSynchronizer behind the scenes.
The AbstractQueuedSynchronizer class, as its documentation states, maintains “a first-in-first-out (FIFO) wait queue”. This data structure is the same for fair and unfair locks. The unfairness doesn’t imply that the lock would change the order of enqueued waiting threads, as there would be no advantage in doing that.
The key difference is that an unfair lock allows a lock attempt to succeed immediately when the lock just has been released, even when there are other threads waiting for the lock for a longer time. In that scenario, the queue is not even involved for the overtaking thread. This is more efficient than adding the current thread to the queue and putting it into the wait state while removing the longest waiting thread from the queue and changing its state to “runnable”.
When the lock is not available by the time, a thread tries to acquire it, the thread will be added to the queue and at this point, there is no difference between fair and unfair locks for it (except that other threads may overtake it without getting enqueued). Since the order has not been specified for an unfair lock, it could use a LIFO data structure behind the scenes, but it’s obviously simpler to have just one implementation code for both.
For synchronized, on the other hand, which does not support fair acquisition, there are some JVM implementations using a LIFO structure. This may change from one version to another (or even with the same, as a side effect of some JVM options or environmental aspects).
Another interesting point in this regard, is that the parameterless tryLock() of the ReentrantLock implementation will be unfair, even when the lock is otherwise in fair mode. This demonstrates that being unfair is not a property of the waiting queue here, but the treatment of the arriving thread that makes a new lock attempt.
Even when this lock has been set to use a fair ordering policy, a call to tryLock() will immediately acquire the lock if it is available, whether or not other threads are currently waiting for the lock. This "barging" behavior can be useful in certain circumstances, even though it breaks fairness.
I have a situtation where in out of multiple instances of a class 2 undergo an operation which leads to the state change of both, since this is a multithreaded application i want to ensure that unless that particular piece of code is executed, none of the other threads which are trying to access any of the aforementioned 2 instances be in waiting state.
Using synchronized or lock we can acquire lock on a single instance and nesting the synchronized block on 2 objects isn't a great idea either.
synchronized(obj1){
synchronized(obj2){
}
}
Another potential problem is that there could be cases where even though the inner object obj2 is free, since the outer object is locked the thread keeps waiting.
What could be the best possible solution to this problem.
Having nested locks introduce risk of deadlock when two different threads aqcuire locks in different order.
There is no universal approach for eliminate such deadlocks. Possible solutions includes:
Maintain total order of lock objects, and acquire locks only in ascending order of objects(or only in descending order).
This is good solution for cases, when both objects for lock are known without needs to lock one of the objects.
Using tryLock mechanism for inner lock.
tryLock returns false immediately if the object is already locked by the other thread. So a program should process this case somehow.
I got one main thread that will start up other threads. Those other threads will ask for jobs to be done, and the main thread will make jobs available for the other threads to see and do.
The job that must be done is to set indexes in the a huge boolean array to true. They are by default false, and the other threads can only set them to true, never false. The various jobs may involve setting the same indexes to true.
The main thread finds new jobs depending on two things.
The values in the huge boolean array.
Which jobs has already been done.
How do I make sure the main thread reads fresh values from the huge boolean array?
I can't have the update of the array be through a synchronized method, because that's pretty much all the other threads do, and as such I would only get a pretty much sequential performance.
Let's say the other threads update the huge boolean array by setting many of it's indexes to true through a non-synchronized function. How can I make sure the main thread reads the updates and make sure it's not just locally cached at the thread? Is there any ways to make it "push" the update? I'm guessing the main thread should just use a synchronized method to "get" the updates?
For the really complete answer to your question, you should open up a copy of the Java Language Spec, and search for "happens before".
When the JLS says that A "happens before" B, it means that in a valid implementation of the Java language, A is required to actually happen before B. The spec says things like:
If some thread updates a field, and then releases a lock (e.g.,
leaves a synchronized block), the update "happens before" the lock is
released,
If some thread releases a lock, and some other thread subsequently
acquires the same lock, the release "happens before" the acquisition.
If some thread acquires a lock, and then reads a field, the
acquisition "happens before" the read.
Since "happens before" is a transitive relationship, you can infer that if thread A updates some variables inside a synchronized block and then thread B examines the variables in a block that is synchronized on the same object, then thread B will see what thread A wrote.
Besides entering and leaving synchronized blocks, there are lots of other events (constructing objects, wait()ing/notify()ing objects, start()ing and join()ing threads, reading and writing volatile variables) that allow you to establish "happens before" relationships between threads.
It's not a quick solution to your problem, but the chapter is worth reading.
...the main thread will make jobs available for the other threads to see and do...
I can't have the update of the array be through a synchronized method, because that's pretty much all the other threads do, and ...
Sounds like you're saying that each worker thread can only do a trivial amount of work before it must wait for further instructions from the main() thread. If that's true, then most of the workers are going to be waiting most of the time. You'd probably get better performance if you just do all of the work in a single thread.
Assuming that your goal is to make the best use of available cycles a multi-processor machine, you will need to partition the work in some way that lets each worker thread go off and do a significant chunk of it before needing to synchronize with any other thread.
I'd use another design pattern. For instance, you could add to a Set the indexes of the boolean values as they're turned on, for instance, and then synchronize access to that. Then you can use wait/notify to wake up.
First of all, don't use boolean arrays in general, use BitSets. See this: boolean[] vs. BitSet: Which is more efficient?
In this case you need an atomic BitSet, so you can't use the java.util.BitSet, but here is one: AtomicBitSet implementation for java
You could instead model this as message passing rather than mutating shared state. In your description the workers never read the boolean array and only write the completion status. Have you considered using a pending job queue that workers consume from and a completion queue that the master reads? The job status fields can be efficiently maintained by the master thread without any shared state concerns. Depending on your needs, you can use either blocking or non-blocking queues.
Assuming that I have the following code:
final Catalog catalog = createCatalog();
for (int i = 0; i< 100; i++{
new Thread(new CatalogWorker(catalog)).start();
}
"Catalog" is an object structure, and the method createCatalog() and the "Catalog" object structure has not been written with concurrency in mind. There are several non-final, non-volatile references within the product catalog, there may even be mutable state (but that's going to have to be handled)
The way I understand the memory model, this code is not thread-safe. Is there any simple way to make it safe ? (The generalized version of this problem is really about single-threaded construction of shared structures that are created before the threads explode into action)
No, there's no simple way to make it safe. Concurrent use of mutable data types is always tricky. In some situations, making each operation on Catalog synchronized (preferably on a privately-held lock) may work, but usually you'll find that a thread actually wants to perform multiple operations without risking any other threads messing around with things.
Just synchronizing every access to variables should be enough to make the Java memory model problem less relevant - you would always see the most recent values, for example - but the bigger problem itself is still significant.
Any immutable state in Catalog should be fine already: there's a "happens-before" between the construction of the Catalog and the new thread being started. From section 17.4.5 of the spec:
A call to start() on a thread
happens-before any actions in the
started thread.
(And the construction finishing happens before the call to start(), so the construction happens before any actions in the started thread.)
You need to synchronize every method that changes the state of Catalog to make it thread-safe.
public synchronized <return type> method(<parameter list>){
...
}
Assuming you handle the "non-final, non-volatile references [and] mutable state" (presumably by not actually mutating anything while these threads are running) then I believe this is thread-safe. From the JSR-133 FAQ:
When one action happens before
another, the first is guaranteed to be
ordered before and visible to the
second. The rules of this ordering are
as follows:
Each action in a thread happens before every action in that thread
that comes later in the program's
order.
An unlock on a monitor happens before every subsequent lock on that
same monitor.
A write to a volatile field happens before every subsequent read
of that same volatile.
A call to start() on a thread happens before any actions in the
started thread.
All actions in a thread happen before any other thread successfully
returns from a join() on that thread.
Since the threads are started after the call to createCatalog, the result of createCatalog should be visible to those threads without any problems. It's only changes to the Catalog objects that occur after start() is called on the thread that would cause trouble.