Is this a proper customized synchronizer? - java

I had a strong need for a synchronizer similar to a CountDownLatch, but the starting number for the countdown is unknown. To add context, if I'm going through a buffered recordset (say from a text file or a query) and kicking off a runnable for each record, but I don't know how many records there will be... I need a synchronizer that signals when the iteration is complete and all runnables are complete.
This is the synchronizer I came up with... a BufferedLatch. A method is called in the iteration loop for each record incrementing the recordSetSize. At the end of each runnable kicked off for each record, the processedRecordSetSize is incremented. When the iteration through all records is complete (but runnables may still be in queue), the setDownloadComplete() method is called letting the BufferedLatch know the recordSetSize is now fixed. The await() method waits for the iterationComplete variable to be true (recordsetSize is now fixed) and recordsetSize == processedRecordSetSize;
Is this an optimal implementation of this synchronizer? Is there more concurrent opportunity that synchronization is holding back? Although testing seems to work fine, are there any gotcha's I'm overlooking?
import java.util.concurrent.atomic.AtomicInteger;
public final class BufferedLatch {
/** A customized synchronizer built for concurrent iteration processes where the number of objects to be iterated is unknown
* and a runnable will be kicked off for each object, and the await() method will wait for all runnables to be complete
*/
private final AtomicInteger recordsetSize = new AtomicInteger(0);
private final AtomicInteger processedRecordsetSize = new AtomicInteger(0);
private volatile boolean iterationComplete = false;
public int incrementRecordsetSize() throws Exception {
if (iterationComplete) {
throw new Exception("Cannot increase recordsize after download is flagged complete!");
}
else {
return recordsetSize.incrementAndGet();
}
}
public void incrementProcessedRecordSize() {
synchronized(this) {
processedRecordsetSize.incrementAndGet();
if (iterationComplete) {
if (processedRecordsetSize.get() == recordsetSize.get()) {
this.notifyAll();
}
}
}
}
public void setDownloadComplete() {
synchronized(this) {
iterationComplete = true;
}
}
public void await() throws InterruptedException {
while (! (iterationComplete && (processedRecordsetSize.get() == recordsetSize.get()))) {
synchronized(this) {
while (! (iterationComplete && (processedRecordsetSize.get() == recordsetSize.get()))) {
this.wait();
}
}
}
}
}
UPDATE-- NEW CODE
public final class BufferedLatch {
/** A customized synchronizer built for concurrent iteration processes where the number of objects to be iterated is unknown
* and a runnable will be kicked off for each object, and the await() method will wait for all runnables to be complete
*/
private int recordCount = 0;
private int processedRecordCount = 0;
private boolean iterationComplete = false;
public synchronized void incrementRecordCount() throws Exception {
if (iterationComplete) {
throw new Exception("Cannot increase recordCount after download is flagged complete!");
}
else {
recordCount++;
}
}
public synchronized void incrementProcessedRecordCount() {
processedRecordCount++;
if (iterationComplete && recordCount == processedRecordCount) {
this.notifyAll();
}
}
public synchronized void setIterationComplete() {
iterationComplete = true;
if (iterationComplete && recordCount == processedRecordCount) {
this.notifyAll();
}
}
public synchronized void await() throws InterruptedException {
while (! (iterationComplete && (recordCount == processedRecordCount))) {
this.wait();
}
}
}

Probably not. I think conceptually you're onto something here, as it looks like your application needs something more than just a CountDownLatch. However, the implementation seems to have several problems.
First, I note that it looks odd to mix atomics/volatiles AND ordinary object monitor locks (synchronized). While there may be proper uses that mix these different constructs, mixing in this case I believe will lead to errors.
Consider incrementRecordsetSize() which first checks iterationComplete and only if it's false does it increment recordsetSize. The iterationComplete variable is volatile so updates from other threads will be visible. However, the fact that no locking is done here allows TOCTOU race conditions (time-of-check vs time-of-use). The rule seems to be, recordsetSize must not be incremented if iterationComplete is true. Suppose thread T1 comes along and finds iterationComplete to be false, so it decides to increment recordsetSize. Before it does so, another thread T2 comes along and sets iterationComplete to be true. This would allow T1 to do the increment improperly. Worse, before it does so, suppose another thread T3 came along and called incrementProcessedRecordSize(). It would increment processedRecordsetSize and then find iterationComplete true. It further might find that processedRecordsetSize equals recordsetSize and then notify all waiters, who then proceed as if the processing is complete. But it's not, as T1 then proceeds to increment recordsetSize and presumably continues with its processing.
The problem here is that this object's state consists of the fusion of three independent pieces of state -- two int counters and a boolean -- and all three must be read and written atomically. If certain bits of logic attempt to take advantage of individual volatile or atomic properties, it introduces the possibility of race conditions such as the one I described.
I'd suggest rewriting this as a plain object with two plain ints and a boolean (not atomic, not volatile) and just lock around everything. This should certainly clear up the logic and make things easier to understand.
In incrementProcessedRecordSize I note that the condition essentially duplicates the condition in the await method. A simplifying convention is for all updates to notify and have the condition evaluated only by the waiters. This may result in some unnecessary wakeups. If this is a problem, you might consider minimizing the number of notifies, but you need to think about maintainability. If you're not careful, the wait/notify conditions will become spread across the code and will be very hard to reason about. Alternatively, you could refactor the condition into a method and call it from the different places that do waiting and notification.
It looks like await() does a complicated form of double-checked locking. Instead of testing a volatile boolean outside the lock, it tests several separate pieces of information both outside and inside the lock. This seems susceptible to TOCTOU problems (as above) but it might be safe if you can prove the state really latches, that is, that once it becomes true it never returns to false. I'd have to stare at the code for a long time before I'd be able to convince myself it's correct.
On the other hand, what does this buy you? It seems to optimize away just the taking of the lock. If you have a zillion threads that are going to come by after processing is complete, it might be worth it, but it doesn't seem like it. I'd just remove the outer while loop and check the variables within a synchronized block.
Finally, having an object that represents counters and a boolean may very well be sensible for what you're doing, but other things you've said (in the question and in comments) are that some threads are generating a workload (e.g. reading lines from a file) and other threads are retiring that workload. This implies that there is some other data structure like a queue that contains this workload, and you have a producer-consumer problem here. That other structure has to be thread-safe, of course, since multiple threads are interacting over it. But the counters and boolean in this structure need to be updated in lockstep with the updates to the workload structure, otherwise there could be race conditions between checking and updating these separate objects.
It seems to me you could replace the counters in this object with the queue and just put simple locks around everything. The producers would append to the queue until they're done, at which time they set iterationComplete to true which prevents more work from being added. The consumers pull from the queue until iterationComplete is true and the queue is empty, at which point they're done. If they find the queue empty but iterationComplete is false, they know to block while awaiting further work.
I'd say to stick with simple locking and avoid volatiles/atomics until you get the basics correct. If there are bottlenecks in that code, then apply optimizations selectively while preserving the same invariants.

Related

Multi-threading -- a faster way?

I have a class with a getter getInt() and a setter setInt() on a certain field, say field
Integer Int;
of an object of a class, say SomeClass.
The setInt() here is synchronized-- getInt() isn't.
I am updating the value of Int from within multiple threads.
Each thread is getting the value Int, and setting it appropriately.
The threads aren't sharing any other resources in any way.
The code executed in each thread is as follows.
public void update(SomeClass c) {
while (<condition-1>) // the conditions here and the calculation of
// k below dont have anything to do
// with the members of c
if (<condition-2>) {
// calculate k here
synchronized (c) {
c.setInt(c.getInt()+k);
// System.out.println("in "+this.toString());
}
}
}
The run() method is just invoking the above method on the members updated from within the constructor by the params passed to it:
public void run() { update(c); }
When I run this on large sequences, the threads aren't interleaving much-- i see one thread executing for long without any other thread running in between.
There must be a better way of doing this.
I can't change the internals of SomeClass, or of the class invoking the threads.
How can this be done better?
TIA.
//=====================================
EDIT:
I'm not after manipulating the execution sequence of the threads. They all have the same priority. It`s just that what i see in the outcome is suggesting that the threads aren't sharing the execution time evenly-- one of them, once takes over, executing on. However, I can't see why this code should be doing this.
It`s just that what i see in the outcome is suggesting that the threads aren't sharing the execution time evenly
Well, this is exactly what you don't want if you are after efficiency. Yanking a thread from being executed and scheduling another thread is generally very costly. Therefore it's actually advantageous to do one of them, once takes over, executing on. Of course, when this is overdone you could see higher throughput but longer response time. In theory. In practice, JVMs thread scheduling is well tuned for almost all purposes, and you don't want to try changing it in almost all situations. As a rule of thumb, if you are interested in response times in millisecond order, you probably want to stay away messing with it.
tl;dr: It's not being inefficient, you probably want to leave it as it is.
EDIT:
Having said that, using an AtomicInteger may help in performance, and is in my opinion less error prone than using a lock (synchronized keyword). You need to be hitting that variable really very hard in order to get a measurable benefit though.
The JDK provides a nice solution for multi threaded int access, AtomicInteger:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicInteger.html
As Enno Shioji has pointed out, letting one thread proceed might be the most efficient way to execute your code in some scenarios.
It depends on how much cost the thread synchronization imposes in relation to the other work of your code (which we don’t know). If you have a loop like:
while (<condition-1>)
if (<condition-2>) {
// calculate k here
synchronized (c) {
c.setInt(c.getInt()+k);
}
}
and the test for condition-1 and condition-2 and the calculation of k is rather cheap compared to the synchronization cost, the Hotspot optimizer might decide to reduce the overhead by transforming the code to something like this:
synchronized (c) {
while (<condition-1>)
if (<condition-2>) {
// calculate k here
c.setInt(c.getInt()+k);
}
}
(or a rather more complicated structure by performing loop unrolling and span the synchronized block over multiple iterations). The bottom line is that the optimized code might block other threads longer but let the one owning the lock finish faster resulting in an overall faster execution.
This does not mean that a single-threaded execution was the fastest way to handle your problem. It also doesn’t mean that using an AtomicInteger here would be the best option to solve the problem. It would create a higher CPU load and possibly a small acceleration but it doesn’t solve your real mistake:
It is completely unnecessary to update c within the loop at a high frequency. After all, your threads do not depend on seeing updates to c timely. It even looks like they are not using it at all. So the correct fix would be to move the update out of the loop:
int kTotal=0;
while (<condition-1>)
if (<condition-2>) {
// calculate k here
kTotal += k;
}
synchronized (c) {
c.setInt(c.getInt()+kTotal);
}
Now, all threads can run in parallel (assuming the code you haven’t posted here doesn’t contain inter-thread dependencies) and the synchronization cost is reduced to a minimum. You could still change it to an AtomicInteger as well but that’s not that important anymore.
Answering to this
i see one thread executing for long without any other thread running in between.
There must be a better way of doing this.
You can not control how threads will be executed. JVM does this for you, and does not like you to interfere in its work.
Still you can look at yield as your option, but that also does not ensure same thread will not be picked again.
The java.lang.Thread.yield() method causes the currently executing thread object to temporarily pause and allow other threads to execute.
I've found it better to use wait() and notify() than yield. Check out this example (seen from a book)-
class Q {
int n;
boolean valueSet = false;
synchronized int get() {
if(!valueSet)
wait(); //handle InterruptedException
//
valueSet = false;
notify();//if thread waiting in put, now notified
}
synchronized void put(int n) {
if(valueSet)
wait(); //handle InterruptedException
//
valueSet = true;
//if thread in get waiting then that is resumed now
notify();
}
}
or you could try using sleep() and join the threads in the end in main() but that isn't a foolproof way
You are having public void update(SomeClass c) method in your code and this method is an instance method in which you are passing the object as parameter.
synchronized(c) in your code is doing nothing. Let me show you with some example,
So if you will make different objects of this class and then try to make them different threads like,
class A extends Thread{
public void update(SomeClass c){}
public void run(){
update(c)
}
public static void main(String args[]){
A t1 = new A();
A t2 = new A();
t1.start();
t2.start();
}
}
Then both of these t1 & t2 will have their own copies of update method and the reference variable c which you are making synchronized will also be different for both the threads. t1 calls its own update() method and t2 calls its own update() method. So synchronization won't work.
Synchronization will work when you have something common for both the threads.
Something like,
class A extends Thread{
static SomeClass c;
public void update(){
synchronized(c){
}
}
public void run(){
update(c)
}
public static void main(String args[]){
A t1 = new A();
A t2 = new A();
t1.start();
t2.start();
}
}
This way the actual concept of synchronization will be applied.

How can I prevent two operations from interleaving with each other whilst still allowing concurrent execution?

I have two methods, foo() and bar(). There will be multiple threads calling these methods, possibly at the same time. It is potentially troublesome if foo() and bar() are run concurrently, as the interleaving of their internal logic can leave the system in an an inconsistent state. However, it it is perfectly ok, and in fact desirable, for multiple threads to be able to call foo() at the same time, and for multiple threads to be able to call bar() at the same time. The final condition is that foo() is expected to return asap, whereas there is no hard time constraint on bar().
I have been considering various ways in which it might be best to control this behaviour. Using synchronized in its simplest form doesn't work because this will block concurrent calls to each method. At first I thought ReadWriteLock might be a good fit, but this would only allow one of the methods to be called concurrently with itself. Another possibility I considered was queuing up requests for these methods on two separate queues and having a consumer which will concurrently execute every foo() in the queue, and then every bar() in the queue, but this seems like it would be difficult to tune so as to avoid unnecessary blocking of foo().
Any suggestions?
I think a good solution would be to make a separate class that controlled access to each of the methods. You would create a singleton of this class, and then use it to control when it is OK to proceed with entering either method.
This is the third iteration. This one prevents starvation.
Usage could be external to the foo() call:
em.enterFoo(Thread.currentThread());
foo();
em.exitFoo();
but would probably be cleaner as calls at the entry and exit of foo() instead, if possible.
Code:
public static class EntryManager
{
private int inFoo = 0;
private int inBar = 0;
private Queue<Thread> queue = new LinkedList<>();
public synchronized void enterBar(Thread t) throws InterruptedException
{
// Place the Thread on the queue
queue.add(t);
while(queue.peek() != t)
{
// Wait until the passed Thread is at the head of the queue.
this.wait();
}
while(inFoo > 0)
{
// Wait until there is no one in foo().
this.wait();
}
// There is no one in foo. So this thread can enter bar.
// Remove the thread from the queue.
queue.remove();
inBar++;
// Wakeup everyone.
this.notifyAll();
}
public synchronized void enterFoo(Thread t) throws InterruptedException
{
// Place the thread on the queue
queue.add(t);
while(queue.peek() != t)
{
// Wait until the passed Thread is at the head of the queue.
this.wait();
}
while(inBar > 0)
{
this.wait();
}
// There is no one in bar. So this thread can enter foo.
// Remove the thread from the queue.
queue.remove();
inFoo++;
// Wakeup everyone.
this.notifyAll();
}
public synchronized void exitBar()
{
inBar--;
// Wakeup everyone.
this.notifyAll();
}
public synchronized void exitFoo()
{
inFoo--;
// Wakeup everyone.
this.notifyAll();
}
}
I don't know of a name for that problem, so I would write my own synchronization helper object to deal with it. It sounds a lot like a reader/writer lock, except that where a reader/writer lock allows any number of readers at the same time, or exactly one writer, but not both; your lock would allow any number of foo() or any number of bar(), but not both.
The tricky part is going to be ensuring that the lock is fair. No problem if there's no contention, but what if the lock is in "foo" mode, and there's a steady stream of threads that want to call foo(), and just one or two that want to call bar(). How do the bar() threads ever get to run?
Actually, it reminds me a lot of a traffic light at a busy highway intersection. The traffic light can allow cars to flow on the east/west route, or on the north/south route, but not both. You don't want the light to switch too often and just let one or two cars through per cycle because that would be inefficient. But you also don't want the light to make drivers wait so long that they get angry.
I've got a feeling that the policy may have to be custom-tailored for your particular application. I.e., it may depend on how often the two functions are called, whether they are called in bursts, etc.
I would start from the source code of a reader/writer lock, and try to hack it up until it worked for me.

implement-your-own blocking queue in java

I know this question has been asked and answered many times before, but I just couldn't figure out a trick on the examples found around internet, like this or that one.
Both of these solutions check for emptiness of the blocking queue's array/queue/linkedlist to notifyAll waiting threads in put() method and vice versa in get() methods. A comment in the second link emphasizes this situation and mentions that that's not necessary.
So the question is; It also seems a bit odd to me to check whether the queue is empty | full to notify all waiting threads. Any ideas?
Thanks in advance.
I know this is an old question by now, but after reading the question and answers I couldn't help my self, I hope you find this useful.
Regarding checking if the queue is actually full or empty before notifying other waiting threads, you're missing something which is both methods put (T t) and T get() are both synchronized methods, meaning that only one thread can enter one of these methods at a time, yet this will not prevent them from working together, so if a thread-a has entered put (T t) method another thread-b can still enter and start executing the instructions in T get() method before thread-a has exited put (T t), and so this double-checking design is will make the developer feel a little bit more safe because you can't know if future cpu context switching if will or when will happen.
A better and a more recommended approach is to use Reentrant Locks and Conditions:
//I've edited the source code from this link
Condition isFullCondition;
Condition isEmptyCondition;
Lock lock;
public BQueue() {
this(Integer.MAX_VALUE);
}
public BQueue(int limit) {
this.limit = limit;
lock = new ReentrantLock();
isFullCondition = lock.newCondition();
isEmptyCondition = lock.newCondition();
}
public void put (T t) {
lock.lock();
try {
while (isFull()) {
try {
isFullCondition.await();
} catch (InterruptedException ex) {}
}
q.add(t);
isEmptyCondition.signalAll();
} finally {
lock.unlock();
}
}
public T get() {
T t = null;
lock.lock();
try {
while (isEmpty()) {
try {
isEmptyCondition.await();
} catch (InterruptedException ex) {}
}
t = q.poll();
isFullCondition.signalAll();
} finally {
lock.unlock();
}
return t;
}
Using this approach there's no need for double checking, because the lock object is shared between the two methods, meaning only one thread a or b can enter any of these methods at a time unlike synchronized methods which creates different monitors, and only those threads waiting because the queue is full will be notified when there's more space, and the same goes for threads waiting because the queue is empty, this will lead to a better cpu utilization.
you can find more detailed example with source code here
I think logically there is no harm doing that extra check before notifyAll().
You can simply notifyAll() once you put/get something from the queue. Everything will still work, and your code is shorter. However, there is also no harm checking if anyone is potentially waiting (by checking if hitting the boundary of queue) before you invoke notifyAll(). This extra piece of logic saves unnecessary notifyAll() invocations.
It just depends on you want a shorter and cleaner code, or you want your code to run more efficiently. (Haven't looked into notifyAll() 's implementation. If it is a cheap operation if there is no-one waiting, the performance gain may not be obvious for that extra checking anyway)
The reason why the authors used notifyAll() is simple: they had no clue whether or not it was necessary, so they decided for the "safer" option.
In the above example it would be sufficient to just call notify() as for each single element added, only a single thread waiting can be served under all circumstances.
This becomes more obvious, if your queue as well has the option to add multiple elements in one step like addAll(Collection<T> list), as in this case more than one thread waiting on an empty list could be served, to be exact: as many threads as elements have been added.
The notifyAll() however causes an extra overhead in the special single-element case, as many threads are woken up unnecessarily and therefore have to be put to sleep again, blocking queue access in the meantime. So replacing notifyAll() with notify() would improve speed in this special case.
But then not using wait/notify and synchronized at all, but instead use the concurrent package would increase speed by a lot more than any smart wait/notify implementation could ever get to.
I would like to write a simple blocking queue implementation which will help the people to understand this easily. This is for someone who is novice to this.
class BlockingQueue {
private List queue = new LinkedList();
private int limit = 10;
public BlockingQueue(int limit){
this.limit = limit;
}
public synchronized void enqueue(Object ele) throws InterruptedException {
while(queue.size() == limit)
wait();
if(queue.size() == 0)
notifyAll();
// add
queue.add(ele);
}
public synchronized Object deque() throws InterruptedException {
while (queue.size() == 0)
wait();
if(queue.size() == limit)
notifyAll();
return queue.remove(0);
}
}

Java class as a Monitor

i need to write a java program but i need some advice before starting on my own.
The program i will be writing is to do the following:
Simulate a shop takes advanced order for donuts
The shop would not take further orders, once 5000 donuts have been ordered
Ok i am kind of stuck thinking if i should be writing the java-class to act as a Monitor or should i use Java-Semaphore class instead?
Please advice me. Thanks for the help.
Any java object can work as a monitor via the wait/notify methods inherited from Object:
Object monitor = new Object();
// thread 1
synchronized(monitor) {
monitor.wait();
}
// thread 2
synchronized(monitor) {
monitor.notify();
}
Just make sure to hold the lock on the monitor object when calling these methods (don't worry about the wait, the lock is released automatically to allow other threads to acquire it). This way, you have a convenient mechanism for signalling among threads.
It seems to me like you are implementing a bounded producer-consumer queue. In this case:
The producer will keep putting items in a shared queue.
If the queue size reaches 5000, it will call wait on a shared monitor and go to sleep.
When it puts an item, it will call notify on the monitor to wake up the consumer if it's waiting.
The consumer will keep taking items from the queue.
When it takes an item, it will call notify on the monitor to wake up the producer.
If the queue size reaches 0 the consumer calls wait and goes to sleep.
For an even more simplified approach, have a loop at the various implementation of BlockingQueue, which provides the above features out of the box!
It seems to me that the core of this exercise is updating a counter (number of orders taken), in a thread-safe and atomic fashion. If implemented incorrectly, your shop could end up taking more than 5000 pre-orders due to missed updates and possibly different threads seeing stale values of the counter.
The simplest way to update a counter atomically is to use synchronized methods to get and increment it:
class DonutShop {
private int ordersTaken = 0;
public synchronized int getOrdersTaken() {
return ordersTaken;
}
public synchronized void increaseOrdersBy(int n) {
ordersTaken += n;
}
// Other methods here
}
The synchronized methods mean that only one thread can be calling either method at any time (and they also provide a memory barrier to ensure that different threads see the same value rather than locally cached ones which may be outdated). This ensures a consistent view of the counter across all threads in your application.
(Note that I didn't have a "set" method but an "increment" method. The problem with "set" is that if client has to call shop.set(shop.get() + 1);, another thread could have incremented the value between the calls to get and set, so this update would be lost. By making the whole increment operation atomic - because it's in the synchronized block - this situation cannot occur.
In practice I would probably use an AtomicInteger instead, which is basically a wrapper around an int to allow for atomic queries and updates, just like the DonutShop class above. It also has the advantage that it's more efficient in terms of minimising exclusive blocking, and it's part of the standard library so will be more immediately familiar to other developers than a class you've written yourself.
In terms of correctness, either will suffice.
Like Tudor wrote, you can use any object as monitor for general purpose locking and synchronization.
However, if you got the requirement that only x orders (x=5000 for your case) can be processed at any one time, you could use the java.util.concurrent.Semaphore class. It is made specifically for use cases where you can only have fixed number of jobs running - it is called permits in the terminology of Semaphore
If you do the processing immediately, you can go with
private Semaphore semaphore = new Semaphore(5000);
public void process(Order order)
{
if (semaphore.tryAcquire())
{
try
{
//do your processing here
}
finally
{
semaphore.release();
}
}
else
{
throw new IllegalStateException("can't take more orders");
}
}
If if takes more than that (human input required, starting another thread/process, etc.), you need to add callback for when the processing is over, like:
private Semaphore semaphore = new Semaphore(5000);
public void process(Order order)
{
if (semaphore.tryAcquire())
{
//start a new job to process order
}
else
{
throw new IllegalStateException("can't take more orders");
}
}
//call this from the job you started, once it is finished
public void processingFinished(Order order)
{
semaphore.release();
//any other post-processing for that order
}

Is this java code thread-safe?

I am planning to use this schema in my application, but I was not sure whether this is safe.
To give a little background, a bunch of servers will compute results of sub-tasks that belong to a single task and report them back to the central server. This piece of code is used to register the results, and also check whether all the subtasks for the task has completed and if so, report that fact only once.
The important point is that, all task must be reported once and only once as soon as it is completed (all subTaskResults are set).
Can anybody help? Thank you! (Also, if you have a better idea to solve this problem, please let me know!)
*Note that I simplified the code for brevity.
Solution I
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
Semaphore permission = new Semaphore(1);
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
return check() ? this : null;
}
private boolean check(){
for(AtomicReference ref : subtasks){
if(ref.get()==null){
return false;
}
}//for
return permission.tryAquire();
}
}//class
Stephen C kindly suggested to use a counter. Actually, I have considered that once, but I reasoned that the JVM could reorder the operations and thus, a thread can observe a decremented counter (by another thread) before the result is set in AtomicReference (by that other thread).
*EDIT: I now see this is thread safe. I'll go with this solution. Thanks, Stephen!
Solution II
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
AtomicInteger counter = new AtomicInteger(subtasks.size());
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
//In the actual app, if !compareAndSet(null, result) return null;
return check() ? this : null;
}
private boolean check(){
return counter.decrementAndGet() == 0;
}
}//class
I assume that your use-case is that there are multiple multiple threads calling set, but for any given value of id, the set method will be called once only. I'm also assuming that populateMap creates the entries for all used id values, and that subtasks and permission are really private.
If so, I think that the code is thread-safe.
Each thread should see the initialized state of the subtasks Map, complete with all keys and all AtomicReference references. This state never changes, so subtasks.get(id) will always give the right reference. The set(result) call operates on an AtomicReference, so the subsequent get() method calls in check() will give the most up-to-date values ... in all threads. Any potential races with multiple threads calling check seem to sort themselves out.
However, this is a rather complicated solution. A simpler solution would be to use an concurrent counter; e.g. replace the Semaphore with an AtomicInteger and use decrementAndGet instead of repeatedly scanning the subtasks map in check.
In response to this comment in the updated solution:
Actually, I have considered that once,
but I reasoned that the JVM could
reorder the operations and thus, a
thread can observe a decremented
counter (by another thread) before the
result is set in AtomicReference (by
that other thread).
The AtomicInteger and AtomicReference by definition are atomic. Any thread that tries to access one is guaranteed to see the "current" value at the time of the access.
In this particular case, each thread calls set on the relevant AtomicReference before it calls decrementAndGet on the AtomicInteger. This cannot be reordered. Actions performed by a thread are performed in order. And since these are atomic actions, the efects will be visible to other threads in order as well.
In other words, it should be thread-safe ... AFAIK.
The atomicity guaranteed (per class documentation) explicitly for AtomicReference.compareAndSet extends to set and get methods (per package documentation), so in that regard your code appears to be thread-safe.
I am not sure, however, why you have Semaphore.tryAquire as a side-effect there, but without complimentary code to release the semaphore, that part of your code looks wrong.
The second solution does provide a thread-safe latch, but it's vulnerable to calls to set() that provide an ID that's not in the map -- which would trigger a NullPointerException -- or more than one call to set() with the same ID. The latter would mistakenly decrement the counter too many times and falsely report completion when there are presumably other subtasks IDs for which no result has been submitted. My criticism isn't with regard to the thread safety, but rather to the invariant maintenance; the same flaw would be present even without the thread-related concern.
Another way to solve this problem is with AbstractQueuedSynchronizer, but it's somewhat gratuitous: you can implement a stripped-down counting semaphore, where each call set() would call releaseShared(), decrementing the counter via a spin on compareAndSetState(), and tryAcquireShared() would only succeed when the count is zero. That's more or less what you implemented above with the AtomicInteger, but you'd be reusing a facility that offers more capabilities you can use for other portions of your design.
To flesh out the AbstractQueuedSynchronizer-based solution requires adding one more operation to justify the complexity: being able to wait on the results from all the subtasks to come back, such that the entire task is complete. That's Task#awaitCompletion() and Task#awaitCompletion(long, TimeUnit) in the code below.
Again, it's possibly overkill, but I'll share it for the purpose of discussion.
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.AbstractQueuedSynchronizer;
final class Task
{
private static final class Sync extends AbstractQueuedSynchronizer
{
public Sync(int count)
{
setState(count);
}
#Override
protected int tryAcquireShared(int ignored)
{
return 0 == getState() ? 1 : -1;
}
#Override
protected boolean tryReleaseShared(int ignored)
{
int current;
do
{
current = getState();
if (0 == current)
return true;
}
while (!compareAndSetState(current, current - 1));
return 1 == current;
}
}
public Task(int count)
{
if (count < 0)
throw new IllegalArgumentException();
sync_ = new Sync(count);
}
public boolean set(int id, Object result)
{
// Ensure that "id" refers to an incomplete task. Doing so requires
// additional synchronization over the structure mapping subtask
// identifiers to results.
// Store result somehow.
return sync_.releaseShared(1);
}
public void awaitCompletion()
throws InterruptedException
{
sync_.acquireSharedInterruptibly(0);
}
public void awaitCompletion(long time, TimeUnit unit)
throws InterruptedException
{
sync_.tryAcquireSharedNanos(0, unit.toNanos(time));
}
private final Sync sync_;
}
I have a weird feeling reading your example program, but it depends on the larger structure of your program what to do about that. A set function that also checks for completion is almost a code smell. :-) Just a few ideas.
If you have synchronous communication with your servers you might use an ExecutorService with the same number of threads like the number of servers that do the communication. From this you get a bunch of Futures, and you can naturally proceed with your calculation - the get calls will block at the moment the result is needed but not yet there.
If you have asynchronous communication with the servers you might also use a CountDownLatch after submitting the task to the servers. The await call blocks the main thread until the completion of all subtasks, and other threads can receive the results and call countdown on each received result.
With all these methods you don't need special threadsafety measures other than that the concurrent storing of the results in your structure is threadsafe. And I bet there are even better patterns for this.

Categories

Resources