First the code fragments...
final class AddedOrders {
private final Set<Order> orders = Sets.newConcurrentHashSet();
private final Set<String> ignoredItems = Sets.newConcurrentHashSet();
private boolean added = false;
public synchronized void clear() {
added = false;
}
public synchronized void add(Order order) {
added = orders.add(order);
}
public synchronized void remove(Order order) {
if (added) orders.remove(order);
}
public synchronized void ban(String item) {
ignoredItems.add(item);
}
public synchronized boolean has(Order order) {
return orders.contains(order);
}
public synchronized Set<Order> getOrders() {
return orders;
}
public synchronized boolean ignored(String item) {
return ignoredItems.contains(item);
}
}
private final AddedOrders added = new AddedOrders();
...
boolean subscribed;
int i = 10;
synchronized (added) {
while (!(subscribed = client.getSubscribedOrders().containsAll(added.getOrders())) && (i>0)) {
Helper.out("...order not subscribed yet (try: %d)", i);
Thread.sleep(200);
i--;
}
}
What I'd like to know...
Could someone point out which synchronized are not necessary?
Of course this is not the full code but assume that in the full project that all methods are called, and that some combinations of methods are called in the check value first, then modify style
added(the class) is accessed by multipleThreads
client is part of an external Server API, that I'm not entirely sure if it is Thread-Safe yet but I think it must be
ConcurrentHashSet is a google guava Class but it is based on ConcurrentHashMap apparently and the docs say it carries all the same concurrency guarantees.
But I don't really understand completely what those guarantees all are, even though I did some reading. Namely I know it's not ok to just check and set a value in a synchronized HashMap (without synchronizing on the synchronized Map using a synchronized block), however I do not know if you can do that in ConcurrentHashMap or not (without synchronizing on the ConcurrentHashMap using a synchronized block).
The only cases in your code where you really need synchronized are the ones where you test or update the added flag. You need the synchronized block to make sure that changes to the flag are visible across threads, and you also need to make sure that the added flag change is made in step with the change to the orders data structure. The synchronized keyword keeps another thread from barging in and doing something in between checking the flag and changing the data structure (the remove method could be broken like this if you remove the synchronization).
The code toward the end seems problematic because you're locking on the added object and then not letting go of the lock, there's not an opportunity for any other thread to make the changes that the thread is looking for. Although it looks like you're waiting for another object to change, so this criticism may be invalid. Sleeping with a lock held seems dangerous, though. This kind of thing is why Object#wait releases the lock it acquired.
Also note that since you're passing references out to the Orders set, code outside this class can add orders. You should do something to protect this internal data, like returning it wrapped in an immutableSet so callers can't make changes.
In general synchronization is used when you want to impose some granularity on changes, where you have 2 or more changes you want made together, without possibility of interleaving. An example is a check-then-act sequence where you execute some code that makes a change based on the value of something else, and you don't want some other thread to execute in between the check and the action (so the decision to act could be made, then the condition that allowed that action changes, so that the action could be invalid). If individual values are changed but they are unrelated, then you can make them volatile or use Atomic variables, and reduce the amount of locking you have to do.
It's a valid point that the synchronized keyword could be removed in cases like the clear method, where the only thing that changes is the added flag, which could be made volatile. The purpose of the added flag continues to elude me. Anything that enters a value that's already present can turn the flag back to false, it's not apparent that reasoning about any action based on what the current value of the flag makes any sense if this structure is getting modified concurrently.
Without knowing the exact context it's hard to say, but in general, classes created without considering their being used with multiple threads probably need to be reworked extensively before being used in a concurrent environment.
Related
I have two methods as follows:
class A{
void method1(){
someObj.setSomeAttribute(true);
someOtherObj.callMethod(someObj);
}
void method2(){
someObj.setSomeAttribute(false);
someOtherObj.callMethod(someObj);
}
}
where in another place that attribute is evaluated:
class B{
void callMethod(Foo someObj){
if(someObj.getAttribute()){
//do one thing
} else{
//so another thing
}
}
}
Note that A.method1 and A.method2 are updating the attribute of the same object. If those 2 methods are run in 2 threads, will this work or will there be unexpected results?
Will there be unexpected results? Yes, guaranteed, in that if you modify things you wouldn't want to have an impact on your app (such as the phase of the moon, the current song playing in your winamp, whether your dog is cuddling near the CPU, if it's the 5th tuesday of the month, and other such things), that may have an effect on behaviour. Which you don't want.
What you've described is a so-called violation of the java memory model: The end result is that any java implementation is free to return any of multiple values and nevertheless, that VM is operating properly according to the java specification. Even if it does so seemingly arbitrarily.
As a general rule, each thread gets an unfair coin. Unfair, in that it will try to mess with you: It'll flip correctly every time when you test it out, and then in production, and only when you're giving a demo to that crucial customer, it'll get ya.
Every time it reads to or writes from any field, it will flip this mean coin. On heads, it will use the actual field. On tails, it will use a local copy it made.
That's oversimplifying the model quite a bit, but it's a good start to try to get your head around how this works.
The way out is to force so-called 'comes before' relationships: What java will do, is ensure that what you can observe matches these relationships: If event A is defined as having a comes-before relationship vs. event B, then anything A did will be observed, exactly as is, by B, guaranteed. No more coin flips.
Examples of establishing comes-before relationships involve using volatile, synchronized, and any methods that use these things internally.
NB: Of course. if your setSomeAttribute method, which you did not paste, includes some comes-before-establishing act, then there's no problem here, but as a rule a method called setX will not be doing that.
An example of one that doesn't:
class Example {
private String attr;
public void setAttr(String attr) {
this.attr = attr;
}
}
some examples of ones that do:
Let's say method B.callMethod is executed in the same thread as method1 - then you are guaranteed to at least observe the change method1 made, though it's still a coin flip (whether you actually see what method2 did or not). What would not be possible is seeing the value of that attribute before either method1 or method2 runs, because code running in a single thread has comes-before across the entire run (any line that is executed before another in the same thread has a comes-before relationship).
The set method looks like:
class Example {
private String attr;
private final Object lock = new Object();
public void setAttr(String attr) {
synchronized (lock) {
this.attr = attr;
}
}
public String getAttr() {
synchronized (lock) {
return this.attr;
}
}
}
Now the get and set ops lock on the same object, that's one of the ways to establish comes-before. Which thread got to a lock first is observable behaviour; if method1's set got there before B's get, then you are guaranteed to observe method1's set.
More generally, sharing state between threads is extremely tricky and you should endeavour not do so. The alternatives are:
Initialize all state before starting a thread, then do the job, and only when it is finished, relay all results back. Fork/join does this.
Use a messaging system that has great concurrency fundamentals, such as a database, which has transactions, or message queue libraries.
If you must share state, try to write things in terms of the nice classes in j.u.concurrent.
I assume what you expected is when you call A.method1, someObj.getAttribute() will return true in B.callMethod, when you call A.method2, someObj.getAttribute() will return false in B.callMethod.
Unfortunately,this will not work. Because between the line setSomeAttribute and callMethod,other thread may have change the value of the attribute.
If you are only use the attribute in callMethod,why not just pass the attribute instead of the Foo object. Code as follow:
class A{
void method1(){
someOtherObj.callMethod(true);
}
}
class B{
void callMethod(boolean flag){
if(flag){
//do one thing
} else{
//so another thing
}
}
}
If you must use Foo as the parameter, what you can do is to make setAttribute and callMethod atomic.
The easiest way to achieve it is to make it synchronized.Code as follow:
synchronized void method1(){
someObj.setSomeAttribute(true);
someOtherObj.callMethod(someObj);
}
synchronized void method2(){
someObj.setSomeAttribute(false);
someOtherObj.callMethod(someObj);
}
But this may have bad performance, you can achieve it with some more fine-grained lock.
I had a strong need for a synchronizer similar to a CountDownLatch, but the starting number for the countdown is unknown. To add context, if I'm going through a buffered recordset (say from a text file or a query) and kicking off a runnable for each record, but I don't know how many records there will be... I need a synchronizer that signals when the iteration is complete and all runnables are complete.
This is the synchronizer I came up with... a BufferedLatch. A method is called in the iteration loop for each record incrementing the recordSetSize. At the end of each runnable kicked off for each record, the processedRecordSetSize is incremented. When the iteration through all records is complete (but runnables may still be in queue), the setDownloadComplete() method is called letting the BufferedLatch know the recordSetSize is now fixed. The await() method waits for the iterationComplete variable to be true (recordsetSize is now fixed) and recordsetSize == processedRecordSetSize;
Is this an optimal implementation of this synchronizer? Is there more concurrent opportunity that synchronization is holding back? Although testing seems to work fine, are there any gotcha's I'm overlooking?
import java.util.concurrent.atomic.AtomicInteger;
public final class BufferedLatch {
/** A customized synchronizer built for concurrent iteration processes where the number of objects to be iterated is unknown
* and a runnable will be kicked off for each object, and the await() method will wait for all runnables to be complete
*/
private final AtomicInteger recordsetSize = new AtomicInteger(0);
private final AtomicInteger processedRecordsetSize = new AtomicInteger(0);
private volatile boolean iterationComplete = false;
public int incrementRecordsetSize() throws Exception {
if (iterationComplete) {
throw new Exception("Cannot increase recordsize after download is flagged complete!");
}
else {
return recordsetSize.incrementAndGet();
}
}
public void incrementProcessedRecordSize() {
synchronized(this) {
processedRecordsetSize.incrementAndGet();
if (iterationComplete) {
if (processedRecordsetSize.get() == recordsetSize.get()) {
this.notifyAll();
}
}
}
}
public void setDownloadComplete() {
synchronized(this) {
iterationComplete = true;
}
}
public void await() throws InterruptedException {
while (! (iterationComplete && (processedRecordsetSize.get() == recordsetSize.get()))) {
synchronized(this) {
while (! (iterationComplete && (processedRecordsetSize.get() == recordsetSize.get()))) {
this.wait();
}
}
}
}
}
UPDATE-- NEW CODE
public final class BufferedLatch {
/** A customized synchronizer built for concurrent iteration processes where the number of objects to be iterated is unknown
* and a runnable will be kicked off for each object, and the await() method will wait for all runnables to be complete
*/
private int recordCount = 0;
private int processedRecordCount = 0;
private boolean iterationComplete = false;
public synchronized void incrementRecordCount() throws Exception {
if (iterationComplete) {
throw new Exception("Cannot increase recordCount after download is flagged complete!");
}
else {
recordCount++;
}
}
public synchronized void incrementProcessedRecordCount() {
processedRecordCount++;
if (iterationComplete && recordCount == processedRecordCount) {
this.notifyAll();
}
}
public synchronized void setIterationComplete() {
iterationComplete = true;
if (iterationComplete && recordCount == processedRecordCount) {
this.notifyAll();
}
}
public synchronized void await() throws InterruptedException {
while (! (iterationComplete && (recordCount == processedRecordCount))) {
this.wait();
}
}
}
Probably not. I think conceptually you're onto something here, as it looks like your application needs something more than just a CountDownLatch. However, the implementation seems to have several problems.
First, I note that it looks odd to mix atomics/volatiles AND ordinary object monitor locks (synchronized). While there may be proper uses that mix these different constructs, mixing in this case I believe will lead to errors.
Consider incrementRecordsetSize() which first checks iterationComplete and only if it's false does it increment recordsetSize. The iterationComplete variable is volatile so updates from other threads will be visible. However, the fact that no locking is done here allows TOCTOU race conditions (time-of-check vs time-of-use). The rule seems to be, recordsetSize must not be incremented if iterationComplete is true. Suppose thread T1 comes along and finds iterationComplete to be false, so it decides to increment recordsetSize. Before it does so, another thread T2 comes along and sets iterationComplete to be true. This would allow T1 to do the increment improperly. Worse, before it does so, suppose another thread T3 came along and called incrementProcessedRecordSize(). It would increment processedRecordsetSize and then find iterationComplete true. It further might find that processedRecordsetSize equals recordsetSize and then notify all waiters, who then proceed as if the processing is complete. But it's not, as T1 then proceeds to increment recordsetSize and presumably continues with its processing.
The problem here is that this object's state consists of the fusion of three independent pieces of state -- two int counters and a boolean -- and all three must be read and written atomically. If certain bits of logic attempt to take advantage of individual volatile or atomic properties, it introduces the possibility of race conditions such as the one I described.
I'd suggest rewriting this as a plain object with two plain ints and a boolean (not atomic, not volatile) and just lock around everything. This should certainly clear up the logic and make things easier to understand.
In incrementProcessedRecordSize I note that the condition essentially duplicates the condition in the await method. A simplifying convention is for all updates to notify and have the condition evaluated only by the waiters. This may result in some unnecessary wakeups. If this is a problem, you might consider minimizing the number of notifies, but you need to think about maintainability. If you're not careful, the wait/notify conditions will become spread across the code and will be very hard to reason about. Alternatively, you could refactor the condition into a method and call it from the different places that do waiting and notification.
It looks like await() does a complicated form of double-checked locking. Instead of testing a volatile boolean outside the lock, it tests several separate pieces of information both outside and inside the lock. This seems susceptible to TOCTOU problems (as above) but it might be safe if you can prove the state really latches, that is, that once it becomes true it never returns to false. I'd have to stare at the code for a long time before I'd be able to convince myself it's correct.
On the other hand, what does this buy you? It seems to optimize away just the taking of the lock. If you have a zillion threads that are going to come by after processing is complete, it might be worth it, but it doesn't seem like it. I'd just remove the outer while loop and check the variables within a synchronized block.
Finally, having an object that represents counters and a boolean may very well be sensible for what you're doing, but other things you've said (in the question and in comments) are that some threads are generating a workload (e.g. reading lines from a file) and other threads are retiring that workload. This implies that there is some other data structure like a queue that contains this workload, and you have a producer-consumer problem here. That other structure has to be thread-safe, of course, since multiple threads are interacting over it. But the counters and boolean in this structure need to be updated in lockstep with the updates to the workload structure, otherwise there could be race conditions between checking and updating these separate objects.
It seems to me you could replace the counters in this object with the queue and just put simple locks around everything. The producers would append to the queue until they're done, at which time they set iterationComplete to true which prevents more work from being added. The consumers pull from the queue until iterationComplete is true and the queue is empty, at which point they're done. If they find the queue empty but iterationComplete is false, they know to block while awaiting further work.
I'd say to stick with simple locking and avoid volatiles/atomics until you get the basics correct. If there are bottlenecks in that code, then apply optimizations selectively while preserving the same invariants.
So far, I have used double-checked locking as follows:
class Example {
static Object o;
volatile static boolean setupDone;
private Example() { /* private constructor */ }
getInstance() {
if(!setupDone) {
synchronized(Example.class) {
if(/*still*/ !setupDone) {
o = new String("typically a more complicated operation");
setupDone = true;
}
}
}
return o;
}
}// end of class
Now, because we have groups of threads that all share this class, we changed the boolean to a ConcurrentHashMap as follows:
class Example {
static ConcurrentHashMap<String, Object> o = new ConcurrentHashMap<String, Object>();
static volatile ConcurrentHashMap<String, Boolean> setupsDone = new ConcurrentHashMap<String, Boolean>();
private Example() { /* private constructor */ }
getInstance(String groupId) {
if (!setupsDone.containsKey(groupId)) {
setupsDone.put(groupId, false);
}
if(!setupsDone.get(groupId)) {
synchronized(Example.class) {
if(/*still*/ !setupsDone.get(groupId)) {
o.put(groupId, new String("typically a more complicated operation"));
setupsDone.put(groupId, true); // will this still maintain happens-before?
}
}
}
return o.get(groupId);
}
}// end of class
My question now is: If I declare a standard Object as volatile, I will only get a happens-before relationship established when I read or write its reference. Therefore writing an element within that Object (if it is e.g. a standard HashMap, performing a put() operation on it) will not establish such a relationship. Is that correct? (What about reading an element; wouldn't that require to read the reference as well and thus establish the relationship?)
Now, with using a volatile ConcurrentHashMap, will writing an element to it establish the happens-before relationship, i.e. will the above still work?
Update: The reason for this question and why double-checked locking is important:
What we actually set up (instead of an Object) is a MultiThreadedHttpConnectionManager, to which we pass some settings, and which we then pass into an HttpClient, that we set up, too, and that we return. We have up to 10 groups of up to 100 threads each, and we use double-checked locking as we don't want to block each of them whenever they need to acquire their group's HttpClient, as the whole setup will be used to help with performance testing. Because of an awkward design and an odd platform we run this on we cannot just pass objects in from outside, so we hope to somehow make this setup work. (I realise the reason for the question is a bit specific, but I hope the question itself is interesting enough: Is there a way to get that ConcurrentHashMap to use "volatile behaviour", i.e. establish a happens-before relationship, as the volatile boolean did, when performing a put() on the ConcurrentHashMap? ;)
Yes, it is correct. volatile protects only that object reference, but nothing else.
No, putting an element to a volatile HashMap will not create a happens-before relationship, not even with a ConcurrentHashMap.
Actually ConcurrentHashMap does not hold lock for read operations (e.g. containsKey()). See ConcurrentHashMap Javadoc.
Update:
Reflecting your updated question: you have to synchronize on the object you put into the CHM. I recommend to use a container object instead of directly storing the Object in the map:
public class ObjectContainer {
volatile boolean isSetupDone = false;
Object o;
}
static ConcurrentHashMap<String, ObjectContainer> containers =
new ConcurrentHashMap<String, ObjectContainer>();
public Object getInstance(String groupId) {
ObjectContainer oc = containers.get(groupId);
if (oc == null) {
// it's enough to sync on the map, don't need the whole class
synchronized(containers) {
// double-check not to overwrite the created object
if (!containers.containsKey(groupId))
oc = new ObjectContainer();
containers.put(groupId, oc);
} else {
// if another thread already created, then use that
oc = containers.get(groupId);
}
} // leave the class-level sync block
}
// here we have a valid ObjectContainer, but may not have been initialized
// same doublechecking for object initialization
if(!oc.isSetupDone) {
// now syncing on the ObjectContainer only
synchronized(oc) {
if(!oc.isSetupDone) {
oc.o = new String("typically a more complicated operation"));
oc.isSetupDone = true;
}
}
}
return oc.o;
}
Note, that at creation, at most one thread may create ObjectContainer. But at initialization each groups may be initialized in parallel (but at most 1 thread per group).
It may also happen that Thread T1 will create the ObjectContainer, but Thread T2 will initialize it.
Yes, it is worth to keep the ConcurrentHashMap, because the map reads and writes will happen at the same time. But volatile is not required, since the map object itself will not change.
The sad thing is that the double-check does not always work, since the compiler may create a bytecode where it is reusing the result of containers.get(groupId) (that's not the case with the volatile isSetupDone). That's why I had to use containsKey for the double-checking.
Therefore writing an element within that Object (if it is e.g. a standard HashMap, performing a put() operation on it) will not establish such a relationship. Is that correct?
Yes and no. There is always a happens-before relationship when you read or write a volatile field. The issue in your case is that even though there is a happens-before when you access the HashMap field, there is no memory synchronization or mutex locking when you are actually operating on the HashMap. So multiple threads can see different versions of the same HashMap and can create a corrupted data structure depending on race conditions.
Now, with using a volatile ConcurrentHashMap, will writing an element to it establish the happens-before relationship, i.e. will the above still work?
Typically you do not need to mark a ConcurrentHashMap as being volatile. There are memory barriers that are crossed internal to the ConcurrentHashMap code itself. The only time I'd use this is if the ConcurrentHashMap field is being changed often -- i.e. is non-final.
Your code really seems like premature optimization. Has a profiler shown you that it is a performance problem? I would suggest that you just synchronize on the map and me done with it. Having two ConcurrentHashMap to solve this problem seems like overkill to me.
So I've been reading on concurrency and have some questions on the way (guide I followed - though I'm not sure if its the best source):
Processes vs. Threads: Is the difference basically that a process is the program as a whole while a thread can be a (small) part of a program?
I am not exactly sure why there is a interrupted() method and a InterruptedException. Why should the interrupted() method even be used? It just seems to me that Java just adds an extra layer of indirection.
For synchronization (and specifically about the one in that link), how does adding the synchronize keyword even fix the problem? I mean, if Thread A gives back its incremented c and Thread B gives back the decremented c and store it to some other variable, I am not exactly sure how the problem is solved. I mean this may be answering my own question, but is it supposed to be assumed that after one of the threads return an answer, terminate? And if that is the case, why would adding synchronize make a difference?
I read (from some random PDF) that if you have two Threads start() subsequently, you cannot guarantee that the first thread will occur before the second thread. How would you guarantee it, though?
In synchronization statements, I am not completely sure whats the point of adding synchronized within the method. What is wrong with leaving it out? Is it because one expects both to mutate separately, but to be obtained together? Why not just have the two non-synchronized?
Is volatile just a keyword for variables and is synonymous with synchronized?
In the deadlock problem, how does synchronize even help the situation? What makes this situation different from starting two threads that change a variable?
Moreover, where is the "wait"/lock for the other person to bowBack? I would have thought that bow() was blocked, not bowBack().
I'll stop here because I think if I went any further without these questions answered, I will not be able to understand the later lessons.
Answers:
Yes, a process is an operating system process that has an address space, a thread is a unit of execution, and there can be multiple units of execution in a process.
The interrupt() method and InterruptedException are generally used to wake up threads that are waiting to either have them do something or terminate.
Synchronizing is a form of mutual exclusion or locking, something very standard and required in computer programming. Google these terms and read up on that and you will have your answer.
True, this cannot be guaranteed, you would have to have some mechanism, involving synchronization that the threads used to make sure they ran in the desired order. This would be specific to the code in the threads.
See answer to #3
Volatile is a way to make sure that a particular variable can be properly shared between different threads. It is necessary on multi-processor machines (which almost everyone has these days) to make sure the value of the variable is consistent between the processors. It is effectively a way to synchronize a single value.
Read about deadlocking in more general terms to understand this. Once you first understand mutual exclusion and locking you will be able to understand how deadlocks can happen.
I have not read the materials that you read, so I don't understand this one. Sorry.
I find that the examples used to explain synchronization and volatility are contrived and difficult to understand the purpose of. Here are my preferred examples:
Synchronized:
private Value value;
public void setValue(Value v) {
value = v;
}
public void doSomething() {
if(value != null) {
doFirstThing();
int val = value.getInt(); // Will throw NullPointerException if another
// thread calls setValue(null);
doSecondThing(val);
}
}
The above code is perfectly correct if run in a single-threaded environment. However with even 2 threads there is the possibility that value will be changed in between the check and when it is used. This is because the method doSomething() is not atomic.
To address this, use synchronization:
private Value value;
private Object lock = new Object();
public void setValue(Value v) {
synchronized(lock) {
value = v;
}
}
public void doSomething() {
synchronized(lock) { // Prevents setValue being called by another thread.
if(value != null) {
doFirstThing();
int val = value.getInt(); // Cannot throw NullPointerException.
doSecondThing(val);
}
}
}
Volatile:
private boolean running = true;
// Called by Thread 1.
public void run() {
while(running) {
doSomething();
}
}
// Called by Thread 2.
public void stop() {
running = false;
}
To explain this requires knowledge of the Java Memory Model. It is worth reading about in depth, but the short version for this example is that Threads have their own copies of variables which are only sync'd to main memory on a synchronized block and when a volatile variable is reached. The Java compiler (specifically the JIT) is allowed to optimise the code into this:
public void run() {
while(true) { // Will never end
doSomething();
}
}
To prevent this optimisation you can set a variable to be volatile, which forces the thread to access main memory every time it reads the variable. Note that this is unnecessary if you are using synchronized statements as both keywords cause a sync to main memory.
I haven't addressed your questions directly as Francis did so. I hope these examples can give you an idea of the concepts in a better way than the examples you saw in the Oracle tutorial.
I am planning to use this schema in my application, but I was not sure whether this is safe.
To give a little background, a bunch of servers will compute results of sub-tasks that belong to a single task and report them back to the central server. This piece of code is used to register the results, and also check whether all the subtasks for the task has completed and if so, report that fact only once.
The important point is that, all task must be reported once and only once as soon as it is completed (all subTaskResults are set).
Can anybody help? Thank you! (Also, if you have a better idea to solve this problem, please let me know!)
*Note that I simplified the code for brevity.
Solution I
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
Semaphore permission = new Semaphore(1);
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
return check() ? this : null;
}
private boolean check(){
for(AtomicReference ref : subtasks){
if(ref.get()==null){
return false;
}
}//for
return permission.tryAquire();
}
}//class
Stephen C kindly suggested to use a counter. Actually, I have considered that once, but I reasoned that the JVM could reorder the operations and thus, a thread can observe a decremented counter (by another thread) before the result is set in AtomicReference (by that other thread).
*EDIT: I now see this is thread safe. I'll go with this solution. Thanks, Stephen!
Solution II
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
AtomicInteger counter = new AtomicInteger(subtasks.size());
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
//In the actual app, if !compareAndSet(null, result) return null;
return check() ? this : null;
}
private boolean check(){
return counter.decrementAndGet() == 0;
}
}//class
I assume that your use-case is that there are multiple multiple threads calling set, but for any given value of id, the set method will be called once only. I'm also assuming that populateMap creates the entries for all used id values, and that subtasks and permission are really private.
If so, I think that the code is thread-safe.
Each thread should see the initialized state of the subtasks Map, complete with all keys and all AtomicReference references. This state never changes, so subtasks.get(id) will always give the right reference. The set(result) call operates on an AtomicReference, so the subsequent get() method calls in check() will give the most up-to-date values ... in all threads. Any potential races with multiple threads calling check seem to sort themselves out.
However, this is a rather complicated solution. A simpler solution would be to use an concurrent counter; e.g. replace the Semaphore with an AtomicInteger and use decrementAndGet instead of repeatedly scanning the subtasks map in check.
In response to this comment in the updated solution:
Actually, I have considered that once,
but I reasoned that the JVM could
reorder the operations and thus, a
thread can observe a decremented
counter (by another thread) before the
result is set in AtomicReference (by
that other thread).
The AtomicInteger and AtomicReference by definition are atomic. Any thread that tries to access one is guaranteed to see the "current" value at the time of the access.
In this particular case, each thread calls set on the relevant AtomicReference before it calls decrementAndGet on the AtomicInteger. This cannot be reordered. Actions performed by a thread are performed in order. And since these are atomic actions, the efects will be visible to other threads in order as well.
In other words, it should be thread-safe ... AFAIK.
The atomicity guaranteed (per class documentation) explicitly for AtomicReference.compareAndSet extends to set and get methods (per package documentation), so in that regard your code appears to be thread-safe.
I am not sure, however, why you have Semaphore.tryAquire as a side-effect there, but without complimentary code to release the semaphore, that part of your code looks wrong.
The second solution does provide a thread-safe latch, but it's vulnerable to calls to set() that provide an ID that's not in the map -- which would trigger a NullPointerException -- or more than one call to set() with the same ID. The latter would mistakenly decrement the counter too many times and falsely report completion when there are presumably other subtasks IDs for which no result has been submitted. My criticism isn't with regard to the thread safety, but rather to the invariant maintenance; the same flaw would be present even without the thread-related concern.
Another way to solve this problem is with AbstractQueuedSynchronizer, but it's somewhat gratuitous: you can implement a stripped-down counting semaphore, where each call set() would call releaseShared(), decrementing the counter via a spin on compareAndSetState(), and tryAcquireShared() would only succeed when the count is zero. That's more or less what you implemented above with the AtomicInteger, but you'd be reusing a facility that offers more capabilities you can use for other portions of your design.
To flesh out the AbstractQueuedSynchronizer-based solution requires adding one more operation to justify the complexity: being able to wait on the results from all the subtasks to come back, such that the entire task is complete. That's Task#awaitCompletion() and Task#awaitCompletion(long, TimeUnit) in the code below.
Again, it's possibly overkill, but I'll share it for the purpose of discussion.
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.AbstractQueuedSynchronizer;
final class Task
{
private static final class Sync extends AbstractQueuedSynchronizer
{
public Sync(int count)
{
setState(count);
}
#Override
protected int tryAcquireShared(int ignored)
{
return 0 == getState() ? 1 : -1;
}
#Override
protected boolean tryReleaseShared(int ignored)
{
int current;
do
{
current = getState();
if (0 == current)
return true;
}
while (!compareAndSetState(current, current - 1));
return 1 == current;
}
}
public Task(int count)
{
if (count < 0)
throw new IllegalArgumentException();
sync_ = new Sync(count);
}
public boolean set(int id, Object result)
{
// Ensure that "id" refers to an incomplete task. Doing so requires
// additional synchronization over the structure mapping subtask
// identifiers to results.
// Store result somehow.
return sync_.releaseShared(1);
}
public void awaitCompletion()
throws InterruptedException
{
sync_.acquireSharedInterruptibly(0);
}
public void awaitCompletion(long time, TimeUnit unit)
throws InterruptedException
{
sync_.tryAcquireSharedNanos(0, unit.toNanos(time));
}
private final Sync sync_;
}
I have a weird feeling reading your example program, but it depends on the larger structure of your program what to do about that. A set function that also checks for completion is almost a code smell. :-) Just a few ideas.
If you have synchronous communication with your servers you might use an ExecutorService with the same number of threads like the number of servers that do the communication. From this you get a bunch of Futures, and you can naturally proceed with your calculation - the get calls will block at the moment the result is needed but not yet there.
If you have asynchronous communication with the servers you might also use a CountDownLatch after submitting the task to the servers. The await call blocks the main thread until the completion of all subtasks, and other threads can receive the results and call countdown on each received result.
With all these methods you don't need special threadsafety measures other than that the concurrent storing of the results in your structure is threadsafe. And I bet there are even better patterns for this.