Multi-threaded array comparison - Pushing events to a thread?

Multi-threaded array comparison - Pushing events to a thread? - java

I have a simple multi threading problem (in Java). I have 2 sets of 4 very large arrays and I have 4 threads, 1 for each array in the set. I want the threads, in parallel, to check if both sets, if their arrays have identical values. If one of the values in one of the arrays does not match the corresponding index value in the other array, then the two sets are not identical and all threads should stop what they are doing and move on to next 2 sets of 4 very large arrays. This process continues until all the pairs of array sets have been compared and deemed equal or not equal. I want all the threads to stop when one of the threads finds a mis-match. What is the correct way to implement this?

Here's one simple solution, but I don't know if it's the most efficient: Simply declare an object with a public boolean field.
public class TerminationEvent {
public boolean terminated = false;
}
Before starting the threads, create a new TerminationEvent object. Use this object as a parameter when you construct the thread objects, e.g.
public class MyThread implements Runnable {
private TerminationEvent terminationEvent;
public MyThread(TerminationEvent event) {
terminationEvent = event;
}
}
The same object will be passed to every MyThread, so they will all see the same boolean.
Now, the run() method in each MyThread will have something like
if (terminationEvent.terminated) {
break;
}
in the loop, and will set terminationEvent.terminated = true; when the other threads need to stop.
(Normally I wouldn't use public fields like terminated, but you said you wanted efficiency. I think this is a bit more efficient than a getter method, but I haven't tried benchmarking anything. Also, in a simple case like this, I don't think you need to worry about synchronization when the threads read or write the terminated field.)

Stopping other threads are usually done through the use of interrupts. Java threads do no longer use Thread.stop() because this was seen as unsafe in that it unlocks all monitors held by the thread, possibly leading to other threads being able to view objects in an inconsistent state (Ref: http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html). The threads are not "stopped" as such, but are commonly used to set a flag false:
The thread should check the interrupted flag (infrequently) before performing computations:
if (Thread.interrupted()) {
throw new InterruptedException();
}

Use a volatile variable to set the abort condition. In your check loop that is run by all threads, let those threads check a number N of values uninterrupted so they don't have to fetch the volatile too often, which may be costly compared to the value match test. Benchmark your solution to find the optimum for N on your target hardware.
Another way would be to use a ForkJoin approach where your result is true if a mismatch was found. Divide your array slices down to a minimum size similar to N.

Related

Does Java LongAdder's increment() & sum() prevent getting the same value twice?

Currently I am using AtomicLong as a synchronized counter in my application, but I have found that with high concurrency/contention, e.g. with 8 threads my throughput is much lower (75% lower) then single-threaded for obvious reasons (e.g. concurrent CAS).
Use case:
A counter variable which
is updated by multiple threads concurrently
has high write contention, basically every usage in a thread will consist of a write with an immediate read afterwards
Requirement is that each read from the counter (immediately after the writing) gets a unique incremented value.
It is not required that each retrieved counter value is increasing in the same order as the different threads(writers) increment the value.
So I tried to replace AtomicLong with a LongAdder, and indeed it looks from my measurements that my throughput with 8 threads is much better - (only) about 20% lower than single-threaded (compared to 75%).
However I'm not sure I correctly understand the way LongAdder works.
The JavaDoc says:
This class is usually preferable to AtomicLong when multiple threads
update a common sum that is used for purposes such as collecting
statistics, not for fine-grained synchronization control.
and for sum()
Returns the current sum. The returned value is NOT an atomic snapshot;
invocation in the absence of concurrent updates returns an accurate
result, but concurrent updates that occur while the sum is being
calculated might not be incorporated.
What is meant by fine-grained synchronization control ...
From looking at this so question and the source of AtomicLong and Striped64, I think I understand that if the update on an AtomicLong is blocked because of a CAS instruction issued by another thread, the update is stored thread-local and accumulated later to get some eventual consistency. So without further synchronization and because the incrementAndGet() in LongAdder is not atomic but two instructions, I fear the following is possible:
private static final LongAdder counter = new LongAdder(); // == 0
// no further synchronisation happening in java code
Thread#1 : counter.increment();
Thread#2 : counter.increment(); // CAS T#1 still ongoing, storing +1 thread-locally
Thread#2 : counter.sum(); // == 1
Thread#3 : counter.increment(); // CAS T#1 still ongoing, storing +1 thread-locally
Thread#3 : counter.sum(); // == 1
Thread#1 : counter.sum(); // == 3 (after merging everything)
If this is possible, AtomicLong is not really suitable for my use case, which probably then counts as "fine-grained synchronization control".
And then with my write/read^n pattern I probably can't do better then AtomicLong?

LongAdder is definitely not suitable for your use case of unique integer generation, but you don't need to understand the implementation or dig into the intricacies of the java memory model to determine that. Just look at the API: it has no compound "increment and get" type methods that would allow you to increment the value and get the old/new value back, atomically.
In terms of adding values, it only offers void add(long x) and void increment() methods, but these don't return a value. You mention:
the incrementAndGet in LongAdder is not atomic
... but I don't see incrementAndGet at all in LongAdder. Where are you looking?
Your idea of:
usage in a thread will consist of a w rite with an immediate read afterwards
Requirement is that each read
from the counter (immediately after the writing) gets a unique
incremented value. It is not required that each retrieved counter
value is increasing in the same order as the different
threads(writers) increment the value.
Doesn't work even for AtomicLong, unless by "write followed by a read" you mean calling the incrementAndGet method. I think it goes without saying that two separate calls on an AtomicLong or LongAdder (or any other object really) can never be atomic without some external locking.
So the Java doc, in my opinion, is a bit confusing. Yes, you should not use sum() for synchronization control, and yes "concurrent updates that occur while the sum is being calculated might not be incorporated"; however, the same is true of AtomicLong and its get() method. Increments that occur while calling get() similarly may or may not be reflected in the value returned by get().
Now there are some guarantees that are weaker with LongAdder compared to AtomicLong. One guarantee you get with AtomicLong is that a series of operations transition the object though a specific series of values, and where there is no guarantee on what specific value a thread will see, all the values should come from the true set of transition values.
For example, consider starting with an AtomicLong with value zero, and two threads incrementing it concurrently, by 1 and 3 respetively. The final value will always be 4, and only two possible transition paths are possible: 0 -> 1 -> 4 or 0 -> 3 -> 4. For a given execution, only one of those can have occurred and all concurrent reads will be consistent with that execution. That is, if any thread reads a 1, then no thread may read a 3 and vice versa (of course, there is no guarantee that any thread will see a 1 or 3 at all, they may all see 0 or 4.
LongCounter doesn't provide that guarantee. Since the write process is not locked, and the read process adds together several values in a not-atomic fashion, it is possible for one thread to see a 1 and another to see a 3 in the same execution. Of course, it still doesn't synthesize "fake" values - you should never read a "2" for example.
Now that's a bit of a subtle concept and the Javadoc doesn't get it across well. They go with a pretty weak and not particularly formal statement instead. Finally, I don't think you can observe the behavior above with pure increments (rather than additions) since there is only one path then: 0 -> 1 -> 2 -> 3, etc. So for increments, I think AtomicLong.get() and LongCounter.sum() have pretty much the same guarantees.
Something Useful
OK, so I'll give you something that might be useful. You can still implement what you want for efficiently, as long as you don't have strict requirements on the exact relationship between the counter value each thread gets and the order they were read.
Re-purpose the LongAdder Idea
You could make the LongAdder idea work fine for unique counter generation. The underlying idea of LongAdder is to spread the counter into N distinct counters (which live on separate cache lines). Any given call updates one of those counters based on the current thread ID2, and a read needs to sum the values from all counters. This means that writes have low contention, at the cost of a bit more complexity, and at a large cost to reads.
Now way the write works by design doesn't let you read the full LongAdder value, but since you just want a unique value you could use the same code except with the top or bottom N bits3 set uniquely per counter.
Now the write can return the prior value, like getAndIncrement and it will be unique because the fixed bits keep it unique among all counters in that object.
Thread-local Counters
A very fast and simple way is to use a unique value per thread, and a thread-local counter. When the thread local is initialized, it gets a unique ID from a shared counter (only once per thread), and then you combine that ID with a thread-local counter - for example, the bottom 24-bits for the ID, and the top 40-bits for the local counter1. This should be very fast, and more importantly essentially zero contention.
The downside is that the values of the counters won't have any specific relationship among threads (although they may still be strictly increasing within a thread). For example, a thread which has recently requested a counter value may get a much smaller one than a long existing value. You haven't described how you'll use these so I don't know if it is a problem.
Also, you don't have a single place to read the "total" number of counters allocated - you have to examine all the local counters to do that. This is doable if your application requires it (and has some of the same caveats as the LongAdder.sum() function).
A different solution, if you want the numbers to be "generally increasing with time" across threads, and know that every thread requests counter values reasonably frequently, is to use a single global counter, which threads request a local "allocation" of a number of IDs, from which it will then allocate individual IDs in a thread-local manner. For example, threads may request 10 IDs, so that three threads will be allocated the range 0-9, 10-19, and 20-29, etc. They then allocate out of that range until it is exhausted and which point they go back to the global counter. This is similar to how memory allocators carve out chunks of a common pool which can then be allocated thread-local.
The example above will keep the IDs roughly in increasing order over time, and each threads IDs will be strictly increasing as well. It doesn't offer any strict guarantees though: a thread that is allocated the range 0-9, could very well sleep for hours after using 0, and then use "1" when the counters on other threads are much higher. It would reduce contention by a factor of 10.
There are a variety of other approaches you could use and mostof them trade-off contention reduction versus the "accuracy" of the counter assignment versus real time. If you had access to the hardware, you could probably use a quickly incrementing clock like the cycle counter (e.g., rdtscp) and the core ID to get a unique value that is very closely tied to realtime (assuming the OS is synchronizing the counters).
1 The bit-field sizes should be chosen carefully based on the expected number of threads and per-thread increments in your application. In general, if you are constantly creating new threads and your application is long-lived, you may want to err on the side of more bits to the thread ID, since you can always detect a wrap of the local counter and get a new thread ID, so bits allocated to the thread ID can be efficiently shared with the local counters (but not the other way around).
2 The optimal is to use the 'CPU ID', but that's not directly accessible in Java (and even at the assembly level there is no fast and portable way to get it, AFAIK) - so the thread ID is used as a proxy.
3 Where N is lg2(number of counters).

There's a subtle difference between the two implementations.
An AtomicLong holds a single number which every thread will attempt to update. Because of this, as you have already found, only one thread can update this value at a time. The advantage, though, is that the value will always be up-to-date when a get is called, as there will be no adds in progress at that time.
A LongAdder, on the other hand, is made up of multiple values, and each value will be updated by a subset of the threads. This results in less contention when updating the value, however it is possible for sum to have an incomplete value if done while an add is in progress, similar to the scenario you described.
LongAdder is recommended for those cases where you will be doing a bunch of adds in parallel followed by a sum at the end. For your use case, I wrote the following which confirmed that around 1 in 10 sums were be repeated (which renders LongAdder unusable for your use case).
public static void main (String[] args) throws Exception
{
LongAdder adder = new LongAdder();
ExecutorService executor = Executors.newFixedThreadPool(10);
Map<Long, Integer> count = new ConcurrentHashMap<>();
for (int i = 0; i < 10; i++)
{
executor.execute(() -> {
for (int j = 0; j < 1000000; j++)
{
adder.add(1);
count.merge(adder.longValue(), 1, Integer::sum);
}
});
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.HOURS);
count.entrySet().stream().filter(e -> e.getValue() > 1).forEach(System.out::println);
}

Two threads accessing the same ArrayList at the same time?

I have the following code in thread 1:
synchronized (queues.get(currentQueue)) { //line 1
queues.get(currentQueue).add(networkEvent); //line 2
}
and the following in thread 2:
synchronized (queues.get(currentQueue)) {
if (queues.get(currentQueue).size() > 10) {
currentQueue = 1;
}
}
Now to my question: The currentQueue variable currently has the value of 0. When thread 2 changes the value of currentQueue to 1 and thread 1 waits at line 1 (because of the synchronized), does thread 1 then use the updated currentQueue value in line 2 after thread 2 has finished (that's what I want to).

The answer to the question is that it depends. I assume there is other chunk of code that increments the currentQueue variable. This being the case, the lock is happening not at the 'currentQueue' variable and neither is it happening at the collection of 'queues', but rather it is happening on one of the 10 queues (or however many you have) in the 'queues' collection.
Hence, if both threads happen to access the same queue (say queue 5), then the answer to your question is yes. However, for that to happen is one in ten chance (one in x chance, where x = the number or queues in the 'queues' collection). Therefore, if the threads access different queues, then the answer is no.

The correct answer to your question is: The result is undefined.
Your monitor object is queues.get(currentQueue), but since currentQueue is variable, your monitor is variable, therefore the state it is currently in is more or less random. Effectively this code would break eventually.
A simple way to fix it would be a function like this:
protected synchronized QueueType getCurrentQueue() {
return queues.get(currentQueue);
}
However this is still a bad way of implementing the whole thing. You should either try to eliminate the synchronization completely through the use of a concurrent Queue (like ConcurrentLinkedQueue) or work with a lock/final monitor object.
final Object queueLock = new Object();
...
synchronized(queueLock) {
queues.get(currentQueue).add(networkEvent);
}
Note that you will have to use that locking every time you access queues or currentQueue as both define the dataset you are using.

Assuming you have no other thread will change the value of currentQueue, yes Thread 1 will end up using the queue pointed to by the updated value of currentQueue, since you're invoking queues.get(currentQueue) once again in the body of the synchronized block. This however doesn't mean that your synchronization is sound. You actually should synchronize on currentQueue, since it seems to be the shared key to access the current queue.
Also remember when you use synchronize you're synchronizing on the reference of the variable, and not its value. So if you reassign a new object to it, your synchronization doesn't make sense anymore.

How to read unique elements from array per thread?

I have an object based on array, which implements the following interface:
public interface PairSupplier<Q, E> {
public int size();
public Pair<Q, E> get(int index);
}
I would like to create a specific iterator over it:
public boolean hasNext(){
return true;
}
public Pair<Q, E> next(){
//some magic
}
In method next I would like to return some element from PairSupplier.
This element should be unique for thread, other threads should not have this element.
Since PairSupplier has a final size, this situation is not always possible, but I would like to approach it.
The order of elements doesn't matter, thread can take same element at a different time.
Example: 2 Threads, 5 elements - {1,2,3,4,5}
Thread 1 | Thread 2
1 2
3 4
5 1
3 2
4 5
My solution:
I create AtomicInteger index, which I increment on every next call.
PairSupplier pairs;
AtomicInteger index;
public boolean hasNext(){
return true;
}
public Pair<Q, E> next(){
int position = index.incrementAndGet() % pairs.size;
if (position < 0) {
position *= -1;
position = pairs.size - position;
}
return pairs.get(position);
}
pairs and index are shared among all threads.
I found this solution not scalable (because all threads go for increment), maybe someone have better ideas?
This iterator will be used by 50-1000 threads.

Your question details are ambiguous - your example suggests that two threads can be handed the same Pair but you say otherwise in the description.
As the more difficult to achieve, I will offer an Iterable<Pair<Q,E>> that will deliver Pairs one per thread until the supplier cycles - then it will repeat.
public interface Supplier<T> {
public int size();
public T get(int index);
}
public interface PairSupplier<Q, E> extends Supplier<Pair<Q, E>> {
}
public class IterableSupplier<T> implements Iterable<T> {
// The common supplier to use across all threads.
final Supplier<T> supplier;
// The atomic counter.
final AtomicInteger i = new AtomicInteger();
public IterableSupplier(Supplier<T> supplier) {
this.supplier = supplier;
}
#Override
public Iterator<T> iterator() {
/**
* You may create a NEW iterator for each thread while they all share supplier
* and Will therefore distribute each Pair between different threads.
*
* You may also share the same iterator across multiple threads.
*
* No two threads will get the same pair twice unless the sequence cycles.
*/
return new ThreadSafeIterator();
}
private class ThreadSafeIterator implements Iterator<T> {
#Override
public boolean hasNext() {
/**
* Always true.
*/
return true;
}
private int pickNext() {
// Just grab one atomically.
int pick = i.incrementAndGet();
// Reset to zero if it has exceeded - but no spin, let "just someone" manage it.
int actual = pick % supplier.size();
if (pick != actual) {
// So long as someone has a success before we overflow int we're good.
i.compareAndSet(pick, actual);
}
return actual;
}
#Override
public T next() {
return supplier.get(pickNext());
}
#Override
public void remove() {
throw new UnsupportedOperationException("Remove not supported.");
}
}
}
NB: I have adjusted the code a little to accommodate both scenarios. You can take an Iterator per thread or share a single Iterator across threads.

You have a piece of information ("has anyone taken this Pair already?") that must be shared between all threads. So for the general case, you're stuck. However, if you have an idea about this size of your array and the number of threads, you could use buckets to make it less painful.
Let's suppose we know that there will be 1,000,000 array elements and 1,000 threads. Assign each thread a range (thread #1 gets elements 0-999, etc). Now instead of 1,000 threads contending for one AtomicInteger, you can have no contention at all!
That works if you can be sure that all your threads will run at about the same pace. If you need to handle the case where sometimes thread #1 is busy doing other things while thread #2 is idle, you can modify your bucket pattern slightly: each bucket has an AtomicInteger. Now threads will generally only contend with themselves, but if their bucket is empty, they can move on to the next bucket.

I'm having some trouble understanding what the problem you are trying to solve is?
Does each thread process the whole collection?
Is the concern that no two threads can work on the same Pair at the same time? But each thread needs to process each Pair in the collection?
Or do you want the collection processed once by using all of the threads?

There is one key thing which is obscure in your example - what exactly is the meaning this?
The order of elements doesn't matter, thread can take same element at a different time.
"different time" means what? Within N milliseconds of each other? Does it mean that absolutely two threads will never be touching the same Pair at the same time? I will assume that.
If you want to decrease the probability that threads will block on each other contending for the same Pair, and there is a backing array of Pairs, try this:
Partition your array into numPairs / threadCount sub-arrays (you don't have to actually create sub-arrays, just start at different offsets - but it's easier to think about as sub-array)
Assign each thread to a different sub-array; when a thread exhausts its sub-array, increment the index of its sub array
Say we have 6 Pairs and 2 threads - your assignments look like Thread-1:[0,1,2] Thread-2:[3,4,5]. When Thread-1 starts it will be looking at a different set of Pairs than thread 2, so it is unlikely that they will contend for the same pair
If it is important that two threads really not touch a Pair at the same time, then wrap all of the code which touches a Pair object in synchronized(pair) (synchronize on the instance, not the type!) - there may occasionally be blocking, but you're never blocking all threads on a single thing, as with the AtomicInteger - threads can only block each other because they are really trying to touch the same object
Note this is not guaranteed never to block - for that, all threads would have to run at exactly the same speed, and processing every Pair object would have to take exactly the same amount of time, and the OS's thread scheduler would have to never steal time from one thread but not another. You cannot assume any of those things. What this gives you is a higher probability that you will get better concurrency, by dividing the areas to work in and making the smallest unit of state that is shared be the lock.
But this is the usual pattern for getting more concurrency on a data structure - partition the data between threads so that they rarely are touching the same lock at the same time.

The most easy that o see, is create Hash set or Map, and give a unique hash for every thread. After that just do simple get by this hash code.

This is standard java semaphore usage problem. The following javadoc gives almost similar example as your problem. http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Semaphore.html
If you need more help, let me know?

I prefer a lock and release process.
If a thread is asking for a pair object, the Pair object is removed from the supplier. Before the thread is asking for a new pair, the 'old' pair is added the the suplier again.
You can push from front and put at the end.

How atomicity is achieved in the classes defined in java.util.concurrent.atomic package?

I was going through the source code of java.util.concurrent.atomic.AtomicInteger to find out how atomicity is achieved by the atomic operations provided by the class. For instance AtomicInteger.getAndIncrement() method source is as follows
public final int getAndIncrement() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return current;
}
}
I am not able to understand the purpose of writing the sequence of operations inside a infinite for loop. Does it serve any special purpose in Java Memory Model (JMM). Please help me find a descriptive understanding. Thanks in advance.

I am not able to understand the purpose of writing the sequence of operations inside a infinite for loop.
The purpose of this code is to ensure that the volatile field gets updated appropriately without the overhead of a synchronized lock. Unless there are a large number of threads all competing to update this same field, this will most likely spin a very few times to accomplish this.
The volatile keyword provides visibility and memory synchronization guarantees but does not in itself ensure atomic operations with multiple operations (test and set). If you are testing and then setting a volatile field there are race-conditions if multiple threads are trying to perform the same operation at the same time. In this case, if multiple threads are trying to increment the AtomicInteger at the same time, you might miss one of the increments. The concurrent code here uses the spin loop and the compareAndSet underlying methods to make sure that the volatile int is only updated to 4 (for example) if it still is equal to 3.
t1 gets the atomic-int and it is 0.
t2 gets the atomic-int and it is 0.
t1 adds 1 to it
t1 atomically tests to make sure it is 0, it is, and stores 1.
t2 adds 1 to it
t2 atomically tests to make sure it is 0, it is not, so it has to spin and try again.
t2 gets the atomic-int and it is 1.
t2 adds 1 to it
t2 atomically tests to make sure it is 1, it is, and stores 2.
Does it serve any special purpose in Java Memory Model (JMM).
No, it serves the purpose of the class and method definitions and uses the JMM and the language definitions around volatile to achieve its purpose. The JMM defines what the language does with the synchronized, volatile, and other keywords and how multiple threads interact with cached and central memory. This is mostly about native code interactions with operating system and hardware and is rarely, if ever, about Java code.
It is the compareAndSet(...) method which gets closer to the JMM by calling into the Unsafe class which is mostly native methods with some wrappers:
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
}

I am not able to understand the purpose of writing the sequence of
operations inside a infinite for loop.
To understand why it is in an infinite loop I find it helpful to understand what the compareAndSet does and how it may return false.
Atomically sets the value to the given updated value if the current
value == the expected value.
Parameters:
expect - the expected value
update - the new value
Returns:
true if successful. False return indicates that the actual value was not
equal to the expected value
So you read the Returns message and ask how is that possible?
If two threads are invoking incrementAndGet at close to the same time, and they both enter and see the value current == 1. Both threads will create a thread-local next == 2 and try to set via compareAndSet. Only one thread will win as per documented and the thread that loses must try again.
This is how CAS works. You attempt to change the value if you fail, try again, if you succeed then continue on.
Now simply declaring the field as volatile will not work because incrementing is not atomic. So something like this is not safe from the scenario I explained
volatile int count = 0;
public int incrementAndGet(){
return ++count; //may return the same number more than once.
}

Java's compareAndSet is based on CPU compare-and-swap (CAS) instructions see http://en.wikipedia.org/wiki/Compare-and-swap. It compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value.
In case of incrementAndGet we read the current value and call compareAndSet(current, current + 1). If it returns false it means that another thread interfered and changed the current value, which means that our attempt failed and we need to repeat the whole cycle until it succeeds.

Is this java code thread-safe?

I am planning to use this schema in my application, but I was not sure whether this is safe.
To give a little background, a bunch of servers will compute results of sub-tasks that belong to a single task and report them back to the central server. This piece of code is used to register the results, and also check whether all the subtasks for the task has completed and if so, report that fact only once.
The important point is that, all task must be reported once and only once as soon as it is completed (all subTaskResults are set).
Can anybody help? Thank you! (Also, if you have a better idea to solve this problem, please let me know!)
*Note that I simplified the code for brevity.
Solution I
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
Semaphore permission = new Semaphore(1);
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
return check() ? this : null;
}
private boolean check(){
for(AtomicReference ref : subtasks){
if(ref.get()==null){
return false;
}
}//for
return permission.tryAquire();
}
}//class
Stephen C kindly suggested to use a counter. Actually, I have considered that once, but I reasoned that the JVM could reorder the operations and thus, a thread can observe a decremented counter (by another thread) before the result is set in AtomicReference (by that other thread).
*EDIT: I now see this is thread safe. I'll go with this solution. Thanks, Stephen!
Solution II
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
AtomicInteger counter = new AtomicInteger(subtasks.size());
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
//In the actual app, if !compareAndSet(null, result) return null;
return check() ? this : null;
}
private boolean check(){
return counter.decrementAndGet() == 0;
}
}//class

I assume that your use-case is that there are multiple multiple threads calling set, but for any given value of id, the set method will be called once only. I'm also assuming that populateMap creates the entries for all used id values, and that subtasks and permission are really private.
If so, I think that the code is thread-safe.
Each thread should see the initialized state of the subtasks Map, complete with all keys and all AtomicReference references. This state never changes, so subtasks.get(id) will always give the right reference. The set(result) call operates on an AtomicReference, so the subsequent get() method calls in check() will give the most up-to-date values ... in all threads. Any potential races with multiple threads calling check seem to sort themselves out.
However, this is a rather complicated solution. A simpler solution would be to use an concurrent counter; e.g. replace the Semaphore with an AtomicInteger and use decrementAndGet instead of repeatedly scanning the subtasks map in check.
In response to this comment in the updated solution:
Actually, I have considered that once,
but I reasoned that the JVM could
reorder the operations and thus, a
thread can observe a decremented
counter (by another thread) before the
result is set in AtomicReference (by
that other thread).
The AtomicInteger and AtomicReference by definition are atomic. Any thread that tries to access one is guaranteed to see the "current" value at the time of the access.
In this particular case, each thread calls set on the relevant AtomicReference before it calls decrementAndGet on the AtomicInteger. This cannot be reordered. Actions performed by a thread are performed in order. And since these are atomic actions, the efects will be visible to other threads in order as well.
In other words, it should be thread-safe ... AFAIK.

The atomicity guaranteed (per class documentation) explicitly for AtomicReference.compareAndSet extends to set and get methods (per package documentation), so in that regard your code appears to be thread-safe.
I am not sure, however, why you have Semaphore.tryAquire as a side-effect there, but without complimentary code to release the semaphore, that part of your code looks wrong.

The second solution does provide a thread-safe latch, but it's vulnerable to calls to set() that provide an ID that's not in the map -- which would trigger a NullPointerException -- or more than one call to set() with the same ID. The latter would mistakenly decrement the counter too many times and falsely report completion when there are presumably other subtasks IDs for which no result has been submitted. My criticism isn't with regard to the thread safety, but rather to the invariant maintenance; the same flaw would be present even without the thread-related concern.
Another way to solve this problem is with AbstractQueuedSynchronizer, but it's somewhat gratuitous: you can implement a stripped-down counting semaphore, where each call set() would call releaseShared(), decrementing the counter via a spin on compareAndSetState(), and tryAcquireShared() would only succeed when the count is zero. That's more or less what you implemented above with the AtomicInteger, but you'd be reusing a facility that offers more capabilities you can use for other portions of your design.
To flesh out the AbstractQueuedSynchronizer-based solution requires adding one more operation to justify the complexity: being able to wait on the results from all the subtasks to come back, such that the entire task is complete. That's Task#awaitCompletion() and Task#awaitCompletion(long, TimeUnit) in the code below.
Again, it's possibly overkill, but I'll share it for the purpose of discussion.
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.AbstractQueuedSynchronizer;
final class Task
{
private static final class Sync extends AbstractQueuedSynchronizer
{
public Sync(int count)
{
setState(count);
}
#Override
protected int tryAcquireShared(int ignored)
{
return 0 == getState() ? 1 : -1;
}
#Override
protected boolean tryReleaseShared(int ignored)
{
int current;
do
{
current = getState();
if (0 == current)
return true;
}
while (!compareAndSetState(current, current - 1));
return 1 == current;
}
}
public Task(int count)
{
if (count < 0)
throw new IllegalArgumentException();
sync_ = new Sync(count);
}
public boolean set(int id, Object result)
{
// Ensure that "id" refers to an incomplete task. Doing so requires
// additional synchronization over the structure mapping subtask
// identifiers to results.
// Store result somehow.
return sync_.releaseShared(1);
}
public void awaitCompletion()
throws InterruptedException
{
sync_.acquireSharedInterruptibly(0);
}
public void awaitCompletion(long time, TimeUnit unit)
throws InterruptedException
{
sync_.tryAcquireSharedNanos(0, unit.toNanos(time));
}
private final Sync sync_;
}

I have a weird feeling reading your example program, but it depends on the larger structure of your program what to do about that. A set function that also checks for completion is almost a code smell. :-) Just a few ideas.
If you have synchronous communication with your servers you might use an ExecutorService with the same number of threads like the number of servers that do the communication. From this you get a bunch of Futures, and you can naturally proceed with your calculation - the get calls will block at the moment the result is needed but not yet there.
If you have asynchronous communication with the servers you might also use a CountDownLatch after submitting the task to the servers. The await call blocks the main thread until the completion of all subtasks, and other threads can receive the results and call countdown on each received result.
With all these methods you don't need special threadsafety measures other than that the concurrent storing of the results in your structure is threadsafe. And I bet there are even better patterns for this.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.