java concurrency: many writers, one reader

java concurrency: many writers, one reader - java

I need to gather some statistics in my software and i am trying to make it fast and correct, which is not easy (for me!)
first my code so far with two classes, a StatsService and a StatsHarvester
public class StatsService
{
private Map<String, Long> stats = new HashMap<String, Long>(1000);
public void notify ( String key )
{
Long value = 1l;
synchronized (stats)
{
if (stats.containsKey(key))
{
value = stats.get(key) + 1;
}
stats.put(key, value);
}
}
public Map<String, Long> getStats ( )
{
Map<String, Long> copy;
synchronized (stats)
{
copy = new HashMap<String, Long>(stats);
stats.clear();
}
return copy;
}
}
this is my second class, a harvester which collects the stats from time to time and writes them to a database.
public class StatsHarvester implements Runnable
{
private StatsService statsService;
private Thread t;
public void init ( )
{
t = new Thread(this);
t.start();
}
public synchronized void run ( )
{
while (true)
{
try
{
wait(5 * 60 * 1000); // 5 minutes
collectAndSave();
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
}
private void collectAndSave ( )
{
Map<String, Long> stats = statsService.getStats();
// do something like:
// saveRecords(stats);
}
}
At runtime it will have about 30 concurrent running threads each calling notify(key) about 100 times. Only one StatsHarvester is calling statsService.getStats()
So i have many writers and only one reader. it would be nice to have accurate stats but i don't care if some records are lost on high concurrency.
The reader should run every 5 Minutes or whatever is reasonable.
Writing should be as fast as possible. Reading should be fast but if it locks for about 300ms every 5 minutes, its fine.
I've read many docs (Java concurrency in practice, effective java and so on), but i have the strong feeling that i need your advice to get it right.
I hope i stated my problem clear and short enough to get valuable help.
EDIT
Thanks to all for your detailed and helpful answers. As i expected there is more than one way to do it.
I tested most of your proposals (those i understood) and uploaded a test project to google code for further reference (maven project)
http://code.google.com/p/javastats/
I have tested different implementations of my StatsService
HashMapStatsService (HMSS)
ConcurrentHashMapStatsService (CHMSS)
LinkedQueueStatsService (LQSS)
GoogleStatsService (GSS)
ExecutorConcurrentHashMapStatsService (ECHMSS)
ExecutorHashMapStatsService (EHMSS)
and i tested them with x number of Threads each calling notify y times, results are in ms
10,100 10,1000 10,5000 50,100 50,1000 50,5000 100,100 100,1000 100,5000
GSS 1 5 17 7 21 117 7 37 254 Summe: 466
ECHMSS 1 6 21 5 32 132 8 54 249 Summe: 508
HMSS 1 8 45 8 52 233 11 103 449 Summe: 910
EHMSS 1 5 24 7 31 113 8 67 235 Summe: 491
CHMSS 1 2 9 3 11 40 7 26 72 Summe: 171
LQSS 0 3 11 3 16 56 6 27 144 Summe: 266
At this moment i think i will use ConcurrentHashMap, as it offers good performance while it is quite easy to understand.
Thanks for all your input!
Janning

As jack was eluding to you can use the java.util.concurrent library which includes a ConcurrentHashMap and AtomicLong. You can put the AtomicLong in if absent else, you can increment the value. Since AtomicLong is thread safe you will be able to increment the variable without worry about a concurrency issue.
public void notify(String key) {
AtomicLong value = stats.get(key);
if (value == null) {
value = stats.putIfAbsent(key, new AtomicLong(1));
}
if (value != null) {
value.incrementAndGet();
}
}
This should be both fast and thread safe
Edit: Refactored sligthly so there is only at most two lookups.

Why don't you use java.util.concurrent.ConcurrentHashMap<K, V>? It handles everything internally avoiding useless locks on the map and saving you a lot of work: you won't have to care about synchronizations on get and put..
From the documentation:
A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.
You can specify its concurrency level:
The allowed concurrency among update operations is guided by the optional concurrencyLevel constructor argument (default 16), which is used as a hint for internal sizing. The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention. But overestimates and underestimates within an order of magnitude do not usually have much noticeable impact. A value of one is appropriate when it is known that only one thread will modify and all others will only read. Also, resizing this or any other kind of hash table is a relatively slow operation, so, when possible, it is a good idea to provide estimates of expected table sizes in constructors.
As suggested in comments read carefully the documentation of ConcurrentHashMap, especially when it states about atomic or not atomic operations.
To have the guarantee of atomicity you should consider which operations are atomic, from ConcurrentMap interface you will know that:
V putIfAbsent(K key, V value)
V replace(K key, V value)
boolean replace(K key,V oldValue, V newValue)
boolean remove(Object key, Object value)
can be used safely.

I would suggest taking a look at Java's util.concurrent library. I think you can implement this solution a lot cleaner. I don't think you need a map here at all. I would recommend implementing this using the ConcurrentLinkedQueue. Each 'producer' can freely write to this queue without worrying about others. It can put an object on the queue with the data for its statistics.
The harvester can consume the queue continually pulling data off and processsing it. It can then store it however it needs.

Chris Dail's answer looks like a good approach.
Another alternative would be to use a concurrent Multiset. There is one in the Google Collections library. You could use this as follows:
private Multiset<String> stats = ConcurrentHashMultiset.create();
public void notify ( String key )
{
stats.add(key, 1);
}
Looking at the source, this is implemented using a ConcurrentHashMap and using putIfAbsent and the three-argument version of replace to detect concurrent modifications and retry.

A different approach to the problem is to exploit the (trivial) thread safety via thread confinement. Basically create a single background thread that takes care of both reading and writing. It has a pretty good characteristics in terms of scalability and simplicity.
The idea is that instead of all the threads trying to update the data directly, they produce an "update" task for the background thread to process. The same thread can also do the read task, assuming some lags in processing updates is tolerable.
This design is pretty nice because the threads will no longer have to compete for a lock to update data, and since the map is confined to a single thread you can simply use a plain HashMap to do get/put, etc. In terms of implementation, it would mean creating a single threaded executor, and submitting write tasks which may also perform the optional "collectAndSave" operation.
A sketch of code may look like the following:
public class StatsService {
private ExecutorService executor = Executors.newSingleThreadExecutor();
private final Map<String,Long> stats = new HashMap<String,Long>();
public void notify(final String key) {
Runnable r = new Runnable() {
public void run() {
Long value = stats.get(key);
if (value == null) {
value = 1L;
} else {
value++;
}
stats.put(key, value);
// do the optional collectAndSave periodically
if (timeToDoCollectAndSave()) {
collectAndSave();
}
}
};
executor.execute(r);
}
}
There is a BlockingQueue associated with an executor, and each thread that produces a task for the StatsService uses the BlockingQueue. The key point is this: the locking duration for this operation should be much shorter than the locking duration in the original code, so the contention should be much less. Overall it should result in a much better throughput and latency.
Another benefit is that since only one thread reads and writes to the map, plain HashMap and primitive long type can be used (no ConcurrentHashMap or atomic types involved). This also simplifies the code that actually processes it a great deal.
Hope it helps.

Have you looked into ScheduledThreadPoolExecutor? You could use that to schedule your writers, which could all write to a concurrent collection, such as the ConcurrentLinkedQueue mentioned by #Chris Dail. You can have a separately schedule job to read from the Queue as necessary, and the Java SDK should handle pretty much all your concurrency concerns, no manual locking needed.

If we ignore the harvesting part and focus on the writing, the main bottleneck of the program is that the stats are locked at a very coarse level of granularity. If two threads want to update different keys, they must wait.
If you know the set of keys in advance, and can preinitialize the map so that by the time an update thread arrives the key is guaranteed to exist, you would be able to do locking on the accumulator variable instead of the whole map, or use a thread-safe accumulator object.
Instead of implementing this yourself, there are map implementations that are designed specifically for concurrency and do this more fine-grained locking for you.
One caveat though are the stats, since you would need to get locks on all the accumulators at roughly the same time. If you use an existing concurrency-friendly map, there might be a construct for getting a snapshot.

Another alternative for implement both methods using ReentranReadWriteLock. This implementation protects against race conditions at getStats method, if you need to clear the counters. Also it removes the mutable AtomicLong from the getStats an uses an immutable Long.
public class StatsService {
private final Map<String, AtomicLong> stats = new HashMap<String, AtomicLong>(1000);
private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock r = rwl.readLock();
private final Lock w = rwl.writeLock();
public void notify(final String key) {
r.lock();
AtomicLong count = stats.get(key);
if (count == null) {
r.unlock();
w.lock();
count = stats.get(key);
if(count == null) {
count = new AtomicLong();
stats.put(key, count);
}
r.lock();
w.unlock();
}
count.incrementAndGet();
r.unlock();
}
public Map<String, Long> getStats() {
w.lock();
Map<String, Long> copy = new HashMap<String, Long>();
for(Entry<String,AtomicLong> entry : stats.entrySet() ){
copy.put(entry.getKey(), entry.getValue().longValue());
}
stats.clear();
w.unlock();
return copy;
}
}
I hope this helps, any comments are welcome!

Here is how to do it with minimal impact on the performance of the threads being measured. This is the fastest solution possible in Java, without resorting to special hardware registers for performance counting.
Have each thread output its stats independently of the others, that is with no synchronization, to some stats object. Make the field containing the count volatile, so it is memory fenced:
class Stats
{
public volatile long count;
}
class SomeRunnable implements Runnable
{
public void run()
{
doStuff();
stats.count++;
}
}
Have another thread, that holds a reference to all the Stats objects, periodically go around them all and add up the counts across all threads:
public long accumulateStats()
{
long count = previousCount;
for (Stats stat : allStats)
{
count += stat.count;
}
long resultDelta = count - previousCount;
previousCount = count;
return resultDelta;
}
This gatherer thread also needs a sleep() (or some other throttle) added to it. It can periodically output counts/sec to the console for example, to give you a "live" view of how your application is performing.
This avoids the synchronization overhead about as much as you can.
The other trick to consider is padding the Stats objects to 128 (or 256 bytes on SandyBridge or later), so as to keep the different threads counts on different cache lines, or there will be caching contention on the CPU.
When only one thread reads and one writes, you do not need locks or atomics, a volatile is sufficient. There will still be some thread contention, when the stats reader thread interacts with the CPU cache line of the thread being measured. This cannot be avoided, but it is the way to do it with minimal impact on the running thread; read the stats maybe once a second or less.

Related

Thread-safety of unique keys in a HashMap

There have been many discussions on this topic, e.g. here:
What's the difference between ConcurrentHashMap and Collections.synchronizedMap(Map)?
But I haven't found an answer to my specific use-case.
In general, you cannot assume that a HashMap is thread-safe. If write to the same key from different threads at the same time, all hell could break loose. But what if I know that all my threads will have unique keys?
Is this code thread-safe or do I need to add blocking mechanism (or use concurrent map)?
Map<int, String> myMap = new HashMap<>();
for (int i = 1 ; i > 6 ; i++) {
new Thread(() -> {
myMap.put(i, Integer.toString(i));
}).start();
}

The answer is simple: HashMap makes absolutely no thread-safety guarantees at all.
In fact it's explicitly documented that it's not thread-safe:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
So accessing one from multiple threads without any kind of synchronization is a recipe for disaster.
I have seen cases where each thread uses a different key cause issue (like iterations happening at the same time resulting in infinite loops).
Just think of re-hashing: when the threshold is reached, the internal bucket-array needs to be resized. That's a somewhat lengthy operation (compared to a single put). During that time all manner of weird things can happen if another thread tries to put as well (and maybe even triggers a second re-hashing!).
Additionally, there's no reliable way for you to proof that your specific use case is safe, since all tests you could run could just "accidentally" work. In other words: you can never depend on this working, even if you thin k you covered it with unit tests.
And since not everyone is convinced, you can easily test it yourself with this code:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
class HashMapDemonstration {
public static void main(String[] args) throws InterruptedException {
int threadCount = 10;
int valuesPerThread = 1000;
Map<Integer, Integer> map = new HashMap<>();
List<Thread> threads = new ArrayList<>(threadCount);
for (int i = 0; i < threadCount; i++) {
Thread thread = new Thread(new MyUpdater(map, i*valuesPerThread, (i+1)*valuesPerThread - 1));
thread.start();
threads.add(thread);
}
for (Thread thread : threads) {
thread.join();
}
System.out.printf("%d threads with %d values per thread with a %s produced %d entries, should be %d%n",
threadCount, valuesPerThread, map.getClass().getName(), map.size(), threadCount * valuesPerThread);
}
}
class MyUpdater implements Runnable {
private final Map<Integer, Integer> map;
private final int startValue;
private final int endValue;
MyUpdater(Map<Integer, Integer> map, int startValue, int endValue) {
this.map = map;
this.startValue = startValue;
this.endValue = endValue;
System.out.printf("Creating updater for values %d to %d%n", startValue, endValue);
}
#Override
public void run() {
for (int i = startValue; i<= endValue; i++) {
map.put(i, i);
}
}
}
This is exactly the type of program OP mentioned: Each thread will only ever write to keys that no other thread ever touches. And still, the resulting Map will not contain all entries:
Creating updater for values 0 to 999
Creating updater for values 1000 to 1999
Creating updater for values 2000 to 2999
Creating updater for values 3000 to 3999
Creating updater for values 4000 to 4999
Creating updater for values 5000 to 5999
Creating updater for values 6000 to 6999
Creating updater for values 7000 to 7999
Creating updater for values 8000 to 8999
Creating updater for values 9000 to 9999
10 threads with 1000 values per thread with a java.util.HashMap produced 9968 entries, should be 10000
Note that the actual number of entries in the final Map will vary for each run. It even sometimes prints 10000 (because it's not thread-safe!).
Note that this failure mode (losing entries) is definitely not the only possible one: basically anything could happen.

I would like to specifically respond to the phrase.
But what if I know that all my threads will have unique keys?
You are making an assumption about the implementation of the map. The implementation is subject to change. If the implementation is documented not to be thread-safe, you must take into account the Java Memory Model (JMM) that guarantees almost nothing about visibility of memory between threads.
This is making a lot of assumptions and few guarantees. You should not rely on these assumptions, even if it happens to work on your machine, in a specific use-case, at a specific time.
In short: if an implementation that is not thread-safe is used in multiple threads, you MUST surround it with constructs that ensure thread-safety. Always.
However, just for the fun of it, let's describe what can go wrong in your particular case, where each thread only uses a unique key.
When adding or removing a key, even if unique, there are cases when a hash map needs to reorganise internally. The first one is in case of a hash-collision,1 in which a linked list of key-value entries must be updated. The second one is where the map decides to resize its internal entry table. That overhauls the internal structure including the mentioned linked lists.
Because of the JMM it is largely not guaranteed what another thread sees of the reorganisation. That means that behaviour is undefined if another threads happens to be in the middle of a get(key) when the reorganisation happens. If another thread is concurrently doing a put(key,value), you could end up with two threads trying to resize the map at the same time. Frankly, I do not even want to think what mayhem that can cause!
1 Multiple keys can have the same hash-code. Because the map has no limitless storage, the hash-code are often also wrapped around with the size of the internal table of entries, like (hashCode % sizeOfTable), which can lead to a situation where different hash-codes utilize the same "entry".

How is LongAccumulator implemented, so that it is more efficient?

I understand that the new Java (8) has introduced new sychronization tools such as LongAccumulator (under the atomic package).
In the documentation it says that the LongAccumulator is more efficient when the variable update from several threads is frequent.
I wonder how is it implemented to be more efficient?

That's a very good question, because it shows a very important characteristic of concurrent programming with shared memory. Before going into details, I have to make a step back. Take a look at the following class:
class Accumulator {
private final AtomicLong value = new AtomicLong(0);
public void accumulate(long value) {
this.value.addAndGet(value);
}
public long get() {
return this.value.get();
}
}
If you create one instance of this class and invoke the method accumulate(1) from one thread in a loop, then the execution will be really fast. However, if you invoke the method on the same instance from two threads, the execution will be about two magnitudes slower.
You have to take a look at the memory architecture to understand what happens. Most systems nowadays have a non-uniform memory access. In particular, each core has its own L1 cache, which is typically structured into cache lines with 64 octets. If a core executes an atomic increment operation on a memory location, it first has to get exclusive access to the corresponding cache line. That's expensive, if it has no exclusive access yet, due to the required coordination with all other cores.
There's a simple and counter-intuitive trick to solve this problem. Take a look at the following class:
class Accumulator {
private final AtomicLong[] values = {
new AtomicLong(0),
new AtomicLong(0),
new AtomicLong(0),
new AtomicLong(0),
};
public void accumulate(long value) {
int index = getMagicValue();
this.values[index % values.length].addAndGet(value);
}
public long get() {
long result = 0;
for (AtomicLong value : values) {
result += value.get();
}
return result;
}
}
At first glance, this class seems to be more expensive due to the additional operations. However, it might be several times faster than the first class, because it has a higher probability, that the executing core already has exclusive access to the required cache line.
To make this really fast, you have to consider a few more things:
The different atomic counters should be located on different cache lines. Otherwise you replace one problem with another, namely false sharing. In Java you can use a long[8 * 4] for that purpose, and only use the indexes 0, 8, 16 and 24.
The number of counters have to be chosen wisely. If there are too few different counters, there are still too many cache switches. if there are too many counters, you waste space in the L1 caches.
The method getMagicValue should return a value with an affinity to the core id.
To sum up, LongAccumulator is more efficient for some use cases, because it uses redundant memory for frequently used write operations, in order to reduce the number of times, that cache lines have to be exchange between cores. On the other hand, read operations are slightly more expensive, because they have to create a consistent result.

by this
http://codenav.org/code.html?project=/jdk/1.8.0-ea&path=/Source%20Packages/java.util.concurrent.atomic/LongAccumulator.java
it looks like a spin lock.

How to optimize concurrent operations in Java?

I'm still quite shaky on multi-threading in Java. What I describe here is at the very heart of my application and I need to get this right. The solution needs to work fast and it needs to be practically safe. Will this work? Any suggestions/criticism/alternative solutions welcome.
Objects used within my application are somewhat expensive to generate but change rarely, so I am caching them in *.temp files. It is possible for one thread to try and retrieve a given object from cache, while another is trying to update it there. Cache operations of retrieve and store are encapsulated within a CacheService implementation.
Consider this scenario:
Thread 1: retrieve cache for objectId "page_1".
Thread 2: update cache for objectId "page_1".
Thread 3: retrieve cache for objectId "page_2".
Thread 4: retrieve cache for objectId "page_3".
Thread 5: retrieve cache for objectId "page_4".
Note: thread 1 appears to retrieve an obsolete object, because thread 2 has a newer copy of it. This is perfectly OK so I do not need any logic that will give thread 2 priority.
If I synchronize retrieve/store methods on my service, then I'm unnecessarily slowing things down for threads 3, 4 and 5. Multiple retrieve operations will be effective at any given time but the update operation will be called rarely. This is why I want to avoid method synchronization.
I gather I need to synchronize on an object that is exclusively common to thread 1 and 2, which implies a lock object registry. Here, an obvious choice would be a Hashtable but again, operations on Hashtable are synchronized, so I'm trying a HashMap. The map stores a string object to be used as a lock object for synchronization and the key/value would be the id of the object being cached. So for object "page_1" the key would be "page_1" and the lock object would be a string with a value of "page_1".
If I've got the registry right, then additionally I want to protect it from being flooded with too many entries. Let's not get into details why. Let's just assume, that if the registry has grown past defined limit, it needs to be reinitialized with 0 elements. This is a bit of a risk with an unsynchronized HashMap but this flooding would be something that is outside of normal application operation. It should be a very rare occurrence and hopefully never takes place. But since it is possible, I want to protect myself from it.
#Service
public class CacheServiceImpl implements CacheService {
private static ConcurrentHashMap<String, String> objectLockRegistry=new ConcurrentHashMap<>();
public Object getObject(String objectId) {
String objectLock=getObjectLock(objectId);
if(objectLock!=null) {
synchronized(objectLock) {
// read object from objectInputStream
}
}
public boolean storeObject(String objectId, Object object) {
String objectLock=getObjectLock(objectId);
synchronized(objectLock) {
// write object to objectOutputStream
}
}
private String getObjectLock(String objectId) {
int objectLockRegistryMaxSize=100_000;
// reinitialize registry if necessary
if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
// hoping to never reach this point but it is not impossible to get here
synchronized(objectLockRegistry) {
if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
objectLockRegistry.clear();
}
}
}
// add lock to registry if necessary
objectLockRegistry.putIfAbsent(objectId, new String(objectId));
String objectLock=objectLockRegistry.get(objectId);
return objectLock;
}

If you are reading from disk, lock contention is not going to be your performance issue.
You can have both threads grab the lock for the entire cache, do a read, if the value is missing, release the lock, read from disk, acquire the lock, and then if the value is still missing write it, otherwise return the value that is now there.
The only issue you will have with that is the concurrent read trashing the disk... but the OS caches will be hot, so the disk shouldn't be overly trashed.
If that is an issue then switch your cache to holding a Future<V> in place of a <V>.
The get method will become something like:
public V get(K key) {
Future<V> future;
synchronized(this) {
future = backingCache.get(key);
if (future == null) {
future = executorService.submit(new LoadFromDisk(key));
backingCache.put(key, future);
}
}
return future.get();
}
Yes that is a global lock... but you're reading from disk, and don't optimize until you have a proved performance bottleneck...
Oh. First optimization, replace the map with a ConcurrentHashMap and use putIfAbsent and you'll have no lock at all! (BUT only do that when you know this is an issue)

The complexity of your scheme has already been discussed. That leads to hard to find bugs. For example, not only do you lock on non-final variables, but you even change them in the middle of synchronized blocks that use them as a lock. Multi-threading is very hard to reason about, this kind of code makes it almost impossible:
synchronized(objectLockRegistry) {
if(objectLockRegistry.size() > objectLockRegistryMaxSize) {
objectLockRegistry = new HashMap<>(); //brrrrrr...
}
}
In particular, 2 simultaneous calls to get a lock on a specific string might actually return 2 different instances of the same string, each stored in a different instance of your hashmap (unless they are interned), and you won't be locking on the same monitor.
You should either use an existing library or keep it a lot simpler.

If your question includes the keywords "optimize", "concurrent", and your solution includes a complicated locking scheme ... you're doing it wrong. It is possible to succeed at this sort of venture, but the odds are stacked against you. Prepare to diagnose bizarre concurrency bugs, including but not limited to, deadlock, livelock, cache incoherency... I can spot multiple unsafe practices in your example code.
Pretty much the only way to create a safe and effective concurrent algorithm without being a concurrency god is to take one of the pre-baked concurrent classes and adapt them to your need. It's just too hard to do unless you have an exceptionally convincing reason.
You might take a look at ConcurrentMap. You might also like CacheBuilder.

Using Threads and synchronize directly is covered by the beginning of most tutorials about multithreading and concurrency. However, many real-world examples require more sophisticated locking and concurrency schemes, which are cumbersome and error prone if you implement them yourself. To prevent reinventing the wheel over an over again, the Java concurrency library was created. There, you can find many classes that will be of great help to you. Try googling for tutorials about java concurrency and locks.
As an example for a lock which might help you, see http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html .

Rather than roll your own cache I would take a look at Google's MapMaker. Something like this will give you a lock cache that automatically expires unused entries as they are garbage collected:
ConcurrentMap<String,String> objectLockRegistry = new MapMaker()
.softValues()
.makeComputingMap(new Function<String,String> {
public String apply(String s) {
return new String(s);
});
With this, the whole getObjectLock implementation is simply return objectLockRegistry.get(objectId) - the map takes care of all the "create if not already present" stuff for you in a safe way.

I Would do it similar, to you: just create a map of Object (new Object()).
But in difference to you i would use TreeMap<String, Object>
or HashMap
You call that the lockMap. One entry per file to lock. The lockMap is public available to all participating threads.
Each read and write to a specific file, gets the lock from the map. And uses syncrobize(lock) on that lock object.
If the lockMap is not fixed, and its content chan change, then reading and writing to the map must syncronized, too. (syncronized (this.lockMap) {....})
But your getObjectLock() is not safe, sync that all with your lock. (Double checked lockin is in Java not thread safe!) A recomended book: Doug Lea, Concurrent Programming in Java

Lock-free guard for synchronized acquire/release

I have a shared tempfile resource that is divided into chunks of 4K (or some such value). Each 4K in the file is represented by an index starting from zero. For this shared resource, I track the 4K chunk indices in use and always return the lowest indexed 4K chunk not in use, or -1 if all are in use.
This ResourceSet class for the indices has a public acquire and release method, both of which use synchronized lock whose duration is about like that of generating 4 random numbers (expensive, cpu-wise).
Therefore as you can see from the code that follows, I use an AtomicInteger "counting semaphore" to prevent a large number of threads from entering the critical section at the same time on acquire(), returning -1 (not available right now) if there are too many threads.
Currently, I am using a constant of 100 for the tight CAS loop to try to increment the atomic integer in acquire, and a constant of 10 for the maximum number of threads to then allow into the critical section, which is long enough to create contention. My question is, what should these constants be for a moderate to highly loaded servlet engine that has several threads trying to get access to these 4K chunks?
public class ResourceSet {
// ??? what should this be
// maximum number of attempts to try to increment with CAS on acquire
private static final int CAS_MAX_ATTEMPTS = 50;
// ??? what should this be
// maximum number of threads contending for lock before returning -1 on acquire
private static final int CONTENTION_MAX = 10;
private AtomicInteger latch = new AtomicInteger(0);
... member variables to track free resources
private boolean aquireLatchForAquire ()
{
for (int i = 0; i < CAS_MAX_ATTEMPTS; i++) {
int val = latch.get();
if (val == -1)
throw new AssertionError("bug in ResourceSet"); // this means more threads than can exist on any system, so its a bug!
if (!latch.compareAndSet(val, val+1))
continue;
if (val < 0 || val >= CONTENTION_MAX) {
latch.decrementAndGet();
// added to fix BUG that comment pointed out, thanks!
return false;
}
}
return false;
}
private void aquireLatchForRelease ()
{
do {
int val = latch.get();
if (val == -1)
throw new AssertionError("bug in ResourceSet"); // this means more threads than can exist on any system, so its a bug!
if (latch.compareAndSet(val, val+1))
return;
} while (true);
}
public ResourceSet (int totalResources)
{
... initialize
}
public int acquire (ResourceTracker owned)
{
if (!aquireLatchForAquire())
return -1;
try {
synchronized (this) {
... algorithm to compute minimum free resoource or return -1 if all in use
return resourceindex;
}
} finally {
latch.decrementAndGet();
}
}
public boolean release (ResourceIter iter)
{
aquireLatchForRelease();
try {
synchronized (this) {
... iterate and release all resources
}
} finally {
latch.decrementAndGet();
}
}
}

Writting a good and performant spinlock is actually pretty complicated and requires a good understanding of memory barriers. Merely picking a constant is not going to cut it and will definitely not be portable. Google's gperftools has an example that you can look at but is probably way more complicated then what you'd need.
If you really want to reduce contention on the lock, you might want to consider using a more fine-grained and optimistic scheme. A simple one could be to divide your chunks into n groups and associate a lock with each group (also called stripping). This will help reduce contention and increase throughput but it won't help reduce latency. You could also associate an AtomicBoolean to each chunk and CAS to acquire it (retry in case of failure). Do be careful when dealing with lock-free algorithms because they tend to be tricky to get right. If you do get it right, it could considerably reduce the latency of acquiring a chunk.
Note that it's difficult to propose a more fine-grained approach without knowing what your chunk selection algorithm looks like. I also assume that you really do have a performance problem (it's been profiled and everything).
While I'm at it, your spinlock implementation is flawed. You should never spin directly on a CAS because you're spamming memory barriers. This will be incredibly slow with any serious amount of contention (related to the thundering-herd problem). A minimum would be to first check the variable for availability before your CAS (simple if on a no barrier read will do). Even better would be to not have all your threads spinning on the same value. This should avoid the associated cache-line from ping-pong-ing between your cores.
Note that I don't know what type of memory barriers are associated with atomic ops in Java so my above suggestions might not be optimal or correct.
Finally, The Art Of Multiprocessor Programming is a fun book to read to get better acquainted with all the non-sense I've been spewing in this answer.

I'm not sure if it's necessary to forge your own Lock class for this scenario. As JDK provided ReentrantLock, which also leverage CAS instruction during lock acquire. The performance should be pretty good when compared with your personal Lock class.

You can use Semaphore's tryAcquire method if you want your threads to balk on no resource available.
I for one would simply substitute your synchronized keyword with a ReentrantLock and use the tryLock() method on it. If you want to let your threads wait a bit, you can use tryLock(timeout) on the same class. Which one to choose and what value to use for timeout, needs to be determined by way of a performance test.
Creating an explicit gate seems as you seem to be doing seems unnecessary to me. I'm not saying that it can never help, but IMO it's more likely to actually hurt performance, and it's an added complication for sure. So unless you have an performance issue around here (based on a test you did) and you found that this kind of gating helps, I'd recommend to go with the simplest implementation.

Is this java code thread-safe?

I am planning to use this schema in my application, but I was not sure whether this is safe.
To give a little background, a bunch of servers will compute results of sub-tasks that belong to a single task and report them back to the central server. This piece of code is used to register the results, and also check whether all the subtasks for the task has completed and if so, report that fact only once.
The important point is that, all task must be reported once and only once as soon as it is completed (all subTaskResults are set).
Can anybody help? Thank you! (Also, if you have a better idea to solve this problem, please let me know!)
*Note that I simplified the code for brevity.
Solution I
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
Semaphore permission = new Semaphore(1);
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
return check() ? this : null;
}
private boolean check(){
for(AtomicReference ref : subtasks){
if(ref.get()==null){
return false;
}
}//for
return permission.tryAquire();
}
}//class
Stephen C kindly suggested to use a counter. Actually, I have considered that once, but I reasoned that the JVM could reorder the operations and thus, a thread can observe a decremented counter (by another thread) before the result is set in AtomicReference (by that other thread).
*EDIT: I now see this is thread safe. I'll go with this solution. Thanks, Stephen!
Solution II
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
AtomicInteger counter = new AtomicInteger(subtasks.size());
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
//In the actual app, if !compareAndSet(null, result) return null;
return check() ? this : null;
}
private boolean check(){
return counter.decrementAndGet() == 0;
}
}//class

I assume that your use-case is that there are multiple multiple threads calling set, but for any given value of id, the set method will be called once only. I'm also assuming that populateMap creates the entries for all used id values, and that subtasks and permission are really private.
If so, I think that the code is thread-safe.
Each thread should see the initialized state of the subtasks Map, complete with all keys and all AtomicReference references. This state never changes, so subtasks.get(id) will always give the right reference. The set(result) call operates on an AtomicReference, so the subsequent get() method calls in check() will give the most up-to-date values ... in all threads. Any potential races with multiple threads calling check seem to sort themselves out.
However, this is a rather complicated solution. A simpler solution would be to use an concurrent counter; e.g. replace the Semaphore with an AtomicInteger and use decrementAndGet instead of repeatedly scanning the subtasks map in check.
In response to this comment in the updated solution:
Actually, I have considered that once,
but I reasoned that the JVM could
reorder the operations and thus, a
thread can observe a decremented
counter (by another thread) before the
result is set in AtomicReference (by
that other thread).
The AtomicInteger and AtomicReference by definition are atomic. Any thread that tries to access one is guaranteed to see the "current" value at the time of the access.
In this particular case, each thread calls set on the relevant AtomicReference before it calls decrementAndGet on the AtomicInteger. This cannot be reordered. Actions performed by a thread are performed in order. And since these are atomic actions, the efects will be visible to other threads in order as well.
In other words, it should be thread-safe ... AFAIK.

The atomicity guaranteed (per class documentation) explicitly for AtomicReference.compareAndSet extends to set and get methods (per package documentation), so in that regard your code appears to be thread-safe.
I am not sure, however, why you have Semaphore.tryAquire as a side-effect there, but without complimentary code to release the semaphore, that part of your code looks wrong.

The second solution does provide a thread-safe latch, but it's vulnerable to calls to set() that provide an ID that's not in the map -- which would trigger a NullPointerException -- or more than one call to set() with the same ID. The latter would mistakenly decrement the counter too many times and falsely report completion when there are presumably other subtasks IDs for which no result has been submitted. My criticism isn't with regard to the thread safety, but rather to the invariant maintenance; the same flaw would be present even without the thread-related concern.
Another way to solve this problem is with AbstractQueuedSynchronizer, but it's somewhat gratuitous: you can implement a stripped-down counting semaphore, where each call set() would call releaseShared(), decrementing the counter via a spin on compareAndSetState(), and tryAcquireShared() would only succeed when the count is zero. That's more or less what you implemented above with the AtomicInteger, but you'd be reusing a facility that offers more capabilities you can use for other portions of your design.
To flesh out the AbstractQueuedSynchronizer-based solution requires adding one more operation to justify the complexity: being able to wait on the results from all the subtasks to come back, such that the entire task is complete. That's Task#awaitCompletion() and Task#awaitCompletion(long, TimeUnit) in the code below.
Again, it's possibly overkill, but I'll share it for the purpose of discussion.
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.AbstractQueuedSynchronizer;
final class Task
{
private static final class Sync extends AbstractQueuedSynchronizer
{
public Sync(int count)
{
setState(count);
}
#Override
protected int tryAcquireShared(int ignored)
{
return 0 == getState() ? 1 : -1;
}
#Override
protected boolean tryReleaseShared(int ignored)
{
int current;
do
{
current = getState();
if (0 == current)
return true;
}
while (!compareAndSetState(current, current - 1));
return 1 == current;
}
}
public Task(int count)
{
if (count < 0)
throw new IllegalArgumentException();
sync_ = new Sync(count);
}
public boolean set(int id, Object result)
{
// Ensure that "id" refers to an incomplete task. Doing so requires
// additional synchronization over the structure mapping subtask
// identifiers to results.
// Store result somehow.
return sync_.releaseShared(1);
}
public void awaitCompletion()
throws InterruptedException
{
sync_.acquireSharedInterruptibly(0);
}
public void awaitCompletion(long time, TimeUnit unit)
throws InterruptedException
{
sync_.tryAcquireSharedNanos(0, unit.toNanos(time));
}
private final Sync sync_;
}

I have a weird feeling reading your example program, but it depends on the larger structure of your program what to do about that. A set function that also checks for completion is almost a code smell. :-) Just a few ideas.
If you have synchronous communication with your servers you might use an ExecutorService with the same number of threads like the number of servers that do the communication. From this you get a bunch of Futures, and you can naturally proceed with your calculation - the get calls will block at the moment the result is needed but not yet there.
If you have asynchronous communication with the servers you might also use a CountDownLatch after submitting the task to the servers. The await call blocks the main thread until the completion of all subtasks, and other threads can receive the results and call countdown on each received result.
With all these methods you don't need special threadsafety measures other than that the concurrent storing of the results in your structure is threadsafe. And I bet there are even better patterns for this.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.