Thread-safety of unique keys in a HashMap

Thread-safety of unique keys in a HashMap - java

There have been many discussions on this topic, e.g. here:
What's the difference between ConcurrentHashMap and Collections.synchronizedMap(Map)?
But I haven't found an answer to my specific use-case.
In general, you cannot assume that a HashMap is thread-safe. If write to the same key from different threads at the same time, all hell could break loose. But what if I know that all my threads will have unique keys?
Is this code thread-safe or do I need to add blocking mechanism (or use concurrent map)?
Map<int, String> myMap = new HashMap<>();
for (int i = 1 ; i > 6 ; i++) {
new Thread(() -> {
myMap.put(i, Integer.toString(i));
}).start();
}

The answer is simple: HashMap makes absolutely no thread-safety guarantees at all.
In fact it's explicitly documented that it's not thread-safe:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
So accessing one from multiple threads without any kind of synchronization is a recipe for disaster.
I have seen cases where each thread uses a different key cause issue (like iterations happening at the same time resulting in infinite loops).
Just think of re-hashing: when the threshold is reached, the internal bucket-array needs to be resized. That's a somewhat lengthy operation (compared to a single put). During that time all manner of weird things can happen if another thread tries to put as well (and maybe even triggers a second re-hashing!).
Additionally, there's no reliable way for you to proof that your specific use case is safe, since all tests you could run could just "accidentally" work. In other words: you can never depend on this working, even if you thin k you covered it with unit tests.
And since not everyone is convinced, you can easily test it yourself with this code:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
class HashMapDemonstration {
public static void main(String[] args) throws InterruptedException {
int threadCount = 10;
int valuesPerThread = 1000;
Map<Integer, Integer> map = new HashMap<>();
List<Thread> threads = new ArrayList<>(threadCount);
for (int i = 0; i < threadCount; i++) {
Thread thread = new Thread(new MyUpdater(map, i*valuesPerThread, (i+1)*valuesPerThread - 1));
thread.start();
threads.add(thread);
}
for (Thread thread : threads) {
thread.join();
}
System.out.printf("%d threads with %d values per thread with a %s produced %d entries, should be %d%n",
threadCount, valuesPerThread, map.getClass().getName(), map.size(), threadCount * valuesPerThread);
}
}
class MyUpdater implements Runnable {
private final Map<Integer, Integer> map;
private final int startValue;
private final int endValue;
MyUpdater(Map<Integer, Integer> map, int startValue, int endValue) {
this.map = map;
this.startValue = startValue;
this.endValue = endValue;
System.out.printf("Creating updater for values %d to %d%n", startValue, endValue);
}
#Override
public void run() {
for (int i = startValue; i<= endValue; i++) {
map.put(i, i);
}
}
}
This is exactly the type of program OP mentioned: Each thread will only ever write to keys that no other thread ever touches. And still, the resulting Map will not contain all entries:
Creating updater for values 0 to 999
Creating updater for values 1000 to 1999
Creating updater for values 2000 to 2999
Creating updater for values 3000 to 3999
Creating updater for values 4000 to 4999
Creating updater for values 5000 to 5999
Creating updater for values 6000 to 6999
Creating updater for values 7000 to 7999
Creating updater for values 8000 to 8999
Creating updater for values 9000 to 9999
10 threads with 1000 values per thread with a java.util.HashMap produced 9968 entries, should be 10000
Note that the actual number of entries in the final Map will vary for each run. It even sometimes prints 10000 (because it's not thread-safe!).
Note that this failure mode (losing entries) is definitely not the only possible one: basically anything could happen.

I would like to specifically respond to the phrase.
But what if I know that all my threads will have unique keys?
You are making an assumption about the implementation of the map. The implementation is subject to change. If the implementation is documented not to be thread-safe, you must take into account the Java Memory Model (JMM) that guarantees almost nothing about visibility of memory between threads.
This is making a lot of assumptions and few guarantees. You should not rely on these assumptions, even if it happens to work on your machine, in a specific use-case, at a specific time.
In short: if an implementation that is not thread-safe is used in multiple threads, you MUST surround it with constructs that ensure thread-safety. Always.
However, just for the fun of it, let's describe what can go wrong in your particular case, where each thread only uses a unique key.
When adding or removing a key, even if unique, there are cases when a hash map needs to reorganise internally. The first one is in case of a hash-collision,1 in which a linked list of key-value entries must be updated. The second one is where the map decides to resize its internal entry table. That overhauls the internal structure including the mentioned linked lists.
Because of the JMM it is largely not guaranteed what another thread sees of the reorganisation. That means that behaviour is undefined if another threads happens to be in the middle of a get(key) when the reorganisation happens. If another thread is concurrently doing a put(key,value), you could end up with two threads trying to resize the map at the same time. Frankly, I do not even want to think what mayhem that can cause!
1 Multiple keys can have the same hash-code. Because the map has no limitless storage, the hash-code are often also wrapped around with the size of the internal table of entries, like (hashCode % sizeOfTable), which can lead to a situation where different hash-codes utilize the same "entry".

Related

Why hashmap is not thread safe？ [duplicate]

This question already has answers here:
How to prove that HashMap in java is not thread-safe
(12 answers)
Closed 4 years ago.
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.*;
public class TestLock {
private static ExecutorService executor = Executors.newCachedThreadPool();
private static Map<Integer, Integer> map = new HashMap<>(1000000);
private static CountDownLatch doneSignal = new CountDownLatch(1000);
public static void main(String[] args) throws Exception {
for (int i = 0; i < 1000; i++) {
final int j = i;
executor.execute(new Runnable() {
#Override
public void run() {
map.put(j, j);
doneSignal.countDown();
}
});
}
doneSignal.await();
System.out.println("done,size:" + map.size());
}
}
Some people say that hashmap insertion is not safe when concurrency. Because the hashmap will perform capacity expansion operations, but I set the size here to 1000000, which will only expand at 750,000. I am doing 1000 inserts here, so I won't expand it. So there should be no problem. But the result is always less than 1000, what went wrong?

Why hashmap is not thread safe？
Because the javadocs say so. See below.
You stated:
Some people say that hashmap insertion is not safe when concurrency.
It is not just "some people"1. The javadocs state this clearly:
"Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification."
You asked:
I am doing 1000 inserts here, so I won't expand it. So there should be no problem. But the result is always less than 1000, what went wrong?
It is not just expansion of the hash array that you need to think about. It is not just insertion. Any operation that performs a structural modification on the HashMap needs to be synchronized ... or you may get unspecified behavior2.
And that is what you got.
1 - I strongly recommend that you do not rely on your intuition or on what "some people" say. Instead, take the time to read and understand the relevant specifications; i.e. the javadocs and the Java Language Specification.
2 - In this example, you can easily see why you get unspecified behavior by reading the HashMap source code. For instance in the OpenJDK Java 11 source code, size() is not synchronized, and it returns the value of the private transient int size field. This is not thread-safe. When other threads add or remove map entries they will update size, the thread calling size() is liable to get a stale value.

"Because the hashmap will perform capacity expansion operations" is not only reason why HashMap is not thread safe.
You have to refer to Java Memory Model to understand what guarantee it can offer.
One of such guarantee is visibility. This mean that changes made in one thread may not be visible in other threads unless specific conditions are meet.

Well the Question title is not really describing what you are asking for. Anyways,
Here you have set the Capacity to 1000000. Not the size.
Capacity : Initially how many slots to have in this hashmap. Basically
Empty Slots.
Size : Number of Elements filled in the Map.
So even though you set the capacity to 1000000, you don't have that many elements at the end. So the number of elements filled in the map will be returned through .size() method. Its nothing related to a concurrent issue. And yes HashMap is not thread safe due to several reasons.

If you see the implementation of 'put' in HashMap class here, no where 'synchronize' is used although it does many thread-unsafe operations like creation of TreeNode if key's hash is not found, increment the modCount etc.
ConcurrentHashMap would be suitable for your use case

If you need a thread-safe HashMap, you may use the Hashtable class instead.
Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable, says the javadoc about Hashtable.
Same, if you need a thread-safe ArrayList one day, use Vector.
EDIT : Oh, I suggested the wrong way to do. My appologies !
comments suggest better solutions than mine :
Collections.synchronizedXxx() or ConcurrentHashMap, that worked for the opener of this question.

How is LongAccumulator implemented, so that it is more efficient?

I understand that the new Java (8) has introduced new sychronization tools such as LongAccumulator (under the atomic package).
In the documentation it says that the LongAccumulator is more efficient when the variable update from several threads is frequent.
I wonder how is it implemented to be more efficient?

That's a very good question, because it shows a very important characteristic of concurrent programming with shared memory. Before going into details, I have to make a step back. Take a look at the following class:
class Accumulator {
private final AtomicLong value = new AtomicLong(0);
public void accumulate(long value) {
this.value.addAndGet(value);
}
public long get() {
return this.value.get();
}
}
If you create one instance of this class and invoke the method accumulate(1) from one thread in a loop, then the execution will be really fast. However, if you invoke the method on the same instance from two threads, the execution will be about two magnitudes slower.
You have to take a look at the memory architecture to understand what happens. Most systems nowadays have a non-uniform memory access. In particular, each core has its own L1 cache, which is typically structured into cache lines with 64 octets. If a core executes an atomic increment operation on a memory location, it first has to get exclusive access to the corresponding cache line. That's expensive, if it has no exclusive access yet, due to the required coordination with all other cores.
There's a simple and counter-intuitive trick to solve this problem. Take a look at the following class:
class Accumulator {
private final AtomicLong[] values = {
new AtomicLong(0),
new AtomicLong(0),
new AtomicLong(0),
new AtomicLong(0),
};
public void accumulate(long value) {
int index = getMagicValue();
this.values[index % values.length].addAndGet(value);
}
public long get() {
long result = 0;
for (AtomicLong value : values) {
result += value.get();
}
return result;
}
}
At first glance, this class seems to be more expensive due to the additional operations. However, it might be several times faster than the first class, because it has a higher probability, that the executing core already has exclusive access to the required cache line.
To make this really fast, you have to consider a few more things:
The different atomic counters should be located on different cache lines. Otherwise you replace one problem with another, namely false sharing. In Java you can use a long[8 * 4] for that purpose, and only use the indexes 0, 8, 16 and 24.
The number of counters have to be chosen wisely. If there are too few different counters, there are still too many cache switches. if there are too many counters, you waste space in the L1 caches.
The method getMagicValue should return a value with an affinity to the core id.
To sum up, LongAccumulator is more efficient for some use cases, because it uses redundant memory for frequently used write operations, in order to reduce the number of times, that cache lines have to be exchange between cores. On the other hand, read operations are slightly more expensive, because they have to create a consistent result.

by this
http://codenav.org/code.html?project=/jdk/1.8.0-ea&path=/Source%20Packages/java.util.concurrent.atomic/LongAccumulator.java
it looks like a spin lock.

is the below code not thread safe?

What's wrong with this below code?.
private Map<Integer, Integer> aMap = new ConcurrentHashMap<Integer, Integer>();
Record rec = records.get(id);
if (rec == null) {
rec = new Record(id);
records.put(id, rec);
}
return rec;
Is the above code not Thread-safe?. Why should i use putIfAbsent here in this case?.
Locking is applied only for updates. In case of of retrievals, it
allows full concurrency. What does this statement mean?.

It's not thread safe.
If there was another thread, then in the time between records.get and records.put the other thread might have put the record as well.
Read only operations (i.e. ones that do not modify a structure) can be done by multiple threads at the same time. For example, 1000 threads can safely read the value of an int. However, those 1000 threads cannot update the value of the int without some sort of locking operation.
I know that this may sound like a very unlikely event, but remember that a 1 in a million event happens 1000 times per second at 1GHz.
This is thread safe:
private Map<Integer, Integer> aMap = new ConcurrentHashMap<Integer, Integer>();
// presumably aMap is a member and the code below is in a function
aMap.putIfAbsent(id, new Record(id))
Record rec = records.get(id);
return rec;
Note that this might create a Record and never use it.

It could or could not be thread-safe, depending on how you want it to act.
By the end of the code, aMap will safely have a Record for id. However, it's possible that two threads will both create and put a Record in, such that there are two (or more, if more threads do it) Records in existence. That might be fine, and it might not be -- really depends on your application.
One of the dangers of thread-safety (for instance, if you use a normal HashMap without synchronization) is that threads can read partially-created or partially-updated objects across threads; in other words, things can go really haywire. This will not happen in your code, because ConcurrentHashMap will ensure memory is kept up-to-date between threads, and in that sense it is thread-safe.
One thing you can do is to use putIfAbsent, which will atomically put a key-value pair into the map, but only if there's nothing at that key already:
if (rec == null) {
records.putIfAbsent(id, new Record(id));
rec = records.get(id);
}
In this approach, you might create a second Record object, but if so, it'll not get inserted and will immediately be available for garbage collection. By the end of the snippet:
records will contain a Record for the given id
only one Record will have ever been put into records for that id (whether put there by this thread or another)
rec will point to that record

How to read unique elements from array per thread?

I have an object based on array, which implements the following interface:
public interface PairSupplier<Q, E> {
public int size();
public Pair<Q, E> get(int index);
}
I would like to create a specific iterator over it:
public boolean hasNext(){
return true;
}
public Pair<Q, E> next(){
//some magic
}
In method next I would like to return some element from PairSupplier.
This element should be unique for thread, other threads should not have this element.
Since PairSupplier has a final size, this situation is not always possible, but I would like to approach it.
The order of elements doesn't matter, thread can take same element at a different time.
Example: 2 Threads, 5 elements - {1,2,3,4,5}
Thread 1 | Thread 2
1 2
3 4
5 1
3 2
4 5
My solution:
I create AtomicInteger index, which I increment on every next call.
PairSupplier pairs;
AtomicInteger index;
public boolean hasNext(){
return true;
}
public Pair<Q, E> next(){
int position = index.incrementAndGet() % pairs.size;
if (position < 0) {
position *= -1;
position = pairs.size - position;
}
return pairs.get(position);
}
pairs and index are shared among all threads.
I found this solution not scalable (because all threads go for increment), maybe someone have better ideas?
This iterator will be used by 50-1000 threads.

Your question details are ambiguous - your example suggests that two threads can be handed the same Pair but you say otherwise in the description.
As the more difficult to achieve, I will offer an Iterable<Pair<Q,E>> that will deliver Pairs one per thread until the supplier cycles - then it will repeat.
public interface Supplier<T> {
public int size();
public T get(int index);
}
public interface PairSupplier<Q, E> extends Supplier<Pair<Q, E>> {
}
public class IterableSupplier<T> implements Iterable<T> {
// The common supplier to use across all threads.
final Supplier<T> supplier;
// The atomic counter.
final AtomicInteger i = new AtomicInteger();
public IterableSupplier(Supplier<T> supplier) {
this.supplier = supplier;
}
#Override
public Iterator<T> iterator() {
/**
* You may create a NEW iterator for each thread while they all share supplier
* and Will therefore distribute each Pair between different threads.
*
* You may also share the same iterator across multiple threads.
*
* No two threads will get the same pair twice unless the sequence cycles.
*/
return new ThreadSafeIterator();
}
private class ThreadSafeIterator implements Iterator<T> {
#Override
public boolean hasNext() {
/**
* Always true.
*/
return true;
}
private int pickNext() {
// Just grab one atomically.
int pick = i.incrementAndGet();
// Reset to zero if it has exceeded - but no spin, let "just someone" manage it.
int actual = pick % supplier.size();
if (pick != actual) {
// So long as someone has a success before we overflow int we're good.
i.compareAndSet(pick, actual);
}
return actual;
}
#Override
public T next() {
return supplier.get(pickNext());
}
#Override
public void remove() {
throw new UnsupportedOperationException("Remove not supported.");
}
}
}
NB: I have adjusted the code a little to accommodate both scenarios. You can take an Iterator per thread or share a single Iterator across threads.

You have a piece of information ("has anyone taken this Pair already?") that must be shared between all threads. So for the general case, you're stuck. However, if you have an idea about this size of your array and the number of threads, you could use buckets to make it less painful.
Let's suppose we know that there will be 1,000,000 array elements and 1,000 threads. Assign each thread a range (thread #1 gets elements 0-999, etc). Now instead of 1,000 threads contending for one AtomicInteger, you can have no contention at all!
That works if you can be sure that all your threads will run at about the same pace. If you need to handle the case where sometimes thread #1 is busy doing other things while thread #2 is idle, you can modify your bucket pattern slightly: each bucket has an AtomicInteger. Now threads will generally only contend with themselves, but if their bucket is empty, they can move on to the next bucket.

I'm having some trouble understanding what the problem you are trying to solve is?
Does each thread process the whole collection?
Is the concern that no two threads can work on the same Pair at the same time? But each thread needs to process each Pair in the collection?
Or do you want the collection processed once by using all of the threads?

There is one key thing which is obscure in your example - what exactly is the meaning this?
The order of elements doesn't matter, thread can take same element at a different time.
"different time" means what? Within N milliseconds of each other? Does it mean that absolutely two threads will never be touching the same Pair at the same time? I will assume that.
If you want to decrease the probability that threads will block on each other contending for the same Pair, and there is a backing array of Pairs, try this:
Partition your array into numPairs / threadCount sub-arrays (you don't have to actually create sub-arrays, just start at different offsets - but it's easier to think about as sub-array)
Assign each thread to a different sub-array; when a thread exhausts its sub-array, increment the index of its sub array
Say we have 6 Pairs and 2 threads - your assignments look like Thread-1:[0,1,2] Thread-2:[3,4,5]. When Thread-1 starts it will be looking at a different set of Pairs than thread 2, so it is unlikely that they will contend for the same pair
If it is important that two threads really not touch a Pair at the same time, then wrap all of the code which touches a Pair object in synchronized(pair) (synchronize on the instance, not the type!) - there may occasionally be blocking, but you're never blocking all threads on a single thing, as with the AtomicInteger - threads can only block each other because they are really trying to touch the same object
Note this is not guaranteed never to block - for that, all threads would have to run at exactly the same speed, and processing every Pair object would have to take exactly the same amount of time, and the OS's thread scheduler would have to never steal time from one thread but not another. You cannot assume any of those things. What this gives you is a higher probability that you will get better concurrency, by dividing the areas to work in and making the smallest unit of state that is shared be the lock.
But this is the usual pattern for getting more concurrency on a data structure - partition the data between threads so that they rarely are touching the same lock at the same time.

The most easy that o see, is create Hash set or Map, and give a unique hash for every thread. After that just do simple get by this hash code.

This is standard java semaphore usage problem. The following javadoc gives almost similar example as your problem. http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Semaphore.html
If you need more help, let me know?

I prefer a lock and release process.
If a thread is asking for a pair object, the Pair object is removed from the supplier. Before the thread is asking for a new pair, the 'old' pair is added the the suplier again.
You can push from front and put at the end.

java concurrency: many writers, one reader

I need to gather some statistics in my software and i am trying to make it fast and correct, which is not easy (for me!)
first my code so far with two classes, a StatsService and a StatsHarvester
public class StatsService
{
private Map<String, Long> stats = new HashMap<String, Long>(1000);
public void notify ( String key )
{
Long value = 1l;
synchronized (stats)
{
if (stats.containsKey(key))
{
value = stats.get(key) + 1;
}
stats.put(key, value);
}
}
public Map<String, Long> getStats ( )
{
Map<String, Long> copy;
synchronized (stats)
{
copy = new HashMap<String, Long>(stats);
stats.clear();
}
return copy;
}
}
this is my second class, a harvester which collects the stats from time to time and writes them to a database.
public class StatsHarvester implements Runnable
{
private StatsService statsService;
private Thread t;
public void init ( )
{
t = new Thread(this);
t.start();
}
public synchronized void run ( )
{
while (true)
{
try
{
wait(5 * 60 * 1000); // 5 minutes
collectAndSave();
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
}
private void collectAndSave ( )
{
Map<String, Long> stats = statsService.getStats();
// do something like:
// saveRecords(stats);
}
}
At runtime it will have about 30 concurrent running threads each calling notify(key) about 100 times. Only one StatsHarvester is calling statsService.getStats()
So i have many writers and only one reader. it would be nice to have accurate stats but i don't care if some records are lost on high concurrency.
The reader should run every 5 Minutes or whatever is reasonable.
Writing should be as fast as possible. Reading should be fast but if it locks for about 300ms every 5 minutes, its fine.
I've read many docs (Java concurrency in practice, effective java and so on), but i have the strong feeling that i need your advice to get it right.
I hope i stated my problem clear and short enough to get valuable help.
EDIT
Thanks to all for your detailed and helpful answers. As i expected there is more than one way to do it.
I tested most of your proposals (those i understood) and uploaded a test project to google code for further reference (maven project)
http://code.google.com/p/javastats/
I have tested different implementations of my StatsService
HashMapStatsService (HMSS)
ConcurrentHashMapStatsService (CHMSS)
LinkedQueueStatsService (LQSS)
GoogleStatsService (GSS)
ExecutorConcurrentHashMapStatsService (ECHMSS)
ExecutorHashMapStatsService (EHMSS)
and i tested them with x number of Threads each calling notify y times, results are in ms
10,100 10,1000 10,5000 50,100 50,1000 50,5000 100,100 100,1000 100,5000
GSS 1 5 17 7 21 117 7 37 254 Summe: 466
ECHMSS 1 6 21 5 32 132 8 54 249 Summe: 508
HMSS 1 8 45 8 52 233 11 103 449 Summe: 910
EHMSS 1 5 24 7 31 113 8 67 235 Summe: 491
CHMSS 1 2 9 3 11 40 7 26 72 Summe: 171
LQSS 0 3 11 3 16 56 6 27 144 Summe: 266
At this moment i think i will use ConcurrentHashMap, as it offers good performance while it is quite easy to understand.
Thanks for all your input!
Janning

As jack was eluding to you can use the java.util.concurrent library which includes a ConcurrentHashMap and AtomicLong. You can put the AtomicLong in if absent else, you can increment the value. Since AtomicLong is thread safe you will be able to increment the variable without worry about a concurrency issue.
public void notify(String key) {
AtomicLong value = stats.get(key);
if (value == null) {
value = stats.putIfAbsent(key, new AtomicLong(1));
}
if (value != null) {
value.incrementAndGet();
}
}
This should be both fast and thread safe
Edit: Refactored sligthly so there is only at most two lookups.

Why don't you use java.util.concurrent.ConcurrentHashMap<K, V>? It handles everything internally avoiding useless locks on the map and saving you a lot of work: you won't have to care about synchronizations on get and put..
From the documentation:
A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.
You can specify its concurrency level:
The allowed concurrency among update operations is guided by the optional concurrencyLevel constructor argument (default 16), which is used as a hint for internal sizing. The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention. But overestimates and underestimates within an order of magnitude do not usually have much noticeable impact. A value of one is appropriate when it is known that only one thread will modify and all others will only read. Also, resizing this or any other kind of hash table is a relatively slow operation, so, when possible, it is a good idea to provide estimates of expected table sizes in constructors.
As suggested in comments read carefully the documentation of ConcurrentHashMap, especially when it states about atomic or not atomic operations.
To have the guarantee of atomicity you should consider which operations are atomic, from ConcurrentMap interface you will know that:
V putIfAbsent(K key, V value)
V replace(K key, V value)
boolean replace(K key,V oldValue, V newValue)
boolean remove(Object key, Object value)
can be used safely.

I would suggest taking a look at Java's util.concurrent library. I think you can implement this solution a lot cleaner. I don't think you need a map here at all. I would recommend implementing this using the ConcurrentLinkedQueue. Each 'producer' can freely write to this queue without worrying about others. It can put an object on the queue with the data for its statistics.
The harvester can consume the queue continually pulling data off and processsing it. It can then store it however it needs.

Chris Dail's answer looks like a good approach.
Another alternative would be to use a concurrent Multiset. There is one in the Google Collections library. You could use this as follows:
private Multiset<String> stats = ConcurrentHashMultiset.create();
public void notify ( String key )
{
stats.add(key, 1);
}
Looking at the source, this is implemented using a ConcurrentHashMap and using putIfAbsent and the three-argument version of replace to detect concurrent modifications and retry.

A different approach to the problem is to exploit the (trivial) thread safety via thread confinement. Basically create a single background thread that takes care of both reading and writing. It has a pretty good characteristics in terms of scalability and simplicity.
The idea is that instead of all the threads trying to update the data directly, they produce an "update" task for the background thread to process. The same thread can also do the read task, assuming some lags in processing updates is tolerable.
This design is pretty nice because the threads will no longer have to compete for a lock to update data, and since the map is confined to a single thread you can simply use a plain HashMap to do get/put, etc. In terms of implementation, it would mean creating a single threaded executor, and submitting write tasks which may also perform the optional "collectAndSave" operation.
A sketch of code may look like the following:
public class StatsService {
private ExecutorService executor = Executors.newSingleThreadExecutor();
private final Map<String,Long> stats = new HashMap<String,Long>();
public void notify(final String key) {
Runnable r = new Runnable() {
public void run() {
Long value = stats.get(key);
if (value == null) {
value = 1L;
} else {
value++;
}
stats.put(key, value);
// do the optional collectAndSave periodically
if (timeToDoCollectAndSave()) {
collectAndSave();
}
}
};
executor.execute(r);
}
}
There is a BlockingQueue associated with an executor, and each thread that produces a task for the StatsService uses the BlockingQueue. The key point is this: the locking duration for this operation should be much shorter than the locking duration in the original code, so the contention should be much less. Overall it should result in a much better throughput and latency.
Another benefit is that since only one thread reads and writes to the map, plain HashMap and primitive long type can be used (no ConcurrentHashMap or atomic types involved). This also simplifies the code that actually processes it a great deal.
Hope it helps.

Have you looked into ScheduledThreadPoolExecutor? You could use that to schedule your writers, which could all write to a concurrent collection, such as the ConcurrentLinkedQueue mentioned by #Chris Dail. You can have a separately schedule job to read from the Queue as necessary, and the Java SDK should handle pretty much all your concurrency concerns, no manual locking needed.

If we ignore the harvesting part and focus on the writing, the main bottleneck of the program is that the stats are locked at a very coarse level of granularity. If two threads want to update different keys, they must wait.
If you know the set of keys in advance, and can preinitialize the map so that by the time an update thread arrives the key is guaranteed to exist, you would be able to do locking on the accumulator variable instead of the whole map, or use a thread-safe accumulator object.
Instead of implementing this yourself, there are map implementations that are designed specifically for concurrency and do this more fine-grained locking for you.
One caveat though are the stats, since you would need to get locks on all the accumulators at roughly the same time. If you use an existing concurrency-friendly map, there might be a construct for getting a snapshot.

Another alternative for implement both methods using ReentranReadWriteLock. This implementation protects against race conditions at getStats method, if you need to clear the counters. Also it removes the mutable AtomicLong from the getStats an uses an immutable Long.
public class StatsService {
private final Map<String, AtomicLong> stats = new HashMap<String, AtomicLong>(1000);
private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock r = rwl.readLock();
private final Lock w = rwl.writeLock();
public void notify(final String key) {
r.lock();
AtomicLong count = stats.get(key);
if (count == null) {
r.unlock();
w.lock();
count = stats.get(key);
if(count == null) {
count = new AtomicLong();
stats.put(key, count);
}
r.lock();
w.unlock();
}
count.incrementAndGet();
r.unlock();
}
public Map<String, Long> getStats() {
w.lock();
Map<String, Long> copy = new HashMap<String, Long>();
for(Entry<String,AtomicLong> entry : stats.entrySet() ){
copy.put(entry.getKey(), entry.getValue().longValue());
}
stats.clear();
w.unlock();
return copy;
}
}
I hope this helps, any comments are welcome!

Here is how to do it with minimal impact on the performance of the threads being measured. This is the fastest solution possible in Java, without resorting to special hardware registers for performance counting.
Have each thread output its stats independently of the others, that is with no synchronization, to some stats object. Make the field containing the count volatile, so it is memory fenced:
class Stats
{
public volatile long count;
}
class SomeRunnable implements Runnable
{
public void run()
{
doStuff();
stats.count++;
}
}
Have another thread, that holds a reference to all the Stats objects, periodically go around them all and add up the counts across all threads:
public long accumulateStats()
{
long count = previousCount;
for (Stats stat : allStats)
{
count += stat.count;
}
long resultDelta = count - previousCount;
previousCount = count;
return resultDelta;
}
This gatherer thread also needs a sleep() (or some other throttle) added to it. It can periodically output counts/sec to the console for example, to give you a "live" view of how your application is performing.
This avoids the synchronization overhead about as much as you can.
The other trick to consider is padding the Stats objects to 128 (or 256 bytes on SandyBridge or later), so as to keep the different threads counts on different cache lines, or there will be caching contention on the CPU.
When only one thread reads and one writes, you do not need locks or atomics, a volatile is sufficient. There will still be some thread contention, when the stats reader thread interacts with the CPU cache line of the thread being measured. This cannot be avoided, but it is the way to do it with minimal impact on the running thread; read the stats maybe once a second or less.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.