Non-locking interaction with an array from several threads (in Java)

Non-locking interaction with an array from several threads (in Java) - java

I would need an array of Strings to be accessed from two threads. It has to be very fast and thread-safe. I prefer not to use locks, what approach I can take to make lock-free thread-safe array of Strings? I need a recipe in Java.

By definition, the only thread-safe writes available to memory shared by contended threads are actions that are provided by atomic instructions in the CPU. This isn't really relevant for Java (at least, almost all of the time), but it is worth noting that writes without locks in a concurrent environment are possible.
So, this is to say, that if you want to write to the array, you are likely going to need to have locks. Locks are the solution to the general problem.
You can, however, happily share an array between many threads without issue as long as they are only reading from the array. So, if your array is immutable (or any other object for that matter), it will be thread-safe by virtue of there never being an opportunity for contention.
So, let's suppose you want to write to the array from two different threads, but you are worried about contention. Maybe each thread wants to be recording a lot of data. There's several different solutions to this problem: I'll try to explain a few. This isn't exhaustive because concurrency is a hard problem to solve and although there are some common approaches, often enough the answer really depends on the specific situation.
The simplest approach
Just use a lock on the array when you write to it and see how it performs. Maybe you don't actually need to worry about performance problems right now.
Use a producer/consumer approach
Rather than having two threads write to the same array, have each of them "produce" values (maybe put them on different thread-safe queues) and have another thread responsible for "consuming" those values (remove them from the queue and put them in the array).
If order matters, this approach can be tricky to implement. But you are using concurrency, so ordering will be fairly non-deterministic anyways.
Do writes in batches
The idea here is that you would store the values you want to put in the array from each thread in it's own temporary batch of values. When the batch reaches a large enough size, the thread would lock the array and write the entire batch.
Write to separate parts of the array
If you know the size of your data, you can avoid contention by simply not allowing threads to write to the same index ranges. You'd divide the array up by the number of threads. Each thread, when created, would be given a start index into the array.
This option might fit what you are looking for (lock-free, thread-safe).

How about using the built in Collections.synchronizedList?

Related

Shard resources by thread?

I have a (limited) thread pool which executes CPU-bound tasks. I'd like to aggregate some numerical statistics from each of these threads in a single place. Basically: each thread will update some shared stats (e.g. how long its job took) at a very high frequency and, at some much slower interval, a 'stat reader' would query those stats.
My first thought was to use some shared atomics and update them from each thread. This works ok, but in my testing the overhead of the atomics can get pretty high with a lot of contention so I was trying to think of some other alternatives.
My second though was a sort of 'sharding' scheme, where each thread had its own stats object that it could update without requiring any synchronization. The 'stat reader' could then aggregate the stats from each thread into an overall stat value.
My first question is: does the thread sharding scheme make sense? Does something like that exist that I'm reinventing?
My second question is: if the sharding scheme does make sense, I'm trying to think of the best way to map threads to their shard:
1) Use the thread's ID mod some shard value to get a shard index, but I don't think that's reliable as I think the thread id value is shared, so I could get a collision.
2) Adding a thread-local index to the thread, but I don't think that will play nicely with the ExecutorService.
3) I could subclass Thread, but then I'd have to cast it when I wanted to access this which I'd rather avoid, if possible.
4) When the thread is created, create a mapping of its name to its shard. This would work, but there would be a race when creating the threads: one could be looking up its shard while we're adding a new shard to the map, causing concurrency issues.
Wondering if I'm way off-base here and overthinking it (seems like it would be a common problem?) or if one of these schemes does make sense for the use case.

One way to solve this is to use the LongAdder class that avoids the contention that plain old atomics suffer from.
A more hand-written approach would be to create some class that holds the statistics you want to gather for each thread, and then have an array of these objects such that each thread's stats object is in array[thread.getId() % NUM_THREADS]. The reader thread can then traverse the array and gather the stats as it pleases.
The trick to getting this to work efficiently is to avoid false sharing. That is, threads on different cores perform updates on their respective objects but those objects happen to reside on the same cacheline, causing massive amounts of unnecessary cache coherence traffic.
In Java 8, there is the #Contended annotation that you might want to look into. The old way of padding your class with a bunch of long fields doesn't work anymore since unused fields will be optimized away.

I would suggest you use different way: Actor.
The actor model provides a relatively simple but powerful model for designing and implementing applications that can distribute and share work across all system resources—from threads and cores to clusters of servers and data centers. It provides an effective framework for building applications with high levels of concurrency and for increasing levels of resource efficiency. Importantly, the actor model also has well-defined ways for handling errors and failures gracefully, ensuring a level of resilience that isolates issues and prevents cascading failures and massive downtime.
You can turn to Akka i think.

How to handle synchronization fast concurrent read/writes in Java/Android

I am trying to set up an analysis tool in android. I have several Listener which are fetching some signal data (float values) every 10 ms. Each signal is supposed to be stored in its own List.
Based on all the signal values we get in each 10ms steps, I need to compute a state based on some restrictions and store it in a List as well.
It is not unlikely that the analysis runs for like 2h, therefore the List can get quite big (~600.000 entries for 100 minutes for each signal). All the Lists are currently wrapped in a RootList, like List<List<float>>.
So I got a lot of write operations which only runs in one thread and a lot of read operations which are running from other threads (computing the states and doing some graphical representation). All this should be done in real-time and not after the analysis is done.
Since I need to do write and read operations concurrently, I thought about which implementation I could use to maintain the real-time computation aspect.
ReadWriteLock
I am not too sure if ReadWriteLocks are sufficient for my case, since my computation needs to happen as soon as all values are available and if I add values each 10ms in every List I would guess that it is basically locked the whole time?!
CopyOnWriteArrayList
I don't think CopyOnWriteArrayList fits my need either because i got a lot of write operations and I will be copying the Arrays endlessly. But I might be wrong here, maybe someone with more expertise can comment on that.
Maybe I could use the Collections.synchronizedList() or some other predefined synchronized implementations. But since I am writing/reading the values in such a frequent time interval it might not be the best idea either.
Are there any other clever ways to achieve what I want or is something I wrote valuable and only I see the problems? Maybe there even is a recommended way of doing this? Any advice on this would be greatly appreciated.

Concurrent map with multiple values for one key and auto-remove on timeout

I'm newcomer in concurrency. I read about Guava Cache and MultiMap. I look for something that can combine some possibilities of both:
From Cache I want auto-removal after ACCESS_TIMEOUT and WRITE_TIMEOUT has been expired.
From Multimap I want multiple values associated with one key.
All that must be concurrent.
I has multiple writers and multiple readers. I want to add values with rundom keys and remove them.
Question: Is there map implementation that fits my needs?
UPDATED: Striped<Lock> solution
More I read about Striped<Lock> - more attractive that seems to me. But it arose even more questions in my head:
If I use something like Striped<Lock> with Guava Cache which already uses ConcurrentHashMap I can face the problems with deadlocks or performance decline. Am I wrong?
If I use Striped<Lock> over Cache it still doesn't remove the question linked with multiple values per key.
Does Striped<Lock> eliminate the need of using concurrent map in my case? (I suppose the answer is YES) but in GitHub a saw the contrary.

You could start with a Cache<SomeKey, Collection<SomeValue>> (so you still get the expiration) and use synchronized collections (Collections.synchronized*()) as the values.
But what's really the question here is the type of concurrent access you need on the collections:
Is it enough that the operations are synchronized so the collections don't get corrupted, or do you need higher-level semantics like what ConcurrentMap.putIfAbsent() offers?
Do you need to do multiple operations on the collections of values in an atomic way? Like if you need to do
if (c.contains(v)) {
c.remove(v);
} else {
c.add(v);
}
you usually want to put that into a synchronized(c) { } block.
If so, you'll probably want to wrap the collection inside a class exposing those higher-level semantics and managing the lock around multiple operations to get the atomicity you need, and use that class as the value: Cache<SomeKey, SomeValuesContainer>.
As mentioned in the comments, Striped<Lock> can be used to synchronize the access to multiple Caches/ConcurrentHashMaps without imposing a single lock and its performance impact in case of even moderate contention.
If you need multiple Caches/ConcurrentHashMaps, that is: why don't the Peers (or a wrapper around it) actually contain that information?
1. Deadlocks, performance
Guava's Cache is similar to ConcurrentHashMap, but it doesn't use it. However, both work in the same way by having segments which can be locked independently, thus reducing contention when accessing the map concurrently (especially when updating). Using a Striped<Lock> around the access to either one cannot cause a deadlock, which only happens if you're not locking multiple locks in a consistent order: that can't happen here, as you'll always lock your Lock obtained from Striped<Lock> before calling the Cache or ConcurrentHashMap, which then locks its segment (invisible to you).
As to performance, yes, locking has a cost but it really depends on the level of contention (and that can be tuned with the number of stripes in a Striped<Lock> or the concurrencyLevel in a Cache). However, you need proper concurrency support anyway since without it you can get invalid results (or corrupt your data), so you have to do something (using either locking or a lock-free algorithm).
2. Multiple values per key
My original answer still stands. It's difficult to get an exact idea of what you're exactly trying to do from your multiple questions (it's better if you can provide a complete, consistent context in one question), but I think you don't need more than concurrent modification of the multiple values per key so the synchronized collections should be enough (but you need at least that). You'll have to reason about your access patterns as you add them to make sure they still fit the model, though: make sure your replaceAll*() methods lock what they need, for example.
3. Is ConcurrentMap still needed with Striped<Lock>?
YES! Especially with Striped<Lock> vs a single Lock, because you'll still get concurrent updates for keys which don't use the same stripe (that's the whole point of Striped<Lock>) so you need data structures which support concurrent modification. If you use a simple HashMap, you have every chance of corrupting it under enough load (and cause infinite loops, for example).

CopyOnWriteArrayList or Vector

All,
The edge Vector class has over ArrayList is that it is synchronized and hence ensures thread-safety. However, between CopyOnWriteArrayList and Vector, what should be the preferred considering thread safety and performance in consideration.

It depends on the usage pattern - if you have much more reads than writes, use CopyOnWriteArrayList, otherwise use Vector.
Vector introduces a small synchronization delay for each operation, when CopyOnWriteArrayList has a longer delay for write (due to copying) but no delay for reads.
Another consideration is a behaviour of iterators - Vector requires explicit synchronization when you are iterating it (so write operations can't be executed at the same time), CopyOnWriteArrayList doesn't.

Overall, it depends on the frequency and nature of read and write operations, and the size of the array.
You'll need to benchmark in your context to be sure, but here are some general principles:
If you are only going to read the
array, then even ArrayList is
thread safe (since the only
non-thread-safe modifications are
those that modify the list). Hence
you would want to use a
non-synchronised data structure,
either ArrayList or
CopyOnWriteArrayList would probably
work equally well.
If reads are much more common
compared to writes then you would
tend to prefer CopyOnWriteArrayList,
since the array copying overhead is
only incurred on writes.
If the size of the Array is
small, then the cost of making
array copies will also be small,
hence this will favour
CopyOnWriteArrayList over Vector.
You may also want to consider two other options:
Use an ArrayList but put the synchronisation elsewhere to ensure thread safety. This is actually the method I personally use most often - basically the idea is to use a separate, higher-level lock to protect all the relevant data structures at the same time. This is much more efficient than having synchronisation on every single operation as Vector does.
Consider an immutable persistent data structure - these are guaranteed to be thread safe due to immutability, do not require synchronisation and also benefit from low overhead (i.e. they share most data between different instances rather than making complete new copies). Languages like Clojure use these to get ArrayList-like performance while also guaranteeing full thread safety.

Performance ConcurrentHashmap vs HashMap

How is the performance of ConcurrentHashMap compared to HashMap, especially .get() operation (I'm especially interested for the case of only few items, in the range between maybe 0-5000)?
Is there any reason not to use ConcurrentHashMap instead of HashMap?
(I know that null values aren't allowed)
Update
just to clarify, obviously the performance in case of actual concurrent access will suffer, but how compares the performance in case of no concurrent access?

I was really surprised to find this topic to be so old and yet no one has yet provided any tests regarding the case. Using ScalaMeter I have created tests of add, get and remove for both HashMap and ConcurrentHashMap in two scenarios:
using single thread
using as many threads as I have cores available. Note that because HashMap is not thread-safe, I simply created separate HashMap for each thread, but used one, shared ConcurrentHashMap.
Code is available on my repo.
The results are as follows:
X axis (size) presents number of elements written to the map(s)
Y axis (value) presents time in milliseconds
The summary
If you want to operate on your data as fast as possible, use all the threads available. That seems obvious, each thread has 1/nth of the full work to do.
If you choose a single thread access use HashMap, it is simply faster. For add method it is even as much as 3x more efficient. Only get is faster on ConcurrentHashMap, but not much.
When operating on ConcurrentHashMap with many threads it is similarly effective to operating on separate HashMaps for each thread. So there is no need to partition your data in different structures.
To sum up, the performance for ConcurrentHashMap is worse when you use with single thread, but adding more threads to do the work will definitely speed-up the process.
Testing platform
AMD FX6100, 16GB Ram
Xubuntu 16.04, Oracle JDK 8 update 91, Scala 2.11.8

Thread safety is a complex question. If you want to make an object thread safe, do it consciously, and document that choice. People who use your class will thank you if it is thread safe when it simplifies their usage, but they will curse you if an object that once was thread safe becomes not so in a future version. Thread safety, while really nice, is not just for Christmas!
So now to your question:
ConcurrentHashMap (at least in Sun's current implementation) works by dividing the underlying map into a number of separate buckets. Getting an element does not require any locking per se, but it does use atomic/volatile operations, which implies a memory barrier (potentially very costly, and interfering with other possible optimisations).
Even if all the overhead of atomic operations can be eliminated by the JIT compiler in a single-threaded case, there is still the overhead of deciding which of the buckets to look in - admittedly this is a relatively quick calculation, but nevertheless, it is impossible to eliminate.
As for deciding which implementation to use, the choice is probably simple.
If this is a static field, you almost certainly want to use ConcurrentHashMap, unless testing shows this is a real performance killer. Your class has different thread safety expectations from the instances of that class.
If this is a local variable, then chances are a HashMap is sufficient - unless you know that references to the object can leak out to another thread. By coding to the Map interface, you allow yourself to change it easily later if you discover a problem.
If this is an instance field, and the class hasn't been designed to be thread safe, then document it as not thread safe, and use a HashMap.
If you know that this instance field is the only reason the class isn't thread safe, and are willing to live with the restrictions that promising thread safety implies, then use ConcurrentHashMap, unless testing shows significant performance implications. In that case, you might consider allowing a user of the class to choose a thread safe version of the object somehow, perhaps by using a different factory method.
In either case, document the class as being thread safe (or conditionally thread safe) so people who use your class know they can use objects across multiple threads, and people who edit your class know that they must maintain thread safety in future.

I would recommend you measure it, since (for one reason) there may be some dependence on the hashing distribution of the particular objects you're storing.

The standard hashmap provides no concurrency protection whereas the concurrent hashmap does. Before it was available, you could wrap the hashmap to get thread safe access but this was coarse grain locking and meant all concurrent access got serialised which could really impact performance.
The concurrent hashmap uses lock stripping and only locks items that affected by a particular lock. If you're running on a modern vm such as hotspot, the vm will try and use lock biasing, coarsaning and ellision if possible so you'll only pay the penalty for the locks when you actually need it.
In summary, if your map is going to be accesaed by concurrent threads and you need to guarantee a consistent view of it's state, use the concurrent hashmap.

In the case of a 1000 element hash table using 10 locks for whole table saves close to half the time when 10000 threads are inserting and 10000 threads are deleting from it.
The interesting run time difference is here
Always use Concurrent data structure. except when the downside of striping (mentioned below) becomes a frequent operation. In that case you will have to acquire all the locks? I read that the best ways to do this is by recursion.
Lock striping is useful when there is a way of breaking a high contention lock into multiple locks without compromising data integrity. If this is possible or not should take some thought and is not always the case. The data structure is also the contributing factor to the decision. So if we use a large array for implementing a hash table, using a single lock for the entire hash table for synchronizing it will lead to threads sequentially accessing the data structure. If this is the same location on the hash table then it is necessary but, what if they are accessing the two extremes of the table.
The down side of lock striping is it is difficult to get the state of the data structure that is affected by striping. In the example the size of the table, or trying to list/enumerate the whole table may be cumbersome since we need to acquire all of the striped locks.

What answer are you expecting here?
It is obviously going to depend on the number of reads happening at the same time as writes and how long a normal map must be "locked" on a write operation in your app (and whether you would make use of the putIfAbsent method on ConcurrentMap). Any benchmark is going to be largely meaningless.

It's not clear what your mean. If you need thread safeness, you have almost no choice - only ConcurrentHashMap. And it's definitely have performance/memory penalties in get() call - access to volatile variables and lock if you're unlucky.

Of course a Map without any lock system wins against one with thread-safe behavior which needs more work.
The point of the Concurrent one is to be thread safe without using synchronized so to be faster than HashTable.
Same graphics would would be very interesting for ConcurrentHashMap vs Hashtable (which is synchronized).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.