I'm searching various alternatives for a write intensive application when it comes to selecting a Java data structure. I know that ONE data structure could not provide a single universal solution solution to a write intensive application but I'm surprised by the lack of discussion out there on the topic.
There are many people talking about read-intensive-rate-writes or concurrent-read-only applications but I cannot find any conversation around the data structures used for a write intensive application.
Based on the following requirements
key/value pairs - Map
Unsorted - for the sake of simplicity
1000+ writes per minute / negligible reads
All data are stored in memory
I am thinking of the following approaches
Simple ConcurrentHashMap: Although based on this from official Oracle docs
[...] even though all operations are thread-safe, retrieval operations do not entail locking
It must be better suited for read intensive applications
Combination of a BlockingQueue and a set of ConcurrentHashMaps. In batches, the queue is drained of all its elements and then the updates are appropriately allocated in the underlying maps. In this approach though I would need an additional map to identify which maps are included to every map - acting like an orchestrator
Use a HashMap and synchronize on the API level. Meaning that every write related method is going to be synchronized
synchronized void aWriteMethod(Integer aKey,String aValue) {
thisWriteIntensiveMap.put(aKey,aValue);
}
It'd be great if this question did not just receive criticism on the aforementioned options but also suggestions about new and better solutions.
PS: Apart from the integrity of the data, order of operations and throttling issues what else needs to be taken into account into choosing the "best" approach for a write intensive.
I know that this might look a bit open ended but it'd be interesting to hear how people think on this problem.
Even if you would go for the worst possible type of map like e.g. a synchronized HashMap, I don't think you will notice any performance impact with your load. A few thousand writes per minute is nothing.
I would set up a JMH benchmark and try out various map implementations. The obvious candidate is a ConcurrentHashMap because it is designed to deal with concurrent access.
So I think this is a good example of premature optimization.
Related
I have a (limited) thread pool which executes CPU-bound tasks. I'd like to aggregate some numerical statistics from each of these threads in a single place. Basically: each thread will update some shared stats (e.g. how long its job took) at a very high frequency and, at some much slower interval, a 'stat reader' would query those stats.
My first thought was to use some shared atomics and update them from each thread. This works ok, but in my testing the overhead of the atomics can get pretty high with a lot of contention so I was trying to think of some other alternatives.
My second though was a sort of 'sharding' scheme, where each thread had its own stats object that it could update without requiring any synchronization. The 'stat reader' could then aggregate the stats from each thread into an overall stat value.
My first question is: does the thread sharding scheme make sense? Does something like that exist that I'm reinventing?
My second question is: if the sharding scheme does make sense, I'm trying to think of the best way to map threads to their shard:
1) Use the thread's ID mod some shard value to get a shard index, but I don't think that's reliable as I think the thread id value is shared, so I could get a collision.
2) Adding a thread-local index to the thread, but I don't think that will play nicely with the ExecutorService.
3) I could subclass Thread, but then I'd have to cast it when I wanted to access this which I'd rather avoid, if possible.
4) When the thread is created, create a mapping of its name to its shard. This would work, but there would be a race when creating the threads: one could be looking up its shard while we're adding a new shard to the map, causing concurrency issues.
Wondering if I'm way off-base here and overthinking it (seems like it would be a common problem?) or if one of these schemes does make sense for the use case.
One way to solve this is to use the LongAdder class that avoids the contention that plain old atomics suffer from.
A more hand-written approach would be to create some class that holds the statistics you want to gather for each thread, and then have an array of these objects such that each thread's stats object is in array[thread.getId() % NUM_THREADS]. The reader thread can then traverse the array and gather the stats as it pleases.
The trick to getting this to work efficiently is to avoid false sharing. That is, threads on different cores perform updates on their respective objects but those objects happen to reside on the same cacheline, causing massive amounts of unnecessary cache coherence traffic.
In Java 8, there is the #Contended annotation that you might want to look into. The old way of padding your class with a bunch of long fields doesn't work anymore since unused fields will be optimized away.
I would suggest you use different way: Actor.
The actor model provides a relatively simple but powerful model for designing and implementing applications that can distribute and share work across all system resources—from threads and cores to clusters of servers and data centers. It provides an effective framework for building applications with high levels of concurrency and for increasing levels of resource efficiency. Importantly, the actor model also has well-defined ways for handling errors and failures gracefully, ensuring a level of resilience that isolates issues and prevents cascading failures and massive downtime.
You can turn to Akka i think.
Use case: a single data structure (hashtable, array, etc) whose members are accessed frequently by multiple threads and modified infrequently by those same threads. How do I maintain performance while guaranteeing thread safety (ie, preventing dirty reads).
Java: Concurrent version of the data structure (concurrent hashmap, Vector, etc).
Python: No need if only threads accessing it, because of GIL. If it's multiple processes that will be reading and updating the data structure, then use threading.Lock. Force the each process's code to acquire the lock before and release the lock after accessing the data structure.
Does that sound reasonable? Will Java's concurrent data structure impose too much penalty to read speed? Is there higher level concurrency mechanism in python?
Instead of reasoning about performance, I highly recommend to measure it for your application. Don't risk thread problems for a performance improvement that you most probably won't ever notice.
So: write thread-safe code without any performance-tricks, use a decent profiler to find the percentage of time spent inside the data structure access, and then decide if that part is worth any improvement.
I bet there will be other bottlenecks, not the shared data structure.
If you like, come back to us with your code and the profiler results.
I am trying to set up an analysis tool in android. I have several Listener which are fetching some signal data (float values) every 10 ms. Each signal is supposed to be stored in its own List.
Based on all the signal values we get in each 10ms steps, I need to compute a state based on some restrictions and store it in a List as well.
It is not unlikely that the analysis runs for like 2h, therefore the List can get quite big (~600.000 entries for 100 minutes for each signal). All the Lists are currently wrapped in a RootList, like List<List<float>>.
So I got a lot of write operations which only runs in one thread and a lot of read operations which are running from other threads (computing the states and doing some graphical representation). All this should be done in real-time and not after the analysis is done.
Since I need to do write and read operations concurrently, I thought about which implementation I could use to maintain the real-time computation aspect.
ReadWriteLock
I am not too sure if ReadWriteLocks are sufficient for my case, since my computation needs to happen as soon as all values are available and if I add values each 10ms in every List I would guess that it is basically locked the whole time?!
CopyOnWriteArrayList
I don't think CopyOnWriteArrayList fits my need either because i got a lot of write operations and I will be copying the Arrays endlessly. But I might be wrong here, maybe someone with more expertise can comment on that.
Maybe I could use the Collections.synchronizedList() or some other predefined synchronized implementations. But since I am writing/reading the values in such a frequent time interval it might not be the best idea either.
Are there any other clever ways to achieve what I want or is something I wrote valuable and only I see the problems? Maybe there even is a recommended way of doing this? Any advice on this would be greatly appreciated.
This a general programming question. Let's say I have a thread doing a specific simulation, where speed is quite important. At every iteration I want to extract data from it and write it to a file.
Is it a better practice to hand over the data to a different thread and let the simulation thread focus on his job, or since speed is very important, make the simulation thread do the data recording too without any copying of data. (in my case it is 3-5 deques of integers with a size of 1000-10000)
Firstly it surely depends on how much data we are copying, but what else can it depend on? Can the cost of synchronization and copying be worth? Is it a good practice to create small runnables at each iteration to handle the recording task in case of 50 or more iterations per second?
If you truly want low latency on this stat capturing, and you want it during the simulation itself then two techniques come to mind. They can be used together very effectively. Please note that these two approaches are fairly far from the standard Java trodden path, so measure first and confirm that you need these techniques before abusing them; they can be difficult to implement correctly.
The fastest way to write the data to file during a simulation, without slowing down the simulation is to hand the work off to another thread. However care has to be taken on how the hand off occurs, as a memory barrier in the simulation thread will slow the simulation. Given the writer only cares that the values will come eventually I would consider using the memory barrier that sits behind AtomicLong.lazySet, it requests a thread safe write out to a memory address without blocking for the write to actually become visible to the other thread. Unfortunately direct access to this memory barrier is currently only availble via lazySet or via class sun.misc.Unsafe, which obviously is not part of the public Java API. However that should not be too large of a hurdle as it is on all current JVM implementations and Doug Lea is talking about moving parts of it into the mainstream.
To avoid the slow, blocking file IO that Java uses; make use of a memory mapped file. This lets the OS perform async IO for you on your behalf, and is very efficient. It also supports use of the same memory barrier mentioned above.
For examples of both techniques, I strongly recommend reading the source code to HFT Chronicle by Peter Lawrey. In fact, HFT Chronicle may be just the library for you to use here. It offers a highly efficient and simple to use disk backed queue that can sustain a million or so messages per second.
In my work on a stress-testing HTTP client I stored the stats into an array and, when the array was ready to send to the GUI, I would create a new array for the tester client and hand off the full array to the network layer. This means that you don't need to pay for any copying, just for the allocation of a fresh array (an ultra-fast operation on the JVM, involving hand-coded assembler macros to utilize the best SIMD instructions available for the task).
I would also suggest not throwing yourself head-on into the realms of optimal memory barrier usage; the difference between a plain volatile write and an AtomicReference.lazySet() can only be measurable if your thread does almost nothing else but excercise the memory barrier (at least millions of writes per second). Depending on your target I/O throughput, you may not even need NIO to meet the goal. Better try first with simple, easily maintainable code than dig elbows-deep into highly specialized APIs without a confirmed need for that.
I'm newcomer in concurrency. I read about Guava Cache and MultiMap. I look for something that can combine some possibilities of both:
From Cache I want auto-removal after ACCESS_TIMEOUT and WRITE_TIMEOUT has been expired.
From Multimap I want multiple values associated with one key.
All that must be concurrent.
I has multiple writers and multiple readers. I want to add values with rundom keys and remove them.
Question: Is there map implementation that fits my needs?
UPDATED: Striped<Lock> solution
More I read about Striped<Lock> - more attractive that seems to me. But it arose even more questions in my head:
If I use something like Striped<Lock> with Guava Cache which already uses ConcurrentHashMap I can face the problems with deadlocks or performance decline. Am I wrong?
If I use Striped<Lock> over Cache it still doesn't remove the question linked with multiple values per key.
Does Striped<Lock> eliminate the need of using concurrent map in my case? (I suppose the answer is YES) but in GitHub a saw the contrary.
You could start with a Cache<SomeKey, Collection<SomeValue>> (so you still get the expiration) and use synchronized collections (Collections.synchronized*()) as the values.
But what's really the question here is the type of concurrent access you need on the collections:
Is it enough that the operations are synchronized so the collections don't get corrupted, or do you need higher-level semantics like what ConcurrentMap.putIfAbsent() offers?
Do you need to do multiple operations on the collections of values in an atomic way? Like if you need to do
if (c.contains(v)) {
c.remove(v);
} else {
c.add(v);
}
you usually want to put that into a synchronized(c) { } block.
If so, you'll probably want to wrap the collection inside a class exposing those higher-level semantics and managing the lock around multiple operations to get the atomicity you need, and use that class as the value: Cache<SomeKey, SomeValuesContainer>.
As mentioned in the comments, Striped<Lock> can be used to synchronize the access to multiple Caches/ConcurrentHashMaps without imposing a single lock and its performance impact in case of even moderate contention.
If you need multiple Caches/ConcurrentHashMaps, that is: why don't the Peers (or a wrapper around it) actually contain that information?
1. Deadlocks, performance
Guava's Cache is similar to ConcurrentHashMap, but it doesn't use it. However, both work in the same way by having segments which can be locked independently, thus reducing contention when accessing the map concurrently (especially when updating). Using a Striped<Lock> around the access to either one cannot cause a deadlock, which only happens if you're not locking multiple locks in a consistent order: that can't happen here, as you'll always lock your Lock obtained from Striped<Lock> before calling the Cache or ConcurrentHashMap, which then locks its segment (invisible to you).
As to performance, yes, locking has a cost but it really depends on the level of contention (and that can be tuned with the number of stripes in a Striped<Lock> or the concurrencyLevel in a Cache). However, you need proper concurrency support anyway since without it you can get invalid results (or corrupt your data), so you have to do something (using either locking or a lock-free algorithm).
2. Multiple values per key
My original answer still stands. It's difficult to get an exact idea of what you're exactly trying to do from your multiple questions (it's better if you can provide a complete, consistent context in one question), but I think you don't need more than concurrent modification of the multiple values per key so the synchronized collections should be enough (but you need at least that). You'll have to reason about your access patterns as you add them to make sure they still fit the model, though: make sure your replaceAll*() methods lock what they need, for example.
3. Is ConcurrentMap still needed with Striped<Lock>?
YES! Especially with Striped<Lock> vs a single Lock, because you'll still get concurrent updates for keys which don't use the same stripe (that's the whole point of Striped<Lock>) so you need data structures which support concurrent modification. If you use a simple HashMap, you have every chance of corrupting it under enough load (and cause infinite loops, for example).