Parallel lock-free ascending id generation

Parallel lock-free ascending id generation - java

I have a map which should associate Strings with an id. There must not be gaps between ids and they must be unique Integers from 0 to N.
Request always comes with two Strings of which one, both or none may have been already indexed.
The map is built in parallel from the ForkJoin pool and ideally i would like to avoid explicit synchronized blocks. I am looking for an optimal way to maximize throughput with or without locking.
I don't see how to use AtomicInteger without creating gaps in sequence for the keys which were already present in the map.
public class Foo {
private final Map<String, Integer> idGenerator = new ConcurrentHashMap<>();
// invoked from multiple threads
public void update(String key1, String key2) {
idGenerator.dosomething(key, ?) // should save the key and unique id
idGenerator.dosomething(key2, ?) // should save the key2 and its unique id
Bar bar = new Bar(idGenerator.get(key), idGenerator.get(key2));
// ... do something with bar
}
}
I think that size() method combined with merge() might solve the problem but i cannot quite convince myself of that. Could anyone suggest an approach for this problem?
EDIT
Regarding duplicate flag, this cannot be solved with AtomicInteger.incrementAndGet() as suggested in the linked answer. If i did this blindly for every String there would be gaps in sequences. There is a need for compound operation which checks if the key exists and only then generates id.
I was looking for a way to implement such compound operation via Map API.
The second provided answer goes against requirements i have specifically laid out in the question.

There is not a way to do it exactly the way you want it -- ConcurrentHashMap is not in and of itself, lock-free. However, you can do it atomically without having to do any explicit lock management by using the java.util.Map.computeIfAbsent function.
Here's a code sample in the style of what you provided that should get you going.
ConcurrentHashMap<String, Integer> keyMap = new ConcurrentHashMap<>();
AtomicInteger sequence = new AtomicInteger();
public void update(String key1, String key2) {
Integer id1 = keyMap.computeIfAbsent(key1, s -> sequence.getAndIncrement());
Integer id2 = keyMap.computeIfAbsent(key2, s -> sequence.getAndIncrement());
Bar bar = new Bar(id1, id2);
// ... do something with bar
}

I'm not sure you can do exactly what you want. You can batch some updates, though, or do the checking separately from the enumerating / adding.
A lot of this answer is assuming that order isn't important: you need all the strings given a number, but reordering even within a pair is ok, right? Concurrency could already cause reordering of pairs, or for members of a pair not to get contiguous numbers, but reordering could lead to the first of a pair getting a higher number.
latency is not that important. This application should chew large amount of data and eventually produce output. Most of the time there should be a search hit in a map
If most searches hit, then we mostly need read throughput on the map.
A single writer thread might be sufficient.
So instead of adding directly to the main map, concurrent readers can check their inputs, and if not present, add them to a queue to be enumerated and added to the main ConcurrentHashMap. The queue could be a simple lockless queue, or could be another ConCurrentHashMap to also filter duplicates out of not-yet-added candidates. But probably a lockless queue is good.
Then you don't need an atomic counter, or have any problems with 2 threads incrementing the counter twice when they see the same string before either of them can add it to the map. (Because otherwise that's a big problem.)
If there's a way for a writer to lock the ConcurrentHashMap to make a batch of updates more efficient, that could be good. But if the hit rate is expected to be quite high, you really want other reader threads to keep filtering duplicates as much as possible while we're growing it instead of pausing that.
To reduce contention between the main front-end threads, you could have multiple queues, like maybe each thread has a single-producer / single-consumer queue, or a group of 4 threads running on a pair of physical cores shares one queue.
The enumerating thread reads from all of them.
In a queue where readers don't contend with writers, the enumerating thread has no contention. But multiple queues reduce contention between writers. (The threads writing these queues are the threads that access the main ConcurrentHashMap read-only, where most CPU time will be spent if hit-rates are high.)
Some kind of read-copy-update (RCU) data structure might be good, if Java has that. It would let readers keep filtering out duplicates at full speed, while the enumerating thread constructs a new table with with a batch of insertions done, with zero contention while it's building the new table.
With a 90% hit rate, one writer thread could maybe keep up with 10 or so reader threads that filter new keys against the main table.
You might want to set some queue-size limit to allow for back-pressure from the single writer thread. Or if you have many more cores / threads than a single writer can keep up with, when maybe some kind of concurrent set to let the multiple threads eliminate duplicates before numbering is helpful.
Or really, if you can just wait until the end to number everything, that would be a lot simpler, I think.
I thought about maybe trying to number with room for error on race conditions, and then going back to fix things up, but that probably isn't better.

Related

Design AppServer Interview Discussion

I encountered the following question in a recent System Design Interview:
Design an AppServer that interfaces with a Cache and a DB.
I came up with this:
public class AppServer{
public Database DB;
public Cache cache;
public Value get(Key k){
Value res = cache.get(k);
if(res == null){
res = DB.get(k);
cache.set(k, res);
}
}
public void set(Key k, Value v){
cache.set(k, v);
DB.set(k, v);
}
}
This code is fine and works correctly, but follow ups to the question are:
What if there are multiple threads?
What if there are multiple instances of the AppServer?
Suddenly AppServer performance degrades a ton, we find out this is because our cache is consistently missing. Cache size is fixed (already largest that it can be). How can we prevent this?
Response:
I answered that we can use Locks or Conditional Variables. In Java, we can add Synchronized to each method to allow for mutual exclusion, but the interviewer mentioned that this isn't too efficient and wanted only critical parts synchronized.
I thought that we only need to synchronize the 2 set lines in void set(Key k, Value v) and 1 set method in Value get(Key k), however the interviewer pushed for also synchronizing res = DB.get(k);. I agreed with him at the end, but don't fully understand. Don't threads have independent stacks and shared heaps? So when a thread executes get, it stores res in local variable on stack frame, even if another thread executes get sequentially, the former thread retains its get value. Then each thread sets their respective fetched values.
How can we handle multiple instances of the AppServer?
I came up with a Distributed Queue Solution like Kafka, every time we perform a set / get command we queue that command, but he also mentioned that set is ok because the action sets a value in the cache / db, but how would you return the correct value for get? Can someone explain this?
Also there are possible solutions with a versioning system and event system?
Possible solutions:
L1, L2, L3 caches - layers and more caches
Regional / Segmentation caches - use different cache for user groups.
Any other ideas?
Will upvote all insightful responses :)

1
Although JDBC is "supposed" to be thread safe, some drivers aren't and I'm going to assume that Cache isn't thread safe either (although most caches should be thread safe) so in that case, you would need to make the following changes to your code:
Make both fields final
Synchronize the ENTIRE get(...)method
Synchronize the ENTIRE set(...)method
Assuming there is no other way to access the said fields, the behavior of your get(...) method depends on 2 things: first, that updates from the set(...) method can be seen, and secondly, that a cache miss is then stored only by a single thread. You need to synchronize because the idea is to only have one thread perform an expensive DB query in the case that there is a cache miss. If you do not synchronize the entire get(...) method, or you split the synchronized statement, it is possible for another thread to also see a cache miss between the lookup and insertion.
The way I would answer this question is honestly just to toss the entire thing. I would look at how JCIP wrote the cache and base my answer on that.
2
I think your queue solution is fine.
I believe your interviewer means that if another instance of AppServer did not have cached what was already set(...) by another instance of AppServer, then it would lookup and find the correct value in the DB. This solution would be incorrect if you are using multiple threads because it is possible for 2 threads to be set(...)ing conflicting values, then the caches would have 2 different values while depending on the thread safety of your DB, it might not even have the value at all.
Ideally, you'd never create more than a single instance of your AppServer.
3
I don't have enough experience to evaluate this question specifically, but perhaps an LRU cache would improve performance somewhat, or using a hash ring buffer. It might be a stretch but if you wanted to throw out there, perhaps even using ML to determine the best values to either preload to retain at certain times of the day, for example, could also work.
If you are always missing values from your cache, there is no way to improve your code. Performance would be dependent on your database.

How to aggregate KStream to list of fixed size?

Similarly but slightly different as this question: KStream batch process windows, I want to batch messages from a KStream before pushing it down to consumers.
However, this push-down should not be scheduled on a fixed time-window, but on a fixed message count threshold per key.
For starters 2 questions come to mind:
1) Is a custom AbstractProcessor the way this should be handled? Something along the lines of:
#Override
public void punctuate(long streamTime) {
KeyValueIterator<String, Message[]> it = messageStore.all();
while (it.hasNext())
KeyValue<String, Message[]> entry = it.next();
if (entry.value.length > 10) {
this.context.forward(entry.key, entry.value);
entry.value = new Message[10]();
}
}
}
2) Since the StateStore will potentially explode (in case an entry value never reaches the threshold in order to be forwarded), what is the best way to 'garbage-collect' this? I could do a timebased schedule and remove keys that are too old... but that looks very DIY and error prone.

I guess this would work. Applying a time based 'garbage collection' sounds reasonable, too. And yes, using Processor API instead of DSL has some flavor of DIY -- that't the purpose of PAPI in the first place (empower the user to do whatever is needed).
A few comments though:
You will need a more complex data structure: because punctuate() is called based on stream-time progress, it can happen that you have more than 10 records for one key between two calls. Thus, you would need something like KeyValueIterator<String, List<Message[]>> it = messageStore.all(); to be able to store multiple batches per key.
I would assume that you will need to fine tune the schedule for punctuate which will be tricky -- if your schedule is too tight, many batches might not be completed yet and you waste CPU -- if your schedule is too loose, you will need a lot of memory and your downstream operators will get a lot of data as you emit a lot of stuff at once. Sending burst of data downstream could become a problem.
Scanning the whole store is expensive -- it seems to be a good idea to try to "sort" your key-value pairs according to their batch size. This should enable you to only touch keys which do have completed batches instead of all keys. Maybe you can keep an in-memory list of keys that have complteted batches and only do a lookup for those (on failure, you need to do a single pass over all keys from the store to recreate this in-memory list).

Effective thread-safe Java List impl when traversals match mutations

I have a number of threads that will be consuming messages from a broker and processing them. Each message is XML containing, amongst other elements, an alpha-numeric <itemId>WI354DE48</itemId> element that serves as a unique ID for the item to "process". Due to criteria I can't control or change, it is possible for items/messages to be duplicated on the broker queue that thhese threads are consuming from. So the same item (with an ID of WI354DE48), might only be sent to the queue once, or it might get sente 100 times. Regardless, I can only allow the item to be processed once; so I need a way to prevent Thread A from processing a duplicated item that Thread B already processed.
I'm looking to use a simple thread-safe list that can be shared by all threads (workers) to act as a cache mechanism. Each thread will be given the same instance of a List<String>. When each worker thread consumes a message, it checks to see if the itemId (a String) exists on the list. If it doesn't then no other worker has processed the item. In this case, the itemID is added to the list (locking/caching it), and then the item is processed. If the itemId does already exist on the list, then another worker has already processed the item, so we can ignore it. Simple, yet effective.
It's obviously then paramount to have a thread-safe list implementation. Note that the only two methods we will ever be calling on this list will be:
List#contains(String) - traversing/searching the list
List#add(String) - mutating the list
...and its important to note that we will be calling both methods with about the same frequency. Only rarely will contains() return true and prevent us from needing to add the ID.
I first thought that CopyOnWriteArrayList was my best bet, but after reading the Javadocs, it seems like each worker would just wind up with its own thread-local copy of the list, which isn't what I want. I then looked into Collections.synchronizedList(new ArrayList<String>), and that seems to be a decent bet:
List<String> processingCache = Collection.synchronizedList(new ArrayList<String>());
List<Worker> workers = getWorkers(processingCache); // Inject the same list into all workers.
for(Worker worker : workers)
executor.submit(worker);
// Inside each Worker's run method:
#Override
public void run() {
String itemXML = consumeItemFromBroker();
Item item = toItem(itemXML);
if(processingCache.contains(item.getId())
return;
else
processingCache.add(item.getId());
... continue processing.
}
Am I on track with Collections.synchronizedList(new ArrayList<String>), or am I way off base? Is there a more efficient thread-safe List impl given my use case, and if so, why?

Collections.synchronizedList is very basic, it just marks all methods as synchronized.
This will work but only under some specific assumptions, namely that you never carry out multiple accesses to the List, i.e.
if(!list.contains(x))
list.add(x);
Is not thread safe as the monitor is released between the two calls.
It can also be somewhat slow if you have many reads and few writes as all threads acquire an exclusive lock.
You can look at the implementations in the java.util.concurrent package, there are several options.
I would recommend using a ConcurrentHashMap with dummy values.
The reason for the recommendation is that the ConcurrentHashMap has synchronized key groups so if you have a good hashing algorithm (and String does) you can actually get a massive amount of concurrent throughput.
I would prefer this over a ConcurrentSkipListSet as it doesn't guarantee ordering and therefore you lose that overhead.
Of course with threading it's never entirely obvious where the bottlenecks are so I would suggest trying both and seeing which gives you better performance.

Why should we use HashMap in multi-threaded environments?

Today I was reading about how HashMap works in Java. I came across a blog and I am quoting directly from the article of the blog. I have gone through this article on Stack Overflow. Still
I want to know the detail.
So the answer is Yes there is potential race condition exists while
resizing HashMap in Java, if two thread at the same time found that
now HashMap needs resizing and they both try to resizing. on the
process of resizing of HashMap in Java , the element in bucket which
is stored in linked list get reversed in order during there migration
to new bucket because java HashMap doesn't append the new element at
tail instead it append new element at head to avoid tail traversing.
If race condition happens then you will end up with an infinite loop.
It states that as HashMap is not thread-safe during resizing of the HashMap a potential race condition can occur. I have seen in our office projects even, people are extensively using HashMaps knowing they are not thread safe. If it is not thread safe, why should we use HashMap then? Is it just lack of knowledge among developers as they might not be aware about structures like ConcurrentHashMap or some other reason. Can anyone put a light on this puzzle.

I can confidently say ConcurrentHashMap is a pretty ignored class. Not many people know about it and not many people care to use it. The class offers a very robust and fast method of synchronizing a Map collection. I have read a few comparisons of HashMap and ConcurrentHashMap on the web. Let me just say that they’re totally wrong. There is no way you can compare the two, one offers synchronized methods to access a map while the other offers no synchronization whatsoever.
What most of us fail to notice is that while our applications, web applications especially, work fine during the development & testing phase, they usually go tilts up under heavy (or even moderately heavy) load. This is due to the fact that we expect our HashMap’s to behave a certain way but under load they usually misbehave. Hashtable’s offer concurrent access to their entries, with a small caveat, the entire map is locked to perform any sort of operation.
While this overhead is ignorable in a web application under normal load, under heavy load it can lead to delayed response times and overtaxing of your server for no good reason. This is where ConcurrentHashMap’s step in. They offer all the features of Hashtable with a performance almost as good as a HashMap. ConcurrentHashMap’s accomplish this by a very simple mechanism.
Instead of a map wide lock, the collection maintains a list of 16 locks by default, each of which is used to guard (or lock on) a single bucket of the map. This effectively means that 16 threads can modify the collection at a single time (as long as they’re all working on different buckets). Infact there is no operation performed by this collection that locks the entire map.

There are several aspects to this: First of all, most of the collections are not thread safe. If you want a thread safe collection you can call synchronizedCollection or synchronizedMap
But the main point is this: You want your threads to run in parallel, no synchronization at all - if possible of course. This is something you should strive for but of course cannot be achieved every time you deal with multithreading.
But there is no point in making the default collection/map thread safe, because it should be an edge case that a map is shared. Synchronization means more work for the jvm.

In a multithreaded environment, you have to ensure that it is not modified concurrently or you can reach a critical memory problem, because it is not synchronized in any way.
Dear just check Api previously I also thinking in same manner.
I thought that the solution was to use the static Collections.synchronizedMap method. I was expecting it to return a better implementation. But if you look at the source code you will realize that all they do in there is just a wrapper with a synchronized call on a mutex, which happens to be the same map, not allowing reads to occur concurrently.
In the Jakarta commons project, there is an implementation that is called FastHashMap. This implementation has a property called fast. If fast is true, then the reads are non-synchronized, and the writes will perform the following steps:
Clone the current structure
Perform the modification on the clone
Replace the existing structure with the modified clone
public class FastSynchronizedMap implements Map,
Serializable {
private final Map m;
private ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
.
.
.
public V get(Object key) {
lock.readLock().lock();
V value = null;
try {
value = m.get(key);
} finally {
lock.readLock().unlock();
}
return value;
}
public V put(K key, V value) {
lock.writeLock().lock();
V v = null;
try {
v = m.put(key, value);
} finally {
lock.writeLock().lock();
}
return v;
}
.
.
.
}
Note that we do a try finally block, we want to guarantee that the lock is released no matter what problem is encountered in the block.
This implementation works well when you have almost no write operations, and mostly read operations.

Hashmap can be used when a single thread has an access to it. However when multiple threads start accessing the Hashmap there will be 2 main problems:
1. resizing of hashmap is not gauranteed to work as expected.
2. Concurrent Modification exception would be thrown. This can also be thrown when its accessed by single thread to read and write onto the hashmap at the same time.

A workaround for using HashMap in multi-threaded environment is to initialize it with the expected number of objects' count, hence avoiding the need for a re-sizing.

Data structure for non-blocking aggregation of Thread values?

Background:
I have a large thread-pool in java each process has some internal state.
I would like to gather some global information about the states -- to do that I have an associative commutative aggregation function (e.g. sum -- mine needs to be plug-able though).
The solution needs to have a fixed memory consumption and be log-free in best case not disturbing the pool at all. So no thread should need to require a log (or enter a synchronized area) when writing to the data-structure. The aggregated value is only read after the threads are done, so I don't need an accurate value all the time. Simply collecting all values and aggregate them after the pool is done might lead to memory problems.
The values are going to be more complex datatypes so I cannot use AtomicInteger etc.
My general Idea for the solution:
Have a log-free collection where all threads put their updates to. I don't even need the order of the events.
If it gets to big run the aggregation function on it (compacting it) while the threads continue filling it.
My question:
Is there a data structure that allows for something like that or do I need to implement it from scratch? I couldn't find anything that directly matches my problem. If I have to implement from scratch what would be a good non-blocking collection class to start from?

If the updates are infrequent (relatively speaking) and the aggregation function is fast, I would recommend aggregrating every time:
State myState;
AtomicReference<State> combinedState;
do
{
State original = combinedState.get();
State newCombined = Aggregate(original, myState);
} while(!combinedState.compareAndSet(original, newCombined));

I don't quite understand the question but I would, at first sight, suggest an IdentityHashMap where keys are (references to) your thread objects and values are where your thread objects write their statistics.
An IdentityHashMap only relies on reference equality, as such there would never be any conflict between two thread objects; you could pass a reference to that map to each thread (which would then call .get(this) on the map to get a reference to the collecting data structure), which would then collect the data it wants. Otherwise you could just pass a reference to the collecting data structure to the thread object.
Such a map is inherently thread safe for your use case, as long as you create the key/value pair for that thread before starting the thread, and because no thread object will ever modify the map anyway since they won't have a referece to it. With some management smartness you can even remove entries from this map, even if the map is not even thread-safe, once the thread is done with its work.
When all is done, you have a map whose values contains all the data collected.
Hope this helps... Reading the question again, in any case...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.