I need a concurrent list that is thread safe and at the same time is best for iteration and should return exact size.
I want to to store auction bids for an item. So I want to be able to
retrieve the exact number of bids for an item
add a bid to a item
retrieve all the bids for a given item.
Remove a bid for a item
I am planning to have it in a
ConcurrentHashMap<Item, LinkedList<ItemBid>> -- LinkedList is not thread safe but returns exact size
ConcurrentHashMap<Item, ConcurrentLinkedQueue<ItemBid>> - concurrentlinked queue is thread safe but does not guarantee to return exact size
Is there any other better collection that will address the above 4 points and is thread safe.
Well arguably in a thread-safe collection or map you cannot guarantee the "consistency" of the size, meaning that the "happen-before" relationship between read and write operations will not benefit your desired use case, where a read operation on the size should return a value reflecting the exact state from the last write operation (N.B.: improved based on comments - see below).
What you can do if performance is not an issue is to use the following idiom - either:
Collections.synchronizedMap(new HashMap<YourKeyType, YourValueType>());
Collections.synchronizedList(new ArrayList<YourType>());
You'll then also need to explicitly synchronize over those objects.
This will ensure the order of operations is consistent at the cost of blocking, and you should get the last "right" size at all times.
You can use LinkedBlockingQueue. It is blocking (as apposed to the CLQ) but size is maintained and not scanned like the CLQ.
Related
I have a class which has:
2 fields holding time-ordered list (list1, list2).
3 read-only methods which iterate above lists to
generate summary statistics.
1 mutating method, which looks for a match of given 'new-item' in list1. If match is not found, it adds 'new-item' to list1. If match is found, it removes the match from list1 and adds both match and 'new-item' to list2.
Lets assume that multiple concurrent invocation of all methods are possible. I need to achieve thread-safety while maximising performance.
Approach1 (extremely slow) - Declare field-types as ArrayList and use synchronise keyword on all methods.
Approach2 - Declare field-type as CopyOnWriteArrayList and synchronise the mutating method.
Questions
Does Approach2 ensure thread-safety?
Are there better alternatives?
Do you need the random access offered by an ArrayList? Can you instead use a thread-safe ordered collection like ConcurrentSkipListSet (non-blocking) or PriorityBlockingQueue (blocking)? Both have log(n) insertions.
Mutation methods in both cases are thread-safe.
Edit: Just note, you would still run into atomicity concerns. If you need the add's to be done attomically then you would need more coarse locking.
Approach number 2 does not guarantee thread-safety.
The two operations on collections are not atomic: first you remove an item, then you add it to the other collection. Some thread might in the meantime execute a read-only method to find out that the item is missing in list 1, and is not yet added to the list 2. It depends on your application whether this is acceptable.
On the other hand, it is also possible that: a read-only method first iterates through list 1, and finds that it contains item x; in the meantime the updating method executes and transfers item x; the read-only method continues and iterates through list 2 in which it finds item x one more item. Again, it depends on your application whether this is acceptable.
Other solutions are possible, but that would require more details about what are you trying to achieve exactly.
One obvious way would be to modify approach number 1, and instead of using synchronized on every method, use a readers-writer lock. You would read-lock in every read-only method and write-lock in the mutating one.
You could also use two separate readers-writer locks. One for the first collection and one for the other. If your read-only methods iterate through both of the lists, they would have to read-acquire both of the locks up front, before doing anything. On the other hand the mutating method would have to first write-acquire the first lock, and if it wishes to transfer an item, then it should write-acquire the second lock.
You'd need to do some testing to see if it works nicely for you. Still there are definitely even better ways to handle it, but you'd need to provide more details.
The time it takes to lock a method is less than a micro-second. If a fraction of a micro-second matters, you might consider something more complex, both otherwise something simple is usually better.
Just using thread safe collection is not enough when you perform multiple operations, e.g. remove from one list and add to another is two operations, and any number of thread can get in between those operations.
Note: if you do lots of updates this can be slower.
Am using JDK 7, SQLite, and have Guava in my project.
I have a TreeMap with less than 100 entries that is being updated by a single "worker" thread hundreds of times a second. I am now writing a component (another thread - the "DB thread") that will write the map to my database every 5 or 10 seconds.
I know that I need to make a deep copy of the map so the DB thread will use a snapshot, while the worker thread continues its job. I am looking at the Guava Maps class which has many methods that make copies, but I am not sure if any of them meet my needs to synchronize on the map whenever a copy is needed. Is there a method there that will meet my needs, or should I write a synchronized block to make my own deep copy?
It depends on what you want:
If you want a fully concurrent map (cant read while adding and so on) You should use what JSlain said before me.
If all you want is the CURRENT snapshot of the map and you do not care if the map will be modified as long as the iterator you are using wont be changed.
Then use ConcurrentSkipListMap
This will provide each iteration with a new independent iterator so even if the real map is changed you wont notice it.
You will see it in the next update (5 seconds in your case.)
From TreeMap javadoc:
Note that this implementation is not synchronized. If multiple threads
access a map
concurrently, and at least one of the threads modifies the map structurally, it must be
synchronized externally. (A structural modification is any operation that adds or deletes one
or more mappings; merely changing the value associated with an existing key is not a
structural modification.) This is typically accomplished by synchronizing on some object that
naturally encapsulates the map. If no such object exists, the map should be "wrapped" using
the Collections.synchronizedSortedMap method. This is best done at creation time, to prevent
accidental unsynchronized access to the map:
SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));
I have a number of threads that will be consuming messages from a broker and processing them. Each message is XML containing, amongst other elements, an alpha-numeric <itemId>WI354DE48</itemId> element that serves as a unique ID for the item to "process". Due to criteria I can't control or change, it is possible for items/messages to be duplicated on the broker queue that thhese threads are consuming from. So the same item (with an ID of WI354DE48), might only be sent to the queue once, or it might get sente 100 times. Regardless, I can only allow the item to be processed once; so I need a way to prevent Thread A from processing a duplicated item that Thread B already processed.
I'm looking to use a simple thread-safe list that can be shared by all threads (workers) to act as a cache mechanism. Each thread will be given the same instance of a List<String>. When each worker thread consumes a message, it checks to see if the itemId (a String) exists on the list. If it doesn't then no other worker has processed the item. In this case, the itemID is added to the list (locking/caching it), and then the item is processed. If the itemId does already exist on the list, then another worker has already processed the item, so we can ignore it. Simple, yet effective.
It's obviously then paramount to have a thread-safe list implementation. Note that the only two methods we will ever be calling on this list will be:
List#contains(String) - traversing/searching the list
List#add(String) - mutating the list
...and its important to note that we will be calling both methods with about the same frequency. Only rarely will contains() return true and prevent us from needing to add the ID.
I first thought that CopyOnWriteArrayList was my best bet, but after reading the Javadocs, it seems like each worker would just wind up with its own thread-local copy of the list, which isn't what I want. I then looked into Collections.synchronizedList(new ArrayList<String>), and that seems to be a decent bet:
List<String> processingCache = Collection.synchronizedList(new ArrayList<String>());
List<Worker> workers = getWorkers(processingCache); // Inject the same list into all workers.
for(Worker worker : workers)
executor.submit(worker);
// Inside each Worker's run method:
#Override
public void run() {
String itemXML = consumeItemFromBroker();
Item item = toItem(itemXML);
if(processingCache.contains(item.getId())
return;
else
processingCache.add(item.getId());
... continue processing.
}
Am I on track with Collections.synchronizedList(new ArrayList<String>), or am I way off base? Is there a more efficient thread-safe List impl given my use case, and if so, why?
Collections.synchronizedList is very basic, it just marks all methods as synchronized.
This will work but only under some specific assumptions, namely that you never carry out multiple accesses to the List, i.e.
if(!list.contains(x))
list.add(x);
Is not thread safe as the monitor is released between the two calls.
It can also be somewhat slow if you have many reads and few writes as all threads acquire an exclusive lock.
You can look at the implementations in the java.util.concurrent package, there are several options.
I would recommend using a ConcurrentHashMap with dummy values.
The reason for the recommendation is that the ConcurrentHashMap has synchronized key groups so if you have a good hashing algorithm (and String does) you can actually get a massive amount of concurrent throughput.
I would prefer this over a ConcurrentSkipListSet as it doesn't guarantee ordering and therefore you lose that overhead.
Of course with threading it's never entirely obvious where the bottlenecks are so I would suggest trying both and seeing which gives you better performance.
I was reading about ConcurrentHashMap.
I read that it provides an Iterator that requires no synchronization and even allows the Map to be modified during iteration and thus there will be no ConcurrentModificationException.
I was wondering if this is a good thing as I might not get the element, put into ConcurrentHashMap earlier, during iteration as another thread might have changed it.
Is my thinking correct? If yes, is it good or bad?
I was wondering if this is a good thing as I might not get the element, put into ConcurrentHashMap earlier, during iteration as another thread might have changed it.
I don't think this should be a concern - the same statement is true if you use synchronization and the thread doing the iteration happens to grab the lock and execute it's loop prior to the thread that would insert the value.
If you need some sort of coordination between your threads to ensure that some action takes place after (and only after) another action, then you still need to manage this coordination, regardless of the type of Map used.
Usually, the ConcurrentHashMap weakly consistent iterator is sufficient. If instead you want a strongly consistent iterator, then you have a couple of options:
The ctrie is a hash array mapped trie that provides constant time snapshots. There is Java source code available for the data structure.
Clojure has a PersistentHashMap that you can use - this lets you iterate over a snapshot of the data.
Use a local database, e.g. HSQLDB to store the data instead of using a ConcurrentHashMap. Use a composite primary key of key|timestamp, and when you "update" a value you instead store a new entry with the current timestamp. To get an iterator, retrieve a resultset with a where timetamp < System.currentTimeMillis() clause, and iterate over the resultset.
In either case you're iterating over a snapshot, so you've got a strongly consistent iterator; in the former case you run the risk of running out of memory, while the latter case is a more complex solution.
The whole point of concurrent -anything is that you acknowledge concurrent activity, and don't trust that all access is serialized. With most collections, you cannot expect inter-element consistency without working for it.
If you don't care about seeing the latest data, but want a consistent (but possibly old) view of data, have a look at purely functional structures like Finger Trees.
My program has 100 threads.
Every single thread does this:
1) if arrayList is empty, add element with certain properties to it
2) if arrayList is not empty, iterate through elements found in arrayList, if found suitable element (matching certain properties), get it and remove the arrayList
The problem here is that while one thread is iterating through the arrayList, other 99 threads are waiting for the lock on arrayList.
What would you suggest to me if I want all 100 threads to work in lock-less condition? So they all have work to do?
Thanks
Have you looked at shared vs exclusive locking? You could use a shared lock on the list, and then have a 'deleted' property on the list elements. The predicate you use to check the list elements would need to make sure the element is not marked 'deleted' in addition to whatever other queries you have - also due to potential read-write conflicts, you would need to lock on each element as you traverse. Then periodically get an exclusive lock on the list to perform the deletes for real.
The read lock allows for a lot of concurrency on the list. The exclusive locks on each element of the list are not as nice, but you need to force the memory model to update your 'deleted' flag to each thread, so there's no way around that.
First if you're not running on a machine that has 64 cores or more your 100 threads are probably a performance hog in themselves.
Then an ArrayList for what you're describing is certainly not a good choice because removing an element does not run in amortized constant time but in linear time O(n). So that's a second performance hog. You probably want to use a LinkedList instead of your ArrayList (if you insist on using a List).
Now of course I doubt very much that you need to iterate over your complete list each time you need to find one element: wouldn't another data structure be more appropriate? Maybe that the elements that you put in your list have such a concept as "equality" and hence a Map with an O(1) lookup time could be used instead?
That's just for a start: as I showed you, there are at least two serious performances issues in what you described.... Maybe you should clarify your question if you want more help.
If your notion of "suitable element (matching certain properties)" can be encoded using a Comparator then a PriorityBlockingQueue would allow each thread to poll the queue, taking the next element without having to search the list or enqueuing a new element if the queue is empty.
Addendum: Thilo raise an essential point: As your approach evolves, you may want to determine empirically how many threads are optimal.
The key is to only use the object lock on arraylist when you actually need to.
A good idea would be to subclass arraylist and provide synchro on single read + write + delete processes.
This will ensure fine granularity with the locking while allowing the threads to run through the array list while protecting the semantics of the arraylist.
Have a single thread own the array and be responsible for adding to it and iterating over it to find work to do. Once a unit of work is found, put the work on a BlockingQueue. Have all your worker threads use take() to remove work from the queue.
This allows multiple units of work to be discovered per pass through the array and they can be handed off to waiting worker threads fairly efficiently.