modifying a ConcurrentHashMap and Synchronized ArrayList in same method - java

I have a collection of objects that is modified by one thread and read by another (more specifically the EDT). I needed a solution that gave me fast look up and also fast indexing (by order inserted), so I'm using a ConcurrentHashMap with an accompanying ArrayList of the keys, so if want to index an entry, I can index the List for the key and then use the returned key to get the value from the hash map. So I have a wrapper class that makes sure when and entry is added, the mapping is added to the hash map and the key is added to the list at the same time, similarly for removal.
I'm posting an example of the code in question:
private List<K> keys = Collections.synchronizedList(new ArrayList<K>(INITIAL_CAPACITY));
private ConcurrentMap<K, T> entries = new ConcurrentHashMap<K, T>(INITIAL_CAPACITY, .75f);
public synchronized T getEntryAt(int index){
return entries.get(keys.get(index));
}
**public synchronized void addOrReplaceEntry(K key, T value){
T result = entries.get(key);
if(result == null){
entries.putIfAbsent(key, value);
keys.add(key);
}
else{
entries.replace(key, result);
}
}**
public syncrhonized T removeEntry(K key, T value){
keys.remove(key);
entries.remove(key, value);
}
public synchronized int getSize(){
return keys.size();
}
my question is: am I losing all the benefits of using the ConcurrentHashMap (over syncrhonized hashmap) by operating on it in synchronized methods? I have to synchronize the methods to safely modify/read from the ArrayList of keys (CopyOnWriteArrayList is not an option because a lot of modification happens...) Also, if you know of a better way to do this, that would be appreciated...

Yes, using a Concurrent collection and a Synchronized collection in only synchronized blocks is a waste. You wont get the benefits of ConcurrentHashMap because only one thread will be accesing it at a time.
You could have a look at this implementation of a concurrent linked hashmap, I havnt use it so can't attest to it's features.
One thing to consider would be to switching from synchronized blocks to a ReadWriteLock to improve concurrent read only performance.
I'm not really sure of the utility of proving a remove at index method, perhaps you could give some more details about the problem you are trying to solve?

It seems that you only care about finding values by index. If so, dump the Map and just use a List. Why do you need the Map?

Mixing synchronized and concurrent collections the way you have done it is not recommended. Any reason why you are maintaining two copies of the stuff you are interested in? You can easily get a list of all the keys from the map anytime rather than maintaining a separate list.

Why not store the values in the list and in the map the key -> index mapping?
so for getEntry you only need on lookup (in the list which should be anyway faster than a map) and for remove you do not have to travers the whole list. Syhnronization happens so.

You can get all access to the List keys onto the event queue using EventQueue.invokeLater. This will get rid of the synchronization. With all the synching you were not running much in parallel anyway. Also it means the getSize method will give the same answer for the duration of an event.
If you stick with synchronization instead of using invokeLater, at least get the entries hash table out of the synch block. Either way, you get more parallel processing. Of course, entries can now become out-of-synch with keys. The only down side is sometimes a key will come up with a null entry. With such a dynamic table this is unlikely to matter much.
Using the suggestion made by chrisichris to put the values in the list will solve this problem if it is one. In fact, this puts a nice wall between keys and entries; they are now used in completely separate ways. (If your only need for entries is to provide values to the JTable, you can get rid of it.) But entries (if still needed) should reference the entries, not contain an index; maintaining indexes there would be a hopeless task. And always remember that keys and entries are snapshots of "reality" (for lack of a better word) taken at different times.

Related

Concurrently checking for duplicates + adding item to list/set in Java

I have some code that is running a load tests against a web service by spinning up multiple threads and hitting the service with a specified transaction at a given rate. The transaction retrieves a list of values from the service, then checks the list of values to see if they exist in a set, and adds them if they do not or fails the transaction if they do (I'm aware the separte check is not necessary and the return value of the add could be inspected- that's just how the code is written now).
Looking at the code however, it is not thread safe. The set being checked against/added to is a basic HashSet. The current code also increments a value in a regular hashMap for each transation- so it looks like this code has been mesesed up from the beginning when it comes to thread safety.
I believe I solved the Map increment issue using ConcurrentHashMap based solution here: Atomically incrementing counters stored in ConcurrentHashMap, but I'm not sure the best way to handle the duplicate check/modification on the Set in a thread-safe way.
Originally I considered using CopyOnWriteArraySet, but because the expected case is to get no duplicates, the reads would occur as frequently as writes, so it doesn't seem ideal. The solution I'm considering now is to use a Set 'view' on ConcurrentHashMap using newKeySet()/KeySet(defaultVal) as described here: https://javarevisited.blogspot.com/2017/08/how-to-create-thread-safe-concurrent-hashset-in-java-8.html
If I use this solution checking for duplicates by just adding the value and checking the bool return type, will this achieve what I want in a thread-safe way? My main concern is that it is important that I DO detect any duplicates. What I don't want to happen is two threads try to add at the same time, and both adds return true since the value was not there when they attempted to add and the duplicate values received from the service goes undetected. For that purpose I thought maybe I should use a List and check for duplicates at the end by converting to a set and checking size? However it's still preferable to at least attempt to detect a duplicate during the transaction and fail if detected. It's fine to get a false negative sometimes and still pass the transaction if we can detect it at the end, but I think that check/failing transaction when we can is still valuable.
Any advice is appreciated- thanks!
I believe I solved the Map increment issue using ConcurrentHashMap based solution here: Atomically incrementing counters stored in ConcurrentHashMap, but I'm not sure the best way to handle the duplicate check/modification on the Set in a thread-safe way.
Yes you can certainly use a ConcurrentHashMap in your solution.
If I use this solution checking for duplicates by just adding the value and checking the bool return type, will this achieve what I want in a thread-safe way?
Yes. ConcurrentHashMap is a fully reentrant class so if two threads are doing a put(...) of the same key at the same instant, one of them will win and return null as the existing key and the other will replace the key and return the previous value for the key that you can test on. It is designed specifically for high performance multi-threaded applications. You can also do a putIfAbsent(...) in which case the 2nd thread (and any others) will return the value already in the map. This would also work if you are using a keyset wrapper to supply Set mechanics.
With all synchronized classes, you need to be careful about race conditions in your code when you make multiple calls to the class. For example, something like the following is a terrible pattern because there is a race condition because of the multiple calls to the concurrent-map:
// terrible pattern which creates a race condition
if (!concurrentMap.containsKey(key)) {
concurrentMap.put(key, value);
}
This is the reason why the ConcurrentMap has a number of atomic operations that help with this:
V putIfAbsent(K key, V value); -- put key into map if it is not there already
boolean remove(K key, V value); -- remove the key from the map if it has value
boolean replace(K key, V oldValue, V newValue); -- replaces key with new-value only if it already has old-value
V replace(K key, V value); -- replace the value associated with the key only if key already exists in the map
All of these methods would require multiple, non-atomic calls to the synchronized map to implement from outside which would introduce race conditions.
My main concern is that it is important that I DO detect any duplicates. What I don't want to happen is two threads try to add at the same time, and both adds return true...
As mentioned above, this won't happen. One of the 2 puts will return null and the other one should be counted as a duplicate.
For that purpose I thought maybe I should use a List and check for duplicates at the end by converting to a set and checking size?
The list would be unnecessary and very hard to get right.
I think a ConcurrentHashSet-like set is your best friend:
Set<Value> values = ConcurrentHashMap.newKeySet();
The set is backed by an ConcurrentHashMap so your code would both benefit from thread-safety and performance of ConcurrentHashMap
Just a little advise -
if your Transaction object (or whatever you put into Set) has proper equals method implementation you do not need to check duplicates in the Set.
Set always has only unique values.
If you still need to know is object already in the set use contains method.
Then there are multiple ways to do what you need.
You can use ConcurrentHashMap instead of Setjust put your objects as keys. You have a keySet there and you can use it. Value can be anything (e.g. same object). Sure you can use valueSet instead as well.
You can use one of the BlockingQueue (e.g. LinkedBlockingQueue) implementation to collect transactions from different threads first and then apply any logic you want after all threads done
and there are many other ways...

thread safe data structure to preserve order of insertion [duplicate]

I need a data structure that is a LinkedHashMap and is thread safe.
How can I do that ?
You can wrap the map in a Collections.synchronizedMap to get a synchronized hashmap that maintains insertion order. This is not as efficient as a ConcurrentHashMap (and doesn't implement the extra interface methods of ConcurrentMap) but it does get you the (somewhat) thread safe behavior.
Even the mighty Google Collections doesn't appear to have solved this particular problem yet. However, there is one project that does try to tackle the problem.
I say somewhat on the synchronization, because iteration is still not thread safe in the sense that concurrent modification exceptions can happen.
There's a number of different approaches to this problem. You could use:
Collections.synchronizedMap(new LinkedHashMap());
as the other responses have suggested but this has several gotchas you'll need to be aware of. Most notably is that you will often need to hold the collections synchronized lock when iterating over the collection, which in turn prevents other threads from accessing the collection until you've completed iterating over it. (See Java theory and practice: Concurrent collections classes). For example:
synchronized(map) {
for (Object obj: map) {
// Do work here
}
}
Using
new ConcurrentHashMap();
is probably a better choice as you won't need to lock the collection to iterate over it.
Finally, you might want to consider a more functional programming approach. That is you could consider the map as essentially immutable. Instead of adding to an existing Map, you would create a new one that contains the contents of the old map plus the new addition. This sounds pretty bizarre at first, but it is actually the way Scala deals with concurrency and collections
There is one implementation available under Google code. A quote from their site:
A high performance version of java.util.LinkedHashMap for use as a software cache.
Design
A concurrent linked list runs through a ConcurrentHashMap to provide eviction ordering.
Supports insertion and access ordered eviction policies (FIFO, LRU, and Second Chance).
You can use a ConcurrentSkipListMap, only available in Java SE/EE 6 or later. It is order presevering in that keys are sorted according to their natural ordering. You need to have a Comparator or make the keys Comparable objects. In order to mimik a linked hash map behavior (iteration order is the order in time in which entries were added) I implemented my key objects to always compare to be greater than a given other object unless it is equal (whatever that is for your object).
A wrapped synchronized linked hash map did not suffice because as stated in
http://www.ibm.com/developerworks/java/library/j-jtp07233.html: "The synchronized collections wrappers, synchronizedMap and synchronizedList, are sometimes called conditionally thread-safe -- all individual operations are thread-safe, but sequences of operations where the control flow depends on the results of previous operations may be subject to data races. The first snippet in Listing 1 shows the common put-if-absent idiom -- if an entry does not already exist in the Map, add it. Unfortunately, as written, it is possible for another thread to insert a value with the same key between the time the containsKey() method returns and the time the put() method is called. If you want to ensure exactly-once insertion, you need to wrap the pair of statements with a synchronized block that synchronizes on the Map m."
So what only helps is a ConcurrentSkipListMap which is 3-5 times slower than a normal ConcurrentHashMap.
Collections.synchronizedMap(new LinkedHashMap())
Since the ConcurrentHashMap offers a few important extra methods that are not in the Map interface, simply wrapping a LinkedHashMap with a synchronizedMap won't give you the same functionality, in particular, they won't give you anything like the putIfAbsent(), replace(key, oldValue, newValue) and remove(key, oldValue) methods which make the ConcurrentHashMap so useful.
Unless there's some apache library that has implemented what you want, you'll probably have to use a LinkedHashMap and provide suitable synchronized{} blocks of your own.
I just tried synchronized bounded LRU Map based on insertion order LinkedConcurrentHashMap; with Read/Write Lock for synchronization.
So when you are using iterator; you have to acquire WriteLock to avoid ConcurrentModificationException. This is better than Collections.synchronizedMap.
public class LinkedConcurrentHashMap<K, V> {
private LinkedHashMap<K, V> linkedHashMap = null;
private final int cacheSize;
private ReadWriteLock readWriteLock = null;
public LinkedConcurrentHashMap(LinkedHashMap<K, V> psCacheMap, int size) {
this.linkedHashMap = psCacheMap;
cacheSize = size;
readWriteLock=new ReentrantReadWriteLock();
}
public void put(K key, V value) throws SQLException{
Lock writeLock=readWriteLock.writeLock();
try{
writeLock.lock();
if(linkedHashMap.size() >= cacheSize && cacheSize > 0){
K oldAgedKey = linkedHashMap.keySet().iterator().next();
remove(oldAgedKey);
}
linkedHashMap.put(key, value);
}finally{
writeLock.unlock();
}
}
public V get(K key){
Lock readLock=readWriteLock.readLock();
try{
readLock.lock();
return linkedHashMap.get(key);
}finally{
readLock.unlock();
}
}
public boolean containsKey(K key){
Lock readLock=readWriteLock.readLock();
try{
readLock.lock();
return linkedHashMap.containsKey(key);
}finally{
readLock.unlock();
}
}
public V remove(K key){
Lock writeLock=readWriteLock.writeLock();
try{
writeLock.lock();
return linkedHashMap.remove(key);
}finally{
writeLock.unlock();
}
}
public ReadWriteLock getLock(){
return readWriteLock;
}
public Set<Map.Entry<K, V>> entrySet(){
return linkedHashMap.entrySet();
}
}
The answer is pretty much no, there's nothing equivalent to a ConcurrentHashMap that is sorted (like the LinkedHashMap). As other people pointed out, you can wrap your collection using Collections.synchronizedMap(-yourmap-) however this will not give you the same level of fine grained locking. It will simply block the entire map on every operation.
Your best bet is to either use synchronized around any access to the map (where it matters, of course. You may not care about dirty reads, for example) or to write a wrapper around the map that determines when it should or should not lock.
How about this.
Take your favourite open-source concurrent HashMap implementation. Sadly it can't be Java's ConcurrentHashMap as it's basically impossible to copy and modify that due to huge numbers of package-private stuff. (Why do the Java authors always do that?)
Add a ConcurrentLinkedDeque field.
Modify all of the put methods so that if an insertion is successful the Entry is added to the end of the deque. Modify all of the remove methods so that any removed entries are also removed from the deque. Where a put method replaces the existing value, we don't have to do anything to the deque.
Change all iterator/spliterator methods so that they delegate to the deque.
There's no guarantee that the deque and the map have exactly the same contents at all times, but concurrent hash maps don't make those sort of promises anyway.
Removal won't be super fast (have to scan the deque). But most maps are never (or very rarely) asked to remove entries anyway.
You could also achieve this by extending ConcurrentHashMap, or decorating it (decorator pattern).

Is it possible/required to speed up HashMap operations on same entry?

Suppose I wish to check HashMap entry and then replace it:
if( check( hashMap.get(key) ) ) {
hashMap.put(key, newValue);
}
this will cause search procedure inside HashMap to run two times: once while get and another one while put. This looks ineffective. Is it possible to modify value of already found entry of Map?
UPDATE
I know I can make a wrapper and I know I have problems to mutate entry. But the question is WHY? May be HashMap remembers last search to improve repeated one? Why there are no methods to do such operation?
EDIT: I've just discovered that you can modify the entry, via Map.Entry.setValue (and the HashMap implementation is mutable). It's a pain to get the entry for a particular key though, and I can't remember ever seeing anyone do this. You can get a set of the entries, but you can't get the entry for a single key, as far as I can tell.
There's one evil way of doing it - declare your own subclass of HashMap within the java.util package, and create a public method which just delegates to the package-private existing method:
package java.util;
// Please don't actually do this...
public class BadMap<K, V> extends HashMap<K, V> {
public Map.Entry<K, V> getEntryPublic(K key) {
return getEntry(key);
}
}
That's pretty nasty though.
You wouldn't normally modify the entry - but of course you can change data within the value, if that's a mutable type.
I very much doubt that this is actually a performance bottleneck though, unless you're doing this a heck of a lot. You should profile your application to prove to yourself that this is a real problem before you start trying to fine-tune something which is probably not an issue.
If it does turn out to be an issue, you could change (say) a Map<Integer, String> into a Map<Integer, AtomicReference<String>> and use the AtomicReference<T> as a simple mutable wrapper type.
Too much information for a comment on your question. Check the documentation for Hashmap.
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
Constant time means that it always requires the same amount of time to do the get and put operations [O(1)]. The amount of time that is going to be required is going to be linear based on how many times you need to loop [O(n)].
You can change the entry if it is mutable. One example of where you might do this is
private final Map<String, List<String>> map = new LinkedHashMap<>();
public void put(String key, String value) {
List<String> list = map.get(key);
if (list == null)
map.put(key, list = new ArrayList<>());
list.add(value);
}
This allows you to update a value, but you can't find and replace a value in one operation.
Take a look at trove ( http://trove4j.sourceforge.net/ ), their maps do have several methods that might be what you want:
adjustOrPut
putIfAbsent
I don't know how this is implemented internally, but i would guess that since trove is made to be highly performant, there will be only one lookup.

Implementing a concurrent LinkedHashMap

I'm trying to create a concurrent LinkedHashMap for a multithreaded architecture.
If I use Collections#synchronizedMap(), I would have to use synchronized blocks for iteration. This implementation would lead to sequential addition of elements.
If I use ConcurrentSkipListMap is there any way to implement a Comparator to store sequentially, as stored in Linked List or queue.
I would like to use java's built in instead of third party packages.
EDIT:
In this concurrent LinkedHashMap, if the keys are the name, I wish to put the keys in sequence of their arrival. i.e. new value would be appended to either at start or end, but sequentially.
While iterating, the LinkedHashMap could be added with new entries, or removed. but the iteration should be the sequence in which the entries were added.
I understand that by using Collections#synchronizedMap(), an synchronized block for iteration would have to be implemented, but would the map be modifiable (entries could be added/removed) while it is being iterated.
If you use synchronizedMap, you don't have to synchronize externally, except for iteration. If you need to preserve the ordering of the map, you should use a SortedMap. You could use ConcurrentSkipListMap, which is thread-safe, or another SortedMap in combination with synchronizedSortedMap.
A LinkedHashMap has a doubly linked list running through a hashtable. A FIFO only mutates the links on a write (insertion or removal). This makes implementing a version fairly straightforward.
Write a LHM with only insertion order allowed.
Switch to a ConcurrentHashMap as the hashtable.
Protect #put() / #putIfAbsent() / #remove() with a lock.
Make the "next" field volatile.
On iteration, no lock is needed as you can safely follow the "next" field. Reads can be lock-free by just delegating to the CHM on a #get().
Use Collections#synchronizedMap().
As per my belief, if I use Collections.synchronizedMap(), I would have to use synchronized blocks for getter/setter.
This is not true. You only need to synchronize the iteration on any of the views (keyset, values, entryset). Also see the abovelinked API documentation.
Until now, my project used LRUMap from Apache Collections but it is based on SequencedHashMap. Collections proposes ListOrderedMap but none are thread-safe.
I have switched to MapMaker from Google Guava. You can look at CacheBuilder too.
Um, simple answer would be to use a monotonically increasing key provider that your Comparator operates on. Think AtomicInteger, and every time you insert, you create a new key to be used for comparisons. If you pool your real key, you can make an internal map of OrderedKey<MyRealKeyType>.
class OrderedKey<T> implements Comparable<OrderedKey<T>> {
T realKey;
int index;
OrderedKey(AtomicInteger source, T key) {
index = source.getAndIncrement();
realKey = key;
}
public int compareTo(OrderedKey<T> other) {
if (Objects.equals(realKey, other.realKey)) {
return 0;
}
return index - other.index;
}
}
This would obviate the need for a custom comparator, and give you a nice O(1) method to compute size (unless you allow removes, in which case, count those as well, so you can just subtract "all successful removes" from "all successful adds", where successful means an entry was actually created or removed).

What Java Data Structure/Solution would best fit these requirements?

I need a java data structure/solution that meets these requirements. What best fits these?
1) Object's insertion order must be kept
2) Object's must be unique (These are database objects that are uniquely identified by a UUID).
3) If a newer object with the same ID is added, the older version of the object should be over-written/removed
4) The Solution should be accessible by many threads.
5) When the first object added to the Structure is read/used, it should be removed from the data structure
There are a couple of possibilities here. The simplest might be to start with a LinkedHashSet. That will provide you with the uniqueness and predictable ordering that you require. Then, you could wrap the resulting set to make it thread-safe:
Set<T> s = Collections.synchronizedSet(new LinkedHashSet<T>(...));
Note: Since a Set doesn't really define a method for retrieving items from it, your code would have to manually invoke Set.remove(Object).
Alternatively, you could wrap a LinkedHashMap, which does provide a hook for the delete-on-read semantics you require:
class DeleteOnReadMap<K, V> implements Map<K, V> {
private Map<K, V> m = new LinkedHashMap<K, V>();
// implement Map "read" methods Map with delete-on-read semantics
public V get(K key) {
// ...
}
// (other read methods here)
// implement remaining Map methods by forwarding to inner Map
public V put(K key, V value) {
return m.put(key, value);
}
// (remaining Map methods here)
}
Finally, wrap an instance of your custom Map to make it thread-safe:
Map<K, V> m = Collections.synchronizedMap(new DeleteOnReadMap<K, V>(...));
My thought is something like the following:
Collections.synchronizedMap(new LinkedHashMap<K, V>());
I think that takes care of everything except requirement 5, but you can do that by using the remove() method instead of get().
This won't be quite as efficient as a ConcurrentMap would be - synchronization locks the entire map on every access, but I think ConncurrentMap implementations can use read-write locks and selective locking on only part of the map to allow multiple non-conflicting accesses to go on simultaneously. If you wanted, you could probably get better performance by writing your own subclass of some existing Map implementation.
1) Object's insertion order must be
kept
This is any "normal" data structure - array, arrayList, tree. So avoid self-balancing or self-sorting data structures: heaps, hashtables, or move-to-front trees (splay trees, for example.) Then again, you could use one of those structures, but then you have to keep track of its insertion order in each node.
2) Object's must be unique (These are
database objects that are uniquely
identified by a UUID).
Keep a unique identifier associated with each object. If this is a C program, then the pointer to that node is unique (I guess this applies in Java as well.) If the node's pointer is not sufficient to maintain "uniqueness", then you need to add a field to each node which you gaurantee to have a unique value.
3) If a newer object with the same ID
is added, the older version of the
object should be over-written/removed
Where do you want to place the node? Do you want to replace the existing node? Or do you want to delete the old node,and then add the new one to the end? This is important because it is related to your requirement #1, where the order of insertion must be preserved.
4) The Solution should be accessible
by many threads.
The only way I can think of to do this is to implement some sort of locking. Java lets you wrap strucutres and code within an synchronized block.
5) When the first object added to the
Structure is read/used, it should be
removed from the data structure
Kinda like a "dequeue" operation.
Seems like an ArrayList is a pretty good option for this: simply because of #5. The only problem is that searches are linear. But if you have a relatively small amount of data, then it isn't really that much of a problem.
Otherwise, like others have said: a HashMap or even a Tree of some sort would work - but that will depend on the frequency of accesses. (For example, if the "most recent" element is most likely to be accessed, I'd use a linear structure. But if accesses will be of "random" elements, I'd go with a HashMap or Tree.)
The solutions talking about LinkedHashSet would be a good starting point.
However, you would have to override the equals and hashcode methods on the objects that you are going to be putting in the set in order to satisfy your requirement number 3.
Sounds like you have to create your own data structure, but it sounds like a pretty easy class assignment.
Basically you start with anything like an Array or Stack but then you have to extend it for the rest of the functionality.
You can look at the 'Contains' method as you will need that.

Categories

Resources