ConcurrentSkipList? That is, not a ConcurrentSkipListSet - java

I need a very fast (insert, remove, contains) highly concurrent list that can be sorted using a comparator/comparable.
The existing ConcurrentSkipListSet would be ideal, if it was a list and not a set. I need to insert multiple items which are equal into the data structure.
I'm currently thinking of using a LinkedDeque if I can't find anything better, but that structure is considerably slower than a skiplist at high contention.
Any suggestions?
EDIT: What I actually need, bare minimum, is something that is sorted using compareTo, can insert concurrently and can remove/get items using object identity. All other concurrent requirements mentioned in comments still apply.

The existing ConcurrentSkipListSet would be ideal, if it was a list and not a set.
So the SkipList data-structure at it's core is a linked list. If you are worried about order and the ability to traverse it easily and in order, the SkipList will work very well for that as well. It is also a probabilistic alternative to a balanced tree which is why it can also be a Set or a Map. The data structure in memory looks something like the following:
To quote from the Javadocs:
This class implements a concurrent variant of SkipLists providing expected average log(n) time cost for the containsKey, get, put and remove operations and their variants. Insertion, removal, update, and access operations safely execute concurrently by multiple threads. Iterators are weakly consistent, returning elements reflecting the state of the map at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations. Ascending key ordered views and their iterators are faster than descending ones.
If you explain more about what features you want from List, I can answer better whether ConcurrentSkipListSet will be able to work.
Edit:
Ah, I see. After some back and forth in the comment, it seems like you need to be able to stick two objects that are equivalent into the Set which isn't possible. What we worked out is to never have compareTo(...) return 0. It's a bit of a hack but using AtomicLong to generate a unique number for each object, you can then compare those numbers whenever the real comparison field (in this case a numerical timeout value) is equal. This will allow objects with the same field to be inserted into the Set and kept in the proper order based on the field.

You can create the Set with a comparator that never returns 0.
private Set<Obj> entities = new ConcurrentSkipListSet<>((o1, o2) -> {
if (o1.equals(o2)) {
// Return -1 or 1 - decide where you want to place an object when it's equals to another one
return -1;
}
// Implement the sorting order below
if (o1.getTimestamp() < o2.getTimestamp()) {
return -1;
}
if (o1.getTimestamp() > o2.getTimestamp()) {
return 1;
}
return -1;
})
;

Related

Duplicate item's index in linkedHashSet

I am adding some values to a LinkedHashSet and based on add() method's output i.e. true/false, I am performing other operations.
If the Set contains duplicate element it returns false and in this case I want to know the index of the duplicate element in the Set as I need to use that index somewhere else. Being a 'linked' collection there must be some way to get the index, but I couldn't find any such thing in Set/LinkedHashSet API.
LinkedHashSet is not explicitly indexed per se. If you require an index, using a Set for such application is usually a sign of wrong abstraction and/or lousy programming. LinkedHashSet only guarantees you predictable iteration order, not proper indexing of elements. You should use a List in such cases, since that's the interface giving you indexing guarantee. You can, however, infer the index using a couple of methods, for example (not recommended, mind me):
a) use indexed iteration through the collection (e.g. with for loop), seeking the duplicate and breaking when it's found; it's O(n) complexity for getting the index,
Object o; // this is the object you want to add to collection
if ( !linkedHashSet.add(o) ) {
int index = 0;
for( Object obj : linkedHashSet ) {
if ( obj == o ) // or obj.equals(o), depending on your code's semantics
return index;
index++;
}
}
b) use .toArray() and find the element in the array, e.g. by
Object o; // this is the object you want to add to collection
int index;
if ( !linkedHashSet.add(o) )
index = Arrays.asList(linkedHashSet.toArray()).indexOf(o);
again, O(n) complexity of acquiring index.
Both would incur heavy runtime penalty (the second solution is obviously worse with respect to efficiency, as it creates an array every time you seek the index; creating a parallel array mirroring the set would be better there). All in all, I see a broken abstraction in your example. You say
I need to use that index somewhere else
... if that's really the case, using Set is 99% of the time wrong by itself.
You can, on the other hand, use a Map (HashMap for example), containing an [index,Object] (or [Object,index], depending on the exact use case) pairs in it. It'd require a bit of refactoring, but it's IMO a preferred way to do this. It'd give you the same order of complexity for most operations as LinkedHashSet, but you'd get O(1) for getting index essentially for free (Java's HashSet uses HashMap internally anyway, so you're not losing any memory by replacing HashSet with HashMap).
Even better way would be to use a class explicitly handling integer maps - see HashMap and int as key for more information; tl;dr - http://trove.starlight-systems.com/ has TIntObjectHashMap & TObjectIntHashMap , giving you possibly the best speed for such operations possible.

Retrieve Least Element, Elements are Dynamically-Ordered

I have collection of elements from which I need to retrieve the least/minimum element.
Normally I would use a PriorityQueue as they are designed specifically for this purpose, and offer O(log(n)) time for dequeing methods.
However, the elements in my array have a dynamic order, ie there natural order changes unpredictably over time. I assume PriorityQueue and other such Sorted collections sort an element when inserted, and then leave it. If this is so PriorityQueue wouldn't work for dynamically-ordered elements. Am I correct in my assumption? Or would PriorityQueue still be appropriate in this situation?
If I can't use PriorityQueue, Collections.min would be my next instinct. However this iterates over the entire collection, which presumably gives O(n) time. Is this the next best solution?
What is the best collection/method to use to retrieve the least element from a collection, given that the natural order of the elements may change unpredictably over time?
Edit:
The order of several elements changes per retrieval operation
Edit 2:
The compare algorithm remains constant, however the values of the fields which it assesses vary unpredictably between retrievals.
I think if the change is truly "unpredictable" you may be stuck with Collections.min(). However, maybe for some other collections like PriorityQueue you could try, before calling for the min.
Add something that you KNOW is the min.
Remove that
Then ask again for the "real" min and hope that your little kludge resorted things...
Alternatively, do you know if the order has changed over time? e.g. some OrderChangedEvent can be fired? If so, recreate the sorted whatever as needed.
A possible way to do this would be to extend PriorityQueue that contains a list as one of the fields. This list will store the java.lang.Object.hashCode() of each object. Whenever an add, peek, poll, offer, etc. is called on the PriorityQueue, the queue will check the hash codes of each element and make see if any element changed. If they have, it will re-order the elements that have changed. Then, it will replace the hashcodes of the changed elements in the list. I don't know how fast this will be, but I suspect it will be faster than O(n).
Without any further assumption on the operations you are going to do, you can't achieve better performance than with a PriorityQueue or another O(log(n))-insert collection (TreeSet , for example, but you lose the O(1)-peek).
As you correctly assumed Collections.min(Collection, Comparator) is a linear operation.
But it depends on how often you need to change the ordering: for example if you only need to change it once in a while and still keep a "standard" ordering, min() is a viable option, but if you need to switch ordering completely then you will probably be better off with reordering the queue/set (that is, traversing and adding all the elements in a new one), tough at a O(nlog(n)) cost. Using Collections.sort(List, Comparator) may be effective if you need a lot of reordering compared to inserts, but requires you to use a List.
Of course if you can make somewhat strong assumptions on the types of sorting you will need (for example, if it can be restricted to a part of the data) you could write your own collection.
Edit:
So you have a (more or less) finite number of orderings (never mind that it's the same type of comparison over different fields, it's different Comparators and that's what matters)? If that's the case, you can probably achieve best performance by using m queues that reference the same objects, each using a different comparator (the simplest method, really). This way you have:
constant time access
O(m*logn(n)) inserts (to insert in every queue)
O(m*n) removals (to remove from every queue)
no ordering costs (as it's handled by the inserts)
slightly larger memory cost (probably negligible)
additional O(n*log(n)) cost the first time a particolar ordering is requested
Supposing a value of m orders of magnitude smaller than n, this is comparable to optimal (single-ordering PriorityQueue) performance. For convenience, you can wrap this into a custom collection that takes a Comparator parameter on retrieval operations, and use it as a key for an HashMap of all the PriorityQueues.
Edit #2:
In that case, there is no better solution than running min() on every retrieval (unless you can make assumptions on the changes of the data); this also means that it's better to just use an ArrayList as the collection, since it has basically the lowest possible cost on every operation and you will not benefit from PriorityQueue's natural ordering anyway. You will end up with linear cost on retrieval (for min) and constant on insertion and deletion: this is optimal as there is no sorting algorithm that has less than Ω(n) and Θ(nlog n) anyway.
As a side note, ordered collections work on the assumption that values will not change after insertion; this is because there is no cost-effective way to monitor the changes nor to reorder them "in place".
Can't you use a java TreeSet which keeps the collection sorted at all times. You need to implement the Comparable interface on your objects to do so. Checkout http://docs.oracle.com/javase/1.4.2/docs/api/java/util/TreeSet.html

java constantly sorted list with quick retrieval

I'm looking for a constantly sorted list in java, which can also be used to retrieve an object very quickly. PriorityQueue works great for the "constantly sorted" requirement, and HashMap works great for the fast retrieval by key, but I need both in the same list. At one point I had wrote my own, but it does not implement the collections interfaces (so can't be used as a drop-in replacement for a java.util.List etc), and I'd rather stick to standard java classes if possible.
Is there such a list out there? Right now I'm using 2 lists, a priority queue and a hashmap, both contain the same objects. I use the priority queue to traverse the first part of the list in sorted order, the hashmap for fast retrieval by key (I need to do both operations interchangeably), but I'm hoping for a more elegant solution...
Edit: I should add that I need to have the list sorted by a different comparator then what is used for retrieval by key; the list is sorted by a long value, the key retrieval is a String.
Since you're already using HashMap, that implies that you have unique keys. Assuming that you want to order by those keys, TreeMap is your answer.
It sounds like what you're talking about is a collection with an automatically-maintained index.
Try looking at GlazedLists which use "list pipelines" to efficiently propagate changes -- their SortedList class should do the job.
edit: missed your retrieval-by-key requirement. That can be accomplished with GlazedLists.syncEventListToMap and GlazedLists.syncEventListToMultimap -- syncEventListToMap works if there are no duplicate keys, and syncEventListToMultimap works if there are duplicate keys. The nice part about this approach is that you can create multiple maps based on different indices.
If you want to use TreeMaps for indices -- which may give you better performance -- you need to keep your TreeMaps privately encapsulated within a custom class of your choosing, that exposes the interfaces/methods you want, and create accessors/mutators for that class to keep the indices in sync with the collection. Be sure to deal with concurrency issues (via synchronized methods or locks or whatever) if you access the collection from multiple threads.
edit: finally, if fast traversal of the items in sorted order is important, consider using ConcurrentSkipListMap instead of TreeMap -- not for its concurrency, but for its fast traversal. Skip lists are linked lists with multiple levels of linkage, one that traverses all items, the next that traverses every K items on average (for a given constant K), the next that traverses every K2 items on average, etc.
TreeMap
http://download.oracle.com/javase/6/docs/api/java/util/TreeMap.html
Go with a TreeSet.
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
I haven't tested this so I might be wrong, so consider this just an attempt.
Use TreeMap, wrap the key of this map as an object which has two attributes (the string which you use as the key in hashmap and the long which you use to maintain the sort order in PriorityQueue). Now for this object, override the equals and hashcode method using the string. Implement the comparable interface using the long.
Why don't you encapsulate your solution to a class that implements Collection or Map?
This way you could simply delegate the retrieval methods to the faster/better suiting collection. Just make sure that calls to write-methods (add/remove/put) will be forwarded to both collections. Remember indirect accesses, like iterator.remove(). Most of these methods are optional to implement, but you have to deactivate them (Collections.unmodifiableXXX will help here in most cases).

modifying a ConcurrentHashMap and Synchronized ArrayList in same method

I have a collection of objects that is modified by one thread and read by another (more specifically the EDT). I needed a solution that gave me fast look up and also fast indexing (by order inserted), so I'm using a ConcurrentHashMap with an accompanying ArrayList of the keys, so if want to index an entry, I can index the List for the key and then use the returned key to get the value from the hash map. So I have a wrapper class that makes sure when and entry is added, the mapping is added to the hash map and the key is added to the list at the same time, similarly for removal.
I'm posting an example of the code in question:
private List<K> keys = Collections.synchronizedList(new ArrayList<K>(INITIAL_CAPACITY));
private ConcurrentMap<K, T> entries = new ConcurrentHashMap<K, T>(INITIAL_CAPACITY, .75f);
public synchronized T getEntryAt(int index){
return entries.get(keys.get(index));
}
**public synchronized void addOrReplaceEntry(K key, T value){
T result = entries.get(key);
if(result == null){
entries.putIfAbsent(key, value);
keys.add(key);
}
else{
entries.replace(key, result);
}
}**
public syncrhonized T removeEntry(K key, T value){
keys.remove(key);
entries.remove(key, value);
}
public synchronized int getSize(){
return keys.size();
}
my question is: am I losing all the benefits of using the ConcurrentHashMap (over syncrhonized hashmap) by operating on it in synchronized methods? I have to synchronize the methods to safely modify/read from the ArrayList of keys (CopyOnWriteArrayList is not an option because a lot of modification happens...) Also, if you know of a better way to do this, that would be appreciated...
Yes, using a Concurrent collection and a Synchronized collection in only synchronized blocks is a waste. You wont get the benefits of ConcurrentHashMap because only one thread will be accesing it at a time.
You could have a look at this implementation of a concurrent linked hashmap, I havnt use it so can't attest to it's features.
One thing to consider would be to switching from synchronized blocks to a ReadWriteLock to improve concurrent read only performance.
I'm not really sure of the utility of proving a remove at index method, perhaps you could give some more details about the problem you are trying to solve?
It seems that you only care about finding values by index. If so, dump the Map and just use a List. Why do you need the Map?
Mixing synchronized and concurrent collections the way you have done it is not recommended. Any reason why you are maintaining two copies of the stuff you are interested in? You can easily get a list of all the keys from the map anytime rather than maintaining a separate list.
Why not store the values in the list and in the map the key -> index mapping?
so for getEntry you only need on lookup (in the list which should be anyway faster than a map) and for remove you do not have to travers the whole list. Syhnronization happens so.
You can get all access to the List keys onto the event queue using EventQueue.invokeLater. This will get rid of the synchronization. With all the synching you were not running much in parallel anyway. Also it means the getSize method will give the same answer for the duration of an event.
If you stick with synchronization instead of using invokeLater, at least get the entries hash table out of the synch block. Either way, you get more parallel processing. Of course, entries can now become out-of-synch with keys. The only down side is sometimes a key will come up with a null entry. With such a dynamic table this is unlikely to matter much.
Using the suggestion made by chrisichris to put the values in the list will solve this problem if it is one. In fact, this puts a nice wall between keys and entries; they are now used in completely separate ways. (If your only need for entries is to provide values to the JTable, you can get rid of it.) But entries (if still needed) should reference the entries, not contain an index; maintaining indexes there would be a hopeless task. And always remember that keys and entries are snapshots of "reality" (for lack of a better word) taken at different times.

Implementing a concurrent LinkedHashMap

I'm trying to create a concurrent LinkedHashMap for a multithreaded architecture.
If I use Collections#synchronizedMap(), I would have to use synchronized blocks for iteration. This implementation would lead to sequential addition of elements.
If I use ConcurrentSkipListMap is there any way to implement a Comparator to store sequentially, as stored in Linked List or queue.
I would like to use java's built in instead of third party packages.
EDIT:
In this concurrent LinkedHashMap, if the keys are the name, I wish to put the keys in sequence of their arrival. i.e. new value would be appended to either at start or end, but sequentially.
While iterating, the LinkedHashMap could be added with new entries, or removed. but the iteration should be the sequence in which the entries were added.
I understand that by using Collections#synchronizedMap(), an synchronized block for iteration would have to be implemented, but would the map be modifiable (entries could be added/removed) while it is being iterated.
If you use synchronizedMap, you don't have to synchronize externally, except for iteration. If you need to preserve the ordering of the map, you should use a SortedMap. You could use ConcurrentSkipListMap, which is thread-safe, or another SortedMap in combination with synchronizedSortedMap.
A LinkedHashMap has a doubly linked list running through a hashtable. A FIFO only mutates the links on a write (insertion or removal). This makes implementing a version fairly straightforward.
Write a LHM with only insertion order allowed.
Switch to a ConcurrentHashMap as the hashtable.
Protect #put() / #putIfAbsent() / #remove() with a lock.
Make the "next" field volatile.
On iteration, no lock is needed as you can safely follow the "next" field. Reads can be lock-free by just delegating to the CHM on a #get().
Use Collections#synchronizedMap().
As per my belief, if I use Collections.synchronizedMap(), I would have to use synchronized blocks for getter/setter.
This is not true. You only need to synchronize the iteration on any of the views (keyset, values, entryset). Also see the abovelinked API documentation.
Until now, my project used LRUMap from Apache Collections but it is based on SequencedHashMap. Collections proposes ListOrderedMap but none are thread-safe.
I have switched to MapMaker from Google Guava. You can look at CacheBuilder too.
Um, simple answer would be to use a monotonically increasing key provider that your Comparator operates on. Think AtomicInteger, and every time you insert, you create a new key to be used for comparisons. If you pool your real key, you can make an internal map of OrderedKey<MyRealKeyType>.
class OrderedKey<T> implements Comparable<OrderedKey<T>> {
T realKey;
int index;
OrderedKey(AtomicInteger source, T key) {
index = source.getAndIncrement();
realKey = key;
}
public int compareTo(OrderedKey<T> other) {
if (Objects.equals(realKey, other.realKey)) {
return 0;
}
return index - other.index;
}
}
This would obviate the need for a custom comparator, and give you a nice O(1) method to compute size (unless you allow removes, in which case, count those as well, so you can just subtract "all successful removes" from "all successful adds", where successful means an entry was actually created or removed).

Categories

Resources