java constantly sorted list with quick retrieval - java

I'm looking for a constantly sorted list in java, which can also be used to retrieve an object very quickly. PriorityQueue works great for the "constantly sorted" requirement, and HashMap works great for the fast retrieval by key, but I need both in the same list. At one point I had wrote my own, but it does not implement the collections interfaces (so can't be used as a drop-in replacement for a java.util.List etc), and I'd rather stick to standard java classes if possible.
Is there such a list out there? Right now I'm using 2 lists, a priority queue and a hashmap, both contain the same objects. I use the priority queue to traverse the first part of the list in sorted order, the hashmap for fast retrieval by key (I need to do both operations interchangeably), but I'm hoping for a more elegant solution...
Edit: I should add that I need to have the list sorted by a different comparator then what is used for retrieval by key; the list is sorted by a long value, the key retrieval is a String.

Since you're already using HashMap, that implies that you have unique keys. Assuming that you want to order by those keys, TreeMap is your answer.

It sounds like what you're talking about is a collection with an automatically-maintained index.
Try looking at GlazedLists which use "list pipelines" to efficiently propagate changes -- their SortedList class should do the job.
edit: missed your retrieval-by-key requirement. That can be accomplished with GlazedLists.syncEventListToMap and GlazedLists.syncEventListToMultimap -- syncEventListToMap works if there are no duplicate keys, and syncEventListToMultimap works if there are duplicate keys. The nice part about this approach is that you can create multiple maps based on different indices.
If you want to use TreeMaps for indices -- which may give you better performance -- you need to keep your TreeMaps privately encapsulated within a custom class of your choosing, that exposes the interfaces/methods you want, and create accessors/mutators for that class to keep the indices in sync with the collection. Be sure to deal with concurrency issues (via synchronized methods or locks or whatever) if you access the collection from multiple threads.
edit: finally, if fast traversal of the items in sorted order is important, consider using ConcurrentSkipListMap instead of TreeMap -- not for its concurrency, but for its fast traversal. Skip lists are linked lists with multiple levels of linkage, one that traverses all items, the next that traverses every K items on average (for a given constant K), the next that traverses every K2 items on average, etc.

TreeMap
http://download.oracle.com/javase/6/docs/api/java/util/TreeMap.html

Go with a TreeSet.
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

I haven't tested this so I might be wrong, so consider this just an attempt.
Use TreeMap, wrap the key of this map as an object which has two attributes (the string which you use as the key in hashmap and the long which you use to maintain the sort order in PriorityQueue). Now for this object, override the equals and hashcode method using the string. Implement the comparable interface using the long.

Why don't you encapsulate your solution to a class that implements Collection or Map?
This way you could simply delegate the retrieval methods to the faster/better suiting collection. Just make sure that calls to write-methods (add/remove/put) will be forwarded to both collections. Remember indirect accesses, like iterator.remove(). Most of these methods are optional to implement, but you have to deactivate them (Collections.unmodifiableXXX will help here in most cases).

Related

Why LinkedHashSet does not support index based operation as it maintain order of it's element [duplicate]

From Javadoc:
Hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries.
If it is so, then why doesn't it provide object access like List in java,
list.get(index);
UPDATE
I had implemented LRU Cache using LinkedHashMap. My algorithm required me to access LRU Object from the cache. That's why I required random access, but I think that will cost me bad performance, so I have changed the logic and I am accessing the LRU object just when Cache is full...using removeEldestEntry()
Thank you all...
a) Because the entries are linked, not randomly accessible. The performance would be miserable, O(N) if I'm not in error.
b) Because there is no interface to back up this functionality. So the choice would be to either introduce a dedicated interface just for this (badly performing) Implementation or require clients to program against implementation classes instead of interfaces
Btw with Guava there's a simple solution for you:
Iterables.get(map.values(), offset);
And for caching look at Guava's MapMaker and it's expiration features.
Since values() provides a backing collection of the values, you can solve it like this:
map.values().remove(map.values().toArray()[index]);
Perhaps not very efficient (especially memory-wise), but it should be O(N) just as you would expect it to be.
Btw, I think the question is legitimate for all List operations. (It shouldn't be slower than LinkedList anyway, right?)
I set out to do a LinkedHashMapList which extended the LinkedHashMap and implemented the List interface. Surprisingly it seems impossible to do, due to the clash for remove. The existing remove method returns the previously mapped object, while the List.remove should return a boolean.
That's just a reflection, and honestly, I also find it annoying that the LinkedHashMap can't be treated more like a LinkedList.
It provides an Iterator interface, each node in the list is linked to the one before it and after it. Having a get(i) method would be no different than iterating over all the elements in the list since there is no backing array (same as LinkedList).
If you require this ability which isn't very performant I suggest extending the map yourself
If you want random access you can do
Map<K,V> map = new LinkedHashMap<K,V>();
Map.Entry<K,V>[] entries = (Map.Entry<K,V>[]) map.toArray(new Map.Entry[map.size()]);
Map.Entry<K,V> entry_n = entry[n];
As you can see the performance is likely to be very poor unless you cache the entries array.
I would question the need for it however.
There is no real problem to make a Map with log(N) efficiency of access by index. If you use a red-black tree and store for each node the number of elements in the tree starting at that node it is possible to write a get(int index) method that is log(N).

HashMaps and Lists reconciliation in Java

My issue is that I need a HashMap which returns a reference to an internal LinkedList when hashMap.get(key) is called— not simply return the value that corresponds to the key.
From what I've gathered, a LinkedHashMap enables a doubly-linked list to occupy each map entry for collision handling. However, I want to be able to get a reference to the overarching LinkedList that encapsulates all values mapped into it (each object that shares a LinkedList also share a particular feature I'm very interested in due to my overridden hash code function).
Put differently, I aim to avoid the linked list auto-traversal built into the LinkedHashMap class and just want the reference of the list itself to operationalize.
I want this reference to be returned in addition to having the capacity to add new values to the end of the LinkedLists with linkedHashMap.put(key, value) invocation.
Any pointers would be appreciated.
LinkedHashMap just stores its keys in a defined order (A LinkedList backs the KeySet). It isn't anything about how it handles collisions.
For what you've described, I think you'll have to implement things yourself. You're basically making a Map<KeyType, List<EntryType>>, with a "put" function that appends to the associated list. It isn't too much code.
I probably wouldn't make it actually extend Map, though, because what you've described doesn't really match the interface for that.

Retrieve Least Element, Elements are Dynamically-Ordered

I have collection of elements from which I need to retrieve the least/minimum element.
Normally I would use a PriorityQueue as they are designed specifically for this purpose, and offer O(log(n)) time for dequeing methods.
However, the elements in my array have a dynamic order, ie there natural order changes unpredictably over time. I assume PriorityQueue and other such Sorted collections sort an element when inserted, and then leave it. If this is so PriorityQueue wouldn't work for dynamically-ordered elements. Am I correct in my assumption? Or would PriorityQueue still be appropriate in this situation?
If I can't use PriorityQueue, Collections.min would be my next instinct. However this iterates over the entire collection, which presumably gives O(n) time. Is this the next best solution?
What is the best collection/method to use to retrieve the least element from a collection, given that the natural order of the elements may change unpredictably over time?
Edit:
The order of several elements changes per retrieval operation
Edit 2:
The compare algorithm remains constant, however the values of the fields which it assesses vary unpredictably between retrievals.
I think if the change is truly "unpredictable" you may be stuck with Collections.min(). However, maybe for some other collections like PriorityQueue you could try, before calling for the min.
Add something that you KNOW is the min.
Remove that
Then ask again for the "real" min and hope that your little kludge resorted things...
Alternatively, do you know if the order has changed over time? e.g. some OrderChangedEvent can be fired? If so, recreate the sorted whatever as needed.
A possible way to do this would be to extend PriorityQueue that contains a list as one of the fields. This list will store the java.lang.Object.hashCode() of each object. Whenever an add, peek, poll, offer, etc. is called on the PriorityQueue, the queue will check the hash codes of each element and make see if any element changed. If they have, it will re-order the elements that have changed. Then, it will replace the hashcodes of the changed elements in the list. I don't know how fast this will be, but I suspect it will be faster than O(n).
Without any further assumption on the operations you are going to do, you can't achieve better performance than with a PriorityQueue or another O(log(n))-insert collection (TreeSet , for example, but you lose the O(1)-peek).
As you correctly assumed Collections.min(Collection, Comparator) is a linear operation.
But it depends on how often you need to change the ordering: for example if you only need to change it once in a while and still keep a "standard" ordering, min() is a viable option, but if you need to switch ordering completely then you will probably be better off with reordering the queue/set (that is, traversing and adding all the elements in a new one), tough at a O(nlog(n)) cost. Using Collections.sort(List, Comparator) may be effective if you need a lot of reordering compared to inserts, but requires you to use a List.
Of course if you can make somewhat strong assumptions on the types of sorting you will need (for example, if it can be restricted to a part of the data) you could write your own collection.
Edit:
So you have a (more or less) finite number of orderings (never mind that it's the same type of comparison over different fields, it's different Comparators and that's what matters)? If that's the case, you can probably achieve best performance by using m queues that reference the same objects, each using a different comparator (the simplest method, really). This way you have:
constant time access
O(m*logn(n)) inserts (to insert in every queue)
O(m*n) removals (to remove from every queue)
no ordering costs (as it's handled by the inserts)
slightly larger memory cost (probably negligible)
additional O(n*log(n)) cost the first time a particolar ordering is requested
Supposing a value of m orders of magnitude smaller than n, this is comparable to optimal (single-ordering PriorityQueue) performance. For convenience, you can wrap this into a custom collection that takes a Comparator parameter on retrieval operations, and use it as a key for an HashMap of all the PriorityQueues.
Edit #2:
In that case, there is no better solution than running min() on every retrieval (unless you can make assumptions on the changes of the data); this also means that it's better to just use an ArrayList as the collection, since it has basically the lowest possible cost on every operation and you will not benefit from PriorityQueue's natural ordering anyway. You will end up with linear cost on retrieval (for min) and constant on insertion and deletion: this is optimal as there is no sorting algorithm that has less than Ω(n) and Θ(nlog n) anyway.
As a side note, ordered collections work on the assumption that values will not change after insertion; this is because there is no cost-effective way to monitor the changes nor to reorder them "in place".
Can't you use a java TreeSet which keeps the collection sorted at all times. You need to implement the Comparable interface on your objects to do so. Checkout http://docs.oracle.com/javase/1.4.2/docs/api/java/util/TreeSet.html

Why HashSet internally implemented as HashMap [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why does HashSet implementation in Sun Java use HashMap as its backing?
I know what a hashset and hashmap is - pretty well versed with them.
There is 1 thing which really puzzled me.
Example:
Set <String> testing= new HashSet <String>();
Now if you debug it using eclipse right after the above statements, under debugger variables tab, you will noticed that the set 'testing' internally is implemented as a hashmap.
Why does it need a hashmap since there is no key,value pair involved in sets collection
It's an implementation detail. The HashMap is actually used as the backing store for the HashSet. From the docs:
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
(emphasis mine)
The answer is right in the API docs
"This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important."
So you don't even need the debugger to know this.
In answer to your question: it is an implementation detail. It doesn't need to use a HashMap, but it is probably just good code re-use. If you think about it, in this case the only difference is that a Set has different semantics from a Map. Namely, maps have a get(key) method, and Sets do not. Sets do not allow duplicates, Maps allow duplicate values, but they must be under different keys.
It is probably really easy to use a HashMap as the backing of a HashSet, because all you would have to do would be to use hashCode (defined on all objects) on the value you are putting in the Set to determine if a dupe, i.e., it is probably just doing something like
backingHashMap.put(toInsert.hashCode(), toInsert);
to insert items into the Set.
In most cases the Set is implemented as wrapper for the keySet() of a Map. This avoids duplicate implementations. If you look at the source you will see how it does this.
You might find the method Collections.newSetFromMap() which can be used to wrap ConcurrentHashMap for example.
The very first sentence of the class's Javadoc states that it is backed by a HashMap:
This class implements the Set interface, backed by a hash table (actually a HashMap instance).
If you'll look at the source code of HashSet you'll see that what it stores in the map is as the key is the entry you are using, and the value is a mere marker Object (named PRESENT).
Why is it backed by a HashMap? Because this is the simplest way to store a set of items in a (conceptual) hashtable and there is no need for HashSet to re-invent an implementation of a hashtable data structure.
It's just a matter of convenience that the standard Java class library implements HashSet using a HashMap -- they only need to implement one data structure and then HashSet stores its data in a HashMap with the actual set objects as the key and a dummy value (typically Boolean.TRUE) as the value.
HashMap has already all the functionality that HashSet requires. There would be no sense to duplicate the same algorithms.
it allows you to easily and quickly determine whether an object is already in the set or not.

When do you know when to use a TreeSet or LinkedList?

What are the advantages of each structure?
In my program I will be performing these steps and I was wondering which data structure above I should be using:
Taking in an unsorted array and
adding them to a sorted structure1.
Traversing through sorted data and removing the right one
Adding data (never removing) and returning that structure as an array
When do you know when to use a TreeSet or LinkedList? What are the advantages of each structure?
In general, you decide on a collection type based on the structural and performance properties that you need it to have. For instance, a TreeSet is a Set, and therefore does not allow duplicates and does not preserve insertion order of elements. By contrast a LinkedList is a List and therefore does allow duplicates and does preserve insertion order. On the performance side, TreeSet gives you O(logN) insertion and deletion, whereas LinkedList gives O(1) insertion at the beginning or end, and O(N) insertion at a selected position or deletion.
The details are all spelled out in the respective class and interface javadocs, but a useful summary may be found in the Java Collections Cheatsheet.
In practice though, the choice of collection type is intimately connected to algorithm design. The two need to be done in parallel. (It is no good deciding that your algorithm requires a collection with properties X, Y and Z, and then discovering that no such collection type exists.)
In your use-case, it looks like TreeSet would be a better fit. There is no efficient way (i.e. better than O(N^2)) to sort a large LinkedList that doesn't involve turning it into some other data structure to do the sorting. There is no efficient way (i.e. better than O(N)) to insert an element into the correct position in a previously sorted LinkedList. The third part (copying to an array) works equally well with a LinkedList or TreeSet; it is an O(N) operation in both cases.
[I'm assuming that the collections are large enough that the big O complexity predicts the actual performance accurately ... ]
The genuine power and advantage of TreeSet lies in interface it realizes - NavigableSet
Why is it so powerfull and in which case?
Navigable Set interface add for example these 3 nice methods:
headSet(E toElement, boolean inclusive)
tailSet(E fromElement, boolean inclusive)
subSet(E fromElement, boolean fromInclusive, E toElement, boolean toInclusive)
These methods allow to organize effective search algorithm(very fast).
Example: we need to find all the names which start with Milla and end with Wladimir:
TreeSet<String> authors = new TreeSet<String>();
authors.add("Andreas Gryphius");
authors.add("Fjodor Michailowitsch Dostojewski");
authors.add("Alexander Puschkin");
authors.add("Ruslana Lyzhichko");
authors.add("Wladimir Klitschko");
authors.add("Andrij Schewtschenko");
authors.add("Wayne Gretzky");
authors.add("Johann Jakob Christoffel");
authors.add("Milla Jovovich");
authors.add("Taras Schewtschenko");
System.out.println(authors.subSet("Milla", "Wladimir"));
output:
[Milla Jovovich, Ruslana Lyzhichko, Taras Schewtschenko, Wayne Gretzky]
TreeSet doesn't go over all the elements, it finds first and last elemenets and returns a new Collection with all the elements in the range.
TreeSet:
TreeSet uses Red-Black tree underlying. So the set could be thought as a dynamic search tree. When you need a structure which is operated read/write frequently and also should keep order, the TreeSet is a good choice.
If you want to keep it sorted and it's append-mostly, TreeSet with a Comparator is your best bet. The JVM would have to traverse the LinkedList from the beginning to decide where to place an item. LinkedList = O(n) for any operations, TreeSet = O(log(n)) for basic stuff.
The most important point when choosing a data structure are its inherent limitations. For example if you use TreeSet to store objects and during run-time your algorithm changes attributes of these objects which affect equal comparisons while the object is an element of the set, get ready for some strange bugs.
The Java Doc for Set interface state that:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Interface Set Java Doc

Categories

Resources