Is there any chance to update a key of the least key entry (firstEntry()) in the TreeMap in one pass?
So, for example, if I pollFirstEntry(), it takes O(log n) time to retrieve and remove an entry. Afterwards, I create a new entry with desired key and put it back into TreeMap, it also takes O(log n) time. Consequently, I spend O(2 log n) time, in the case where it is logically could be just O(1+log n) = (log n) time.
I would be very happy to avoid removing the entry, but update it while it captured by firstEntry() method.
If it is not possible with TreeMap, could anybody suggest an alternative PriorityQueue like data structure, where possible to update keys of the least entry.
O(2 log N) is rightly considered O(log N). But no, cannot be done as in the entry in the map changes to another position (in the tree). The almost only data structure where this does not hold is a list of key-value pairs, and that is a terrible O(N) or as you might like O(N/2).
If the key can be used as index in a huge array, then O(1) would hold, still 2 operations.
Related
I have a TreeMap, and want to get a set of K smallest keys (entries) larger than a given value.
I know we can use higherKey(givenValue) get the one key, but then how should I iterate from there?
One possible way is to get a tailMap from the smallest key larger than the given value, but that's an overkill if K is small compared with the map size.
Is there a better way with O(logn + K) time complexity?
The error here is thinking that tailMap makes a new map. It doesn't. It gives you a lightweight object that basically just contains a field pointing back to the original map, and the pivot point, that's all it has. Any changes you make to a tailMap map will therefore also affect the underlying map, and any changes made to the underlying map will also affect your tailMap.
for (KeyType key : treeMap.tailMap(pivot).keySet()) {
}
The above IS O(logn) complexity for the tailMap operation, and give that you loop, that adds K to the lot, for a grand total of O(logn+K) time complexity, and O(1) space complexity, where N is the size of the map and K is the number of keys that end up being on the selected side of the pivot point (so, worst case, O(nlogn)).
If you want an actual copied-over map, you'd do something like:
TreeMap<KeyType> tm = new TreeMap<KeyType>(original.tailMap(pivot));
Note that this also copies over the used comparator, in case you specified a custom one. That one really is O(logn + K) in time complexity (and O(K) in space complexity). Of course, if you loop through this code, it's.. still O(logn + K) (because 2K just boils down to K when talking about big-O notation).
I have a collection, I don't know which data structure to use yet for this.
I have two functions, add and remove.
Both of the functions need to have similar complexities because they both are as frequently used.
It's either add function will be simple as O(1) and removeMax will be O(log n) or both o(1) or one of them log n and other o(n).
removeMax should remove the maximum value and return it, and should be able to use it multiple times, so the next time u call it it removes the next new max value.
Is there a way to do both with O(1) or atleast log n for remove?
If it's a sorted structure (such as TreeSet), both add and remove would require O(logN).
If it's not sorted, add can be implemented in O(1) but removeMax would take O(N), since you must check all the elements to find the maximum in an unsorted data structure.
If you need a data structure to do both add() and removeMax() in O(logn), then you just need a sorted array. For both removeMax() and add(), you can use binary search to find the target value. (for remove, you find the max value. for add, you find the biggest value smaller than the one you want to insert, and insert the value after it).Both time complexity is O(logn).
Max heaps are probably what you are looking for, their amortized complexity of remove operation is O(logn). Fibonacci heap (see this great animation to see how it works) seems like the data structure suitable for you, as it has O(1) for insert and all other operations. Sadly, it's implementation is not a part of standard Java libraries, but there's ton of implementations to be found (for instance see the answer in the comment from #Lino).
Guava's implementation of min-max heap
basically i'm looking for a best data structure in java which i can store pairs and retrieve top N number of element by the value. i'd like to do this in O(n) time where n is number of entires in the data structure.
example input would be,
<"john", 32>
<"dave", 3>
<"brian", 15>
<"jenna", 23>
<"rachael", 41>
and if N=3, i should be able to return rachael, john, jenna if i wanted descending order.
if i use some kind of hashMap, insertion is fast, but retrieving them by order gets expensive.
if i use some data structure that keeps things ordered, then insertion becomes expensive while retrieving is cheaper. i was not able to find the best data structure that can do both very well and very fast.
any input is appreciated. thanks.
[updated]
let me ask the question in other way if that make it clearer.
i know i can insert at constant time O(1) into hashMap.
now, how can i retrieve elements from sorted order by value in O(n) time where n=number of entires in the data structure? hope it makes sense.
If you want to sort, you have to give up constant O(1) time.
That is because unlike inserting an unsorted key / value pair, sorting will minimally require you to compare the new entry to something, and odds are to a number of somethings. Once you have an algorithm that will require more time with more entries (due to more comparisons) you have overshot "constant" time.
If you can do better, then by all means, do so! There is a Dijkstra prize awaiting for you, if not a Fields Medal to boot.
Don't dispair, you can still do the key part as a HashMap, and the sorting part with a Tree like implementation, that will give you O(log n). TreeMap is probably what you desire.
--- Update to match your update ---
No, you cannot iterate over a hashmap in O(n) time. To do so would assume that you had a list; but, that list would have to already be sorted. With a raw HashMap, you would have to search the entire map for the next "lower" value. Searching part of the map would not do, because the one element you didn't check would possibly be the correct value.
Now, there are some data structures that make a lot of trade offs which might get you closer. If you want to roll your own, perhaps a custom Fibonacci heap can give you an amortized performance close to what you wish, but it cannot guarantee a worst-case performance. In any case, some operations (like extract-min) will still require O(log n) performance.
I need a Map<Integer,String> with a major need to do fast retrievals of values by key. However I also have the need to retrieve List of all entries (key, value pairs) whose keys are in range (n1 to n2). However, No sorting required in the list.
The map would hold atleast 10,000 such entries.
I initially thought of using TreeMap but that doesn't help with faster retrievals(O(log n) for get() operations). Is it possible to get a list of entries from HashMap whose keys are in range n1 to n2 ?
What would be my best bet to go with ?
The two implementations of NavigableMap (which allow you to retrieve sub-maps or subsets based on key ranges) are TreeMap and ConcurrentSkipListMap, both of which offer O(log n) access time.
Assuming you require O(1) access time as per a regular HashMap, I suggest to implement your own (inefficient) "key range" methods. In other words, sacrifice the performance of the key-range operation for the improved access time you achieve with a regular HashMap. There isn't really another way around this: NavigableMap methods are inherently dependent on the data being stored in a sorted fashion which means you will never be able to achieve O(1) access time.
How close are the keys distributed? For 10000 elements, equally distributed over 20 000 possibilities like 0 to 19999, I could imagine a search for elements from 4 to 14 could be fine. You would miss at a 50% rate.
I wonder why TreeMap doesn't help with faster retrievals (O(log n) for get() operations)?
If you have Tree, with smaller values Left, and bigger ones right, you could return big parts of subtrees. Need it be Map and List?
I need a Java data structure that has:
fast (O(1)) insertion
fast removal
fast (O(1)) max() function
What's the best data structure to use?
HashMap would almost work, but using java.util.Collections.max() is at least O(n) in the size of the map. TreeMap's insertion and removal are too slow.
Any thoughts?
O(1) insertion and O(1) max() are mutually exclusive together with the fast removal point.
A O(1) insertion collection won't have O(1) max as the collection is unsorted. A O(1) max collection has to be sorted, thus the insert is O(n). You'll have to bite the bullet and choose between the two. In both cases however, the removal should be equally fast.
If you can live with slow removal, you could have a variable saving the current highest element, compare on insert with that variable, max and insert should be O(1) then. Removal will be O(n) then though, as you have to find a new highest element in the cases where the removed element was the highest.
If you can have O(log n) insertion and removal, you can have O(1) max value with a TreeSet or a PriorityQueue. O(log n) is pretty good for most applications.
If you accept that O(log n) is still "fast" even though it isn't "fast (O(1))", then some kinds of heap-based priority queue will do it. See the comparison table for different heaps you might use.
Note that Java's library PriorityQueue isn't very exciting, it only guarantees O(n) remove(Object).
For heap-based queues "remove" can be implemented as "decreaseKey" followed by "removeMin", provided that you reserve a "negative infinity" value for the purpose. And since it's the max you want, invert all mentions of "min" to "max" and "decrease" to "increase" when reading the article...
you cannot have O(1) removal+insertion+max
proof:
assume you could, let's call this data base D
given an array A:
1. insert all elements in A to D.
2. create empty linked list L
3. while D is not empty:
3.1. x<-D.max(); D.delete(x); --all is O(1) - assumption
3.2 L.insert_first(x) -- O(1)
4. return L
in here we created a sorting algorithm which is O(n), but it is proven to be impossible! sorting is known as omega(nlog(n)). contradiction! thus, D cannot exist.
I'm very skeptical that TreeMap's log(n) insertion and deletion are too slow--log(n) time is practically constant with respect to most real applications. Even with a 1,000,000,000 elements in your tree, if it's balanced well you will only perform log(2, 1000000000) = ~30 comparisons per insertion or removal, which is comparable to what any other hash function would take.
Such a data structure would be awesome and, as far as I know, doesn't exist. Others pointed this.
But you can go beyond, if you don't care making all of this a bit more complex.
If you can "waste" some memory and some programming efforts, you can use, at the same time, different data structures, combining the pro's of each one.
For example I needed a sorted data structure but wanted to have O(1) lookups ("is the element X in the collection?"), not O(log n). I combined a TreeMap with an HashMap (which is not really O(1) but it is almost when it's not too full and the hashing function is good) and I got really good results.
For your specific case, I would go for a dynamic combination between an HashMap and a custom helper data structure. I have in my mind something very complex (hash map + variable length priority queue), but I'll go for a simple example. Just keep all the stuff in the HashMap, and then use a special field (currentMax) that only contains the max element in the map. When you insert() in your combined data structure, if the element you're going to insert is > than the current max, then you do currentMax <- elementGoingToInsert (and you insert it in the HashMap).
When you remove an element from your combined data structure, you check if it is equal to the currentMax and if it is, you remove it from the map (that's normal) and you have to find the new max (in O(n)). So you do currentMax <- findMaxInCollection().
If the max doesn't change very frequently, that's damn good, believe me.
However, don't take anything for granted. You have to struggle a bit to find the best combination between different data structures. Do your tests, learn how frequently max changes. Data structures aren't easy, and you can make a difference if you really work combining them instead of finding a magic one, that doesn't exist. :)
Cheers
Here's a degenerate answer. I noted that you hadn't specified what you consider "fast" for deletion; if O(n) is fast then the following will work. Make a class that wraps a HashSet; maintain a reference to the maximum element upon insertion. This gives the two constant time operations. For deletion, if the element you deleted is the maximum, you have to iterate through the set to find the maximum of the remaining elements.
This may sound like it's a silly answer, but in some practical situations (a generalization of) this idea could actually be useful. For example, you can still maintain the five highest values in constant time upon insertion, and whenever you delete an element that happens to occur in that set you remove it from your list-of-five, turning it into a list-of-four etcetera; when you add an element that falls in that range, you can extend it back to five. If you typically add elements much more frequently than you delete them, then it may be very rare that you need to provide a maximum when your list-of-maxima is empty, and you can restore the list of five highest elements in linear time in that case.
As already explained: for the general case, no. However, if your range of values are limited, you can use a counting sort-like algorithm to get O(1) insertion, and on top of that a linked list for moving the max pointer, thus achieving O(1) max and removal.