I have a PriorityQueue that has an element with a priority. Now I want to add the same element again with a different priority and keep only the one with higher priority. I thought of checking the new element against the already present one and then deciding whether to keep the old one or replace, but I can't find a way to compare my new element against an arbitrary element from the PriorityQueue.
a PriorityQueue was not meant to access an arbitrary element in it, it is designed to allow fast access to the head alone. If you need to do this operation frequently, probably a java.util.TreeSet will be a better data structure.
However, you can access any element by iterating PriorityQueue [using an Iterator] and breaking when you find your match. You cannot get performance better then O(n) for getting an arbitrary element in any case for a PriorityQueue, because it was not designed to do it.
Related
I am currently studying about Algorithms & Data Structures and while I was reading over the Book of Algorithms 4th edition, I discovered the Bag data-structure together with the Stack and Queue.
After reading the the explanation of it, it is still unclear to me why would I prefer using a Bag (which has no remove() method) over other data-structures such as Stack, Queue, LinkedList or a Set?
As far as I can understand from the Book, the implementation of a Bag, is the same as for a Stack, just replacing the name of push() to add() and remove the pop() method.
So the idea of a Bag is basically having the ability to collect items and then iterate through the collected items, check if a bag is empty and find the number of items in it.
But under which circumstances I would better using a Bag over one of the mentioned above Collections? And why a Bag doesn't have a remove() method basically? is there a specific reason for it?
Thanks in advance.
Stack is ADT of the collection of elements with specific remove order = LIFO (last-in-first-out), allows duplicates,
Queue is ADT of the collection of elements with specific remove order = FIFO (first-in-first-out), allows duplicates,
LinkedList is implementation of the list,
Set is ADT of the collection of elements which disallows duplicates,
Bag is ADT of the collection of elements which allows duplicates.
In general, anything that holds an elements is Collection.
Any collection which allows duplicates is Bag, otherwise it is Set.
Any bag which access elements via index is List.
Bag which appends new element after the last one and has a method to remove element from the head (first index) is Queue.
Bag which appends new element after the last one and has a method to remove element from the tail (last index) is Stack.
Example: In Java, LinkedList is a collection, bag, list, queue and also you can work with it as it was a stack since it support stack operations (add~addLast~push, peekLast, removeLast~pop), so you can call it also stack. The reason, why it does not implement Stack interface is, that peek method is reserved by Queue implementation which retrieves the head of the list (first element). Therefore in case of LinkedList, the "stack methods" are derived from Deque.
Whether Bag contains remove(Object) or not may depend on the implementation e. g. you can implement your own Bag type which supports this operation. Also you can implement get(int) operation to access object on specified index. Time complexity of the get(int) would depend on your implementation e. g. one can implement Bag via linked-list so the complexity would be at average O(n/2), other one via resizable array (array-list) with direct access to the element via index, so the complexity would be O(1).
But the main idea of the Bag is, that it allows duplicates and iteration through this collection. Whether it supports another useful operations depends on implementator's design decision.
Which one of the collection type to use dependes on your needs, if duplicates are not desired, you would use Set instead of Bag. Moreover, if you care about remove order you would pick Stack or Queue which are basically Bags with specific remove order. You can think of Bag as super-type of the Stack and Queue which extends its api by specific operations.
Most of the time, you just need to collect objects and process them in some way (iteration + element processing). So you will use the most simple Bag implementation which is one directional linked-list.
Bag is an unordered collection of values that may have duplicates. When comparing a stack to a bag, the first difference is that for stacks,
order matters.
Bag only supports the add and iterate operations. You cannot remove items from a bag-it’s possible to remove elements from a stack.-. After checking if the container is actually empty, clients can iterate through its elements; since the actual order is unspecified by definition, clients must not rely on it.
Bags are useful when you need to collect objects and process them as a whole set rather than individually. For example, you could collect samples and then, later, compute statistics on them, such as average or standard deviation—the order is
irrelevant in that case.
in terms of priority queues, a bag is a Priority queue for which element
removal (top()-Returns and extracts the element with the highest priority. ) is disabled. Priority Queue api has, top, peek,insert,remove and update methods. it’s possible to peek one element at a time, and the
priority of each element is given by a random number from a uniform distribution. Priorities also change at every iteration.
I have an algorithm where I pass through nodes in a graph in a certain way, occasionally passing through the same node several times, and I need to form a list of the nodes passed, such that a node appears once for the last time I passed it.
For instance, if I passed through nodes A -> B -> C -> A -> C, the list I need in the end is [B, A, C].
What I wanted to do was to use a LinkedList, such that every node in the graph will contain a reference to its node in the LinkedList. Then, every time I pass through a node, I will remove its corresponding node from the LinkedList and insert it again into the end of the LinkedList, and the complexity of the operation will only be O(1).
However, when I began implementing this, I ran into a problem: apparently, the java class LinkedList does not allow me to see its actual list nodes. Using the regular remove functions of LinkedList to remove the list node containing a given graph node will be O(n) instead O(1), negating the whole point of using a LinkedList to begin with.
Naturally, I can implement LinkedList myself, but I would rather avoid that - it seems to me that if I have to implement LinkedList in java, I'm doing something wrong.
So, is there a way to solve this problem without implementing LinkedList on my own? Is there something that I'm missing?
As it seems, you are expecting a built-in approach, i don't think there is any Collection which provides such functionality. You will have to implement it on your own as #MartijinCourteaux suggested. Or:
use Sorted Set collection like TreeSet<E> with supporting cost of O(log n) for operations: add, remove and contains.
LinkedHashSet<E> But beware unlike HashSet<E>, LinkedHashSet can have O(1) expected performance for operations: add, contains, remove but the performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list. But we can use it without incurring the increased cost associated with TreeSet. However, insertion order is not affected if an element is re-inserted into the set so try removing the first insertion of an element before re-inserting it.
LinkedHashMap keeps order of entered values and allows remove node by its key and then put back to the end. I think that it is all you need.
Unless your linked list is large just using a regular array list will give fast performance even with the shuffling. You should also consider using hash sets, if order is not important, linked hash set if the order of insert matters, or tree set if you want it sorted. They don't allow duplicate values but have good O performance for insert, delete and contains.
I was wondering which Java collection types are traversed fastest. Collections I am most interested in are...
array
LinkedList
Queue
PriorityLinkedList
HashMap
Actually among concrete classes of Collection interface , traversing will be fast through array. Its because as you know it traverse with the index of the element.Since it follows the index pattern so,traversing through index it makes our traversing fast. Why not others? Let me explain one by one..
1.LinkedList : LinkedList follows the insertion order.If you traverse the data and searching for elements,for every element it will search from beginning. So traversing becomes slow.
2.Queue : LinkedList and PriorityQueue are two concrete classes of Queue. The elements of the priority queue are ordered according to their natural ordering, or by a Comparator provided at queue construction time, depending on which constructor is used.It's not guaranteed to traverse the elements of the priority queue in any particular order.If you need ordered traversal, consider using Arrays.sort(pq.toArray()). So it becomes useless for traversing provided if you traverse without sorting it explicitly.
3.HashMap: If you use Map instead of Collection , traversing is not guaranteed here because it works on hashcode of the key element. So here again traversing becomes useless. You can directly search the element by providing key-value of the element.
4.PriorityLinkedList: This class does not exist in Java APIs.
I have collection of elements from which I need to retrieve the least/minimum element.
Normally I would use a PriorityQueue as they are designed specifically for this purpose, and offer O(log(n)) time for dequeing methods.
However, the elements in my array have a dynamic order, ie there natural order changes unpredictably over time. I assume PriorityQueue and other such Sorted collections sort an element when inserted, and then leave it. If this is so PriorityQueue wouldn't work for dynamically-ordered elements. Am I correct in my assumption? Or would PriorityQueue still be appropriate in this situation?
If I can't use PriorityQueue, Collections.min would be my next instinct. However this iterates over the entire collection, which presumably gives O(n) time. Is this the next best solution?
What is the best collection/method to use to retrieve the least element from a collection, given that the natural order of the elements may change unpredictably over time?
Edit:
The order of several elements changes per retrieval operation
Edit 2:
The compare algorithm remains constant, however the values of the fields which it assesses vary unpredictably between retrievals.
I think if the change is truly "unpredictable" you may be stuck with Collections.min(). However, maybe for some other collections like PriorityQueue you could try, before calling for the min.
Add something that you KNOW is the min.
Remove that
Then ask again for the "real" min and hope that your little kludge resorted things...
Alternatively, do you know if the order has changed over time? e.g. some OrderChangedEvent can be fired? If so, recreate the sorted whatever as needed.
A possible way to do this would be to extend PriorityQueue that contains a list as one of the fields. This list will store the java.lang.Object.hashCode() of each object. Whenever an add, peek, poll, offer, etc. is called on the PriorityQueue, the queue will check the hash codes of each element and make see if any element changed. If they have, it will re-order the elements that have changed. Then, it will replace the hashcodes of the changed elements in the list. I don't know how fast this will be, but I suspect it will be faster than O(n).
Without any further assumption on the operations you are going to do, you can't achieve better performance than with a PriorityQueue or another O(log(n))-insert collection (TreeSet , for example, but you lose the O(1)-peek).
As you correctly assumed Collections.min(Collection, Comparator) is a linear operation.
But it depends on how often you need to change the ordering: for example if you only need to change it once in a while and still keep a "standard" ordering, min() is a viable option, but if you need to switch ordering completely then you will probably be better off with reordering the queue/set (that is, traversing and adding all the elements in a new one), tough at a O(nlog(n)) cost. Using Collections.sort(List, Comparator) may be effective if you need a lot of reordering compared to inserts, but requires you to use a List.
Of course if you can make somewhat strong assumptions on the types of sorting you will need (for example, if it can be restricted to a part of the data) you could write your own collection.
Edit:
So you have a (more or less) finite number of orderings (never mind that it's the same type of comparison over different fields, it's different Comparators and that's what matters)? If that's the case, you can probably achieve best performance by using m queues that reference the same objects, each using a different comparator (the simplest method, really). This way you have:
constant time access
O(m*logn(n)) inserts (to insert in every queue)
O(m*n) removals (to remove from every queue)
no ordering costs (as it's handled by the inserts)
slightly larger memory cost (probably negligible)
additional O(n*log(n)) cost the first time a particolar ordering is requested
Supposing a value of m orders of magnitude smaller than n, this is comparable to optimal (single-ordering PriorityQueue) performance. For convenience, you can wrap this into a custom collection that takes a Comparator parameter on retrieval operations, and use it as a key for an HashMap of all the PriorityQueues.
Edit #2:
In that case, there is no better solution than running min() on every retrieval (unless you can make assumptions on the changes of the data); this also means that it's better to just use an ArrayList as the collection, since it has basically the lowest possible cost on every operation and you will not benefit from PriorityQueue's natural ordering anyway. You will end up with linear cost on retrieval (for min) and constant on insertion and deletion: this is optimal as there is no sorting algorithm that has less than Ω(n) and Θ(nlog n) anyway.
As a side note, ordered collections work on the assumption that values will not change after insertion; this is because there is no cost-effective way to monitor the changes nor to reorder them "in place".
Can't you use a java TreeSet which keeps the collection sorted at all times. You need to implement the Comparable interface on your objects to do so. Checkout http://docs.oracle.com/javase/1.4.2/docs/api/java/util/TreeSet.html
I know you can find the first and last elements in a treeset. What if I wanted to know what the second or third element was without iterating? Or, more preferable, given an element, figure out it's rank in the treeset.
Thanks
EDIT: I think you can do it using tailset, ie. compare the size of the original set with the size of the tailset. How efficient is tailset?
TreeSet does not provide an efficient rank method. I suspect (you can confirm by looking at its source) that TreeSet does not even maintain any extra bits of information (i.e. counts of elements on the left and right subtrees of each node) that one would need to perform such queries in O(log(n)) time. So there does not appear to be any fast method of finding the rank of an element of TreeSet.
If you really really need it, you can either implement your own SortedSet with a balanced binary search tree which allows such queries or modify the TreeSet implementation to create a new implementation which is augmented to allow such queries. Refer to the chapter on augmenting data structures in CLRS for more details about how this can actually be done.
According to the source of the Sun JDK6 implementation, tailSet(E).size() iterates over the tail set and counts the elements, so this call is O(tail set size).
There is no other way than Iterator.
Edited:
Try this:
treeSet.higher(treeSet.first());
This should give second element on TreeSet. I'm not sure if this is more optimized then just using Iterator.