Using multiple Queues instead of one PriorityQueue - java

currently I'm using a PriorityBlockingQueue for my producer-consumer-system but since I have only 3 different priorities I thought about using 3 different BlockingQueues instead.
This way no work has to be done when inserting elements.
Might this approach be more effective or not and why?

Sure, this way is more effective as it has O(1) insertion time vs O(log N) insertion time of a priority queue.
It follows the counting sort idea in which you count the number of each element and output them accordingly. Counting sort also uses the fact that all input elements fall into a narrow range of values.

Related

In java, when sorting, what's the solution to a specific issue to sort when I have equal values

So i have this problem right here. I want to code non-preemptive priority scheduling algorithm and my way is to sort it since you wanna get the highest priority first as the algorithm says. If ever I have priority values inside an Array. example: job1 = 2 ; job2 = 5; job3 = 2; job4 = 4.
The algorithm is that when two or more jobs with equal priority are present, the processor is allocated to the one "who arrived first". From the examples above It should be expected to be sorted this way(descending order): job2 - job4 - job1 - job3.
Since job1 and job3 is having the same priority, I want job1 to be in first before job3.
Now my problem is this. What's the solution for the sort to get the job1 first and not the job3? Or is it already in the system that I might automatically sort this out. Because I never tried anything before if job3 goes first or last.
Priority Queue Data Structure already exists in Java , you can use that. Thread safe version is - PriorityBlockingQueue
You can define your custom comparator to keep queue sorted based on priority while maintaining insertion order when priorities equalize.
Lots of examples are here - Java: How do I use a PriorityQueue?
Other comparator strategies listed here
Refer this one too
Hope it helps !!
You are talking about stable sorting, which keeps the order of equal value elements. Merge sort is stable, while quick sort is not. Collections.sort uses merge sort and should do the job in O(nlogn).
However, if time complexity is an issue and since the number of priorities is limited, radix sort should sort, generally (though not guaranteed), in O(sn), when n integer keys of size s are used.
Stable sorting is the answer to your question, assuming that jobs are stored to the front of the back of the array when they arrive. So if Job 1 = 2 is inputted, then some other jobs, then Job 3 = 2, then the array will look something like this: [Job 1, Job x, Job y, ..., Job 3]. Stable sorting, by definition, means that if two elements in the array have the same value, then the original ordering of the two elements is preserved. As myin528 states, Mergesort is stable, as are radix sort, which could be faster depending on the values of your array, or insertion sort, if your array is small.

Data structure in Java that supports quick search and remove in array with duplicates

More specifically, suppose I have an array with duplicates:
{3,2,3,4,2,2,1,4}
I want to have a data structure that supports search and remove the first occurrence of some value faster than O(n), say if the value is 4, then it becomes:
{3,2,3,2,2,1,4}
I also need to iterate the list from head according to the same order. Other operations like get(index) or insert are not needed.
You can use O(n) time to record the original data(say it's an int[]) in your data structure, I just need the later search and remove faster than O(n).
"Search and remove" is considered as ONE operation as shown above.
If I have to make it myself, I would use a LinkedList to store the data, and HashMap to map every key to a list of all occurrence of nodes together with their previous and next ones.
Is it a right approach? Are there any better choices already there in Java?
The data structure you describe, essentially a hybrid linked list and map, I think is the most efficient way of handling your stated problem. You'll have to keep track of the nodes yourself, since Java's LinkedList doesn't provide access to the actual nodes. The AbstractSequentialList may be helpful here.
The index structure you'll need is a map from an element value to the appearances of that element in the list. I recommend a hash table from hashCode % modulus to a linked list of (value, list of main-list nodes).
Note that this approach is still O(n) in the worst case, when you have universal hash collisions; this applies whether you use open or closed hashing. In the average case it should be something closer to O(ln(n)), but I'm not prepared to prove that.
Consider also whether the overhead of keeping track of all of this is really worth the gains. Unless you've actually profiled running code and determined that a LinkedList is causing problems because remove is O(n), stick with that until you do.
Since your requirement is that the first occurrence of the element should be removed and the remaining occurrences retained, there would be no way to do it faster than O(n) as you would definitely have to move through to the end of the list to find out if there is another occurrence. There is no standard api from Oracle in the java package that does this.

Find K max values from a N List

I got requirements-
1. Have random values in a List/Array and I need to find 3 max values .
2. I have a pool of values and each time this pool is getting updated may be in every 5 seconds, Now every time after the update , I need to find the 3 max Values from the list pool.
I thought of using Math.max thrice on the list but I dont think it as
a very optimized approach.
> Won't any sorting mechanism be costly as I am bothered about only top
3 Max Values , why to sort all these
Please suggest the best way to do it in JAVA
Sort the list, get the 3 max values. If you don't want the expense of the sort, iterate and maintain the n largest values.
Maintain the pool is a sorted collection.
Update: FYI Guava has an Ordering class with a greatestOf method to get the n max elements in a collection. You might want to check out the implementation.
Ordering.greatestOf
Traverse the list once, keeping an ordered array of three largest elements seen so far. This is trivial to update whenever you see a new element, and instantly gives you the answer you're looking for.
A priority queue should be the data structure you need in this case.
First, it would be wise to never say again, "I dont think it as a very optimized approach." You will not know which part of your code is slowing you down until you put a profiler on it.
Second, the easiest way to do what you're trying to do -- and what will be most clear to someone later if they are trying to see what your code does -- is to use Collections.sort() and pick off the last three elements. Then anyone who sees the code will know, "oh, this code takes the three largest elements." There is so much value in clear code that it will likely outweigh any optimization that you might have done. It will also keep you from writing bugs, like giving a natural meaning to what happens when someone puts the same number into the list twice, or giving a useful error message when there are only two elements in the list.
Third, if you really get data which is so large that O(n log n) operations is too slow, you should rewrite the data structure which holds the data in the first place -- java.util.NavigableSet for example offers a .descendingIterator() method which you can probe for its first three elements, those would be the three maximum numbers. If you really want, a Heap data structure can be used, and you can pull off the top 3 elements with something like one comparison, at the cost of making adding an O(log n) procedure.

Which is the appropriate data structure?

I need a Java data structure that has:
fast (O(1)) insertion
fast removal
fast (O(1)) max() function
What's the best data structure to use?
HashMap would almost work, but using java.util.Collections.max() is at least O(n) in the size of the map. TreeMap's insertion and removal are too slow.
Any thoughts?
O(1) insertion and O(1) max() are mutually exclusive together with the fast removal point.
A O(1) insertion collection won't have O(1) max as the collection is unsorted. A O(1) max collection has to be sorted, thus the insert is O(n). You'll have to bite the bullet and choose between the two. In both cases however, the removal should be equally fast.
If you can live with slow removal, you could have a variable saving the current highest element, compare on insert with that variable, max and insert should be O(1) then. Removal will be O(n) then though, as you have to find a new highest element in the cases where the removed element was the highest.
If you can have O(log n) insertion and removal, you can have O(1) max value with a TreeSet or a PriorityQueue. O(log n) is pretty good for most applications.
If you accept that O(log n) is still "fast" even though it isn't "fast (O(1))", then some kinds of heap-based priority queue will do it. See the comparison table for different heaps you might use.
Note that Java's library PriorityQueue isn't very exciting, it only guarantees O(n) remove(Object).
For heap-based queues "remove" can be implemented as "decreaseKey" followed by "removeMin", provided that you reserve a "negative infinity" value for the purpose. And since it's the max you want, invert all mentions of "min" to "max" and "decrease" to "increase" when reading the article...
you cannot have O(1) removal+insertion+max
proof:
assume you could, let's call this data base D
given an array A:
1. insert all elements in A to D.
2. create empty linked list L
3. while D is not empty:
3.1. x<-D.max(); D.delete(x); --all is O(1) - assumption
3.2 L.insert_first(x) -- O(1)
4. return L
in here we created a sorting algorithm which is O(n), but it is proven to be impossible! sorting is known as omega(nlog(n)). contradiction! thus, D cannot exist.
I'm very skeptical that TreeMap's log(n) insertion and deletion are too slow--log(n) time is practically constant with respect to most real applications. Even with a 1,000,000,000 elements in your tree, if it's balanced well you will only perform log(2, 1000000000) = ~30 comparisons per insertion or removal, which is comparable to what any other hash function would take.
Such a data structure would be awesome and, as far as I know, doesn't exist. Others pointed this.
But you can go beyond, if you don't care making all of this a bit more complex.
If you can "waste" some memory and some programming efforts, you can use, at the same time, different data structures, combining the pro's of each one.
For example I needed a sorted data structure but wanted to have O(1) lookups ("is the element X in the collection?"), not O(log n). I combined a TreeMap with an HashMap (which is not really O(1) but it is almost when it's not too full and the hashing function is good) and I got really good results.
For your specific case, I would go for a dynamic combination between an HashMap and a custom helper data structure. I have in my mind something very complex (hash map + variable length priority queue), but I'll go for a simple example. Just keep all the stuff in the HashMap, and then use a special field (currentMax) that only contains the max element in the map. When you insert() in your combined data structure, if the element you're going to insert is > than the current max, then you do currentMax <- elementGoingToInsert (and you insert it in the HashMap).
When you remove an element from your combined data structure, you check if it is equal to the currentMax and if it is, you remove it from the map (that's normal) and you have to find the new max (in O(n)). So you do currentMax <- findMaxInCollection().
If the max doesn't change very frequently, that's damn good, believe me.
However, don't take anything for granted. You have to struggle a bit to find the best combination between different data structures. Do your tests, learn how frequently max changes. Data structures aren't easy, and you can make a difference if you really work combining them instead of finding a magic one, that doesn't exist. :)
Cheers
Here's a degenerate answer. I noted that you hadn't specified what you consider "fast" for deletion; if O(n) is fast then the following will work. Make a class that wraps a HashSet; maintain a reference to the maximum element upon insertion. This gives the two constant time operations. For deletion, if the element you deleted is the maximum, you have to iterate through the set to find the maximum of the remaining elements.
This may sound like it's a silly answer, but in some practical situations (a generalization of) this idea could actually be useful. For example, you can still maintain the five highest values in constant time upon insertion, and whenever you delete an element that happens to occur in that set you remove it from your list-of-five, turning it into a list-of-four etcetera; when you add an element that falls in that range, you can extend it back to five. If you typically add elements much more frequently than you delete them, then it may be very rare that you need to provide a maximum when your list-of-maxima is empty, and you can restore the list of five highest elements in linear time in that case.
As already explained: for the general case, no. However, if your range of values are limited, you can use a counting sort-like algorithm to get O(1) insertion, and on top of that a linked list for moving the max pointer, thus achieving O(1) max and removal.

When should I use a TreeMap over a PriorityQueue and vice versa?

Seems they both let you retrieve the minimum, which is what I need for Prim's algorithm, and force me to remove and reinsert a key to update its value. Is there any advantage of using one over the other, not just for this example, but generally speaking?
Generally speaking, it is less work to track only the minimum element, using a heap.
A tree is more organized, and it requires more computation to maintain that organization. But if you need to access any key, and not just the minimum, a heap will not suffice, and the extra overhead of the tree is justified.
There are 2 differences I would like to point out (and this may be more relevant to Difference between PriorityQueue and TreeSet in Java? as that question is deemed a dup of this question).
(1) PriorityQueue can have duplicates where as TreeSet can NOT have dups. So in Treeset, if your comparator deems 2 elements as equal, TreeSet will keep only one of those 2 elements and throw away the other one.
(2) TreeSet iterator traverses the collection in a sorted order, whereas PriorityQueue iterator does NOT traverse in sorted order. For PriorityQueue If you want to get the items in sorted order, you have to destroy the queue by calling remove() repeatedly.
I am assuming that the discussion is limited to Java's implementation of these data structures.
Totally agree with Erickson on that priority queue only gives you the minimum/maximum element.
In addition, because the priority queue is less powerful in maintaining the total order of the data, it has the advantage in some special cases. If you want to track the biggest M elements in an array of N, the time complexity would be O(N*LogM) and the space complexity would be O(M). But if you do it in a map, the time complexity is O(N*logN) and the space is O(N). This is quite fundamental while we must use priority queue in some cases for example M is just a constant like 10.
Rule of thumb about it is:
TreeMap maintains all elements orderly. (So intuitively, it takes time to construct it)
PriorityQueue only guarantees min or max. It's less expensive but less powerful.
It all depends what you want to achieve. Here are the main points to consider before you choose one over other.
PriorityQueue Allows Duplicate(i.e with same priority) while TreeMap doesn't.
Complexity of PriorityQueue is O(n)(when is increases its size), while that of TreeMap is O(logn)(as it is based on Red Black Tree)
PriorityQueue is based on Array while in TreeMap nodes are linked to each other, so contains method of PriorityQueue would take O(n) time while TreeMap would take O(logn) time.
One of the differences is that remove(Object) and contains(Object) are linear O(N) in a normal heap based PriorityQueue (like Oracle's), but O(log(N)) for a TreeSet/Map.
So if you have a large number of elements and do a lot of remove(Object) or contains(Object), then a TreeSet/Map may be faster.
I may be late to this answer but still.
They have their own use-cases, in which either one of them is a clear winner.
For Example:
1: https://leetcode.com/problems/my-calendar-i TreeMap is the one you are looking at
2: https://leetcode.com/problems/top-k-frequent-words you don't need the overhead of keys and values.
So my answer would be, look at the use-case, and see if that could be done without key and value, if yes, go for PQueue else move to TreeMap.
It depends on how you implement you Priority Queue. According to Cormen's book 2nd ed the fastest result is with a Fibonacci Heap.
I find TreeMap to be useful, when there is a need to do something like:
find the minimal/least key, which is greater equal some value, using ceilingKey()
find the maximum/greatest key, which is less equal some value, using floorKey()
If the above is not required, and it's mostly about having a quick option to retrieve the min/max - PriorityQueue might be preferred.
Their difference on time complexity is stated clearly in Erickson's answer.
On space complexity, although a heap and a TreeMap both take O(n) space complexity, building them in actual programs takes up different amount of space and effort.
Say if you have an array of numbers, you can build a heap in place with O(n) time and constant extra space. If you build a TreeMap based on the given array, you need O(nlogn) time and O(n) extra space to accomplish that.
One more thing to take into consideration, PriorityQueue offers an api which return the max/min value without removing it, the time complexity is O(1) while for a TreeMap this will still cost you O(logn)
This could be clear advantage in case of readonly cases where you are only interested in the top end value.

Categories

Resources