In what order does a minheap sort? - java

I've been reading various definitions on minHeap and maxHeap. I stumbled upon statements which say:
minHeap is used to sort in descending order.
maxHeap is used to sort in ascending order.
Statements taken from the "Note" in https://www.geeksforgeeks.org/heap-sort-for-decreasing-order-using-min-heap/ .
But when I implement minHeap using PriorityQueue<Integer> in Java with default comparator, and poll() it, I get the minimum element. Why is that?
Thanks to anybody who's trying to help :).

The explanation in the blog is correct
While having a close look at the heapSort() function, it has smartly made use of min heap. The smallest element of the array gets replaced with the last element and size of heap is reduced by 1 to again heapify() it.
arr[0] -> represents the smallest element.
In every iteration, for i from n-1 to 0, the arr[0] is swapped with the arr[i] and heap is again heapified with size of 1 smaller than previous iteration.

min-heap and max-heap don't sort. You can use a min-heap or max-heap to sort, but in standard use, heaps aren't sorted. Instead, they arranged in what's called heap order. This arrangement makes it efficient to add items, and to remove the smallest (or largest) while keeping the data in the proper order.
For example, here's an illustration of a min-heap:
1
3 2
4 7 6 5
That follows the rule that no child is larger than its parent. The resulting array representation of that heap is [1,3,2,4,7,6,5]. Note that there are other valid heaps with those same numbers. For example, both of the below are valid, as well:
1 1
2 5 2 3
4 3 6 7 4 5 6 7
The corresponding array representations are [1,2,5,4,3,6,7] and [1,2,3,4,5,6,7].
Max-heap is similar, except the rule is that no child can be larger than its parent.
The Wikipedia article on binary heap explains this all very well.
Now, when you're talking about using heap sort, you build a heap and then repeatedly swap the root element with the last element in the array, reduce the count and then re-heapify. So you build the sorted array from back to front. If you use a min-heap, then the root (smallest value) will be at the end of the array.
So if you want to sort in descending order with heap sort, you use a min-heap.

The basic idea is to sort in place. Every time the algorithm polls from the heap, the heaps size shrinks by one. So one place at the end of the array is not part of the heap anymore. In this index the n-th smallest number is placed.
// One by one extract an element from heap
for (int i = n - 1; i >= 0; i--) {
// Move current root to end
swap(arr[0], arr[i]);
// call max heapify on the reduced heap
heapify(arr, i, 0);
}
So the explanation why your algorithm using a PriorityQueue behaves so oddly is that you use a separate array for output. You could either switch to a max-heap and stick to your approach or fill the output-array in reverse. Both should produce the same behavior.

Related

Data structure to select number with probability proportional to its value in less than O(n)

I have a set of numbers, [1, 3, 4, 5, 7]. I want to select a number from the set with a probability proportional to its value:
Number
Probability
%
1
1/20
5
3
3/20
15
4
4/20
20
5
5/20
25
7
7/20
35
However, I want to be able to both update and query this set in less than O(n). Are there any data structures that would help me achieve that?
Preferably in Java, if it exists already
You can get O(log n) amortized querying and updating (including removing/inserting elements) using a Binary Indexed Tree, also known as a Fenwick tree. The idea is to use dynamic resizing, which is the same trick used in variable-size arrays and hash tables to get amortized constant time appends. This also implies that you should be able to get O(log n) worst-case bounds using the method from dynamic arrays of rebuilding a second array on the side, but this makes the code significantly longer.
First, we know that given a list of the partial sums of arr, and a random integer in [0, sum(arr)], we can do this in O(log n) time with a binary search. Specifically, if our random integer is r, we want the index of the rightmost partial sum less than or equal to r.
Now, we'll use the technique from this post of Fenwick trees to maintain and query the partial sums. That post is slightly different from yours: they have a fixed set of n keys, whose weights can be updated, without new insertions or deletions.
A Fenwick tree is an array that allows you to answer queries about partial sums of a 'base' array in O(log n) time per query, and can be built in O(n) time. In particular, you can
Find the the index of the rightmost partial sum of arr less than or equal to r,
Set arr[i] to arr[i]+c for any integer c,
both in O(log n) time.
Start by appending n zeros to arr (it is now half full), and build its Fenwick tree. We can treat 'removing' an element as setting its weight to 0. Inserting an element is done by taking the zero after the rightmost nonzero element in arr as the new element's spot. The removed elements and new elements may eventually cause our array to fill up: if we reach 75% capacity, rebuild our array and Fenwick tree, doubling the array size (pad with zeros on the right) and deleting all the zero-weight elements. If we reach 25% capacity, shrink the array to half size, rebuilding the Fenwick tree as well.
You'll need to maintain arr constantly to be able to rebuild, so all updates must be done on arr and the Fenwick tree. You'll also need a hashmap from array indices to your keys for random selection.
The good part is that you don't need to modify the Fenwick tree internals at all: given a Fenwick tree implementation in Java that supports initialization, array updates and the binary search, you can treat it as a black box. This stops being true if you want worst-case time guarantees: then, you'll need to copy the internal state of the Fenwick tree, piece by piece, which has some complications.

For Array List resizing, why multiply a factor(for size of new array) makes it faster? [duplicate]

"Therefore, inserting N elements takes O(N) work total. Each insertion is O(1) on average, each though some insertions take O(N) time in the worst case." This quote is found in Cracking the Coding Interview. I kind of understand this statement even though a little thing about it is irking me. The amortized insertion is O(1) on good days. This simply means that when the resizable array doesn't need to resized, then to insert something is simply O(1). That is clear. But, on a bad day, when we run out of space, we would need O(N) to insert that extra element. However, I don't agree with the statement above when it says some insertion take O(N) in the worst case. Shouldn't it say, ONE insertion take O(N) in the worst case.
To make this more clear, here's an example of what I'm saying.
Say we have a resizable array, but we that array is size 4. Let's now say to insert 5 elements, we would have O(1), O(1), O(1), O(1), but then once we get to the last element, we would have to copy all these guys into a new array and this process would give us a cost of O(N).
Can someone please clarify this for me. I don't understand why the book says some cases would take O(N) when when only need to copy all the elements into a new array one time when we run of of space in our old array.
Think to the cost of N insertions inside a resizable array as (I will use tilde notation here):
cost of N insertions = cost of new element insertions + cost of resizes
Cost of new element insertions
This is simply the cost of inserting a new element in the array multiplied for how many times you insert a new element, i.e., N:
cost of new element insertions = 1 * N
Cost of resizes
Imagine you have a 64 cells array. Then, it means that the array has been resized for:
array size = 1 -> 2 -> 4 -> 8 -> 16 -> 32 -> 64
#resize done = 6
The 64 cell array has been resized 6 times, i.e., resize happens for log2(64) times.
In general, now we know that for N insertion, we will perform log2(N) resize operations.
But what do we do inside each resize? We will copy the elements already present in the array in the new resized array: at resize "i", how many elements we wil copy? 2^i. With previous example:
resize number 1 = 1 -> 2: copy 1 elements
resize number 2 = 2 -> 4: copy 2 elements
resize number 3 = 4 -> 8: copy 4 elements
......
resize number 6 = 32 -> 64: copy 32 elements
So:
Cost of resizes = summatory(from i=1 to log2(N)) 2^i = 2(N-1)
Conclusion
cost of N insertions = cost of new element insertions + cost of resizes = N + 2(N-1) ~ 3N
I think you'd better understand above statement like this way.
At first. an array size is just 1. and insert it one element. Now the array is full!. you have to resize it as much as 2 times of the previous one.
Next, the array size is 2. Let this process progress. You can easily notice that the moment you have to resize an array is 1, 2, 4, 8, 16, 32, ... , 2^r.
I will give you questions.
How many times are the moments you have to resize an array?
How much the total cost until N(N>=0) steps?
1st answer is floor(lgN) times. you can figure it out easily I think. If you find the first answer, calculating the total cost of this N steps which is the second answer is pretty easy.(I don't know how I can express mathematical symbol:<)
1 + 2 + 4 + 8 + 16 + ... + 2^(floor(lgN)) = 2^(floor(lgN)+1) - 1 => O(n)
To get the average cost of each step, divide total cost into N => O(1)
I think worst case the reference mentions is when the array is needed to be resized. The cost of this readjustment is in proportional to the number of elements are in the array, O(N)
Let's split up all the insertions into "heavy" insertions that take time proportional to the number of elements and "light" insertions that only take a constant amount of time to complete. Then if you start with an empty list and keep appending and appending, you're going to have mostly light insertions, but every now and then you'll have a heavy insertion.
Let's say, for simplicity, that you double the size of the array every time you run out of space and that you start off with an array of size 4. Then the first resize will have to move four elements, the second will move eight, then sixteen, then thirty-two, then sixty-four, then 128, then 256, etc.
Notice that it's not just one single append that takes a long time - roughly speaking, if you have n total insertions, then roughly log n of them will be heavy (the time between them keeps growing) and the other roughly n - log n of them will be light.
std::vector will keep all its elements next to each other in memory for fast iteration - that's its thing, that's what it does, that's why everyone loves std::vector. Typically it will reserve a bit more space than it needs for the number of elements contained within it, or that memory happens to be free, so when you add a new element to the end of vector it's quick for vector to shove your new element there.
However, when vector doesn't have space to expand, it can't just leave its existing elements where they are and start a new list somewhere else - all the elements MUST be next to each other in memory! So it must find a free chunk of memory that's big enough for all the elements plus your new one, then copy all the existing elements over there, then add your new element to the end.
If it takes 1 unit of time to add 1 element, it takes N units of time to move N elements, broadly speaking. If you add a new element, that's one operation. If you add a new element and 1024 existing elements need to be relocated, that's 1025 operations. So how long the reallocation takes is proportional to the size of the vector, hence O(N).

Find All Elements in an Array which appears more than N/K times, N is Array Size and k is a Number

Every where best algorithm given is this
1) Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements). Following is structure of temporary array elements.
2) Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.
3) Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.
Resources i referred are :-
1) http://www.geeksforgeeks.org/given-an-array-of-of-size-n-finds-all-the-elements-that-appear-more-than-nk-times/
2) http://algorithms.tutorialhorizon.com/find-all-elements-in-an-array-which-appears-more-than-nk-times-n-is-array-size-and-k-is-a-number/
**To me better algorithm looks like below which is O(n) **
1)Start iterating through array.
2)For each element check if element exist in hash map . If yes increase counter otherwise add it with counter 1
3)Also after addition of each element check whether count is greater than n/k times. If yes print
Here memory complexity will be O(N) which is bad than O(K-1) but time complexity is much better here which is O(n) compared to O(nk) in previuos algo.
so is n't second algo better considering time complexity and simplicity/readability or answer is Depends on kind depending upon where we want to compromise i.e. either on
time or memory ?

Finding median in unsorted array

I need to take input such as 4 1 6 5 0. The 4 determines how big the array is, and the rest are array elements. The catch is I can't sort it first. I'm at a loss of how to begin.
There Is A Chapter In MIT's Introduction To Algorithm Course (http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-four) Dedicated To Order Statics. You Can Find The Median In O(N) Expected Time, O(N^2) Worst Case.
I think that you should use sorted list, it uses a perform any algorithm to sort the list. So you can sort them first and then get the n/2 element, it's your median.

Array with large number of elements sort only largest n elements [duplicate]

This question already has answers here:
Store the largest 5000 numbers from a stream of numbers
(2 answers)
Closed 8 years ago.
I have an array containing a big number of elements—more than 2,000,000. I need to obtain the highest (or lowest) ranking 300 elements. So whenever I reach the first highest (or lowest) 300 elements of the array, return them. Currently Arrays.sort is used for the whole array, then the highest (or lowest) ranking elements are returned.
e.g.: {1,2,3,4,5,6,7,8,9} I want to obtain the highest 3 elements; I need: {9,8,7}
Any suggestions on this one?
EDIT
Best resource found so far, containing a research/comparison of different solutions:
http://www.michaelpollmeier.com/selecting-top-k-items-from-a-list-efficiently-in-java-groovy/
Source code for the article:
https://github.com/mpollmeier/Selection-Algorithms
You can use partial heapsort. Construct a minheap having 1st 300 elements.
Then as you traverse the array further, check whether the current element is greater than the root element of the heap. If it is greater, then delete the root element and add this new element.
After you finish with the entire array, your minHeap will have the largest 300 elements.
Extract the root element one by one. The elements will pop out in ascending order.
Note: The heap will always contain k(300) elements irrespective of the value of N, so heap operations shall be O(logk) in this case.
Hence order of complexity of this algorithm is O(Nlogk) where N is the size of array.
Space Complexity - O(k)
EDIT:
If you want the lowest 300 elements, then similar algorithm can be followed using a maxheap instead of minheap.
Would this work for you? This example sorts the top 4 elements in the array:
double[] arr = new double[]{1.0,4.0,2.0,8.0,3.0,6.0,7.0,5.0};
int nth = 4; //e.g. - sort the top 4 numbers
Arrays.sort(arr,arr.length-nth-1,arr.length);
System.out.println(Arrays.toString(arr));
Output:
[1.0, 4.0, 2.0, 3.0, 5.0, 6.0, 7.0, 8.0]
Use (T[], int, int, java.util.Comparator), this will sort the given array (T[]) in the range specified, only the elements from the first int arg to the second int arg. The comparator is optional.

Categories

Resources