Not understanding median of medians algorithm to find k-th element - java

Below is my code for trying to understand the median of medians algorithm (using blocks of size 5). I understand how to get medians of the input, but I'm not sure how to code the block to keep recursing the input until I just have the median. Then after getting that median, I'm not sure how to use it as a pivot to throw away the useless information to partition the input. getMediansArray returns an array of size ceil(input.length/5) and getMedians just returns the median from an array (only used on arrays of length <= 5).
public static int[] findKthElement(int[] input, int k) {
int numOfMedians = (int) Math.ceil(input.length/5.0);
int[] medians = new int[numOfMedians];
medians = getMediansArray(input, medians)
// (1) This only gets the first iteration of medians of the
// input. How do I recurse on this until I just have one median?
// (2) how should I partition about the pivot once I get it?
}
public static int[] getMediansArray(int[] input, int[] medians) {
int numOfMedians = (int) Math.ceil(input.length/5.0);
int[] five = new int[5];
for (int i = 0; i < numOfMedians; i++) {
if (i != numOfMedians - 1) {
for (int j = 0; j < 5; j++) {
five[j] = input[(i*5)+j];
}
medians[i] = getMedian(five);
} else {
int numOfRemainders = input.length % 5;
int[] remainder = new int[numOfRemainders];
for (int j = 0; j < numOfRemainders; j++) {
remainder[j] = input[(i*5)+j];
}
medians[i] = getMedian(five);
}
}
return medians;
}
public static int getMedian(int[] input) {
Arrays.sort(input);
if (input.length % 2 == 0) {
return (input[input.length/2] + input[input.length/2 - 1]) / 2;
}
return input[input.length/2];
}

Median of medians is basically just the quick-select algorithm (http://en.wikipedia.org/wiki/Quickselect) improved. While quick-select has O(n) average time complexity, it can slow down to O(n^2) for tricky input.
What you do after finding a median of medians is nothing but an iteration of quick-select algorithm. Median of medians has a nice property that it will be always larger than 30% of elements and smaller than 30% of elements. This guarantees that quick-select using median of medians for a pivot will run in worst time complexity of O(n). Refer to: http://en.wikipedia.org/wiki/Median_of_medians
I suggest you start by implementing quick-select. Once you do that, you can use code you already have to select pivot in each step of quick-select.

If I remember correctly (refreshing my memory) Median of Medians selects an approximate median. I'm not understanding how it can be used to select the k-th element.

Related

quickSort algorithm using median of medians as a pivot for partitioning

there is a lot of info on StackOverflow but I couldn't exactly figure it out the way I need it.
I'm trying to implement a quickSelect algorithm with a median of medians but not exactly making it work.
the algorithm is supposed to find the i-th smallest element in the array with the input of n>1 elements using these steps :
if n==1 return the element in the array.
divide the n elements in the array into groups of "groupSize" and one more group of "groupSize" -1 elements at most. the original algorithm is groups of 5 but i want it to be modular.
find the median of each of the (ceiling value) of n/groupSize by using insertion sort and picking the median of the sort sub-array.
call Select recursively to find the median "x" of ceiling value n/groupSize medians found in step 3. given that number of medians is even - "x" would be the bottom median.
this is the part i found trickiest - divide the input array around the median of medians "x" using partition (possibly hoare-parititon). place in k the number of elements in the lower area of the partition so "x" would be the k-th smallest element and the upper are of the partition will hold n-k elements.
if i=k return "x" , if i<k call Select recursively to find the i-th smallest element in the lower sub-array, if i>k call select to find the (i-k) smallest in the upper sub-array.
i don't exactly know how to execute section 5 and i feel this is the part that figures everything here
this is my Select
if(right-left+1==1) return array[left];
// full steps are the count of full groups in size of groupSize(left of decimal point)
int fullSteps = array.length/groupSize;
int semiSteps = (int)((double)(array.length)%groupSize);
int i;
//array of medians is the size of number of groups needed
int medianArraySize = (int) Math.ceil (((double)(array.length))/groupSize);
int [] medianArray = new int [medianArraySize];
print(medianArray);
// sort the entire array by cutting it to chunks of defined size
for(i=0;i<medianArraySize;i++){
insertionSort(array, i*groupSize,i*groupSize+groupSize);
//place the median of sorted sub-array into median array in place i
medianArray[i]=findMidean(array, i*groupSize, i*groupSize+groupSize);
//implement insertion sort on full groups to find median - ceilling value
}
int medianOfMedians = select(medianArray,0,medianArraySize-1,groupSize,medianArray[(medianArraySize-1)/2]);
//find median of medians without using recursion
// int medianOfMedians = medianArray[(medianArraySize-1)/2];
// System.out.println("median of medians is :" +medianOfMedians);
int pivot = hoarePartition(array,left,right,medianOfMedians);
int k = left-pivot +1;
if (pivot==k-1) return array[pivot];
else
if (pivot<k-1) return select(array,pivot +1,right,groupSize,ithSmallest);
else return select(array,left,pivot-1,groupSize,ithSmallest-k);
}
helper function - find median
*
* returns the median value for the desginated range in given array.
*/
private static int findMidean(int [] array, double start,int end){
int stopper = Math.min(array.length,end);
int index =(int)(Math.ceil(start+stopper-1)/2.0);
return array[index];
}
partition algorithm
public static int hoarePartition(int[] arr, int low, int high,int pivot)
{
int i = low, j = high;
while (true) {
// Find leftmost element greater
// than or equal to pivot
while (i<j && arr[i] < pivot){
i++;
}
// Find rightmost element smaller
// than or equal to pivot
while (j>i && arr[j] > pivot){
j--;
}
// If two pointers met.
if (i >= j)
return j;
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
// swap(arr[i], arr[j]);
}
}
and insertion sort
public static void insertionSort(int [] arr,int placeHolder,int end)
{
int stopper = Math.min(arr.length,end);
for (int i = placeHolder; i < stopper; ++i) {
int key = arr[i];
int j = i - 1;
/* Move elements of arr[0..i-1], that are
greater than key, to one position ahead
of their current position */
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j = j - 1;
}
arr[j + 1] = key;
}
}
If Hoare partition scheme is used, then the pivot and all elements equal to the pivot can end up anywhere. This means the case of i == k can't be used, and instead quickselect will have to recursively call itself until the base case of a single element is reached.
If Lomuto partition scheme is used, then the pivot is put in place and the index of the pivot is returned, so the i == k case can be used.

Why would a quadratic time algorithm execute faster than a linear time algorithm

Here are two different solutions for finding "Number of subarrays having product less than K", one with runtime O(n) and the other O(n^2). However, the one with O(n^2) finished executing about 4x faster than the one with linear runtime complexity (1s vs 4s). Could someone explain why this is the case, please?
Solution 1 with O(n) runtime:
static long countProductsLessThanK(int[] numbers, int k)
{
if (k <= 1) { return 0; }
int prod = 1;
int count = 0;
for (int right = 0, left = 0; right < numbers.length; right++) {
prod *= numbers[right];
while (prod >= k)
prod /= numbers[left++];
count += (right-left)+1;
}
return count;
}
Solution 2 with O(n^2) runtime:
static long countProductsLessThanK(int[] numbers, int k) {
long count = 0;
for (int i = 0; i < numbers.length; i++) {
int productSoFar = 1;
for (int j = i; j < numbers.length; j++) {
productSoFar *= numbers[j];
if (productSoFar >= k)
break;
count++;
}
}
return count;
}
Sample main program:
public static void main(String[] args) {
int size = 300000000;
int[] numbers = new int[size];
int bound = 1000;
int k = bound/2;
for (int i = 0; i < size; i++)
numbers[i] = (new Random().nextInt(bound)+2);
long start = System.currentTimeMillis();
System.out.println(countProductLessThanK(numbers, k));
System.out.println("O(n) took " + ((System.currentTimeMillis() - start)/1000) + "s");
start = System.currentTimeMillis();
System.out.println(countMyWay(numbers, k));
System.out.println("O(n^2) took " + ((System.currentTimeMillis() - start)/1000) + "s");
}
Edit1:
The array size I chose in my sample test program has 300,000,000 elements.
Edit2:
array size: 300000000:
O(n) took 4152ms
O(n^2) took 1486ms
array size: 100000000:
O(n) took 1505ms
O(n^2) took 480ms
array size: 10000:
O(n) took 2ms
O(n^2) took 0ms
The numbers you are choosing are uniformly distributed in the range [2, 1001], and you're counting subarrays whose products are less than 500. The probability of finding a large subarray is essentially 0; the longest possible subarray whose products is less than 500 has length 8, and there are only nine possible subarrays of that length (all 2s, and the eight arrays of seven 2s and a 3); the probability of hitting one of those is vanishingly small. Half of the array values are already over 500; the probability of finding even a length two subarray at a given starting point is less than one-quarter.
So your theoretically O(n²) algorithm is effectively linear with this test. And your O(n) algorithm requires a division at each point, which is really slow; slower than n multiplications for small values of n.
In the first one, you're dividing (slow), multiplying and doing multiple sums.
In the second one, the heavier operation is multiplication, and as the first answer says, the algorithm is effectively linear for your tests cases.

Java: Parallelise loop and merge results for calculating entropy

I have an algorithm that does the following:
Given I have an array array of length n It's goal is to merge certain elements based on some condition (it this case entropy). It calculates the entropy e_all of the entire array and calculates the entropy e_merged of the array where element n and n+1 are merged. It does that for each pair of adjacent elements. The pair where the difference in e_all - e_merged is greatest are marged. If there is a merge, the algorithm is applied again on the new array with length n-1.
As you can see, this takes in the worst case n^2 - 1 iterations and if n is big it might take minutes or even hours to complete.
Therefore I was wondering how I can parallelise this algorithms. Basically it should be able calculate the entropies on i cores and when all the elements are evaluated the results should be merged and a conclusion can be drawn.
How can I do such a thing? Which kinds of code pieces or idea's must I implement for it to work this way? Or is there a better way?
public double[] applyAlgorithm(double[] array) {
boolean merging = false;
for (int i = 0; i < array.length - 1; i++) {
double[] entropy = getEntropy(array); // returns list of entropy for all adjacent intervals
int idx = 0;
double max = Double.NEGATIVE_INFINITY;
for (int j = 0; j < entropy.length; j++) {
if (entropy[j] > max) {
max = entropy[j];
idx = j;
}
}
if (max > 0) {
array = mergeAdjacentIntervals(array, idx); //merge intervals that have the max entropy, if the entropy is > 0
merging = true;
break;
}
}
if (merging) {
array = applyAlgorithm(array);
}
return array;
}
private double[] getEntropy(double[] array) {
double[] entropy = new double[array.length - 1];
double[] tempArray = new double[array.length - 1];
double baseEntropy = calculateEntropy(array);
for (int i = 0; i < entropy.length; i++) {
tempArray = mergeAdjacentIntervals(array, idx);
entropy[i] = baseEntropy - calculateEntropy(tempArray);
}
return entropy;
}

Understanding Kadane's Algorithm for 2-D Array

I'm trying to write a program which solves the maximum subarray problem. I can understand the intuition behind Kadane's Algorithm on a 1-D array as well as the O(N^4) implementation on a 2-D array. However, I am having some trouble understanding the O(N^3) implementation on a 2-D array.
1) Why do we add up the elements with those from the previous rows within the same column?
for (int i = 1; i <= N; i++) {
for (int j = 1; j <= M; j++)
array[i][j] += array[i-1][j];
}
2) I have no understanding of the second part of the algorithm
Tried looking for an explanation on the web but to no avail. Hope to get some help here!
Thanks in advance!
You know how to compute maximum sum sub-array on a 1D array using Kadane's algorithm. Now we want to extend this algorithm for the 2D array. For an O(N^3) algorithm, we have an intuition. If we somehow create N^2 sub problems and then try to run our O(N) Kadane's algorithm, we can solve the maximum sub array problem.
So basically how we create the N^2 sub problems is by iterating over all the top and bottom rows of the matrix. Then we try to find the optimal columns between which the sub array exists by applying kadane's 1D algorithm. We thus sum the numbers between these two rows column wise and then apply kadane's 1D algorithm on this newly formed 1D array.
But we have a problem here. Computing the sums for all the O(n^2) ranges of the top and bottom rows will itself be O(n^4). This bottle neck can be overcome by modifying our matrix by replacing each element with the sum of all the numbers that are above it in that element's column. Thus, now we can find out the sum of numbers between any two rows in O(n) time by subtracting the appropriate arrays in the matrix.
The java pseudo code -
int kadane2D(int array[N][M]){
// Modify the array's elements to now hold the sum
// of all the numbers that are above that element in its column
for (int i = 1; i < N; i++) {
for (int j = 0; j < M; j++){
array[i][j] += array[i-1][j];
}
}
int ans = 0; // Holds the maximum sum matrix found till now
for(int bottom = 0; bottom < N; bottom++){
for(int top = bottom; top < N; top++){
// loop over all the N^2 sub problems
int[] sums = new int[N];
// store the sum of numbers between the two rows
// in the sums array
for(int i = 0; i < M; i++){
if (bottom > 0) {
sums[i] = array[top][i] - array[bottom-1][i];
} else {
sums[i] = array[top][i];
}
}
// O(n) time to run 1D kadane's on this sums array
ans = Math.max(ans, kadane1d(sums));
}
}
return ans;
}
For people who understand the Kadane's 1D algorithm, below should be easy to understand. Basically we try to convert the 2D matrix into 1D by using the prefix sum for each rows. And for each prefix sum row, we just apply the Kadane's 1D algorithm.
Just posting the working Python code:
class Kadane2D:
def maxSumRetangle(self, grid):
def kadane1D(arr):
curmax, maxsofar = 0, float('-inf')
for a in arr:
curmax = max(a, curmax + a)
maxsofar = max(curmax, maxsofar)
return maxsofar
m, n, ans = len(grid), len(grid[0]), float('-inf')
colCum = [[0] * n]
for row in grid:
colCum.append([pre + now for pre, now in zip(colCum[-1], row)])
for top in range(1, m + 1):
for bottom in range(top, m + 1):
sums = [b - t for b, t in zip(colCum[bottom], colCum[top - 1])]
ans = max(ans, kadane1D(sums))
return ans
grid = [[1, 2, - 3], [3, 4, -6]]
assert Kadane2D().maxSumRetangle(grid) == 10
grid = [[1, 2, -1, -4, -20],
[-8, -3, 4, 2, 1],
[3, 8, 10, 1, 3],
[-4, -1, 1, 7, -6]]
assert Kadane2D().maxSumRetangle(grid) == 29
I know it's an old question. But Google doesn't have the right answers, or they're overworked.
No, this is no correct way. Working example, on O(N^2):
/**
* Kadane 1d
* #return max sum
*/
public static int maxSum(int[] a) {
int result = a[0]; //get first value for correct comparison
int sum = a[0];
for (int i = 1; i < a.length; i++) {
sum = Math.max(sum + a[i], a[i]); //first step getting max sum, temporary value
result = Math.max(result, sum);
}
return result;
}
/**
* Kadane 2d
* #param array
* #return max sum
*/
public static int maxSum2D(int array[][]){
int result = Integer.MIN_VALUE; //result max sum
for (int i = 0; i < array.length; i++) {
int sum = maxSum(array[i]);
result = Math.max(result, sum);
}
return result;
}
Fully examples:
Easy: https://pastebin.com/Qu1x0TL8
Supplemented: https://pastebin.com/Tjv602Ad
With indexes: https://pastebin.com/QsgPBfY6

Selection: Median of medians

As a homework I was assigned to write algorithm that finds k-th ordered number from unordered set of numbers. As an approach, algorithm median of medians has been presented.
Unfortunately, my attemp has failed. If anyone spots a mistake - please correct me.
private int find(int[] A, int size, int k) {
if (size <= 10) {
sort(A, 0, size);
return A[k];
} else {
int[] M = new int[size/5];
for (int i = 0; i < size / 5; i++) {
sort(A, i*5, (i+1) * 5);
M[i] = A[i*5 + 2];
}
int m = find(M, M.length, M.length / 2);
int[] aMinus = new int[size];
int aMinusIndex = 0;
int[] aEqual = new int[size];
int aEqualIndex = 0;
int[] aPlus = new int[size];
int aPlusIndex = 0;
for (int j = 0; j < size; j++) {
if (A[j] < m) {
aMinus[aMinusIndex++] = A[j];
} else if (A[j] == m) {
aEqual[aEqualIndex++] = A[j];
} else {
aPlus[aPlusIndex++] = A[j];
}
}
if (aMinusIndex <= k) {
return find(aMinus, aMinusIndex, k);
} else if (aMinusIndex + aEqualIndex <= k) {
return m;
} else {
return find(aPlus, aPlusIndex, k - aMinusIndex - aEqualIndex);
}
}
}
private void sort(int[] t, int begin, int end) { //simple insertion sort
for (int i = begin; i < end; i++) {
int j = i;
int element = t[i];
while ((j > begin) && (t[j - 1] > element)) {
t[j] = t[j - 1];
j--;
}
t[j] = element;
}
}
The test I'm running is to put numbers {200, 199, 198, ..., 1) and get 1st number from ordered array. I'm getting:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -13
Which is thrown at return A[k] line, because of recursive call:
return find(aPlus, aPlusIndex, k - aMinusIndex - aEqualIndex);
Your branching logic for the recursion step is backwards. You're trying to find the kth smallest number, and you've found that there are aMinusIndex numbers smaller than m, aEqualIndex equal to m, and aPlusIndex larger than m.
You should be searching in aMinus if aMinusIndex >= k, not if aMinusIndex <= k -- and so on.
(See this easily by looking at the extreme case: say there are zero numbers smaller than m. Then clearly you should not be searching for anything in an empty array, but because 0 <= k, you will be.)
I don't know exactly what your problem is, but you definitely should not be doing this:
sort(A, i*5, (i+1) * 5);
Also, you shouldn't do so much copying, you don't gain any performance when you do that. The algorithm is supposed to be done in place.
Check this wikipedia: Selection algorithm
I understand that this is homework, so your options might be constrained, but I don't see how the Median of Medians is all that useful here. Just sort the entire array using a standard algorithm, and pick the kth element. Median of medians helps find a very good pivot for the sort. For data of 200 length, you aren't going to save much time.
So far as I know, you can't accurately obtain a median, or a percentile, or the kth element, without ultimately sorting the entire input array. Using subsets yields an estimate. If this is wrong, I'd really like to know, as I recently worked on code to find percentiles in arrays of millions of numbers!
p.s. it could be that I don't completely understand your code...

Categories

Resources