Good day SO community,
I am a CS student currently performing an experiment combining MergeSort and InsertionSort. It is understood that for a certain threshold, S, InsertionSort will have a quicker execution time than MergeSort. Hence, by merging both sorting algorithms, the total runtime will be optimized.
However, after running the experiment many times, using a sample size of 1000, and varying sizes of S, the results of the experiment does not give a definitive answer each time. Here is a picture of the better results obtained (Note that half of the time the result is not as definitive):
Now, trying the same algorithm code with a sample size of 3500:
Finally, trying the same algorithm code with a sample size of 500,000 (Note that the y-axis is in milliseconds:
Although logically, the Hybrid MergeSort will be faster when S<=10, as InsertionSort does not have recursive overhead time. However, the results of my mini experiment says otherwise.
Currently, these are the Time Complexities taught to me:
MergeSort: O(n log n)
InsertionSort:
Best Case: θ(n)
Worst Case: θ(n^2)
Finally, I have found an online source: https://cs.stackexchange.com/questions/68179/combining-merge-sort-and-insertion-sort that states that:
Hybrid MergeInsertionSort:
Best Case: θ(n + n log (n/x))
Worst Case: θ(nx + n log (n/x))
I would like to ask if there are results in the CS community that shows definitive proof that a Hybrid MergeSort algorithm will work better than a normal MergeSort algorithm below a certain threshold, S, and if so, why?
Thank you so much SO community, it might be a trivial question, but it really will clarify many questions that I currently have regarding Time Complexities and stuff :)
Note: I am using Java for the coding of the algorithm, and runtime could be affected by the way java stores data in memory..
Code in Java:
public static int mergeSort2(int n, int m, int s, int[] arr){
int mid = (n+m)/2, right=0, left=0;
if(m-n<=s)
return insertSort(arr,n,m);
else
{
right = mergeSort2(n, mid,s, arr);
left = mergeSort2(mid+1,m,s, arr);
return right+left+merge(n,m,s,arr);
}
}
public static int insertSort(int[] arr, int n, int m){
int temp, comp=0;
for(int i=n+1; i<= m; i++){
for(int j=i; j>n; j--){
comp++;
comparison2++;
if(arr[j]<arr[j-1]){
temp = arr[j];
arr[j] = arr[j-1];
arr[j-1] = temp;
}
else
break;
}
}
return comp;
}
public static void shiftArr(int start, int m, int[] arr){
for(int i=m; i>start; i--)
arr[i] = arr[i-1];
}
public static int merge(int n, int m, int s, int[] arr){
int comp=0;
if(m-n<=s)
return 0;
int mid = (n+m)/2;
int temp, i=n, j=mid+1;
while(i<=mid && j<=m)
{
comp++;
comparison2++;
if(arr[i] >= arr[j])
{
if(i==mid++&&j==m && (arr[i]==arr[j]))
break;
temp = arr[j];
shiftArr(i,j++,arr);
arr[i] = temp;
if(arr[i+1]==arr[i]){
i++;
}
}
i++;
}
return comp;
}
The example code isn't a conventional merge sort. The merge function is shifting an array instead of merging runs between the original array and a temporary working array and back.
I tested top down and bottom up merge sorts and both take about 42 ms == 0.042 seconds to sort 500,000 32 bit integers, versus the apparent results in the graph which are 1000 times slower at about 42 seconds instead of 42 ms. I also tested with 10,000,000 integers and it takes a bit over 1 second to sort.
In the past, using C++, I compared a bottom up merge sort with a hybrid bottom up merge / insertion sort, and for 16 million (2^24 == 16,777,216) 32 bit integers, the hybrid sort was about 8% faster with S == 16. S == 64 was slightly slower than S == 16. Visual Studio std::stable_sort is a variation of bottom up merge sort (the temp array is 1/2 the size of the original array) and insertion sort, and uses S == 32.
For small arrays, insertion sort is quicker than merge sort, a combination of cache locality and fewer instructions needed to sort a small array with insertion sort. For pseudo random data and S == 16 to 64, insertion sort was about twice as fast as merge sort.
The relative gain diminishes as the array size increases. Considering the effect on bottom up merge sort, with S == 16, only 4 merge passes are optimized. In my test case with 2^24 == 16,777,216 elements, that's 4/24 = 1/6 ~= 16.7% of the number of passes, resulting in about an 8% improvement (so the insertion sort is about twice as fast as merge sort for those 4 passes). The total times were about 1.52 seconds for the merge only sort, and about 1.40 seconds for the hybrid sort, a 0.12 second gain on a process that only takes 1.52 seconds. For a top down merge sort, with S == 16, the 4 deepest levels of recursion would be optimized.
Update - Example java code for an hybrid in place merge sort / insertion sort with O(n log(n)) time complexity. (Note - auxiliary storage is still consumed on the stack due to recursion.) The in place part is accomplished during merge steps by swapping the data in the area merged into with the data in the area merged from. This is not a stable sort (the order of equal elements is not preserved, due to the swapping during merge steps). Sorting 500,000 integers takes about 1/8th of a second, so I increased this to 16 million (2^24 == 16777216) integers, which takes a bit over 4 seconds. Without the insertion sort, the sort takes about 4.524 seconds, and with the insertion sort with S == 64, the sort takes about 4.150 seconds, about 8.8% gain. With essentially the same code in C, the improvement was less: from 2.88 seconds to 2.75 seconds, about 4.5% gain.
package msortih;
import java.util.Random;
public class msortih {
static final int S = 64; // use insertion sort if size <= S
static void swap(int[] a, int i, int j) {
int tmp = a[i]; a[i] = a[j]; a[j] = tmp;
}
// a[w:] = merged a[i:m]+a[j:n]
// a[i:] = reordered a[w:]
static void wmerge(int[] a, int i, int m, int j, int n, int w) {
while (i < m && j < n)
swap(a, w++, a[i] < a[j] ? i++ : j++);
while (i < m)
swap(a, w++, i++);
while (j < n)
swap(a, w++, j++);
}
// a[w:] = sorted a[b:e]
// a[b:e] = reordered a[w:]
static void wsort(int[] a, int b, int e, int w) {
int m;
if (e - b > 1) {
m = b + (e - b) / 2;
imsort(a, b, m);
imsort(a, m, e);
wmerge(a, b, m, m, e, w);
}
else
while (b < e)
swap(a, b++, w++);
}
// inplace merge sort a[b:e]
static void imsort(int[] a, int b, int e) {
int m, n, w, x;
int t;
// if <= S elements, use insertion sort
if (e - b <= S){
for(n = b+1; n < e; n++){
t = a[n];
m = n-1;
while(m >= b && a[m] > t){
a[m+1] = a[m];
m--;}
a[m+1] = t;}
return;
}
if (e - b > 1) {
// split a[b:e]
m = b + (e - b) / 2;
w = b + e - m;
// wsort -> a[w:e] = sorted a[b:m]
// a[b:m] = reordered a[w:e]
wsort(a, b, m, w);
while (w - b > 2) {
// split a[b:w], w = new mid point
n = w;
w = b + (n - b + 1) / 2;
x = b + n - w;
// wsort -> a[b:x] = sorted a[w:n]
// a[w:n] = reordered a[b:x]
wsort(a, w, n, b);
// wmerge -> a[w:e] = merged a[b:x]+a[n:e]
// a[b:x] = reordered a[w:n]
wmerge(a, b, x, n, e, w);
}
// insert a[b:w] into a[b:e] using left shift
for (n = w; n > b; --n) {
t = a[n-1];
for (m = n; m < e && a[m] < t; ++m)
a[m-1] = a[m];
a[m-1] = t;
}
}
}
public static void main(String[] args) {
int[] a = new int[16*1024*1024];
Random r = new Random(0);
for(int i = 0; i < a.length; i++)
a[i] = r.nextInt();
long bgn, end;
bgn = System.currentTimeMillis();
imsort(a, 0, a.length);
end = System.currentTimeMillis();
for(int i = 1; i < a.length; i++){
if(a[i-1] > a[i]){
System.out.println("failed");
break;
}
}
System.out.println("milliseconds " + (end-bgn));
}
}
Related
I'm solving Codility questions as practice and couldn't answer one of the questions. I found the answer on the Internet but I don't get how this algorithm works. Could someone walk me through it step-by-step?
Here is the question:
/*
You are given integers K, M and a non-empty zero-indexed array A consisting of N integers.
Every element of the array is not greater than M.
You should divide this array into K blocks of consecutive elements.
The size of the block is any integer between 0 and N. Every element of the array should belong to some block.
The sum of the block from X to Y equals A[X] + A[X + 1] + ... + A[Y]. The sum of empty block equals 0.
The large sum is the maximal sum of any block.
For example, you are given integers K = 3, M = 5 and array A such that:
A[0] = 2
A[1] = 1
A[2] = 5
A[3] = 1
A[4] = 2
A[5] = 2
A[6] = 2
The array can be divided, for example, into the following blocks:
[2, 1, 5, 1, 2, 2, 2], [], [] with a large sum of 15;
[2], [1, 5, 1, 2], [2, 2] with a large sum of 9;
[2, 1, 5], [], [1, 2, 2, 2] with a large sum of 8;
[2, 1], [5, 1], [2, 2, 2] with a large sum of 6.
The goal is to minimize the large sum. In the above example, 6 is the minimal large sum.
Write a function:
class Solution { public int solution(int K, int M, int[] A); }
that, given integers K, M and a non-empty zero-indexed array A consisting of N integers, returns the minimal large sum.
For example, given K = 3, M = 5 and array A such that:
A[0] = 2
A[1] = 1
A[2] = 5
A[3] = 1
A[4] = 2
A[5] = 2
A[6] = 2
the function should return 6, as explained above. Assume that:
N and K are integers within the range [1..100,000];
M is an integer within the range [0..10,000];
each element of array A is an integer within the range [0..M].
Complexity:
expected worst-case time complexity is O(N*log(N+M));
expected worst-case space complexity is O(1), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
*/
And here is the solution I found with my comments about parts which I don't understand:
public static int solution(int K, int M, int[] A) {
int lower = max(A); // why lower is max?
int upper = sum(A); // why upper is sum?
while (true) {
int mid = (lower + upper) / 2;
int blocks = calculateBlockCount(A, mid); // don't I have specified number of blocks? What blocks do? Don't get that.
if (blocks < K) {
upper = mid - 1;
} else if (blocks > K) {
lower = mid + 1;
} else {
return upper;
}
}
}
private static int calculateBlockCount(int[] array, int maxSum) {
int count = 0;
int sum = array[0];
for (int i = 1; i < array.length; i++) {
if (sum + array[i] > maxSum) {
count++;
sum = array[i];
} else {
sum += array[i];
}
}
return count;
}
// returns sum of all elements in an array
private static int sum(int[] input) {
int sum = 0;
for (int n : input) {
sum += n;
}
return sum;
}
// returns max value in an array
private static int max(int[] input) {
int max = -1;
for (int n : input) {
if (n > max) {
max = n;
}
}
return max;
}
So what the code does is using a form of binary search (How binary search works is explained quite nicely here, https://www.topcoder.com/community/data-science/data-science-tutorials/binary-search/. It also uses an example quite similar to your problem.). Where you search for the minimum sum every block needs to contain. In the example case, you need the divide the array in 3 parts
When doing a binary search you need to define 2 boundaries, where you are certain that your answer can be found in between. Here, the lower boundary is the maximum value in the array (lower). For the example, this is 5 (this is if you divide your array in 7 blocks). The upper boundary (upper) is 15, which is the sum of all the elements in the array (this is if you divide the array in 1 block.)
Now comes the search part: In solution() you start with your bounds and mid point (10 for the example).
In calculateBlockCount you count (count ++ does that) how many blocks you can make if your sum is a maximum of 10 (your middle point/ or maxSum in calculateBlockCount).
For the example 10 (in the while loop) this is 2 blocks, now the code returns this (blocks) to solution. Then it checks whether is less or more than K, which is the number of blocks you want. If its less than K your mid point is high because you're putting to many array elements in your blocks. If it's more than K, than your mid point is too high and you're putting too little array elements in your array.
Now after the checking this, it halves the solution space (upper = mid-1).
This happens every loop, it halves the solution space which makes it converge quite quickly.
Now you keep going through your while adjusting the mid, till this gives the amount blocks which was in your input K.
So to go though it step by step:
Mid =10 , calculateBlockCount returns 2 blocks
solution. 2 blocks < K so upper -> mid-1 =9, mid -> 7 (lower is 5)
Mid =7 , calculateBlockCount returns 2 blocks
solution() 2 blocks < K so upper -> mid-1 =6, mid -> 5 (lower is 5, cast to int makes it 5)
Mid =5 , calculateBlockCount returns 4 blocks
solution() 4 blocks < K so lower -> mid+1 =6, mid -> 6 (lower is 6, upper is 6
Mid =6 , calculateBlockCount returns 3 blocks
So the function returns mid =6....
Hope this helps,
Gl learning to code :)
Edit. When using binary search a prerequisite is that the solution space is a monotonic function. This is true in this case as when K increases the sum is strictly decreasing.
Seems like your solution has some problems. I rewrote it as below:
class Solution {
public int solution(int K, int M, int[] A) {
// write your code in Java SE 8
int high = sum(A);
int low = max(A);
int mid = 0;
int smallestSum = 0;
while (high >= low) {
mid = (high + low) / 2;
int numberOfBlock = blockCount(mid, A);
if (numberOfBlock > K) {
low = mid + 1;
} else if (numberOfBlock <= K) {
smallestSum = mid;
high = mid - 1;
}
}
return smallestSum;
}
public int sum(int[] A) {
int total = 0;
for (int i = 0; i < A.length; i++) {
total += A[i];
}
return total;
}
public int max(int[] A) {
int max = 0;
for (int i = 0; i < A.length; i++) {
if (max < A[i]) max = A[i];
}
return max;
}
public int blockCount(int max, int[] A) {
int current = 0;
int count = 1;
for (int i = 0; i< A.length; i++) {
if (current + A[i] > max) {
current = A[i];
count++;
} else {
current += A[i];
}
}
return count;
}
}
This is helped me in case anyone else finds it helpful.
Think of it as a function: given k (the block count) we get some largeSum.
What is the inverse of this function? It's that given largeSum we get a k. This inverse function is implemented below.
In solution() we keep plugging guesses for largeSum into the inverse function until it returns the k given in the exercise.
To speed up the guessing process, we use binary search.
public class Problem {
int SLICE_MAX = 100 * 1000 + 1;
public int solution(int blockCount, int maxElement, int[] array) {
// maxGuess is determined by looking at what the max possible largeSum could be
// this happens if all elements are m and the blockCount is 1
// Math.max is necessary, because blockCount can exceed array.length,
// but this shouldn't lower maxGuess
int maxGuess = (Math.max(array.length / blockCount, array.length)) * maxElement;
int minGuess = 0;
return helper(blockCount, array, minGuess, maxGuess);
}
private int helper(int targetBlockCount, int[] array, int minGuess, int maxGuess) {
int guess = minGuess + (maxGuess - minGuess) / 2;
int resultBlockCount = inverseFunction(array, guess);
// if resultBlockCount == targetBlockCount this is not necessarily the solution
// as there might be a lower largeSum, which also satisfies resultBlockCount == targetBlockCount
if (resultBlockCount <= targetBlockCount) {
if (minGuess == guess) return guess;
// even if resultBlockCount == targetBlockCount
// we keep searching for potential lower largeSum that also satisfies resultBlockCount == targetBlockCount
// note that the search range below includes 'guess', as this might in fact be the lowest possible solution
// but we need to check in case there's a lower one
return helper(targetBlockCount, array, minGuess, guess);
} else {
return helper(targetBlockCount, array, guess + 1, maxGuess);
}
}
// think of it as a function: given k (blockCount) we get some largeSum
// the inverse of the above function is that given largeSum we get a k
// in solution() we will keep guessing largeSum using binary search until
// we hit k given in the exercise
int inverseFunction(int[] array, int largeSumGuess) {
int runningSum = 0;
int blockCount = 1;
for (int i = 0; i < array.length; i++) {
int current = array[i];
if (current > largeSumGuess) return SLICE_MAX;
if (runningSum + current <= largeSumGuess) {
runningSum += current;
} else {
runningSum = current;
blockCount++;
}
}
return blockCount;
}
}
From anhtuannd's code, I refactored using Java 8. It is slightly slower. Thanks anhtuannd.
IntSummaryStatistics summary = Arrays.stream(A).summaryStatistics();
long high = summary.getSum();
long low = summary.getMax();
long result = 0;
while (high >= low) {
long mid = (high + low) / 2;
AtomicLong blocks = new AtomicLong(1);
Arrays.stream(A).reduce(0, (acc, val) -> {
if (acc + val > mid) {
blocks.incrementAndGet();
return val;
} else {
return acc + val;
}
});
if (blocks.get() > K) {
low = mid + 1;
} else if (blocks.get() <= K) {
result = mid;
high = mid - 1;
}
}
return (int) result;
I wrote a 100% solution in python here. The result is here.
Remember: You are searching the set of possible answers not the array A
In the example given they are searching for possible answers. Consider [5] as 5 being the smallest max value for a block. And consider [2, 1, 5, 1, 2, 2, 2] 15 as the largest max value for a block.
Mid = (5 + 15) // 2. Slicing out blocks of 10 at a time won't create more than 3 blocks in total.
Make 10-1 the upper and try again (5+9)//2 is 7. Slicing out blocks of 7 at a time won't create more than 3 blocks in total.
Make 7-1 the upper and try again (5+6)//2 is 5. Slicing out blocks of 5 at a time will create more than 3 blocks in total.
Make 5+1 the lower and try again (6+6)//2 is 6. Slicing out blocks of 6 at a time won't create more than 3 blocks in total.
Therefore 6 is the lowest limit to impose on the sum of a block that will permit breaking into 3 blocks.
I am working on evaluating the Merge Sort algorithm and counting the critical operations. While technically this is homework, it is from a prior assignment and the code is a scaled down version to only show the area in question. I am trying to better understand my issue and debug my critical operations count to be accurately portrayed.
The project was to implement Merge Sort and evaluate it. The area of concern is counting critical operations and determining the deviation of the count by randomly filling an array with 10 different sizes and each size ran 50 different times (with different random data). My findings were that for each size the number of critical operations always ended the same (e.g. array of size 10 came to 68 critical operations regardless of the data) leaving a critical operations deviation of 0.
The professor stated this was inaccurate and there was something wrong with my program as it should produce different counts for differing data for each array length. I am trying to figure out what in my program is inaccurate causing this issue. I have checked that each pass is producing different array data and that this data is being passed to the sorting algorithm and properly sorted.
Below is my code that I wrote that, again, regardless of the data it still produces the same critical operations count. Can anyone pin point my issue? Regardless of what I do to the count it always produces the same value.
public class MergeSortSingle {
public static int count = 0;
private MergeSortSingle() { }
private static void merge(Comparable[] a, Comparable[] aux, int lo, int mid, int hi) {
// copy to aux[]
for (int k = lo; k <= hi; k++) {
count++;
aux[k] = a[k];
}
// merge back to a[]
int i = lo, j = mid+1;
for (int k = lo; k <= hi; k++) {
if (i > mid) {
count++;
a[k] = aux[j++];
}
else if (j > hi) {
count++;
a[k] = aux[i++];
}
else if (less(aux[j], aux[i])) {
count++;
a[k] = aux[j++];
}
else {
count++;
a[k] = aux[i++];
}
}
}
private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi) {
if (hi <= lo) return;
int mid = lo + (hi - lo) / 2;
sort(a, aux, lo, mid);
sort(a, aux, mid + 1, hi);
merge(a, aux, lo, mid, hi);
}
public static void sort(Comparable[] a) {
Comparable[] aux = new Comparable[a.length];
sort(a, aux, 0, a.length-1);
}
private static boolean less(Comparable v, Comparable w) {
return v.compareTo(w) < 0;
}
private static void show(Comparable[] a) {
System.out.println("\nAfter Sorting completed:");
System.out.print(Arrays.toString(a));
System.out.println();
}
public static void main(String[] args) {
int length = 10;
Comparable[] a = new Comparable[length];
for (int w = 0; w < length; w++) {
a[w] = (int) (Math.random() * 10000 + 1);
}
System.out.println("Before Sorting:");
System.out.print(Arrays.toString(a));
System.out.println();
MergeSortSingle.sort(a);
show(a);
System.out.println("\nCounter = " + count);
}
}
Sample Output 1:
Before Sorting:
[9661, 4831, 4865, 3383, 1451, 3029, 5258, 4788, 9463, 8971]
After Sorting completed:
[1451, 3029, 3383, 4788, 4831, 4865, 5258, 8971, 9463, 9661]
Counter = 68
Sample Output 2:
Before Sorting:
[9446, 230, 9089, 7865, 5829, 2589, 4068, 5608, 6138, 372]
After Sorting completed:
[230, 372, 2589, 4068, 5608, 5829, 6138, 7865, 9089, 9446]
Counter = 68
The merge sort code was utilized from:
http://algs4.cs.princeton.edu/22mergesort/Merge.java.html
You only counting while merging the sub-arrays - you use the counter for copying the array tro aux - that will always be the same number of operations, and then you use it again at the for loop - you have 4 paths there, and each of them increments the counter - again, a fixed number of times.
you have to count comarasions as well at sort - if (hi <= lo) - its an operations. If it fails - it's another operation.
I am trying to Implement a solutions to find k-th largest element in a given integer list with duplicates with O(N*log(N)) average time complexity in Big-O notation, where N is the number of elements in the list.
As per my understanding Merge-sort has an average time complexity of O(N*log(N)) however in my below code I am actually using an extra for loop along with mergesort algorithm to delete duplicates which is definitely violating my rule of find k-th largest element with O(N*log(N)). How do I go about it by achieving my task O(N*log(N)) average time complexity in Big-O notation?
public class FindLargest {
public static void nthLargeNumber(int[] arr, String nthElement) {
mergeSort_srt(arr, 0, arr.length - 1);
// remove duplicate elements logic
int b = 0;
for (int i = 1; i < arr.length; i++) {
if (arr[b] != arr[i]) {
b++;
arr[b] = arr[i];
}
}
int bbb = Integer.parseInt(nthElement) - 1;
// printing second highest number among given list
System.out.println("Second highest number is::" + arr[b - bbb]);
}
public static void mergeSort_srt(int array[], int lo, int n) {
int low = lo;
int high = n;
if (low >= high) {
return;
}
int middle = (low + high) / 2;
mergeSort_srt(array, low, middle);
mergeSort_srt(array, middle + 1, high);
int end_low = middle;
int start_high = middle + 1;
while ((lo <= end_low) && (start_high <= high)) {
if (array[low] < array[start_high]) {
low++;
} else {
int Temp = array[start_high];
for (int k = start_high - 1; k >= low; k--) {
array[k + 1] = array[k];
}
array[low] = Temp;
low++;
end_low++;
start_high++;
}
}
}
public static void main(String... str) {
String nthElement = "2";
int[] intArray = { 1, 9, 5, 7, 2, 5 };
FindLargest.nthLargeNumber(intArray, nthElement);
}
}
Your only problem here is that you don't understand how to do the time analysis. If you have one routine which takes O(n) and one which takes O(n*log(n)), running both takes a total of O(n*log(n)). Thus your code runs in O(n*log(n)) like you want.
To do things formally, we would note that the definition of O() is as follows:
f(x) ∈ O(g(x)) if and only if there exists values c > 0 and y such that f(x) < cg(x) whenever x > y.
Your merge sort is in O(n*log(n)) which tells us that its running time is bounded above by c1*n*log(n) when n > y1 for some c1,y1. Your duplication elimination is in O(n) which tells us that its running time is bounded above by c2*n when n > y2 for some c2 and y2. Using this, we can know that the total running time of the two is bounded above by c1*n*log(n)+c2*n when n > max(y1,y2). We know that c1*n*log(n)+c2*n < c1*n*log(n)+c2*n*log(n) because log(n) > 1, and this, of course simplifies to (c1+c2)*n*log(n). Thus, we can know that the running time of the two together is bounded above by (c1+c2)*n*log(n) when n > max(y1,y2) and thus, using c1+c2 as our c and max(y1,y2) as our y, we know that the running time of the two together is in O(n*log(n)).
Informally, you can just know that faster growing functions always dominate, so if one piece of code is O(n) and the second is O(n^2), the combination is O(n^2). If one is O(log(n)) and the second is O(n), the combination is O(n). If one is O(n^20) and the second is O(n^19.99), the combination is O(n^20). If one is O(n^2000) and the second is O(2^n), the combination is O(2^n).
Problem here is your merge routine where you have used another loop which i donot understand why, Hence i would say your algorithm of merge O(n^2) which changes your merge sort time to O(n^2).
Here is a pseudo code for typical O(N) merge routine :-
void merge(int low,int high,int arr[]) {
int buff[high-low+1];
int i = low;
int mid = (low+high)/2;
int j = mid +1;
int k = 0;
while(i<=mid && j<=high) {
if(arr[i]<arr[j]) {
buff[k++] = arr[i];
i++;
}
else {
buff[k++] = arr[j];
j++;
}
}
while(i<=mid) {
buff[k++] = arr[i];
i++;
}
while(j<=high) {
buff[k++] = arr[j];
j++;
}
for(int x=0;x<k;x++) {
arr[low+x] = buff[x];
}
}
Method needs to return the k elements a[i] such that ABS(a[i] - val) are the k largest evaluation. My code only works for integers greater than val. It will fail if integers less than val. Can I do this without importing anything other than java.util.Arrays? Could somebody just enlighten me? Any help will be much appreciated!
public static int[] farthestK(int[] a, int val, int k) {// This line should not change
int[] b = new int[a.length];
for (int i = 0; i < b.length; i++) {
b[i] = Math.abs(a[i] - val);
}
Arrays.sort(b);
int[] c = new int[k];
int w = 0;
for (int i = b.length-1; i > b.length-k-1; i--) {
c[w] = b[i] + val;
w++;
}
return c;
}
test case:
#Test public void farthestKTest() {
int[] a = {-2, 4, -6, 7, 8, 13, 15};
int[] expected = {15, -6, 13, -2};
int[] actual = Selector.farthestK(a, 4, 4);
Assert.assertArrayEquals(expected, actual);
}
There was 1 failure:
1) farthestKTest(SelectorTest)
arrays first differed at element [1]; expected:<-6> but was:<14>
FAILURES!!!
Tests run: 1, Failures: 1
The top k problem can be solved in many ways. In your case you add a new parameter, but it really doesn't matter.
The first and the easiest one: just sort the array. Time complexity: O(nlogn)
public static int[] farthestK(Integer[] a, final int val, int k) {
Arrays.sort(a, new java.util.Comparator<Integer>() {
#Override
public int compare(Integer o1, Integer o2) {
return -Math.abs(o1 - val) + Math.abs(o2 - val);
}
});
int[] c = new int[k];
for (int i = 0; i < k; i++) {
c[i] = a[i];
}
return c;
}
The second way: use a heap to save the max k values, Time complexity: O(nlogk)
/**
* Use a min heap to save the max k values. Time complexity: O(nlogk)
*/
public static int[] farthestKWithHeap(Integer[] a, final int val, int k) {
PriorityQueue<Integer> minHeap = new PriorityQueue<Integer>(4,
new java.util.Comparator<Integer>() {
#Override
public int compare(Integer o1, Integer o2) {
return Math.abs(o1 - val) - Math.abs(o2 - val);
}
});
for (int i : a) {
minHeap.add(i);
if (minHeap.size() > k) {
minHeap.poll();
}
}
int[] c = new int[k];
for (int i = 0; i < k; i++) {
c[i] = minHeap.poll();
}
return c;
}
The third way: divide and conquer, just like quicksort. Partition the array to two part, and find the kth in one of them. Time complexity: O(n + klogk)
The code is a little long, so i just provide link here.
Selection problem.
Sorting the array will cost you O(n log n) time. You can do it in O(n) time using k-selection.
Compute an array B, where B[i] = abs(A[i] - val). Then your problem is equivalent to finding the k values farthest from zero in B. Since each B[i] >= 0, this is equivalent to finding the k largest elements in B.
Run k-selection on B looking for the (n - k)th element. See Quickselect on Wikipedia for an O(n) expected time algorithm.
After k-selection is complete, B[n - k] through B[n - 1] contain the largest elements in B. With proper bookkeeping, you can link back to the elements in A that correspond to them (see pseudocode below).
Time complexity: O(n) time for #1, O(n) time for #2, and O(k) time for #3 => a total time complexity of O(n). (Quickselect runs in O(n) expected time, and there exist complicated worst-case linear time selection algorithms).
Space complexity: O(n).
Pseudocode:
farthest_from(k, val, A):
let n = A.length
# Compute B. Elements are objects to
# keep track of the original element in A.
let B = array[0 .. n - 1]
for i between 0 and n - 1:
B[i] = {
value: abs(A[i] - val)
index: i
}
# k_selection should know to compare
# elements in B by their "value";
# e.g., each B[i] could be java.lang.Comparable.
k_selection(n - k - 1, B)
# Use the top elements in B to link back to A.
# Return the result.
let C = array[0 .. k - 1]
for i between 0 and k - 1:
C[i] = A[B[n - k + i].index]
return C
You can modify this algorithm a little and use it for printing k elements according to your requirement.(This is the only work you will need to do with some changes in this algorithm.)
Explore this link.
http://jakharmania.blogspot.in/2013/08/selection-of-kth-largest-element-java.html
This algo uses Selection Sort - so the output would be a Logarithmic Time Complexity based answer which is very efficient.
O(n) algorithm, from Wikipedia entry on partial sorting:
Find the k-th smallest element using the linear time median-of-medians selection algorithm. Then make a linear pass to select the elements smaller than the k-th smallest element.
The collection in this case is created by taking the original array, subtracting the given value, taking the absolute value, (and then negating it so that largest becomes smallest).
I am trying to code quicksort in two ways, one in-place, and the other by using separate arrays. I'm kind of stuck on some of the logic, take a look at what I have, Thanks for the help in advance!
public List<Integer> sort(List<Integer> arr){
if(arr.length > 0)
List<Integer> ret = new ArrayList<Integer>();
ret = quickSort(arr);
return ret;
}
public List<Integer> quickSort(List<Integer> arr){
if(arr.length < 2)
return;
int pivot = arr[0];
List<Integer> left = new ArrayList<Integer>();
List<Integer> right = new ArrayList<Integer>();
for(int i = 0; i < arr.length; i++){
if(arr[i] <= pivot)
left.add(arr[i]);
else
right.add(arr[i]);
}
quickSort(left);
quickSort(right);
}
Now i'm stuck, I don't know what I would do after recursively going through both sets, mostly stuck on how would I connect them together and return a sorted list.
You need to combine left and right sequences together. You need to do it at the end of your algorithm (before the closing }). In pseudo code:
int leftpos = 0, rightpos = 0;
List newlist = new ArrayList();
for(int pos = 0; pos < arr.length; pos++)
if left[pos] < right[pos] newlist.add(left[leftpos++]);
else newlist.add(right[rightpos++]);
return newlist;
This is just a pseudo-code. You need to add code to check lengths of each array (left and right) in the for cycle.
Also I must note that this is far from quicksort. So many new array allocations make the algorithm extremely slow and that's unwelcome when sorting.
Also, right side of line 3 is redundant. You don't need to allocate anything here, as it is overwritten in the next line. I would just simply replace your lines 3-5 with this:
return quickSort(arr);
Let me have a crack at this for you.
First off, you always want to do in-place sorting unless you're working with linked lists (and even then it usually pays to convert to an array, sort in place, then convert back to a linked list -- it puts way less pressure on the garbage collector). .NET List<>s are actually expanding arrays.
Next, quicksort is really all about the pivot operation. Here's one way to do it:
// Quicksort the sub-array xs[lo..hi].
void QSort(int[] xs, int lo, int hi) {
if (hi <= lo) return; // Don't sort empty or singleton sub-arrays.
var p = [choose some pivot value from xs[lo..hi]];
var a = lo; // Invariant: x[lo..a - 1] <= p.
var z = hi; // Invariant: p < x[z + 1..hi].
while (a <= z) {
if (xs[a] <= p) a++; else Swap(xs, a, z--);
}
QSort(xs, lo, a - 1); // Sort the items <= p.
QSort(xs, z + 1, hi); // Sort the items > p.
}
void Swap(int[] xs, int i, int j) {
var tmp = xs[i];
xs[i] = xs[j];
xs[j] = tmp;
}
Simple implementation on Groovy
def qs(list) {
if (list.size() < 2) return list
def pivot = list[0]
def items = list.groupBy { it <=> pivot }.withDefault { [] }
qs(items[-1]) + items[0] + qs(items[1])
}