I've been trying to implement median-of-three into this implementation of a quick-sort. I know this may not be the best implementation of a quick sort but unfortunately I am forced to work with this.
public static <E extends Comparable<E>> void quickSort(E[] list){
quickSort(list, 0, list.length - 1);
}
private static <E extends Comparable<E>> void quickSort(E[] list, int first, int last){
if (last > first) {
int pivotIndex = partition(list, first, last);
quickSort(list, first, pivotIndex - 1);
quickSort(list, pivotIndex + 1, last);
}
}
private static <E extends Comparable<E>> int partition(E[] list, int first, int last){
E pivot = list[first];
int low = first + 1;
int high = last;
while (high > low) {
while (low <= high && (list[low].compareTo(pivot) <= 0)){
low++;
}
while (low <= high && (list[high].compareTo(pivot) > 0)){
high--;
}
if (high > low){
E temp = list[high];
list[high] = list[low];
list[low] = temp;
}
}
while (high > first && (list[high].compareTo(pivot) >= 0)){
high--;
}
if (pivot.compareTo(list[high]) > 0){
list[first] = list[high];
list[high] = pivot;
return high;
} else{
return first;
}
}
What I've done first is alter it to work with Generic arrays. Now I need to set the pivot to the median of the first three values in the list array.
I understand how to get the median of the first three values. What I don't understand is how it affects how this quick sort implementation works.
After setting the pivot to the median value, how does that affect the forward and backward searches? In the code shown, low is set to the "left" element incremented by 1. Would I increment the pivot value by 1 in my particular case? Can somebody explain the logic behind the particular median-of-three I am trying to implement?
Usually with a Lomotu type scheme as seen in the example code, compare [low] with [middle] and [high] values and swap as needed so that the median value ends up at array[low]. This prevents worst case issues if sorting an already sorted or reverse sorted array. Using the median of the first 3 values would not prevent worst case for ordered or reversed ordered data.
With a Hoare type partition scheme, this is done by setting the pivot index to the middle of the array, and swapping with [low] and/or [high] as needed to end up with the median of 3 (low, middle, high) element at array[pivot].
Related
I am trying to develop a faster way than what I currently have to add an element to a sorted array list. Currently this is my strategy
public void insertSorted(E value) {
add(value);
for (int i = size() - 1; i > 0 && value.compareTo(get(i - 1)) < 0; i--) {
this.swap(i);
}
}
and my add method...
public void add(E element) {
ensureCapacity();
array[size++] = element;
}
So I read that using a binary search algorithm I could more efficiently find the best way to put an element even faster.
I tried developing that, but somehow it always outputs me 0.
private int binarySearch(E value) {
int low = 0;
int high = this.size()-1;
while (low <= high) {
int mid = (low + high) / 2;
E midVal = this.get(mid);
int cmp = midVal.compareTo(value);
if (cmp < 0)
low = mid + 1;
else if (cmp > 0)
high = mid - 1;
else
return mid;
}
return low;
}
public void insertSorted(E value) {
int searchResult = binarySearch(value);
add(value, searchResult);
System.out.println("Value: " + value + ". Position = " + searchResult);
}
Could someone help me out? If necessary I will show full code
Rather than developing your own binary search, use built-in Arrays.binarySearch implementation. However, this wouldn't give you much improvement over your original version in terms of time.
To see why, consider the steps that you take to place the value in the sorted sequence:
Find the insertion position
Move items to the right of insertion position by one
Place the element into insertion position
The first step can be done in O(log2N). The second step takes O(N). The last step takes O(1). Overall, insertion's time complexity is O(log2N + N + 1), which is the same as O(N). The algorithm is dominated by the second step, so you might as well use linear search as you move items to the right by one.
I am working out a solution to the following question:
Describe an algorithm to find the smallest one million numbers in one
billion numbers. Assume that the computer memory can hold all one
billion numbers.
The book gives the a selection rank solution but I am having a hard time understanding a few parts of it:
public static int partition(int[] array, int left, int right, int pivot) {
while (true) {
while (left <= right && array[left] <= pivot) {
left++;
}
while (left <= right && array[right] > pivot) {
right--;
}
if (left > right) {
return left - 1;
}
swap(array, left, right);
}
}
public static int rank(int[] array, int left, int right, int rank) {
int pivot = array[randomIntInRange(left, right)];
int leftEnd = partition(array, left, right, pivot); // returns end of left partition
int leftSize = leftEnd - left + 1;
if (leftSize == rank + 1) {
return max(array, left, leftEnd);
} else if (rank < leftSize) {
return rank(array, left, leftEnd, rank);
} else {
return rank(array, leftEnd + 1, right, rank - leftSize);
}
}
I understand most of it, but I do no understand the following two lines above:
if (leftSize == rank + 1) {
return max(array, left, leftEnd);
1. Why are we returning the max of the three variables?
2. Shouldn't we just be returning array[left:leftEnd] or something of that nature?
Congratulations on trying to learn something by carefully studying a book. It's a key skill that seems to be getting rarer.
It makes general sense if the definition of the return value of rank is "there exist exactly a million numbers less than or equal to rank." The definition of max would be something like:
int t = array[left];
for (int i = left + 1; i <= leftEnd; i++)
t = Math.max(t, array[i]);
return t;
Returning the max value is beyond the problem statement and kind of weird. It would be better and simpler just to partition the elements so that the max million are at the top: array[0] through array[999999]. Then find the max only if that's actually needed
Note that because rank is tail recursive, there's a simple iterative version of the same code that I think would be clearer.
I'm also not convinced this code is correct. leftSize == rank in the check makes more sense than leftSize == rank + 1. But without more definitions and calling code, it's hard to say for sure.
The same rank function is used in Cracking the Coding Interview 6th Edition p. 569 (aside from the line: if (leftSize == rank + 1) { is modified to rank - 1).
The max function is provided, listed below:
/* Get largest element in array between left and right indices */
int max(int[] array, int left, int right)
{
int max = Integer.MIN_VALUE;
for(int i = left; i <= right; i++)
{
max = Math.max(array[i], max);
}
return max;
}
As for an explanation: rank(array, rank) returns the element which would be at the ith position of the sorted array.
leftEnd is the position of the pivot. If there are rank - 1 elements before the pivot element, then there are rank elements including said pivot.
I believe that the pivot will always be the max element, therefore the call to max(array, left, leftEnd); can be replaced with return pivot;
I'm having a problem with the partitioning while trying to implement the Quicksort algorithm. My implementation works good with arrays up to a size of 10,000 but over that I'm getting a StackOverflowError. Note that this only happens when the input arrays are in a/descending order. Random-ordered arrays can be up to 10,000,000,000 before they cause the same issue.
I'm pretty sure that there is something wrong in the part when I'm partitioning the array, but I can't really see what's wrong. I've tried debugging but had no success with finding the issue. I understand that the error is caused by too many recursive calls but as far as I know, the stack shouldn't overflow if the partitioning is well implemented.
Thanks in advance :)
My code:
public void sort(int[] v, int first, int last) {
if (first >= last) return;
//less than two elements left in subvector
// Partition the elements so that every number of
// v[first..mid] <= p and every number of v[mid+1..last] > p.
int[]subvector = new int[2];
subvector = partition(v, first, last);
sort(v, first, subvector[0]-1);
sort(v, subvector[1], last);
}
And the partitioning method:
private int[] partition(int[] v, int first, int last) {
int low = first;
int mid = first;
int high = last;
int pivot = getPivot(v, last);
while (mid <= high) {
// Invariant:
// - v[first..low-1] < pivot
// - v[low..mid-1] = pivot
// - v[mid..high] are unknown
// - v[high+1..last] > pivot
//
// < pivot = pivot unknown > pivot
// -----------------------------------------------
// v: | | |a | |
// -----------------------------------------------
// ^ ^ ^ ^ ^
// first low mid high last
//
int a = v[mid];
if (a < pivot) {
v[mid] = v[low];
v[low] = a;
low++;
mid++;
} else if (a == pivot) {
mid++;
} else { // a > pivot
v[mid] = v[high];
v[high] = a;
high--;
}
}
return new int[]{low, high};
}
Quicksort is known to be O(n^2) worst case, which is when you give it sorted input and choose the worst pivot (the highest or lowest element). That will also have the effect of causing very deep recursion, as you have seen. You don't include your pivot selection mechanism, so I can't be sure what you're doing, but you appear to select the last element. Some googling will turn up extensive discussions on pivot selection for qsort.
I run Quicksort 10 times, and get the average mean time.
I do the same thing for Qicksort/Insertion sort combination, and it seems to be slower than just quicksort.
Here's the part of the code where I call InsertionSort
public static <T extends Comparable<? super T>> void OptQSort2 (T[] data, int min, int max) {
int indexofpartition;
if(max - min > 0) {
if( (max - min) <= 10) {
// Use InsertionSort now
InsertionSort.sort(data);
return;
} else {
indexofpartition = findPartition(data, min, max);
OptQSort2(data, min, indexofpartition - 1);
OptQSort2(data, indexofpartition + 1, max);
}
}
}
And the regular Quicksort is just the same as the above snippet, but without the if condition that calls InsertionSort.
FindPartition is as follows:
public static <T extends Comparable<? super T>> int findPartition(T[] data, int min, int max) {
int left, right;
T temp, partitionelement;
int middle = (min + max)/2;
partitionelement = data[middle];
left = min;
right = max;
while(left < right) {
while(data[left].compareTo(partitionelement) <= 0 && left < right)
left++;
while(data[right].compareTo(partitionelement) > 0)
right--;
if(left < right) {
temp = data[left];
data[left] = data[right];
data[right] = temp;
}
}
The mean time for just Quicksort and OptSort2(which uses insertion sort) are
Sorted using QuickSort in: 3858841
Sorted using OptQSort2 in: 34359610
Any ideas why? Does the size of the sequence matter? I am using a 1000 element Integer[] array for this
In OptQSort2, for small partitions, you have the following function call:
InsertionSort.sort(data);
Is this supposed to insertion sort the small partition? It looks like you are insertion sorting the entire array. Shouldn't you pass the min and max indexes to InsertionSort?
Another option is to simply do no work on small partitions during OptQSort2. Then perform a single InsertionSort pass over the entire array after OptQSort2 has done its work.
You will need a much larger integer array for the test to be relevant. At this point, probably testing the if condition slows down your algorithm in the QS+IS case.
Test for a large amount of numbers and switch to IS when the size of the data is enough to fit in the L1 cache i.e. 32-64kb.
First suspect is obviously your insertion sort method. Does it really sort for example?
You will also need to test it many more than 10 times to warm up the JVM. And also to test them in both orders so one doesn't benefit from the warming up performed by the other. I would suggest 100 or 1000 tests. And they must all be on the same dataset too.
You should not call InsertionSort each time you have a subarray of at most 10 elements. Don't do anything:
public static <T extends Comparable<? super T>> void OptQSort2 (T[] data, int min, int max) {
int indexofpartition;
if( (max - min) > 10) {
indexofpartition = findPartition(data, min, max);
OptQSort2(data, min, indexofpartition - 1);
OptQSort2(data, indexofpartition + 1, max);
}
}
When you are finished call InsertionSort for the whole array.
The implementation below is stable as it used <= instead of < at line marked XXX. This also makes it more efficient. Is there any reason to use < and not <= at this line?
/**
class for In place MergeSort
**/
class MergeSortAlgorithm extends SortAlgorithm {
void sort(int a[], int lo0, int hi0) throws Exception {
int lo = lo0;
int hi = hi0;
pause(lo, hi);
if (lo >= hi) {
return;
}
int mid = (lo + hi) / 2;
/*
* Partition the list into two lists and sort them recursively
*/
sort(a, lo, mid);
sort(a, mid + 1, hi);
/*
* Merge the two sorted lists
*/
int end_lo = mid;
int start_hi = mid + 1;
while ((lo <= end_lo) && (start_hi <= hi)) {
pause(lo);
if (stopRequested) {
return;
}
if (a[lo] <= a[start_hi]) { // LINE XXX
lo++;
} else {
/*
* a[lo] >= a[start_hi]
* The next element comes from the second list,
* move the a[start_hi] element into the next
* position and shuffle all the other elements up.
*/
int T = a[start_hi];
for (int k = start_hi - 1; k >= lo; k--) {
a[k+1] = a[k];
pause(lo);
}
a[lo] = T;
lo++;
end_lo++;
start_hi++;
}
}
}
void sort(int a[]) throws Exception {
sort(a, 0, a.length-1);
}
}
Because the <= in your code assures that same-valued elements (in left- and right-half of sorting array) won't be exchanged.
And also, it avoids useless exchanges.
if (a[lo] <= a[start_hi]) {
/* The left value is smaller than or equal to the right one, leave them as is. */
/* Especially, if the values are same, they won't be exchanged. */
lo++;
} else {
/*
* If the value in right-half is greater than that in left-half,
* insert the right one into just before the left one, i.e., they're exchanged.
*/
...
}
Assume that same-valued element (e.g., ‘5’) in both-halves and the operator above is <.
As comments above shows, the right ‘5’ will be inserted before the left ‘5’, in other words, same-valued elements will be exchanged.
This means the sort is not stable.
And also, it's inefficient to exchange same-valued elements.
I guess the cause of inefficiency comes from the algorithm itself.
Your merging stage is implemented using insertion sort (as you know, it's O(n^2)).
You may have to re-implement when you sort huge arrays.
Fastest known in place stable sort:
http://thomas.baudel.name/Visualisation/VisuTri/inplacestablesort.html