Recursion calls overflows stack - java

I'm having a problem with the partitioning while trying to implement the Quicksort algorithm. My implementation works good with arrays up to a size of 10,000 but over that I'm getting a StackOverflowError. Note that this only happens when the input arrays are in a/descending order. Random-ordered arrays can be up to 10,000,000,000 before they cause the same issue.
I'm pretty sure that there is something wrong in the part when I'm partitioning the array, but I can't really see what's wrong. I've tried debugging but had no success with finding the issue. I understand that the error is caused by too many recursive calls but as far as I know, the stack shouldn't overflow if the partitioning is well implemented.
Thanks in advance :)
My code:
public void sort(int[] v, int first, int last) {
if (first >= last) return;
//less than two elements left in subvector
// Partition the elements so that every number of
// v[first..mid] <= p and every number of v[mid+1..last] > p.
int[]subvector = new int[2];
subvector = partition(v, first, last);
sort(v, first, subvector[0]-1);
sort(v, subvector[1], last);
}
And the partitioning method:
private int[] partition(int[] v, int first, int last) {
int low = first;
int mid = first;
int high = last;
int pivot = getPivot(v, last);
while (mid <= high) {
// Invariant:
// - v[first..low-1] < pivot
// - v[low..mid-1] = pivot
// - v[mid..high] are unknown
// - v[high+1..last] > pivot
//
// < pivot = pivot unknown > pivot
// -----------------------------------------------
// v: | | |a | |
// -----------------------------------------------
// ^ ^ ^ ^ ^
// first low mid high last
//
int a = v[mid];
if (a < pivot) {
v[mid] = v[low];
v[low] = a;
low++;
mid++;
} else if (a == pivot) {
mid++;
} else { // a > pivot
v[mid] = v[high];
v[high] = a;
high--;
}
}
return new int[]{low, high};
}

Quicksort is known to be O(n^2) worst case, which is when you give it sorted input and choose the worst pivot (the highest or lowest element). That will also have the effect of causing very deep recursion, as you have seen. You don't include your pivot selection mechanism, so I can't be sure what you're doing, but you appear to select the last element. Some googling will turn up extensive discussions on pivot selection for qsort.

Related

BinarySearch in a SortedArrayList

I am trying to develop a faster way than what I currently have to add an element to a sorted array list. Currently this is my strategy
public void insertSorted(E value) {
add(value);
for (int i = size() - 1; i > 0 && value.compareTo(get(i - 1)) < 0; i--) {
this.swap(i);
}
}
and my add method...
public void add(E element) {
ensureCapacity();
array[size++] = element;
}
So I read that using a binary search algorithm I could more efficiently find the best way to put an element even faster.
I tried developing that, but somehow it always outputs me 0.
private int binarySearch(E value) {
int low = 0;
int high = this.size()-1;
while (low <= high) {
int mid = (low + high) / 2;
E midVal = this.get(mid);
int cmp = midVal.compareTo(value);
if (cmp < 0)
low = mid + 1;
else if (cmp > 0)
high = mid - 1;
else
return mid;
}
return low;
}
public void insertSorted(E value) {
int searchResult = binarySearch(value);
add(value, searchResult);
System.out.println("Value: " + value + ". Position = " + searchResult);
}
Could someone help me out? If necessary I will show full code
Rather than developing your own binary search, use built-in Arrays.binarySearch implementation. However, this wouldn't give you much improvement over your original version in terms of time.
To see why, consider the steps that you take to place the value in the sorted sequence:
Find the insertion position
Move items to the right of insertion position by one
Place the element into insertion position
The first step can be done in O(log2N). The second step takes O(N). The last step takes O(1). Overall, insertion's time complexity is O(log2N + N + 1), which is the same as O(N). The algorithm is dominated by the second step, so you might as well use linear search as you move items to the right by one.

Implementing median-of-three into generic quicksort

I've been trying to implement median-of-three into this implementation of a quick-sort. I know this may not be the best implementation of a quick sort but unfortunately I am forced to work with this.
public static <E extends Comparable<E>> void quickSort(E[] list){
quickSort(list, 0, list.length - 1);
}
private static <E extends Comparable<E>> void quickSort(E[] list, int first, int last){
if (last > first) {
int pivotIndex = partition(list, first, last);
quickSort(list, first, pivotIndex - 1);
quickSort(list, pivotIndex + 1, last);
}
}
private static <E extends Comparable<E>> int partition(E[] list, int first, int last){
E pivot = list[first];
int low = first + 1;
int high = last;
while (high > low) {
while (low <= high && (list[low].compareTo(pivot) <= 0)){
low++;
}
while (low <= high && (list[high].compareTo(pivot) > 0)){
high--;
}
if (high > low){
E temp = list[high];
list[high] = list[low];
list[low] = temp;
}
}
while (high > first && (list[high].compareTo(pivot) >= 0)){
high--;
}
if (pivot.compareTo(list[high]) > 0){
list[first] = list[high];
list[high] = pivot;
return high;
} else{
return first;
}
}
What I've done first is alter it to work with Generic arrays. Now I need to set the pivot to the median of the first three values in the list array.
I understand how to get the median of the first three values. What I don't understand is how it affects how this quick sort implementation works.
After setting the pivot to the median value, how does that affect the forward and backward searches? In the code shown, low is set to the "left" element incremented by 1. Would I increment the pivot value by 1 in my particular case? Can somebody explain the logic behind the particular median-of-three I am trying to implement?
Usually with a Lomotu type scheme as seen in the example code, compare [low] with [middle] and [high] values and swap as needed so that the median value ends up at array[low]. This prevents worst case issues if sorting an already sorted or reverse sorted array. Using the median of the first 3 values would not prevent worst case for ordered or reversed ordered data.
With a Hoare type partition scheme, this is done by setting the pivot index to the middle of the array, and swapping with [low] and/or [high] as needed to end up with the median of 3 (low, middle, high) element at array[pivot].

Understanding why Java selection rank returns max() as final result

I am working out a solution to the following question:
Describe an algorithm to find the smallest one million numbers in one
billion numbers. Assume that the computer memory can hold all one
billion numbers.
The book gives the a selection rank solution but I am having a hard time understanding a few parts of it:
public static int partition(int[] array, int left, int right, int pivot) {
while (true) {
while (left <= right && array[left] <= pivot) {
left++;
}
while (left <= right && array[right] > pivot) {
right--;
}
if (left > right) {
return left - 1;
}
swap(array, left, right);
}
}
public static int rank(int[] array, int left, int right, int rank) {
int pivot = array[randomIntInRange(left, right)];
int leftEnd = partition(array, left, right, pivot); // returns end of left partition
int leftSize = leftEnd - left + 1;
if (leftSize == rank + 1) {
return max(array, left, leftEnd);
} else if (rank < leftSize) {
return rank(array, left, leftEnd, rank);
} else {
return rank(array, leftEnd + 1, right, rank - leftSize);
}
}
I understand most of it, but I do no understand the following two lines above:
if (leftSize == rank + 1) {
return max(array, left, leftEnd);
1. Why are we returning the max of the three variables?
2. Shouldn't we just be returning array[left:leftEnd] or something of that nature?
Congratulations on trying to learn something by carefully studying a book. It's a key skill that seems to be getting rarer.
It makes general sense if the definition of the return value of rank is "there exist exactly a million numbers less than or equal to rank." The definition of max would be something like:
int t = array[left];
for (int i = left + 1; i <= leftEnd; i++)
t = Math.max(t, array[i]);
return t;
Returning the max value is beyond the problem statement and kind of weird. It would be better and simpler just to partition the elements so that the max million are at the top: array[0] through array[999999]. Then find the max only if that's actually needed
Note that because rank is tail recursive, there's a simple iterative version of the same code that I think would be clearer.
I'm also not convinced this code is correct. leftSize == rank in the check makes more sense than leftSize == rank + 1. But without more definitions and calling code, it's hard to say for sure.
The same rank function is used in Cracking the Coding Interview 6th Edition p. 569 (aside from the line: if (leftSize == rank + 1) { is modified to rank - 1).
The max function is provided, listed below:
/* Get largest element in array between left and right indices */
int max(int[] array, int left, int right)
{
int max = Integer.MIN_VALUE;
for(int i = left; i <= right; i++)
{
max = Math.max(array[i], max);
}
return max;
}
As for an explanation: rank(array, rank) returns the element which would be at the ith position of the sorted array.
leftEnd is the position of the pivot. If there are rank - 1 elements before the pivot element, then there are rank elements including said pivot.
I believe that the pivot will always be the max element, therefore the call to max(array, left, leftEnd); can be replaced with return pivot;

How to find the median of a large number of integers (they dont fit in memory)

I know the answer is using median of medians but can someone explain how to do it?
There are linear time algorithms to do this, this page might be helpful, http://en.wikipedia.org/wiki/Selection_algorithm, if you are still confused just ask
Basically the way the selection algorithm works is like a quicksort but it only sorts on side of the pivot each time. The goal is to keep partitioning until you choose the pivot equal to the index of the element you were trying to find. Here is java code I found for quickselect:
public static int selectKth(int[] arr, int k) {
if (arr == null || arr.length <= k)
throw new Error();
int from = 0, to = arr.length - 1;
// if from == to we reached the kth element
while (from < to) {
int r = from, w = to;
int mid = arr[(r + w) / 2];
// stop if the reader and writer meets
while (r < w) {
if (arr[r] >= mid) { // put the large values at the end
int tmp = arr[w];
arr[w] = arr[r];
arr[r] = tmp;
w--;
} else { // the value is smaller than the pivot, skip
r++;
}
}
// if we stepped up (r++) we need to step one down
if (arr[r] > mid)
r--;
// the r pointer is on the end of the first k elements
if (k <= r) {
to = r;
} else {
from = r + 1;
}
}
return arr[k];
}
here is the Median of Medians algorithm. check this out
See the first two answers to this question. If the first one (frequency counts) can work for your data / available storage, you can get the exact answer that way. The second (remedian) is a robust, general method.

why is in place merge sort not stable?

The implementation below is stable as it used <= instead of < at line marked XXX. This also makes it more efficient. Is there any reason to use < and not <= at this line?
/**
class for In place MergeSort
**/
class MergeSortAlgorithm extends SortAlgorithm {
void sort(int a[], int lo0, int hi0) throws Exception {
int lo = lo0;
int hi = hi0;
pause(lo, hi);
if (lo >= hi) {
return;
}
int mid = (lo + hi) / 2;
/*
* Partition the list into two lists and sort them recursively
*/
sort(a, lo, mid);
sort(a, mid + 1, hi);
/*
* Merge the two sorted lists
*/
int end_lo = mid;
int start_hi = mid + 1;
while ((lo <= end_lo) && (start_hi <= hi)) {
pause(lo);
if (stopRequested) {
return;
}
if (a[lo] <= a[start_hi]) { // LINE XXX
lo++;
} else {
/*
* a[lo] >= a[start_hi]
* The next element comes from the second list,
* move the a[start_hi] element into the next
* position and shuffle all the other elements up.
*/
int T = a[start_hi];
for (int k = start_hi - 1; k >= lo; k--) {
a[k+1] = a[k];
pause(lo);
}
a[lo] = T;
lo++;
end_lo++;
start_hi++;
}
}
}
void sort(int a[]) throws Exception {
sort(a, 0, a.length-1);
}
}
Because the <= in your code assures that same-valued elements (in left- and right-half of sorting array) won't be exchanged.
And also, it avoids useless exchanges.
if (a[lo] <= a[start_hi]) {
/* The left value is smaller than or equal to the right one, leave them as is. */
/* Especially, if the values are same, they won't be exchanged. */
lo++;
} else {
/*
* If the value in right-half is greater than that in left-half,
* insert the right one into just before the left one, i.e., they're exchanged.
*/
...
}
Assume that same-valued element (e.g., ‘5’) in both-halves and the operator above is <.
As comments above shows, the right ‘5’ will be inserted before the left ‘5’, in other words, same-valued elements will be exchanged.
This means the sort is not stable.
And also, it's inefficient to exchange same-valued elements.
I guess the cause of inefficiency comes from the algorithm itself.
Your merging stage is implemented using insertion sort (as you know, it's O(n^2)).
You may have to re-implement when you sort huge arrays.
Fastest known in place stable sort:
http://thomas.baudel.name/Visualisation/VisuTri/inplacestablesort.html

Categories

Resources