Early min on quicksort in java - java

I have a problem I can't -quite- wrap my head around. I'm racing quicksort vs heapsort on one million integers. Each of the sorting algorithms divides the list into 32 threaded sub-sections and then sorts those, finally merging each sorted subsection back into a coherent whole (using more threads).
Rather than sorting the whole sub-array using quicksort and then pushing the whole thing out to the mergers, I'm trying to find a way to find the min value(s) using quicksort and push that before the whole section has been sorted. Since I don't actually have enough cores to concurrently run all my threads I expect the performance impact to be negligible, but I'd like to test that theory.
Time on my machine with 4 cores is about 14 seconds plus or minus two for both heapsort and quicksort. Speed seems to depend on which algorithm I run first, even though I'm shuffling after the sort.
Here's my quicksort code right now:
//code borrowed from http://stackoverflow.com/questions/19124752/non-
//recursive-quicksort
public synchronized void qSort(PipedOutputStream pout, int start, int sz) {
Deque<int[]> stack = new ArrayDeque<int[]>();
int first = start;
int last = start + sz - 1;
if(first >= arr.length || last >= arr.length){
System.out.printf("\nSorting error: parameters out of bounds! Start %d, end %d \n", start, last);
return;
}
stack.push(new int[] {first, last});
while(!stack.isEmpty()) {
qsortStep(arr, stack);
}
try { out = new ObjectOutputStream ( pout ); }
catch (IOException e ) { e.printStackTrace(); }
while ( size >= 1 ) {
try {
// System.out.printf("sort is writing %d to pipe.\n", v );
out.writeObject( arr[start] );
out.flush( );
start++; size--;
} catch(IOException e ) { e.printStackTrace(); }
}
}
private synchronized void qsortStep(T[] list, Deque<int[]> stack) {
if(stack.isEmpty())
return;
int temp[] = stack.pop();
int first = temp[0];
int last = temp[1];
int boundLo = first;
int boundHi = last;
//Pivot can be optimized to median of quintiles to mitigate O(n^2) on sorted arrays.
int pivot = last;
/*int sz = last - first;
int pivots[] = {first, first+sz/5, first+2*sz/5, first+4*sz/5, last};
for(int i = 0; i < 5; i++)
for(int j = 4; j > i; j--)
if(arr[pivots[i]].compareTo(arr[pivots[j]]) > 0)
swap(pivots, i, j);*/
pivot = last;
while(first < last) {
//possible opportunity here for early min
if(list[first].compareTo(list[pivot]) >= 0) {
last--;
if(first != last)
swap(list, first, last);
swap(list, last, pivot);
pivot--;
}
else first++;
}
if(boundLo < (pivot - 1))
stack.add(new int[] {boundLo, pivot - 1});
if(boundHi > (pivot + 1))
stack.add(new int[] {pivot + 1, boundHi});
}

Related

Benchmarking quicksort and mergesort yields that mergesort is faster

I've tried benchmarking and for some reason when trying both of them on array of 1M elements the Mergesort sorted it in 0.3s and Quicksort took 1.3s.
I've heard that generally quicksort is faster, because of its memory management, but how would one explain these results?
I am running MacBook Pro if that makes any difference. The input is a set of randomly generated integers from 0 to 127.
The codes are in Java:
MergeSort:
static void mergesort(int arr[]) {
int n = arr.length;
if (n < 2)
return;
int mid = n / 2;
int left[] = new int[mid];
int right[] = new int[n - mid];
for (int i = 0; i < mid; i++)
left[i] = arr[i];
for (int i = mid; i < n; i++)
right[i - mid] = arr[i];
mergesort(left);
mergesort(right);
merge(arr, left, right);
}
public static void merge(int arr[], int left[], int right[]) {
int nL = left.length;
int nR = right.length;
int i = 0, j = 0, k = 0;
while (i < nL && j < nR) {
if (left[i] <= right[j]) {
arr[k] = left[i];
i++;
} else {
arr[k] = right[j];
j++;
}
k++;
}
while (i < nL) {
arr[k] = left[i];
i++;
k++;
}
while (j < nR) {
arr[k] = right[j];
j++;
k++;
}
}
Quicksort:
public static void quickSort(int[] arr, int start, int end) {
int partition = partition(arr, start, end);
if (partition - 1 > start) {
quickSort(arr, start, partition - 1);
}
if (partition + 1 < end) {
quickSort(arr, partition + 1, end);
}
}
public static int partition(int[] arr, int start, int end) {
int pivot = arr[end];
for (int i = start; i < end; i++) {
if (arr[i] < pivot) {
int temp = arr[start];
arr[start] = arr[i];
arr[i] = temp;
start++;
}
}
int temp = arr[start];
arr[start] = pivot;
arr[end] = temp;
return start;
}
Your implementations are a bit simplistic:
mergesort allocates 2 new arrays at each recursive call, which is expensive, yet some JVMs are surprisingly efficient at optimising such coding patterns.
quickSort uses a poor choice of pivot, the last element of the subarray, which gives quadratic time for sorted subarrays, including those with identical elements.
The data set, an array with pseudo-random numbers in a small range 0..127, causes the shortcoming of the quickSort implementation to perform much worse than the inefficiency of the mergesort version. Increasing the dataset size should make this even more obvious and might even cause a stack overflow because of too many recursive calls. Data sets with common patterns such as identical values, increasing or decreasing sets and combinations of such sequences would cause catastrophic performance of the quickSort implementation.
Here is a slightly modified version with less pathological choice of pivot (the element at 3/4 of the array) and a loop to detect duplicates of the pivot value to improve efficiency on datasets with repeated values. It performs much better (100x) on my standard sorting benchmark with arrays of just 40k elements, but still much slower (8x) than radixsort:
public static void quickSort(int[] arr, int start, int end) {
int p1 = partition(arr, start, end);
int p2 = p1;
/* skip elements identical to the pivot */
while (++p2 <= end && arr[p2] == arr[p1])
continue;
if (p1 - 1 > start) {
quickSort(arr, start, p1 - 1);
}
if (p2 < end) {
quickSort(arr, p2, end);
}
}
public static int partition(int[] arr, int start, int end) {
/* choose pivot at 3/4 or the array */
int i = end - ((end - start + 1) >> 2);
int pivot = arr[i];
arr[i] = arr[end];
arr[end] = pivot;
for (i = start; i < end; i++) {
if (arr[i] < pivot) {
int temp = arr[start];
arr[start] = arr[i];
arr[i] = temp;
start++;
}
}
int temp = arr[start];
arr[start] = pivot;
arr[end] = temp;
return start;
}
For the OP's dataset, assuming decent randomness of the distribution, scanning for duplicates is responsible for the performance improvement. Choosing a different pivot, be it first, last, middle, 3/4 or 2/3 or even median of 3 has almost no impact, as expected.
Further testing on other non random distributions shows catastrophic performance for this quickSort implementation due to the choice of pivot. On my benchmark, much improved performance is obtained by choosing for pivot the element at 3/4 or 2/3 of the array (300x improvement for 50k samples, 40% faster than standard merge sort and comparable time to radix_sort).
Mergesort has the distinct advantage of being stable and predictable for all distributions, but it requires extra memory between 50% and 100% of the size of the dataset.
Carefully implemented Quicksort is somewhat faster in many cases and performs in place, requiring only log(N) stack space for recursion. Yet it is not stable and tailor made distributions will exhibit catastrophic performance, possibly crashing.
Radixsort is only appropriate for specific kinds of data such as integers and fixed length strings. It also requires extra memory.
Countingsort would be the most efficient for the OP's dataset as it only needs an array of 128 integers to count the number of occurrences of the different values, known to be in the range 0..127. It will execute in linear time for any distribution.

big difference in time between two Implementation of quick Sort

I have two implementation of quick sort the first one uses a median of (fist ,middle , last ) as pivot and the second uses the middle element as pivot
the first Implementation :
public class quickMedian {
public void sort(int array[])
// pre: array is full, all elements are non-null integers
// post: the array is sorted in ascending order
{
quickSort(array, 0, array.length - 1); // quicksort all the elements in the array
}
public void quickSort(int array[], int start, int end)
{
int i = start; // index of left-to-right scan
int k = end; // index of right-to-left scan
if (end - start >= 1) // check that there are at least two elements to sort
{
if (array[start+(end-start)/2]>array[end]){
swap(array,start+(end-start)/2, end);
}
if (array[start]>array[end]){
swap(array,start, end);
}
if (array[start+(end-start)/2]>array[start]){
swap(array, start+(end-start)/2, start);
}
int pivot = array[start]; // set the pivot as the first element in the partition
while (k > i) // while the scan indices from left and right have not met,
{
while (array[i] <= pivot && i <= end && k > i) // from the left, look for the first
i++; // element greater than the pivot
while (array[k] > pivot && k >= start && k >= i) // from the right, look for the first
k--; // element not greater than the pivot
if (k > i) // if the left seekindex is still smaller than
swap(array, i, k); // the right index, swap the corresponding elements
}
swap(array, start, k); // after the indices have crossed, swap the last element in // the left partition with the pivot
quickSort(array, start, k - 1); // quicksort the left partition
quickSort(array, k + 1, end); // quicksort the right partition
}
else // if there is only one element in the partition, do not do any sorting
{
return; // the array is sorted, so exit
}
}
public void swap(int array[], int index1, int index2)
// pre: array is full and index1, index2 < array.length
// post: the values at indices 1 and 2 have been swapped
{
int temp = array[index1]; // store the first value in a temp
array[index1] = array[index2]; // copy the value of the second into the first
array[index2] = temp; // copy the value of the temp into the second
}
}
The second Implementation :
public class quickSort {
private int array[];
private int length;
public void sort(int[] inputArr) {
if (inputArr == null || inputArr.length == 0) {
return;
}
this.array = inputArr;
length = inputArr.length;
quickSorter(0, length - 1);
}
private void quickSorter(int lowerIndex, int higherIndex) {
int i = lowerIndex;
int j = higherIndex;
// calculate pivot number, I am taking pivot as middle index number
int pivot = array[lowerIndex+(higherIndex-lowerIndex)/2];
// Divide into two arrays
while (i <= j) {
/**
* In each iteration, we will identify a number from left side which
* is greater then the pivot value, and also we will identify a number
* from right side which is less then the pivot value. Once the search
* is done, then we exchange both numbers.
*/
while (array[i] < pivot) {
i++;
}
while (array[j] > pivot) {
j--;
}
if (i <= j) {
exchangeNumbers(i, j);
//move index to next position on both sides
i++;
j--;
}
}
// call quickSort() method recursively
if (lowerIndex < j)
quickSorter(lowerIndex, j);
if (i < higherIndex)
quickSorter(i, higherIndex);
}
private void exchangeNumbers(int i, int j) {
int temp = array[i];
array[i] = array[j];
array[j] = temp;
}
}
To obtain the median we need to do extra steps on each recursion which may increase the time a little bit (if the array is totally random) .
I am testing these two classes on an array of size N=10,000,000 randomly generated and I have done the test many times the first Implementation takes around 30 seconds and the second takes around 4 seconds
so this is obviously not caused by the extra overhead to get the median of three numbers .
There must be something wrong with the first implementation, what is it ?
here is the testing code :
public static void main(String[] args) {
File number = new File ("f.txt");
final int size = 10000000;
try{
// quickSort s = new quickSort();
quickMedian s = new quickMedian();
writeTofile(number, size);
int [] arr1 =readFromFile(number, size);
long a=System.currentTimeMillis();
s.sort(arr1);
long b=System.currentTimeMillis();
System.out.println("quickSort: "+(double)(b-a)/1000);
}catch (Exception ex){ex.printStackTrace();}
}

Using quicksort on a string array

I'm a programming student and rather than post the whole assignment I'll just ask for help solving what I've tried for hours now to understand. I'm tasked with sorting an array of strings using the quicksort method. Everything else I've been tasked with as part of this problem is fine but when I tested the sorting method by printing out the String Array, it's completely jumbled up without any seeming rhyme or reason. Please help me pinpoint the error in my code, or the several glaring errors I've overlooked. The array of strings provided is this list of 65 names: http://pastebin.com/jRrgeV1E and the method's code is below:
private static void quickSort(String[] a, int start, int end)
{
// index for the "left-to-right scan"
int i = start;
// index for the "right-to-left scan"
int j = end;
// only examine arrays of 2 or more elements.
if (j - i >= 1)
{
// The pivot point of the sort method is arbitrarily set to the first element int the array.
String pivot = a[i];
// only scan between the two indexes, until they meet.
while (j > i)
{
// from the left, if the current element is lexicographically less than the (original)
// first element in the String array, move on. Stop advancing the counter when we reach
// the right or an element that is lexicographically greater than the pivot String.
while (a[i].compareTo(pivot) < 0 && i <= end && j > i){
i++;
}
// from the right, if the current element is lexicographically greater than the (original)
// first element in the String array, move on. Stop advancing the counter when we reach
// the left or an element that is lexicographically less than the pivot String.
while (a[j].compareTo(pivot) > 0 && j >= start && j >= i){
j--;
}
// check the two elements in the center, the last comparison before the scans cross.
if (j > i)
swap(a, i, j);
}
// At this point, the two scans have crossed each other in the center of the array and stop.
// The left partition and right partition contain the right groups of numbers but are not
// sorted themselves. The following recursive code sorts the left and right partitions.
// Swap the pivot point with the last element of the left partition.
swap(a, start, j);
// sort left partition
quickSort(a, start, j - 1);
// sort right partition
quickSort(a, j + 1, end);
}
}
/**
* This method facilitates the quickSort method's need to swap two elements, Towers of Hanoi style.
*/
private static void swap(String[] a, int i, int j)
{
String temp = a[i];
a[i] = a[j];
a[j] = temp;
}
Ok, i was mistaken that it would work and found your tiny mistake.
Take a look at wikipedias pseudo code
You will notice that your conditions in the while loop are causing the error
if you change (a[i].compareTo(pivot) < 0 && i <= end && j > i) and (a[j].compareTo(pivot) > 0 && j >= start && j >= i) to
(a[i].compareTo(pivot) <= 0 && i < end && j > i) and (a[j].compareTo(pivot) >= 0 && j > start && j >= i).
Thought this would help for those who seek for a string sorting algorithm based on quick sorting method.
public class StringQuickSort {
String names[];
int length;
public static void main(String[] args) {
StringQuickSort sorter = new StringQuickSort();
String words[] = {"zz", "aa", "cc", "hh", "bb", "ee", "ll"}; // the strings need to be sorted are put inside this array
sorter.sort(words);
for (String i : words) {
System.out.print(i);
System.out.print(" ");
}
}
void sort(String array[]) {
if (array == null || array.length == 0) {
return;
}
this.names = array;
this.length = array.length;
quickSort(0, length - 1);
}
void quickSort(int lowerIndex, int higherIndex) {
int i = lowerIndex;
int j = higherIndex;
String pivot = this.names[lowerIndex + (higherIndex - lowerIndex) / 2];
while (i <= j) {
while (this.names[i].compareToIgnoreCase(pivot) < 0) {
i++;
}
while (this.names[j].compareToIgnoreCase(pivot) > 0) {
j--;
}
if (i <= j) {
exchangeNames(i, j);
i++;
j--;
}
}
//call quickSort recursively
if (lowerIndex < j) {
quickSort(lowerIndex, j);
}
if (i < higherIndex) {
quickSort(i, higherIndex);
}
}
void exchangeNames(int i, int j) {
String temp = this.names[i];
this.names[i] = this.names[j];
this.names[j] = temp;
}
}

Given a rotated sorted array, how can I find the largest value in that array?

I have given this a lot of thought and was unable to find the most optimal solution. I am preparing for technical interviews, but I haven't found very much stuff related to this question. My first step was to implement a naive O(n) algorithm that searches through the entire array to find the maximum integer. Now I know I can do much better than this, so I thought maybe there was a way to use Binary Search or take advantage of the fact that at least one half of the array is fully sorted. Maybe you could find the middle value and compare it to the start and end of the array.
Example:
[5, 7, 11, 1, 3] would return 11.
[7, 9, 15, 1, 3] would return 15.
In a sorted array (even rotated), you can be sure to use binary search (O(log2(n)).
/**
* Time complexity: O(log2(n))
* Space complexity: O(1)
*
* #param nums
* #return
*/
public int findMax(int[] nums) {
// binary search
int left = 0;
int right = nums.length - 1;
while (left < right) {
int mid = left + (right - left) / 2;
if (nums[left] < nums[mid]) {
left = mid;
} else if (nums[left] > nums[mid]) {
right = mid - 1;
} else {
// subtility here if there are duplicate elements in the array.
// shift the left linearly
left = left + 1;
}
}
return nums[left];
}
You have to binary search in a clever way to achieve a O(lg n) bound. Observe that the element to the right of the max element is the min (or none if the array is not rotated at all). So do a regular binary search but check that the element at index mid is the max, if not compare the first and last elements in each of the left/right subarrays. If first<last in the left subarray, you know that the left subarray is sorted and go right, otherwise you go left.
Let's assume that array is called a and it has n elements.
/* check if not rotated at all */
int ans = INFINITY;
if(a[0] < a[n-1] || n == 1)
{ ans = a[n-1];
return;
}
/* array is certainly rotated */
int l = 0, r = n-1;
while(r - l > 5)
{ int m = (l + r) / 2;
if(a[m] > a[m+1]) { ans = a[m]; break; }
else
{ if(a[l] < a[m-1]) l = m+1;
else r = m-1;
}
}
/* check the remaining elements (at most 5) in a loop */
if(ans == INFINITY)
{ for(int i = l; i <= r; i++)
{ ans = max(ans, a[i]);
}
}
I've not tested this code. The reason i break when the number of elements is 5 or less is to be sure that number of elements in either subarray is at least 2 (so you can be sure that first and last are not the same element). You've got to try this yourself and fix it if there is anything to fix. Hope this helps.
Use modified binary search to eliminate half the sorted subarray (if there are two sorted subarrays remove the "lower" subarray) in each step while keeping track of a potentially updated maximum.
#include <iostream>
#include <cstdlib>
#include <vector>
int main(int argc, char** argv)
{
std::vector<int> nums;
for(int i = 1; i < argc; i++)
nums.push_back(atoi(argv[i]));
int start = 0;
int end = argc - 2;
int max = nums[start];
while(start <= end) {
int mid = (start + end) >> 1;
int cand;
if(nums[start] <= nums[mid]) {
start = mid + 1;
} else {
end = mid - 1;
}
cand = nums[mid];
if(cand > max)
max = cand;
}
std::cout << max << std::endl;
return 0;
}
Question : find largest in the rotated sorted array.The array don't have any duplicates:
Solution : Using Binary Search.
The Idea : Always remember in a Sorted Rotated Array, the largest element will always be on the left side of the array. Similarly, the smallest element will always be on the right side of the array.
The code is :
public class Test19 {
public static void main(String[] args) {
int[] a = { 5, 6, 1, 2, 3, 4 };
System.out.println(findLargestElement(a));
}
private static int findLargestElement(int[] a) {
int start = 0;
int last = a.length - 1;
while (start + 1 < last) {
int mid = (last - start) / 2 + start;
if (mid < start) {
mid = start;
}
if (mid > start) {
last = mid - 1;
} else {
mid--;
}
} // while
if (a[start] > a[last]) {
return a[start];
} else
return a[last];
}
}
The solution I've come up with is both compact and efficient.
It is basically a spin-off of the Binary Search Algorithm.
int maxFinder(int[] array, int start, int end)
{
//Compute the middle element
int mid = (start + end) / 2;
//return the first element if it's a single element array
//OR
//the boundary pair has been discovered.
if(array.length == 1 || array[mid] > array[mid + 1])
{return mid;}
//Basic Binary Search implementation
if(array[mid] < array[start])
{return maxFinder(array, start, mid - 1);}
else if(array[mid] > array[end])
{return maxFinder(array, mid + 1, end);}
//Return the last element if the array hasn't been rotated at all.
else
{return end;}
}
Speaking of using binary search to solve this problem at a time complexity of O(log2n). I would do as the following
#include<stdio.h>
#define ARRSIZE 200
int greatestElement(int* , int ) ;
int main() {
int arr[ARRSIZE] ;
int n ;
printf("Enter the number of elements you want to enter in the array!") ;
scanf("%d" , &n) ;
printf("Enter the array elements\n") ;
for(int i = 0 ; i < n ; i++) {
scanf("%d", &arr[i]) ;
}
printf("%d is the maximum element of the given array\n",greatestElement(arr,n)) ;
}
int greatestElement(int* arr, int n) {
int mid = 0 ;
int start = 0 , end = n-1 ;
while(start < end) {
mid = (start+end)/2 ;
if(mid < n-1 && arr[mid] >= arr[mid+1]) {
return arr[mid] ;
}
if(arr[start] > arr[mid]) {
end = mid - 1 ;
}
else {
start = mid + 1;
}
}
return arr[start] ;
}```
This question is so easy with another version of binary search:
int solve(vector<int>& a) {
int n = a.size();
int k=0;
for(int b=n/2; b>=1; b/=2)
{
while(k+b<n && a[k+b] >= a[0])
k += b;
}
return a[k];
}

Java - My sort is not working

I have created the following class to sort an array of strings.
public class StringSort {
private String[] hotelNames;
private int arrayLength;
public void sortHotel(String[] hotelArray) {
if (hotelArray.length <= 1) {
return;
}
this.hotelNames = hotelArray;
arrayLength = hotelArray.length;
quicksort(0, arrayLength - 1);
}
private void quicksort(int low, int high) {
int i = low, j = high;
String first = hotelNames[low];
String last = hotelNames[high];
String pivot = hotelNames[low + (high - low) / 2];
while( (first.compareTo(last)) < 0 ) { // first is less than last
while( (hotelNames[i].compareTo(pivot)) < 0 ) { // ith element is < pivot
i++;
}
while( (hotelNames[j].compareTo(pivot)) > 0) { // jth element is > pivot
j--;
}
if ( ( hotelNames[i].compareTo( hotelNames[j] )) <= 0 ) {
swap(i, j);
i++;
j--;
}
//recursive calls
if (low < j) {
quicksort(low, j);
}
if (i < high) {
quicksort(i, high);
}
}
}
private void swap(int i, int j) {
String temp = hotelNames[i];
hotelNames[i] = hotelNames[j];
hotelNames[j] = temp;
}
}
However in my main class (a class to test StringSort), when I do:
StringSort str = new StringSort();
String[] hotel1 = {"zzzz", "wwww", "dddd", "bbbbb", "bbbba", "aaaf", "aaag", "zzz"};
str.sortHotel(hotel1);
And then I have another method that prints out the array. However when it prints out, it outputs the hotel1 array as it is, unchanged. There is no 'sorting' happening, I'm not sure where I've gone wrong.
There are several problems in your implementation of quicksort:
First/last comparison. This code will made your quicksort not do anything as long as first element is less than the last element, regardless of any other order.
while( (first.compareTo(last)) < 0 ) { // first is less than last
Check before swap. This line is unnecessary:
if ( ( hotelNames[i].compareTo( hotelNames[j] )) <= 0 ) {
What you really want to do is see if the i is still less than j and bail out of the loop then. If not, then swap. After you finished with the partitioning loop, then make the recursive call, as long as there are more than two elements in each subarray.

Categories

Resources