Could a modified quicksort be O(n) best case? - java

It's generally agreed that the best case for quicksort is O(nlogn), given that the array is partitioned by roughly half each time. It's also said that the worst case is order n^2, assuming that the array is sorted.
Can't we modify quicksort by setting a boolean called swap? For example, if there is no initial swap in position for the first pass, then we can assume that the array is already sorted, therefore do not partition the data any further.
I know that the modified bubble sort uses this by checking for swaps, allowing the best case to be O(n) rather than O(n^2). Can this method be applied to quicksort? Why or why not?

There is one mistake with your approach...
For example we have an array like this:
1243 5 678
our Pivot Element is 5. After a first pass there would be no swap(because 4 and 3 are both smaller), but the array is NOT sorted. So you have to start dividing it and that leads to n log n.

No, this won't work for quicksort. In bubble sort if you do a pass through the array without making any swaps you know that the entire array is sorted. This is because each element is compared to its neighbor in bubble sort, so you can infer that the entire array is sorted after any pass where no swaps are done.
That isn't the case in quicksort. In quicksort each element is compared to a single pivot element. If you go through an entire pass without moving anything in quicksort it only tells you that the elements are sorted with respect to the pivot (values less than the pivot are to its left, values greater than the pivot are to its right), not to each other.

Also, there is also the problem that you also get O(n) behaviour with almost sorted arrays in addition to the fully sorted input.
You can try harder to make your approach work but I don't think you can make it breach the O(n log n) boundary. There is a proof that comparison-based sorts cannot be more efficient then O(n log n) in the worst case.

Related

what is cut off value in sorting algorithms?

I was asked to do the quicksort, cut off with insertion sort. But I did not understand the meaning of cut off value. I need someone to elaborate this concept with real-world examples.
Some algorithms are asymptotically better than others. In your case, the asymptotic run time of Quicksort is O(N(logN)) while the asymptotic run time of insertion sort is O(N^2).
This means that for large values of N, Quicksort will run faster than insertion sort.
However, for small values of N, insertion sort may run faster. Hence, you can optimize the actual run-time of Quicksort by combining it with insertion sort for small array sizes.
Quicksort is a recursive algorithm, that breaks the original array into smaller arrays and runs recursively on each sub-array. Cut off with insertion sort means that once you reach array sizes smaller than some constant cut-off size, you sort these small arrays using insertion sort instead of continuing the recursion.
It is hard to be sure without seeing the exact statement of requirements.
However, I think that this means that you should use insertion sort (O(N^2)) below a certain size (the "cut off") and quicksort (O(NlogN)) above that size.
They could mean that you do this test on the size of the input array.
They could also mean that you implement a hybrid sort, and use insertion sort on a subarray when a quicksort partition size is less than the threshold.
Either interpretation is plausible.
I need someone to elaborate this concept with real-world examples.
I doubt that you need examples to understand this. If you understand the insertion sort and quicksort algorithms, then this is conceptually straightforward. (And if you don't understand them, there are lots of places to read about them, starting with your data structures & algorithms textbook.)

Trying to understand complexity of quick sort

I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements
But how it is calculated to O(N^2)
I have read couple of articles still not able to understand it fully.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated
I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements.
So, imagine that you repeatedly pick the worst pivot; i.e. in the N-1 case one partition is empty and you recurse with N-2 elements, then N-3, and so on until you get to 1.
The sum of N-1 + N-2 + ... + 1 is (N * (N - 1)) / 2. (Students typically learn this in high-school maths these days ...)
O(N(N-1)/2) is the same as O(N^2). You can deduce this from first principles from the mathematical definition of Big-O notation.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated.
That is a bit more complicated.
Think of the problem as a tree:
At the top level, you split the problem into two equal-sized sub problems, and move N objects into their correct partitions.
At the 2nd level. you split the two sub-problems into four sub-sub-problems, and in 2 problems you move N/2 objects into their correct partitions, for a total of N objects moved.
At the bottom level you have N/2 sub-problems of size 2 which you (notionally) split into N problems of size 1, again copying N objects.
Clearly, at each level you move N objects. The height of the tree for a problem of size N is log2N. So ... there are N * log2N object moves; i.e. O(N * log2)
But log2N is logeN * loge2. (High-school maths, again.)
So O(Nlog2N) is O(NlogN)
Little correction to your statement:
I understand worst case happens when the pivot is the smallest or the largest element.
Actually, the worst case happens when each successive pivots are the smallest or the largest element of remaining partitioned array.
To better understand the worst case: Think about an already sorted array, which you may be trying to sort.
You select first element as first pivot. After comparing the rest of the array, you would find that the n-1 elements still are on the other end (rightside) and the first element remains at the same position, which actually totally beats the purpose of partitioning. These steps you would keep repeating till the last element with the same effect, which in turn would account for (n-1 + n-2 + n-3 + ... + 1) comparisons, and that sums up to (n*(n-1))/2 comparisons. So,
O(n*(n-1)/2) = O(n^2) for worst case.
To overcome this problem, it is always recommended to pick up each successive pivots randomly.
I would try to add explanation for average case as well.
The best case can derived from the Master theorem. See https://en.wikipedia.org/wiki/Master_theorem for instance, or Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms for a proof of the theorem.
One way to think of it is the following.
If quicksort makes a poor choice for a pivot, then the pivot induces an unbalanced partition, with most of the elements being on one side of the pivot (either below or above). In an extreme case, you could have, as you suggest, that all elements are below or above the pivot. In that case we can model the time complexity of quicksort with the recurrence T(n)=T(1)+T(n-1)+O(n). However, T(1)=O(1), and we can write out the recurrence T(n)=O(n)+O(n-1)+...+O(1) = O(n^2). (One has to take care to understand what it means to sum Big-Oh terms.)
On the other hand, if quicksort repeatedly makes good pivot choices, then those pivots induce balanced partitions. In the best case, about half the elements will be below the pivot, and about half the elements above the pivot. This means we recurse on two subarrays of roughly equal sizes. The recurrence then becomes T(n)=T(n/2)+T(n/2)+O(n) = 2T(n/2)+O(n). (Here, too, one must take care about rounding the n/2 if one wants to be formal.) This solves to T(n)=O(n log n).
The intuition for the last paragraph is that we compute the relative position of the pivot in the sorted order without actually sorting the whole array. We can then compute the relative position of the pivot in the below subarray without having to worry about the elements from the above subarray; similarly for the above subarray.
Firstly, paste a pseudo-code here:
In my opinion, you need understand the two cases: the worst case & the best case.
The worst case:
The most unbalanced partition occurs when the pivot divides the list into two sublists of sizes 0 and n−1. The complexity in recursively is T(n)=O(n)+T(0)+T(n-1)=O(n)+T(n-1). The master theorem tells that
T(n)=O(n²).
The best case:
In the most balanced case, each time we perform a partition we divide the list into two nearly equal pieces. The same as the worst, the recurrence relation is T(n)=O(n)+2T(n/2). And it can transform to T(n)=O(n logn).

Finding median in unsorted array

I need to take input such as 4 1 6 5 0. The 4 determines how big the array is, and the rest are array elements. The catch is I can't sort it first. I'm at a loss of how to begin.
There Is A Chapter In MIT's Introduction To Algorithm Course (http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-four) Dedicated To Order Statics. You Can Find The Median In O(N) Expected Time, O(N^2) Worst Case.
I think that you should use sorted list, it uses a perform any algorithm to sort the list. So you can sort them first and then get the n/2 element, it's your median.

What is the most inefficient sorting routine?

For an array of integers, what is the least efficient way of sorting the array. The function should make progress in each step (eg no infinity loop). What is the runtime of that algorithm?
There can be no least efficient algorithm for anything. This can easily be proved by contradiction so long as you accept that, starting from any algorithm, another equivalent but less efficient algorithm can be constructed.
Bogosort has average runtime of O(n*n!), ouch.
The stupid sort is surely the worst algorithm. It's not exactly an infinite loop but this approach has as worst case O(inf) and the avarage is O(n × n!).
You can do it in O(n!*n) by generating all unqiue permutations and afterwards checking if the array is sorted.
The least efficiant algorithm I can think with a finite upper bound on runtime is permutation sort. The idea is to generate every permutation (combination) of the inputs and check if it's sorted.
The upper bound is O(n!), the lower bound is O(n) when the array is already sorted.
Iterate over all finite sequences of integers using diagonalization.
For each sequence, test whether it's sorted.
If so, test whether its elements match the elements in your array.
This will have an upper bound (first guess: O(n^(n*m)?).
Strange question, normally we go for the fastest.
Find the highest and move it to a list. Repeat the previous step till the original list has only one element left. You are guaranteed O(N^2).
Bogosort is a worst sorting algorithm uses shuffle. But in another point of view it has low probability to sort array in one step :)
"Worstsort" has a complexity of where factorial of n iterated m times. The paper by Miguel A. Lerma can be found here.
Bogobogo sort.It's like bogo sort,shuffles.But it creates auxiliary arrays,the first one is the same array others are smaller by 1 element compared to previous one.They can be removed as well.
It's average complexity is O(N superfactorial*n). It's best case is O(N^2). Just like bogo sort it has worst case of O(inf).

What is the difference between quicksort and tuned quicksort?

What is the fundamental difference between quicksort and tuned quicksort? What is the improvement given to quicksort? How does Java decide to use this instead of merge sort?
As Bill the Lizard said, a tuned quicksort still has the same complexity as the basic quicksort - O(N log N) average complexity - but a tuned quicksort uses some various means to try to avoid the O(N^2) worst case complexity as well as uses some optimizations to reduce the constant that goes in front of the N log N for average running time.
Worst Case Time Complexity
Worst case time complexity occurs for quicksort when one side of the partition at each step always has zero elements. Near worst case time complexity occurs when the ratio of the elements in one partition to the other partition is very far from 1:1 (10000:1 for instance). Common causes of this worst case complexity include, but are not limited to:
A quicksort algorithm that always chooses the element with the same relative index of a subarray as the pivot. For instance, with an array that is already sorted, a quicksort algorithm that always chooses the leftmost or rightmost element of the subarray as the pivot will be O(N^2). A quicksort algorithm that always chooses the middle element gives O(N^2) for the organ pipe array ([1,2,3,4,5,4,3,2,1] is an example of this).
A quicksort algorithm that doesn't handle repeated/duplicate elements in the array can be O(N^2). The obvious example is sorting an array that contains all the same elements. Explicitly, if the quicksort sorts the array into partitions like [ < p | >= p ], then the left partition will always have zero elements.
How are these remedied? The first is generally remedied by choosing the pivot randomly. Using a median of a few elements as the pivot can also help, but the probability of the sort being O(N^2) is higher than using a random pivot. Of course, the median of a few randomly chosen elements might be a wise choice too. The median of three randomly chosen elements as the pivot is a common choice here.
The second case, repeated elements, is usually solved with something like Bentley-McIlroy paritioning(links to a pdf) or the solution to the Dutch National Flag problem. The Bentley-McIlroy partitioning is more commonly used, however, because it is usually faster. I've come up with a method that is faster than it, but that's not the point of this post.
Optimizations
Here are some common optimizations outside of the methods listed above to help with worst case scenarios:
Using the converging pointers quicksort as opposed to the basic quicksort. Let me know if you want more elaboration on this.
Insertion sort subarrays when they get below a certain size. Insertion sort is asymptotically O(N^2), but for small enough N, it beats quicksort.
Using an iterative quicksort with an explicit stack as opposed to a recursive quicksort.
Unrolling parts of loops to reduce the number of comparisons.
Copying the pivot to a register and using that space in the array to reduce the time cost of swapping elements.
Other Notes
Java uses mergesort when sorting objects because it is a stable sort (the order of elements that have the same key is preserved). Quicksort can be stable or unstable, but the stable version is slower than the unstable version.
"Tuned" quicksort just means that some improvements are applied to the basic algorithm. Usually the improvements are to try and avoid worst case time complexity. Some examples of improvements might be to choose the pivot (or multiple pivots) so that there's never only 1 key in a partition, or only make the recursive call when a partition is above a certain minimum size.
It looks like Java only uses merge sort when sorting Objects (the Arrays doc tells you which sorting algorithm is used for which sort method signature), so I don't think it ever really "decides" on its own, but the decision was made in advance. (Also, implementers are free to use another sort, as long as it's stable.)
In java, Arrays.sort(Object[]) uses merge sort but all other overloaded sort functions use
insertion sort if length is less than 7 and if length of array is greater than 7 it uses
tuned quicksort.

Categories

Resources