what is cut off value in sorting algorithms?

what is cut off value in sorting algorithms? - java

I was asked to do the quicksort, cut off with insertion sort. But I did not understand the meaning of cut off value. I need someone to elaborate this concept with real-world examples.

Some algorithms are asymptotically better than others. In your case, the asymptotic run time of Quicksort is O(N(logN)) while the asymptotic run time of insertion sort is O(N^2).
This means that for large values of N, Quicksort will run faster than insertion sort.
However, for small values of N, insertion sort may run faster. Hence, you can optimize the actual run-time of Quicksort by combining it with insertion sort for small array sizes.
Quicksort is a recursive algorithm, that breaks the original array into smaller arrays and runs recursively on each sub-array. Cut off with insertion sort means that once you reach array sizes smaller than some constant cut-off size, you sort these small arrays using insertion sort instead of continuing the recursion.

It is hard to be sure without seeing the exact statement of requirements.
However, I think that this means that you should use insertion sort (O(N^2)) below a certain size (the "cut off") and quicksort (O(NlogN)) above that size.
They could mean that you do this test on the size of the input array.
They could also mean that you implement a hybrid sort, and use insertion sort on a subarray when a quicksort partition size is less than the threshold.
Either interpretation is plausible.
I need someone to elaborate this concept with real-world examples.
I doubt that you need examples to understand this. If you understand the insertion sort and quicksort algorithms, then this is conceptually straightforward. (And if you don't understand them, there are lots of places to read about them, starting with your data structures & algorithms textbook.)

Related

Selection Sort vs Quicksort for Small Data

I know that quicksort is the fastest sorting algorithm at the moment. If I have a small dataset of 7 or 10 terms, will selection sort work better than quicksort or the other way around?

quicksort is the fastest sorting algorithm at the moment
This is not correct. E.g. counting sort is faster. Quicksort has O(n*logn), CountingSort has O(n).
If I have a small dataset of 7 or 10 terms, will selection sort work
better than quicksort or the other way around?
According to the usage - yes. Time complexity is worse for SelectionSort, but it is more simple. You should not shuffle the collection at the beginning and so on. For small data you could choose any simpliest sorting algorithm and do not worry about performance.

QuickSort vs MergeSort on arrays of primitives in Java

I know that Java's Arrays.sort method uses MergeSort for sorting arrays of objects (or collections of objects) since it is stable, and Java uses QuickSort for arrays of primitives because we don't need stability since two equal ints are indistinguishable, i.e. their identity doesn't matter.
My question is, in the case of primitives, why doesn't Java use MergeSort's guaranteed O(n log n) time and instead goes for the average O(n log n) time of QuickSort? In the last paragraph of one of the related answers here, it is explained that:
For reference types, where the referred objects usually take up far more memory than the array of references, this generally does not matter. But for primitive types, cloning the array outright doubles the memory usage.
What does this mean? Cloning a reference is still at-least as costly as cloning a primitive. Are there any other reasons for using QuickSort (average O(n log n)) instead of MergeSort (guaranteed O(n log n) time) on arrays of primitives?

Not all O(n log n) algorithms have the same constant factors. Quicksort, in the 99.9% of cases where it takes n log n time, runs in a much faster n log n than mergesort. I don't know the exact multiplier -- and it'll vary system to system -- but, say, quicksort could run twice as fast as merge sort on average and still have theoretical worst case n^2 performance.
Additionally, Quicksort doesn't require cloning the array in the first place, and merge sort inevitably does. But you don't have a choice for reference types if you want a stable sort, so you have to accept the copy, but you don't need to accept that cost for primitives.

Cloning a reference is still at-least as costly as cloning a primitive.
Most (or all?) implementations of Java implement an array of objects as an array of pointers (references) to objects. So cloning an array of pointers (references) would consume less space than cloning the objects themselves if the objects are larger in size than a pointer (reference).
I don't know why the term "cloning" was used. Merge sort allocates a second temp array, but the array is not a "clone" of the original. Instead an proper merge sort alternates the direction of merge from original to temp or from temp to original depending on iteration for bottom up, or depending on level of recursion for top down.
dual pivot quick sort
Based on what I can find doing web searches, Java's dual pivot quicksort keeps track of "recursions", and switches to heap sort if the recursion depth is excessive, to maintain O(n log(n)) time complexity, but at a higher cost factor.
quick sort versus merge sort
In addition to stability, merge sort can be faster for sorting an array of pointers (references) to objects. Merge sort does more moves (of the pointers) but fewer compares (of the objects accessed by dereferencing pointers), than quick sort.
On a system with 16 registers (most of them used as pointers), such as X86 in 64 bit mode, a 4-way merge sort is about as fast as regular quick sort, but I don't recall seeing a 4-way merge sort in a common library, at least not for a PC.

Arrays#sort(primitive array) doesn't use traditional Quick Sort; it uses Dual-Pivot Quicksort, which is faster than quicksort, which in turn is faster than merge sort, in part because it doesn't have to be stable.
From the javadoc:
Implementation note: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.

QuickSort is approximately 40% faster than MergeSort on random data because of fewer data movements
QuickSort requires O(1) extra space while MergeSort requires O(n)
P.S. Neither classic QuickSort nor MergeSort are used in Java standard library.

Why Collections.sort uses merge sort instead of quicksort?

We know that quick sort is the fastest sorting algorithm.
The JDK6 collections.sort uses the merge sort algorithm instead of quick sort. But Arrays.sort uses quick sort algorithm.
What is the reason Collections.sort uses merge sort instead of quick sort?

Highly likely from Josh Bloch §:
I did write these methods, so I suppose I'm qualified to answer. It is
true that there is no single best sorting algorithm. QuickSort has
two major deficiencies when compared to mergesort:
It's not stable (as parsifal noted).
It doesn't guarantee n log n performance; it can degrade to quadratic performance on pathological inputs.
Stability is a non-issue for primitive types, as there is no notion of
identity as distinct from (value) equality. And the possibility of
quadratic behavior was deemed not to be a problem in practice for
Bentely and McIlroy's implementation (or subsequently for Dual Pivot
Quicksort), which is why these QuickSort variants were used for
the primitive sorts.
Stability is a big deal when sorting arbitrary objects. For example,
suppose you have objects representing email messages, and you sort
them first by date, then by sender. You expect them to be sorted by
date within each sender, but that will only be true if the sort is
stable. That's why we elected to provide a stable sort (Merge Sort)
to sort object references. (Techincally speaking, multiple sequential
stable sorts result in a lexicographic ordering on the keys in the
reverse order of the sorts: the final sort determines the most
significant subkey.)
It's a nice side benefit that Merge Sort guarantees n log n (time)
performance no matter what the input. Of course there is a down side:
quick sort is an "in place" sort: it requies only log n external space
(to maintain the call stack). Merge, sort, on the other hand,
requires O(n) external space. The TimSort variant (introduced in Java
SE 6) requires substantially less space (O(k)) if the input array is
nearly sorted.
Also, the following is relevant:
The algorithm used by java.util.Arrays.sort and (indirectly) by
java.util.Collections.sort to sort object references is a "modified
mergesort (in which the merge is omitted if the highest element in the
low sublist is less than the lowest element in the high sublist)." It
is a reasonably fast stable sort that guarantees O(n log n)
performance and requires O(n) extra space. In its day (it was written
in 1997 by Joshua Bloch), it was a fine choice, but today but we can
do much better.
Since 2003, Python's list sort has used an algorithm known as timsort
(after Tim Peters, who wrote it). It is a stable, adaptive, iterative
mergesort that requires far fewer than n log(n) comparisons when
running on partially sorted arrays, while offering performance
comparable to a traditional mergesort when run on random arrays. Like
all proper mergesorts timsort is stable and runs in O(n log n) time
(worst case). In the worst case, timsort requires temporary storage
space for n/2 object references; in the best case, it requires only a
small constant amount of space. Contrast this with the current
implementation, which always requires extra space for n object
references, and beats n log n only on nearly sorted lists.
Timsort is described in detail here:
http://svn.python.org/projects/python/trunk/Objects/listsort.txt.
Tim Peters's original implementation is written in C. Joshua Bloch
ported it from C to Java and end tested, benchmarked, and tuned the
resulting code extensively. The resulting code is a drop-in
replacement for java.util.Arrays.sort. On highly ordered data, this
code can run up to 25 times as fast as the current implementation (on
the HotSpot server VM). On random data, the speeds of the old and new
implementations are comparable. For very short lists, the new
implementation is substantially faster that the old even on random
data (because it avoids unnecessary data copying).
Also, see Is Java 7 using Tim Sort for the Method Arrays.Sort?.
There isn't a single "best" choice. As with many other things, it's about tradeoffs.

Could a modified quicksort be O(n) best case?

It's generally agreed that the best case for quicksort is O(nlogn), given that the array is partitioned by roughly half each time. It's also said that the worst case is order n^2, assuming that the array is sorted.
Can't we modify quicksort by setting a boolean called swap? For example, if there is no initial swap in position for the first pass, then we can assume that the array is already sorted, therefore do not partition the data any further.
I know that the modified bubble sort uses this by checking for swaps, allowing the best case to be O(n) rather than O(n^2). Can this method be applied to quicksort? Why or why not?

There is one mistake with your approach...
For example we have an array like this:
1243 5 678
our Pivot Element is 5. After a first pass there would be no swap(because 4 and 3 are both smaller), but the array is NOT sorted. So you have to start dividing it and that leads to n log n.

No, this won't work for quicksort. In bubble sort if you do a pass through the array without making any swaps you know that the entire array is sorted. This is because each element is compared to its neighbor in bubble sort, so you can infer that the entire array is sorted after any pass where no swaps are done.
That isn't the case in quicksort. In quicksort each element is compared to a single pivot element. If you go through an entire pass without moving anything in quicksort it only tells you that the elements are sorted with respect to the pivot (values less than the pivot are to its left, values greater than the pivot are to its right), not to each other.

Also, there is also the problem that you also get O(n) behaviour with almost sorted arrays in addition to the fully sorted input.
You can try harder to make your approach work but I don't think you can make it breach the O(n log n) boundary. There is a proof that comparison-based sorts cannot be more efficient then O(n log n) in the worst case.

What is the difference between quicksort and tuned quicksort?

What is the fundamental difference between quicksort and tuned quicksort? What is the improvement given to quicksort? How does Java decide to use this instead of merge sort?

As Bill the Lizard said, a tuned quicksort still has the same complexity as the basic quicksort - O(N log N) average complexity - but a tuned quicksort uses some various means to try to avoid the O(N^2) worst case complexity as well as uses some optimizations to reduce the constant that goes in front of the N log N for average running time.
Worst Case Time Complexity
Worst case time complexity occurs for quicksort when one side of the partition at each step always has zero elements. Near worst case time complexity occurs when the ratio of the elements in one partition to the other partition is very far from 1:1 (10000:1 for instance). Common causes of this worst case complexity include, but are not limited to:
A quicksort algorithm that always chooses the element with the same relative index of a subarray as the pivot. For instance, with an array that is already sorted, a quicksort algorithm that always chooses the leftmost or rightmost element of the subarray as the pivot will be O(N^2). A quicksort algorithm that always chooses the middle element gives O(N^2) for the organ pipe array ([1,2,3,4,5,4,3,2,1] is an example of this).
A quicksort algorithm that doesn't handle repeated/duplicate elements in the array can be O(N^2). The obvious example is sorting an array that contains all the same elements. Explicitly, if the quicksort sorts the array into partitions like [ < p | >= p ], then the left partition will always have zero elements.
How are these remedied? The first is generally remedied by choosing the pivot randomly. Using a median of a few elements as the pivot can also help, but the probability of the sort being O(N^2) is higher than using a random pivot. Of course, the median of a few randomly chosen elements might be a wise choice too. The median of three randomly chosen elements as the pivot is a common choice here.
The second case, repeated elements, is usually solved with something like Bentley-McIlroy paritioning(links to a pdf) or the solution to the Dutch National Flag problem. The Bentley-McIlroy partitioning is more commonly used, however, because it is usually faster. I've come up with a method that is faster than it, but that's not the point of this post.
Optimizations
Here are some common optimizations outside of the methods listed above to help with worst case scenarios:
Using the converging pointers quicksort as opposed to the basic quicksort. Let me know if you want more elaboration on this.
Insertion sort subarrays when they get below a certain size. Insertion sort is asymptotically O(N^2), but for small enough N, it beats quicksort.
Using an iterative quicksort with an explicit stack as opposed to a recursive quicksort.
Unrolling parts of loops to reduce the number of comparisons.
Copying the pivot to a register and using that space in the array to reduce the time cost of swapping elements.
Other Notes
Java uses mergesort when sorting objects because it is a stable sort (the order of elements that have the same key is preserved). Quicksort can be stable or unstable, but the stable version is slower than the unstable version.

"Tuned" quicksort just means that some improvements are applied to the basic algorithm. Usually the improvements are to try and avoid worst case time complexity. Some examples of improvements might be to choose the pivot (or multiple pivots) so that there's never only 1 key in a partition, or only make the recursive call when a partition is above a certain minimum size.
It looks like Java only uses merge sort when sorting Objects (the Arrays doc tells you which sorting algorithm is used for which sort method signature), so I don't think it ever really "decides" on its own, but the decision was made in advance. (Also, implementers are free to use another sort, as long as it's stable.)

In java, Arrays.sort(Object[]) uses merge sort but all other overloaded sort functions use
insertion sort if length is less than 7 and if length of array is greater than 7 it uses
tuned quicksort.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.