Trying to understand complexity of quick sort

Trying to understand complexity of quick sort - java

I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements
But how it is calculated to O(N^2)
I have read couple of articles still not able to understand it fully.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated

I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements.
So, imagine that you repeatedly pick the worst pivot; i.e. in the N-1 case one partition is empty and you recurse with N-2 elements, then N-3, and so on until you get to 1.
The sum of N-1 + N-2 + ... + 1 is (N * (N - 1)) / 2. (Students typically learn this in high-school maths these days ...)
O(N(N-1)/2) is the same as O(N^2). You can deduce this from first principles from the mathematical definition of Big-O notation.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated.
That is a bit more complicated.
Think of the problem as a tree:
At the top level, you split the problem into two equal-sized sub problems, and move N objects into their correct partitions.
At the 2nd level. you split the two sub-problems into four sub-sub-problems, and in 2 problems you move N/2 objects into their correct partitions, for a total of N objects moved.
At the bottom level you have N/2 sub-problems of size 2 which you (notionally) split into N problems of size 1, again copying N objects.
Clearly, at each level you move N objects. The height of the tree for a problem of size N is log2N. So ... there are N * log2N object moves; i.e. O(N * log2)
But log2N is logeN * loge2. (High-school maths, again.)
So O(Nlog2N) is O(NlogN)

Little correction to your statement:
I understand worst case happens when the pivot is the smallest or the largest element.
Actually, the worst case happens when each successive pivots are the smallest or the largest element of remaining partitioned array.
To better understand the worst case: Think about an already sorted array, which you may be trying to sort.
You select first element as first pivot. After comparing the rest of the array, you would find that the n-1 elements still are on the other end (rightside) and the first element remains at the same position, which actually totally beats the purpose of partitioning. These steps you would keep repeating till the last element with the same effect, which in turn would account for (n-1 + n-2 + n-3 + ... + 1) comparisons, and that sums up to (n*(n-1))/2 comparisons. So,
O(n*(n-1)/2) = O(n^2) for worst case.
To overcome this problem, it is always recommended to pick up each successive pivots randomly.
I would try to add explanation for average case as well.

The best case can derived from the Master theorem. See https://en.wikipedia.org/wiki/Master_theorem for instance, or Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms for a proof of the theorem.

One way to think of it is the following.
If quicksort makes a poor choice for a pivot, then the pivot induces an unbalanced partition, with most of the elements being on one side of the pivot (either below or above). In an extreme case, you could have, as you suggest, that all elements are below or above the pivot. In that case we can model the time complexity of quicksort with the recurrence T(n)=T(1)+T(n-1)+O(n). However, T(1)=O(1), and we can write out the recurrence T(n)=O(n)+O(n-1)+...+O(1) = O(n^2). (One has to take care to understand what it means to sum Big-Oh terms.)
On the other hand, if quicksort repeatedly makes good pivot choices, then those pivots induce balanced partitions. In the best case, about half the elements will be below the pivot, and about half the elements above the pivot. This means we recurse on two subarrays of roughly equal sizes. The recurrence then becomes T(n)=T(n/2)+T(n/2)+O(n) = 2T(n/2)+O(n). (Here, too, one must take care about rounding the n/2 if one wants to be formal.) This solves to T(n)=O(n log n).
The intuition for the last paragraph is that we compute the relative position of the pivot in the sorted order without actually sorting the whole array. We can then compute the relative position of the pivot in the below subarray without having to worry about the elements from the above subarray; similarly for the above subarray.

Firstly, paste a pseudo-code here:
In my opinion, you need understand the two cases: the worst case & the best case.
The worst case:
The most unbalanced partition occurs when the pivot divides the list into two sublists of sizes 0 and n−1. The complexity in recursively is T(n)=O(n)+T(0)+T(n-1)=O(n)+T(n-1). The master theorem tells that
T(n)=O(n²).
The best case:
In the most balanced case, each time we perform a partition we divide the list into two nearly equal pieces. The same as the worst, the recurrence relation is T(n)=O(n)+2T(n/2). And it can transform to T(n)=O(n logn).

Related

Subset sum problem with continuous subset using recursion

I am trying to think how to solve the Subset sum problem with an extra constraint: The subset of the array needs to be continuous (the indexes needs to be). I am trying to solve it using recursion in Java.
I know the solution for the non-constrained problem: Each element can be in the subset (and thus I perform a recursive call with sum = sum - arr[index]) or not be in it (and thus I perform a recursive call with sum = sum).
I am thinking about maybe adding another parameter for knowing weather or not the previous index is part of the subset, but I don't know what to do next.

You are on the right track.
Think of it this way:
for every entry you have to decide: do you want to start a new sum at this point or skip it and reconsider the next entry.
a + b + c + d contains the sum of b + c + d. Do you want to recompute the sums?
Maybe a bottom-up approach would be better

The O(n) solution that you asked for:
This solution requires three fixed point numbers: The start and end indices, and the total sum of the span
Starting from element 0 (or from the end of the list if you want) increase the end index until the total sum is greater than or equal to the desired value. If it is equal, you've found a subset sum. If it is greater, move the start index up one and subtract the value of the previous start index. Finally, if the resulting total is greater than the desired value, move the end index back until the sum is less than the desired value. In the other case (where the sum is less) move the end index forward until the sum is greater than the desired value. If no match is found, repeat
So, caveats:
Is this "fairly obvious"? Maybe, maybe not. I was making assumptions about order of magnitude similarity when I said both "fairly obvious" and o(n) in my comments
Is this actually o(n)? It depends a lot on how similar (in terms of order of magnitude (digits in the number)) the numbers in the list are. The closer all the numbers are to each other, the fewer steps you'll need to make on the end index to test if a subset exists. On the other hand, if you have a couple of very big numbers (like in the thousands) surrounded by hundreds of pretty small numbers (1's and 2's and 3's) the solution I've presented will get closers to O(n^2)
This solution only works based on your restriction that the subset values are continuous

Search the different number in array, when all the other numbers are same , can this be done in O(logn) using divide and conquer

Lets say we have a very large array and we need to find the only different number in the array, all the other numbers are the same in the array, can we find it in O(log n) using divide and conquer, just like mergeSort, please provide an implementation.

This cannot be done in better time complexity than O(n) unless that array is special. With the constraints you have given, even if you apply an algorithm like divide and conquer you have to visit every array element at least once.
As dividing the array will be O(log n) and comparing 2 elements when array is reduced to size 2 will be O(1)
This is wrongly put. Dividing the array is not O(log n). The reason why something like a binary search works in O(log n) is because the array is sorted and that way you can discard the other half of the array at every step even without looking at what elements they have, thereby halving the size of original problem.
Intuitively, you can think this as follows : Even if you keep on dividing the array into halves, the leaf nodes of the tree formed are n/2 (Considering you compare 2 elements at leaf). You will have to make n/2 comparisons, which leads to asymptotic complexity of O(n).

Is ArrayList indexOf complexity N?

I have N numbers in arraylist. To get the indexOf, arraylist will have to iterate maximum N times, so complexity is O(N), is that correct?

Source Java API
Yes,Complexity is O(N).
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.

Yes it's O(n) as it needs to iterate through every item in the list in the worst case.
The only way to achieve better than this is to have some sort of structure to the list. The most typical example being looking through a sorted list using binary search in O(log n) time.

Yes, that is correct. The order is based off the worst case.

100%, it needs to iterate through the list to find the correct index.

It is true. Best Case is 1 so O(1), Average Case is N/2 so O(N) and Worst Case is N so O(N)

In the worst case you find the element at the very last position, which takes N steps, that is, O(N). In the best case the item you are searching for is the very first one, so the complexity is O(1). The average length is of the average number of steps. If we do not have further context, then this is how one can make the calculations:
avg = (1 + 2 + ... n) / n = (n * (n + 1) / 2) / n = (n + 1) / 2
If n -> infinity, then adding a positive constant and dividing by a positive constant has no effect, we still have infinity, so it is O(n).
However if you have a large finite data to work with, then you might want to calculate the exact average value as above.
Also, you might have a context there which could aid you to get further accuracy in your calculations.
Example:
Let's consider the example when your array is ordered by usage frequency descendingly. In case that your call of indexOf is a usage, then the most probable item is the first one, then the second and so on. If you have exact usage frequency for each item, then you will be able to calculate a probable wait time.

An ArrayList is an Array with more features. So the order of complexity for operations done to an ArrayList is the same as for an Array.

Could a modified quicksort be O(n) best case?

It's generally agreed that the best case for quicksort is O(nlogn), given that the array is partitioned by roughly half each time. It's also said that the worst case is order n^2, assuming that the array is sorted.
Can't we modify quicksort by setting a boolean called swap? For example, if there is no initial swap in position for the first pass, then we can assume that the array is already sorted, therefore do not partition the data any further.
I know that the modified bubble sort uses this by checking for swaps, allowing the best case to be O(n) rather than O(n^2). Can this method be applied to quicksort? Why or why not?

There is one mistake with your approach...
For example we have an array like this:
1243 5 678
our Pivot Element is 5. After a first pass there would be no swap(because 4 and 3 are both smaller), but the array is NOT sorted. So you have to start dividing it and that leads to n log n.

No, this won't work for quicksort. In bubble sort if you do a pass through the array without making any swaps you know that the entire array is sorted. This is because each element is compared to its neighbor in bubble sort, so you can infer that the entire array is sorted after any pass where no swaps are done.
That isn't the case in quicksort. In quicksort each element is compared to a single pivot element. If you go through an entire pass without moving anything in quicksort it only tells you that the elements are sorted with respect to the pivot (values less than the pivot are to its left, values greater than the pivot are to its right), not to each other.

Also, there is also the problem that you also get O(n) behaviour with almost sorted arrays in addition to the fully sorted input.
You can try harder to make your approach work but I don't think you can make it breach the O(n log n) boundary. There is a proof that comparison-based sorts cannot be more efficient then O(n log n) in the worst case.

about the usage of modulus operator

this a part of code for Quick Sort algorithm but realy I do not know that why it uses rand() %n please help me thanks
Swap(V,0,rand() %n) // move pivot elem to V[0]

It is used for randomizing the Quick Sort to achieve an average of nlgn time complexity.
To Quote from Wikipedia:
What makes random pivots a good choice?
Suppose we sort the list and then
divide it into four parts. The two
parts in the middle will contain the
best pivots; each of them is larger
than at least 25% of the elements and
smaller than at least 25% of the
elements. If we could consistently
choose an element from these two
middle parts, we would only have to
split the list at most 2log2n times
before reaching lists of size 1,
yielding an algorithm.

Quick sort has its average time complexity O(nlog(n)) but worst case complexity is n^2 (when array is already sorted). So to make it O(nlog(n)) pivot is chosen randomly so rand()%n is generating a random index between 0 to n-1.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Trying to understand complexity of quick sort - java

The best case can derived from the Master theorem. See https://en.wikipedia.org/wiki/Master_theorem for instance, or Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms for a proof of the theorem.

Related

Subset sum problem with continuous subset using recursion

Search the different number in array, when all the other numbers are same , can this be done in O(logn) using divide and conquer

Is ArrayList indexOf complexity N?

Could a modified quicksort be O(n) best case?

about the usage of modulus operator

Categories

Resources