about Select algorithm

about Select algorithm - java

I have read about the selection algorithm and I have a question maybe it looks silly!!! but why we consider the array as groups of 5 elements ?? can we consider it with 7 or 3 elements??thanks also is there any link to help me for understanding this aim better?
also this is my proof when we consider the array with 3 elements and it still is order of n,why?is this correct?
T(n)<=T(n/3)+T(n/3)+theta(n)
claim: T(n)<=cn
proof: For all k<=n : T(n)<=ck
T(n)<=(nc/3)+(nc/3)+theta(n)
T(n)<= (2nc/3)+theta(n)
T(n)<=cn-(cn/3-theta(n)) and for c>=3 theta(n) this algorithm with this condition will have an order of n,too !!!!

A little bit of googling and I found this. There is a very small section on why 5, but it doesn't really answer your question specifically other than to say that it is the smallest possible odd number that can be used (must be odd to give a median). There is some mathematical proof that it can't be 3 (but I don't really understand it myself). I think it is basically saying it can any odd number, 5 or greater, but the smaller the better, I guess because it will be quicker to find the median in the smaller group?

I think you made a mistake for T(n). It should be T(n)=T(n/3)+T(2n/3)+O(n).
The T(n/3) is for finding the pivot (median of medians). Only half of all the n/3 groups have a median smaller than the pivot. Those groups have 2 elements smaller than the pivot. Giving 2*(1/2 * n/3) == n/3 elements smaller than the pivot. Thus only 33% must be smaller than the pivot (and 33% must be greater than the pivot). So, in worse case you still have 66% for the next iteration, T(2n/3).
I can't read your proof well, but now it is impossible to prove it. Right?

Related

Subset sum problem with continuous subset using recursion

I am trying to think how to solve the Subset sum problem with an extra constraint: The subset of the array needs to be continuous (the indexes needs to be). I am trying to solve it using recursion in Java.
I know the solution for the non-constrained problem: Each element can be in the subset (and thus I perform a recursive call with sum = sum - arr[index]) or not be in it (and thus I perform a recursive call with sum = sum).
I am thinking about maybe adding another parameter for knowing weather or not the previous index is part of the subset, but I don't know what to do next.

You are on the right track.
Think of it this way:
for every entry you have to decide: do you want to start a new sum at this point or skip it and reconsider the next entry.
a + b + c + d contains the sum of b + c + d. Do you want to recompute the sums?
Maybe a bottom-up approach would be better

The O(n) solution that you asked for:
This solution requires three fixed point numbers: The start and end indices, and the total sum of the span
Starting from element 0 (or from the end of the list if you want) increase the end index until the total sum is greater than or equal to the desired value. If it is equal, you've found a subset sum. If it is greater, move the start index up one and subtract the value of the previous start index. Finally, if the resulting total is greater than the desired value, move the end index back until the sum is less than the desired value. In the other case (where the sum is less) move the end index forward until the sum is greater than the desired value. If no match is found, repeat
So, caveats:
Is this "fairly obvious"? Maybe, maybe not. I was making assumptions about order of magnitude similarity when I said both "fairly obvious" and o(n) in my comments
Is this actually o(n)? It depends a lot on how similar (in terms of order of magnitude (digits in the number)) the numbers in the list are. The closer all the numbers are to each other, the fewer steps you'll need to make on the end index to test if a subset exists. On the other hand, if you have a couple of very big numbers (like in the thousands) surrounded by hundreds of pretty small numbers (1's and 2's and 3's) the solution I've presented will get closers to O(n^2)
This solution only works based on your restriction that the subset values are continuous

Trying to understand complexity of quick sort

I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements
But how it is calculated to O(N^2)
I have read couple of articles still not able to understand it fully.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated

I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements.
So, imagine that you repeatedly pick the worst pivot; i.e. in the N-1 case one partition is empty and you recurse with N-2 elements, then N-3, and so on until you get to 1.
The sum of N-1 + N-2 + ... + 1 is (N * (N - 1)) / 2. (Students typically learn this in high-school maths these days ...)
O(N(N-1)/2) is the same as O(N^2). You can deduce this from first principles from the mathematical definition of Big-O notation.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated.
That is a bit more complicated.
Think of the problem as a tree:
At the top level, you split the problem into two equal-sized sub problems, and move N objects into their correct partitions.
At the 2nd level. you split the two sub-problems into four sub-sub-problems, and in 2 problems you move N/2 objects into their correct partitions, for a total of N objects moved.
At the bottom level you have N/2 sub-problems of size 2 which you (notionally) split into N problems of size 1, again copying N objects.
Clearly, at each level you move N objects. The height of the tree for a problem of size N is log2N. So ... there are N * log2N object moves; i.e. O(N * log2)
But log2N is logeN * loge2. (High-school maths, again.)
So O(Nlog2N) is O(NlogN)

Little correction to your statement:
I understand worst case happens when the pivot is the smallest or the largest element.
Actually, the worst case happens when each successive pivots are the smallest or the largest element of remaining partitioned array.
To better understand the worst case: Think about an already sorted array, which you may be trying to sort.
You select first element as first pivot. After comparing the rest of the array, you would find that the n-1 elements still are on the other end (rightside) and the first element remains at the same position, which actually totally beats the purpose of partitioning. These steps you would keep repeating till the last element with the same effect, which in turn would account for (n-1 + n-2 + n-3 + ... + 1) comparisons, and that sums up to (n*(n-1))/2 comparisons. So,
O(n*(n-1)/2) = O(n^2) for worst case.
To overcome this problem, it is always recommended to pick up each successive pivots randomly.
I would try to add explanation for average case as well.

The best case can derived from the Master theorem. See https://en.wikipedia.org/wiki/Master_theorem for instance, or Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms for a proof of the theorem.

One way to think of it is the following.
If quicksort makes a poor choice for a pivot, then the pivot induces an unbalanced partition, with most of the elements being on one side of the pivot (either below or above). In an extreme case, you could have, as you suggest, that all elements are below or above the pivot. In that case we can model the time complexity of quicksort with the recurrence T(n)=T(1)+T(n-1)+O(n). However, T(1)=O(1), and we can write out the recurrence T(n)=O(n)+O(n-1)+...+O(1) = O(n^2). (One has to take care to understand what it means to sum Big-Oh terms.)
On the other hand, if quicksort repeatedly makes good pivot choices, then those pivots induce balanced partitions. In the best case, about half the elements will be below the pivot, and about half the elements above the pivot. This means we recurse on two subarrays of roughly equal sizes. The recurrence then becomes T(n)=T(n/2)+T(n/2)+O(n) = 2T(n/2)+O(n). (Here, too, one must take care about rounding the n/2 if one wants to be formal.) This solves to T(n)=O(n log n).
The intuition for the last paragraph is that we compute the relative position of the pivot in the sorted order without actually sorting the whole array. We can then compute the relative position of the pivot in the below subarray without having to worry about the elements from the above subarray; similarly for the above subarray.

Firstly, paste a pseudo-code here:
In my opinion, you need understand the two cases: the worst case & the best case.
The worst case:
The most unbalanced partition occurs when the pivot divides the list into two sublists of sizes 0 and n−1. The complexity in recursively is T(n)=O(n)+T(0)+T(n-1)=O(n)+T(n-1). The master theorem tells that
T(n)=O(n²).
The best case:
In the most balanced case, each time we perform a partition we divide the list into two nearly equal pieces. The same as the worst, the recurrence relation is T(n)=O(n)+2T(n/2). And it can transform to T(n)=O(n logn).

8 puzzle: Solvability and shortest solution

I have built a 8 puzzle solver using Breadth First Search. I would now want to modify the code to use heuristics. I would be grateful if someone could answer the following two questions:
Solvability
How do we decide whether an 8 puzzle is solvable ? (given a starting state and a goal state )
This is what Wikipedia says:
The invariant is the parity of the permutation of all 16 squares plus
the parity of the taxicab distance (number of rows plus number of
columns) of the empty square from the lower right corner.
Unfortunately, I couldn't understand what that meant. It was a bit complicated to understand. Can someone explain it in a simpler language?
Shortest Solution
Given a heuristic, is it guaranteed to give the shortest solution using the A* algorithm? To be more specific, will the first node in the open list always have a depth ( or the number of movements made so fat ) which is the minimum of the depths of all the nodes present in the open list?
Should the heuristic satisfy some condition for the above statement to be true?
Edit : How is it that an admissible heuristic will always provide the optimal solution? And how do we test whether a heuristic is admissible?
I would be using the heuristics listed here
Manhattan Distance
Linear Conflict
Pattern Database
Misplaced Tiles
Nilsson's Sequence Score
N-MaxSwap X-Y
Tiles out of row and column
For clarification from Eyal Schneider :

I'll refer only to the solvability issue. Some background in permutations is needed.
A permutation is a reordering of an ordered set. For example, 2134 is a reordering of the list 1234, where 1 and 2 swap places. A permutation has a parity property; it refers to the parity of the number of inversions. For example, in the following permutation you can see that exactly 3 inversions exist (23,24,34):
1234
1432
That means that the permutation has an odd parity. The following permutation has an even parity (12, 34):
1234
2143
Naturally, the identity permutation (which keeps the items order) has an even parity.
Any state in the 15 puzzle (or 8 puzzle) can be regarded as a permutation of the final state, if we look at it as a concatenation of the rows, starting from the first row. Note that every legal move changes the parity of the permutation (because we swap two elements, and the number of inversions involving items in between them must be even). Therefore, if you know that the empty square has to travel an even number of steps to reach its final state, then the permutation must also be even. Otherwise, you'll end with an odd permutation of the final state, which is necessarily different from it. Same with odd number of steps for the empty square.
According to the Wikipedia link you provided, the criteria above is sufficient and necessary for a given puzzle to be solvable.

The A* algorithm is guaranteed to find the (one if there are more than one equal short ones) shortest solution, if your heuristic always underestimates the real costs (In your case the real number of needed moves to the solution).
But on the fly I cannot come up with a good heuristic for your problem. That needs some thinking to find such a heuristic.
The real art using A* is to find a heuristic that always underestimates the real costs but as little as possible to speed up the search.
First ideas for such a heuristic:
A quite pad but valid heuristic that popped up in my mind is the manhatten distance of the empty filed to its final destination.
The sum of the manhatten distance of each field to its final destination divided by the maximal number of fields that can change position within one move. (I think this is quite a good heuristic)

For anyone coming along, I will attempt to explain how the OP got the value pairs as well as how he determines the highlighted ones i.e. inversions as it took me several hours to figure it out. First the pairs.
First take the goal state and imagine it as a 1D array(A for example)
[1,2,3,8,0,4,7,5]. Each value in that array has it's own column in the table(going all the way down, which is the first value of the pair.)
Then move over 1 value to the right in the array(i + 1) and go all the way down again, second pair value. for example(State A): the first column, second value will start [2,3,8,0,4,7,5] going down. the second column, will start [3,8,0,4,7,5] etc..
okay now for the inversions. for each of the 2 pair values, find their INDEX location in the start state. if the left INDEX > right INDEX then it's an inversion(highlighted). first four pairs of state A are: (1,2),(1,3),(1,8),(1,0)
1 is at Index 3
2 is at Index 0
3 > 0 so inversion.
1 is 3
3 is 2
3 > 2 so inversion
1 is 3
8 is 1
3 > 2 so inversion
1 is 3
0 is 7
3 < 7 so No inversion
Do this for each pairs and tally up the total inversions.
If both even or both odd (Manhattan distance of blank spot And total inversions)
then it's solvable. Hope this helps!

What is the most inefficient sorting routine?

For an array of integers, what is the least efficient way of sorting the array. The function should make progress in each step (eg no infinity loop). What is the runtime of that algorithm?

There can be no least efficient algorithm for anything. This can easily be proved by contradiction so long as you accept that, starting from any algorithm, another equivalent but less efficient algorithm can be constructed.

Bogosort has average runtime of O(n*n!), ouch.

The stupid sort is surely the worst algorithm. It's not exactly an infinite loop but this approach has as worst case O(inf) and the avarage is O(n × n!).

You can do it in O(n!*n) by generating all unqiue permutations and afterwards checking if the array is sorted.

The least efficiant algorithm I can think with a finite upper bound on runtime is permutation sort. The idea is to generate every permutation (combination) of the inputs and check if it's sorted.
The upper bound is O(n!), the lower bound is O(n) when the array is already sorted.

Iterate over all finite sequences of integers using diagonalization.
For each sequence, test whether it's sorted.
If so, test whether its elements match the elements in your array.
This will have an upper bound (first guess: O(n^(n*m)?).

Strange question, normally we go for the fastest.
Find the highest and move it to a list. Repeat the previous step till the original list has only one element left. You are guaranteed O(N^2).

Bogosort is a worst sorting algorithm uses shuffle. But in another point of view it has low probability to sort array in one step :)

"Worstsort" has a complexity of where factorial of n iterated m times. The paper by Miguel A. Lerma can be found here.

Bogobogo sort.It's like bogo sort,shuffles.But it creates auxiliary arrays,the first one is the same array others are smaller by 1 element compared to previous one.They can be removed as well.
It's average complexity is O(N superfactorial*n). It's best case is O(N^2). Just like bogo sort it has worst case of O(inf).

about the usage of modulus operator

this a part of code for Quick Sort algorithm but realy I do not know that why it uses rand() %n please help me thanks
Swap(V,0,rand() %n) // move pivot elem to V[0]

It is used for randomizing the Quick Sort to achieve an average of nlgn time complexity.
To Quote from Wikipedia:
What makes random pivots a good choice?
Suppose we sort the list and then
divide it into four parts. The two
parts in the middle will contain the
best pivots; each of them is larger
than at least 25% of the elements and
smaller than at least 25% of the
elements. If we could consistently
choose an element from these two
middle parts, we would only have to
split the list at most 2log2n times
before reaching lists of size 1,
yielding an algorithm.

Quick sort has its average time complexity O(nlog(n)) but worst case complexity is n^2 (when array is already sorted). So to make it O(nlog(n)) pivot is chosen randomly so rand()%n is generating a random index between 0 to n-1.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

about Select algorithm - java

Related

Subset sum problem with continuous subset using recursion

Trying to understand complexity of quick sort

8 puzzle: Solvability and shortest solution

What is the most inefficient sorting routine?

about the usage of modulus operator

Categories

Resources