I have a program that searches a 2d array using a binary search. In this case I am using the matrix below and searching for the integers 4,12,110,5,111. The program finds all of them except for 110 and 111 why is this?
{1,3,7,8,8,9,12},
{2,4,8,9,10,30,38},
{4,5,10,20,29,50,60},
{8,10,11,30,50,60,61},
{11,12,40,80,90,100,111},
{13,15,50,100,110,112,120},
{22,27,61,112,119,138,153},
public static boolean searchMatrix(int[][] matrix, int p,int n) {
int low = 0, high = n-1 ;
while (low < high) {
int mid = (low + high) / 2;
if (p == matrix[mid][0])return true;
else if (p < matrix[mid][0]) high = mid - 1;
else if (p < matrix[mid+1][0]) { low = mid; break; }
else low = mid + 1;
}
int row = low;
low = 0; high = matrix[row].length - 1;
while (low <= high) {
int mid = (low + high) / 2;
if (p == matrix[row][mid]) return true;
else if (p < matrix[row][mid]) high = mid - 1;
else low = mid + 1;
}
return false;
}
I would rather say it's more or less a surprise that your algorithm does find 4, 5, and 12 in the first place. The reason for that is that 4 occurs in the first position of a row and 5 and 12 satisfy the condition that they are less than the first element in the next row. Only due to that fact are they discovered in the second half of your algorithm. The algorithm is a bit hard to read and I did not evaluate the +/-1 magic, but it seems the algorithm expects the 110 and 111 to occur actually in the last row (since both 110 and 111 are greater than 22) where they are not.
If I get you right, your approach is flawed in that it is actually impossible, by looking at a single number, to tell what row it will occur in, which is what your first tries to achieve. So any two-phase algorithm that first picks a row and then searches a column must fail.
With the few constraints you have on your matrix (each row and each column is sorted), it does not seem like binary search will work at all: Even if your bounds low and high would be 2D points it would not help a lot. Consider any element of the matrix that is greater than your search point. Then all you can say is that your search point is not below and right of that element (whereas what you were hoping to be able to conclude was that it was left and above, but that is not necessarily true - it can be above and right or left and below), so you are only cutting off only a too small part of the search space.
Your issue is that you are making the false assumption that you can first lock down the row of your search value, and then easily do binary search on that row. That's not the case at all.
For 110 and 111, the first element of each row is always less than your search value, and your algorithm comes to the false conclusion that that this means that your row must be the array with index 6 after the first loop. This is simply not true.
The reason it works for small numbers is because your algorithm is just lucky enough to lock down the right row in the first loop...
One correct algorithm to quickly search on a 2d matrix where every row and column is sorted in ascending order is as follows:
1) Start with top right element 2) Loop: compare this element e with x
….i) if they are equal then return its position …ii) e < x then move
it to down (if out of bound of matrix then break return false) ..iii)
e > x then move it to left (if out of bound of matrix then break
return false) 3) repeat the i), ii) and iii) till you find element or
returned false
I found this algorithm here: http://www.geeksforgeeks.org/search-in-row-wise-and-column-wise-sorted-matrix/
It's O(n) for an n x n matrix.
Related
I'm working on a problem right now where we are provided with a 1D array of values, and must find the path from the first index to the last index that sums to the smallest sum. The only restrictions are that if you go forward, the distance you go must be 1 more than the last "jump", while if you go backwards, you must go the same distance as the last "jump". For instance, given an array int[] values = new int[]{4, 10, 30, 1, 6}, you need to find the path that gets you from position 0 (4) to position 4 (6) that sums up to the smallest amount. The starting indice is not counted, thus, if I go from values[0] to values[1] (which is the only possible starting move), my running total at that point would be 10. From there, I either have the choice to "jump" back the same distance (to values[0]), or "jump" one distance longer than my last jump, which would then be 1+1=2, so jump from values[1] to values[3].
I'm really new to dynamic programming and attempted a solution that went something like this
public static int smallestCalc(int[] values, int prevJump, int pos, int runTot) {
while (pos != penal.length) {
int forwards = 600;
int backwards = 600;
try {
backwards = penal[pos - prevJump];
} catch (Exception ignore) {}
try {
forwards = penal[pos + prevJump+1];
} catch (Exception ignore) {}
int min = Math.min(forwards,backwards);
if (min == backwards) {
pos -= prevJump;
} else {
pos += prevJump + 1;
prevJump ++;
}
runTot+=min;
smallestCalc(values, prevJump, pos, runTot);
}
return runTot;
}
However, I recognize that I'm not actually making use of a dynamic programming table here, but I'm not exactly sure what I would store inside that I need to "remember" across calculations, or how I could even utilize it in calculations. From what I see, it appears that I basically have to make a recursive function that evaluates all possible jump distances from an index, store them, and then traverse through the DP table to find the smallest amount? Should I be starting from the last index or the first index of the array to limit possible moves? I watched this video here, and understood the premise, but it seems much more applicable to his 2D Array than anything I have. Any amount of guidance here would be greatly appreciated.
In my opinion, this is a 2D DP problem.
dp(i, j) represents the minimum sum required to reach last index from index i and minimum jump allowed of size j.
Let's say you are at index i in the array. Then you can either go to index i-j+1 or index i+j.
So,
int recur(int values[], int i, int j){
// base case. Here n is size of values array
if(i==n-1)
return 0;
if(dp[i][j] != -1){
/* here -1 is taken as to mark never calculated state of dp.
If the values[] array also contains negative values then you need to change it to
something appropriate.
*/
return dp[i][j];
}
int a = INT_MAX;
int b = a;
if(i>0 && (i-j+1)>=0)
a = values[i-j + 1] + recur(values, i-j+1, j);
if(i+j < n)
b = values[i+j] + recur(values, i+j, j+1);
return dp[i][j] = min(a, b);
}
Time and space complexity O(n * n).
Edit:
Initial function call is recur(values, 0, 1).
I know that the tag of this question is java but I do competitive programming in c++ only. Here I have full working code if you want in c++.
The description of the problem and it's solution(s) can be found here
https://www.geeksforgeeks.org/count-the-number-of-ways-to-divide-n-in-k-groups-incrementally/
Basically the problem is given N people, how many ways can you divide them into K groups, such that each group is greater than or equal in number of people to the one that came before it?
The solution is to recurse through every possibility, and it's complexity can be cut down from O(NK) to O(N2 * K) through dynamic programming.
I understand the complexity of the old recursive solution, but have trouble understanding why the dynamic programming solution has O(N2 * K) complexity. How does one come to this conclusion about the dynamic programming solution's time complexity? Any help would be appreciated!
First of all, big O notation gives us an idea about the relation between two functions t(n)/i(n) when n -> infinity. To be more specific, it's an upper bound for such relation, which means it's f(n) >= t(n)/i(n). t(n) stands for the speed of growth of time spent on execution, i(n) describes the speed of growth of input. In function space (we work with functions there rather than with numbers and treat functions almost like numbers: we can divide or compare them, for example) the relation between two elements is also a function. Hence, t(n)/i(n) is a function.
Secondly, there are two methods of determining bounds for that relation.
The scientific observational approach implies the next steps. Let's see how much time it takes to execute an algorithm with 10 pieces of input. Then let's increase the input up to 100 pieces, and then up to 1000 and so on. The speed of growth of input i(n) is exponential (10^1, 10^2, 10^3, ...). Suppose, we get the exponential speed of growth of time as well (10^1 sec, 10^2 sec, 10^3 sec, ... respectively).
That means t(n)/i(n) = exp(n)/exp(n) = 1, n -> infinity (for the scientific purity sake, we can divide and compare functions only when n -> infinity, it doesn't mean anything for the practicality of the method though). We can say at least (remember it's an upper bound) the execution time of our algorithm doesn't grow faster than its input does. We might have got, say, the quadratic exponential speed of growth of time. In that case t(n)/i(n) = exp^2(n)/exp(n) = a^2n/a^n = exp(n), a > 1, n -> infinity, which means our time complexity is O(exp(n)), big O notation only reminds us that it's not a tight bound. Also, it's worth pointing out that it doesn't matter which speed of growth of input we choose. We might have wanted to increase our input linearly. Then t(n)/i(n) = exp(n)*n/n = exp(n) would express the same as t(n)/i(n) = exp^2(n)/exp(n) = a^2n/a^n = exp(n), a > 1. What matters here is the quotient.
The second approach is theoretical and mostly used in the analysis of relatively obvious cases. Say, we have a piece of code from the example:
// DP Table
static int [][][]dp = new int[500][500][500];
// Function to count the number
// of ways to divide the number N
// in groups such that each group
// has K number of elements
static int calculate(int pos, int prev,
int left, int k)
{
// Base Case
if (pos == k)
{
if (left == 0)
return 1;
else
return 0;
}
// if N is divides completely
// into less than k groups
if (left == 0)
return 0;
// If the subproblem has been
// solved, use the value
if (dp[pos][prev][left] != -1)
return dp[pos][prev][left];
int answer = 0;
// put all possible values
// greater equal to prev
for (int i = prev; i <= left; i++)
{
answer += calculate(pos + 1, i,
left - i, k);
}
return dp[pos][prev][left] = answer;
}
// Function to count the number of
// ways to divide the number N in groups
static int countWaystoDivide(int n, int k)
{
// Intialize DP Table as -1
for (int i = 0; i < 500; i++)
{
for (int j = 0; j < 500; j++)
{
for (int l = 0; l < 500; l++)
dp[i][j][l] = -1;
}
}
return calculate(0, 1, n, k);
}
The first thing to notice here is a 3-d array dp. It gives us a hint of the time complexity of a DP algorithm because usually, we traverse it once. Then we are concerned about the size of the array. It's initialized with the size 500*500*500 which doesn't give us a lot because 500 is a number, not a function, and it doesn't depend on the input variables, strictly speaking. It's done for the sake of simplicity though. Effectively, the dp has size of k*n*n with assumption that k <= 500 and n <= 500.
Let's prove it. Method static int calculate(int pos, int prev, int left, int k) has three actual variables pos, prev and left when k remains constant. The range of pos is 0 to k because it starts from 0 here return calculate(0, 1, n, k); and the base case is if (pos == k), the range of prev is 1 to left because it starts from 1 and iterates through up to left here for (int i = prev; i <= left; i++) and finally the range of left is n to 0 because it starts from n here return calculate(0, 1, n, k); and iterates through down to 0 here for (int i = prev; i <= left; i++). To recap, the number of possible combinations of pos, prev and left is simply their product k*n*n.
The second thing is to prove that each range of pos, prev and left is traversed only once. From the code, it can be determined by analysing this block:
for (int i = prev; i <= left; i++)
{
answer += calculate(pos + 1, i,
left - i, k);
}
All the 3 variables get changed only here. pos grows from 0 by adding 1 on each step. On each particular value of pos, prev gets changed by adding 1 from prev up to left, on each particular combination of values of pos and prev, left gets changed by subtracting i, which has the range prev to left, from left.
The idea behind this approach is once we iterate over an input variable by some rule, we get corresponding time complexity. We could iterate over a variable stepping on elements by decreasing the range by twice on each step, for example. In that case, we would get logarithmical complexity. Or we could step on every element of the input, then we would get linear complexity.
In other words, we without any doubts assume the minimum time complexity t(n)/i(n) = 1 for every algorithm from common sense. Meaning that t(n) and i(n) grow equally fast. That also means we do nothing with the input. Once we do something with the input, t(n) becomes f(n) times bigger than i(n). By the logic shown in the previous lines, we need to estimate f(n).
Tried to understand the solution for Codility NailingPlanks.
Link for the Problem:
https://app.codility.com/programmers/lessons/14-binary_search_algorithm/nailing_planks/
You are given two non-empty arrays A and B consisting of N integers.
These arrays represent N planks. More precisely, A[K] is the start and
B[K] the end of the K−th plank.
Next, you are given a non-empty array C consisting of M integers. This
array represents M nails. More precisely, C[I] is the position where
you can hammer in the I−th nail.
We say that a plank (A[K], B[K]) is nailed if there exists a nail C[I]
such that A[K] ≤ C[I] ≤ B[K].
The goal is to find the minimum number of nails that must be used
until all the planks are nailed. In other words, you should find a
value J such that all planks will be nailed after using only the first
J nails. More precisely, for every plank (A[K], B[K]) such that 0 ≤ K
< N, there should exist a nail C[I] such that I < J and A[K] ≤ C[I] ≤
B[K].
Link for the solution:
https://github.com/ZRonchy/Codility/blob/master/Lesson12/NailingPlanks.java
import java.util.Arrays;
class Solution {
public int solution(int[] A, int[] B, int[] C) {
// the main algorithm is that getting the minimal index of nails which
// is needed to nail every plank by using the binary search
int N = A.length;
int M = C.length;
// two dimension array to save the original index of array C
int[][] sortedNail = new int[M][2];
for (int i = 0; i < M; i++) {
sortedNail[i][0] = C[i];
sortedNail[i][1] = i;
}
// use the lambda expression to sort two dimension array
Arrays.sort(sortedNail, (int x[], int y[]) -> x[0] - y[0]);
int result = 0;
for (int i = 0; i < N; i++) {//find the earlist position that can nail each plank, and the max value for all planks is the result
result = getMinIndex(A[i], B[i], sortedNail, result);
if (result == -1)
return -1;
}
return result + 1;
}
// for each plank , we can use binary search to get the minimal index of the
// sorted array of nails, and we should check the candidate nails to get the
// minimal index of the original array of nails.
public int getMinIndex(int startPlank, int endPlank, int[][] nail, int preIndex) {
int min = 0;
int max = nail.length - 1;
int minIndex = -1;
while (min <= max) {
int mid = (min + max) / 2;
if (nail[mid][0] < startPlank)
min = mid + 1;
else if (nail[mid][0] > endPlank)
max = mid - 1;
else {
max = mid - 1;
minIndex = mid;
}
}
if (minIndex == -1)
return -1;
int minIndexOrigin = nail[minIndex][1];
//find the smallest original position of nail that can nail the plank
for (int i = minIndex; i < nail.length; i++) {
if (nail[i][0] > endPlank)
break;
minIndexOrigin = Math.min(minIndexOrigin, nail[i][1]);
// we need the maximal index of nails to nail every plank, so the
// smaller index can be omitted ****
if (minIndexOrigin <= preIndex) // ****
return preIndex; // ****
}
return minIndexOrigin;
}
}
I don't understand Line 99-102, marked with ****, of the solution.
My question is:
If minIndexOrigin <= preIndex , then it will use preIndex,
but how if the preIndex doesn't nail the current plank ?
Is there a bit mistake with the solution ?
https://app.codility.com/demo/results/trainingR7UKQB-9AQ/
Its a 100% solution. The planks are zipped into (begin, end) pairs and sorted which supports a binary search. For each nail, that nail is used to remove as many planks as possible before failing the search. When the Array of planks is empty the index of the current nail can be returned representing the count of nailed used.
O((N + M) * log(M))
All code is here, https://github.com/niall-oc/things/blob/master/codility/nailing_planks.py
def find_plank(nail, planks):
"""
planks is an array of pairs (begin, end) for each plank.
planks is sorted by start position of planks
"""
if not planks:
return -1 # empty planks
BEGIN_IDX = 0
END_IDX = 1
lower = 0
upper = len(planks)-1
while lower <= upper:
mid = (lower + upper) // 2
if planks[mid][BEGIN_IDX] > nail:
upper = mid - 1
elif planks[mid][END_IDX] < nail:
lower = mid + 1
else: # nail hits plank[mid]
return mid # return this plank id so we can pop the plank
return -1
def solution(A, B, C):
"""
Strategy is to sort the planks first. Then scan the nails and do the following.
For each nail perform a binary search for a plank.
if plank found then pop plank then search again until the nail hits no more planks.
The plank list should diminish until it hits zero meaning we have found the minimum number of nails needed
If any planks remain then return -1
100% https://app.codility.com/demo/results/trainingR7UKQB-9AQ/
"""
if max(B) < min(C) or max(C) < min(A):
return -1 # no nail can hit that plank
planks = sorted(zip(A,B))
for i in range(len(C)):
nail = C[i]
p_idx = find_plank(nail, planks)
# print(f'idx{i} nail at pos {nail}, matched {p_idx}, in {planks}')
while p_idx > -1:
del planks[p_idx]
p_idx = find_plank(nail, planks)
# print(f'idx{i} nail at pos {nail}, matched {p_idx}, in {planks}')
if not planks:
# print('NO PLANKS', i+1)
return i+1 # the index of the nail that removed the last plank.
return -1 # else we couldn't remove all planks
The case that is handled in those lines is when you find that there is an index that nails the current plank, which index is less (or equal) to the lowest index we need to be able to nail another (previously analysed) plank. In that case, we don't need to look further for the current plank, since we know that:
we can nail the plank
we can use an index that is not greater than an index we really need to use for another plank.
Since we are only interested in the greatest index among the least indexes we need for the different planks (i.e. the index for the "worst" plank), we know that the index we just found now is not important any more: if we already know that we will be using all the nails up to at least preIndex, we know that one nail among that set will nail the current plank. We can just exit the loop and return a "dummy" index that will not influence the result.
Note what the effect is in the calling loop:
result = getMinIndex(A[i], B[i], sortedNail, result);
After this assignment, result will be equal to what result was before the call: this plank (A[i], B[i]) can be nailed with an earlier nail, but we don't really care which nail that is, as we need to know the worst case, which is up to now reflected by result, and all nails up to that index are already in the set of nails that will be hammered.
You can compare it with alpha-beta pruning.
I would like to provide my algorithm and implementation for the desired O(log(M) * (M + N)) runtime complexity.
Binary search over the C elements.
For each iteration, create a prefix sums of nails up the current binary search iterations. This will require two steps:
a. Count the occurences of nail positions in C.
b. Create the proper prefix sums about the available nails at a particular index.
If there is no nail in a plank range, then a nail cannot be found there, so there cannot be a solution for the task.
If there is no solution in that consecutive array, we need to look for a longer range.
If there is a solution in that consecutive sequence, we need to look for a smaller range.
The binary search goes on until we eventually find the smallest range.
The runtime complexity of the binary search is log(M) since we are bisecting a range of M. The runtime complexity of the inner iteration comes from three loops:
a. O(mid), where mid < M. So, this is O(M) in worst case.
b. O(2M), which is O(M) as we can leave the constant.
c. O(N), going through the number of elements in A and B.
Therefore, the runtime complexity of the inner loop is O(M + N).
The overall runtime complexity of the algorithm is therefore O(log(M) * (M + N)).
The overall space complexity is O(2 * M) for the prefix sums, so essentially O(M).
bool check(vector<int> &A, vector<int> &B, vector<int> &C, int mid)
{
const int M = C.size();
vector<int> prefix_sums(2 * M + 1, 0);
for (int i = 0; i < mid; ++i) ++prefix_sums[C[i]];
for (size_t i = 1; i < prefix_sums.size(); ++i) prefix_sums[i] += prefix_sums[i - 1];
for (size_t i = 0; i < A.size(); ++i) if (prefix_sums[B[i]] == prefix_sums[A[i] - 1]) return false;
return true;
}
int solution(vector<int> &A, vector<int> &B, vector<int> &C)
{
int start = 1;
int end = C.size();
int min_nails = -1;
while (start <= end) {
int mid = (start + end) / 2;
if (check(A, B, C, mid)) { end = mid - 1; min_nails = mid; }
else start = mid + 1;
}
return min_nails;
}
This is the summary of the entire algorithm. I think who understands it won't have any question in mind.
What we are doing?
1- Order the nails without losing their original indexes.
2- For each plank, find the min nail value that can nail the plank by using binary search.
3- Find each nail's min original index between min nail value and the end position of the plank, and take the minimum of these min original indexes.
4- Return the maximum of the min nailing original index of each plank.
Why we are doing it?
1- We don't want to search entire array to find the min index. The original order of the index is what is asked, so we need to store it.
2- We need to find the minimum nail value to limit the number of possible original index needed to be checked. Binary search is required to find the minimum value in logarithmic time complexity.
3- We need to find the original indexes of the candidate nails. The first candidate can be the min nail value, and the last candidate can be the end position of the plank. That's why we check the original indexes only in this interval.
4- We find the min original index of nail for each plank. But the answer will be the maximum of these min indexes since the question asks the index of the last nail we use, when we finish nailing each plank.
I have large arrays of integers (with sizes between 10'000 and 1'400'000). I want to get the first integer bigger to a value. The value is never inside the array.
I've looked for various solutions but I have only found :
methods that estimate each values and are not designed for sorted lists or arrays (with O(n) time complexity).
methods that are recursive and/or not designed for very large lists or arrays (with O(n) or more time complexity, in other languages though, so I'm not sure).
I've designed my own method. Here it is :
int findClosestBiggerInt(int value, int[] sortedArray) {
if( sortedArray[0]>value ||
value>sortedArray[sortedArray.length-1] ) // for my application's convenience only. It could also return the last.
return sortedArray[0];
int exp = (int) (Math.log(sortedArray.length)/Math.log(2)),
index = (int) Math.pow(2,exp);
boolean dir; // true = ascend, false = descend.
while(exp>=0){
dir = sortedArray[Math.min(index, sortedArray.length-1)]<value;
exp--;
index = (int)( index+ (dir ? 1 : -1 )*Math.pow(2,exp) );
}
int answer = sortedArray[index];
return answer > value ? answer : sortedArray[index+1];
}
It has a O(log n) time complexity. With an array of length 1'400'000, it will loop 21 times inside the while block. Still, I'm not sure that it cannot be improved.
Is there a more effective way to do it, without the help of external packages ? Any time saved is great because this calculation occurs quite frequently.
Is there a more effective way to do it, without the help of external packages ? Any time saved is great because this calculation occurs quite frequently.
Well here is one approach that uses a map instead of an array.
int categorizer = 10_000;
// Assume this is your array of ints.
int[] arrayOfInts = r.ints(4_000, 10_000, 1_400_000).toArray();
You can group them in a map like so.
Map<Integer, List<Integer>> ranges =
Arrays.stream(arrayOfInts).sorted().boxed().collect(
Collectors.groupingBy(n -> n / categorizer));
Now, when you want to find the next element higher, you can get the list that would contain the number.
Say you want the next number larger than 982,828
int target = 982,828;
List<Integer> list = map.get(target/categorizer); // gets the list at key = 98
Now just process the list with your favorite method. One note. In some circumstances it is possible that your highest number could be in the other lists that come after this one, depending on the gap. You would need to account for this, perhaps by adjusting how the numbers are categorized or by searching subsequent lists. But this can greatly reduce the size of the lists you're working with.
As Gene's answer indicates, you can do this with binary search. The built-in java.util.Arrays class provides a binarySearch method to do that for you:
int findClosestBiggerInt(final int value, final int[] sortedArray) {
final int index = Arrays.binarySearch(sortedArray, value + 1);
if (index >= 0) {
return sortedArray[index];
} else {
return sortedArray[-(index + 1)];
}
}
You'll find that to be much faster than the method you wrote; it's still O(log n) time, but the constant factors will be much lower, because it doesn't perform expensive operations like Math.log and Math.pow.
Binary search is easily modified to do what you want.
Standard binary search for exact match with the target maintains a [lo,hi] bracket of integers where the target value - if it exists - is always inside. Each step makes the bracket smaller. If the bracket ever gets to size zero (hi < lo), the element is not in the array.
For this new problem, the invariant is exactly the same except for the definition of the target value. We must take care never to shrink the bracket in a way that might eliminate the next bigger element.
Here's the "standard" binary search:
int search(int tgt, int [] a) {
int lo = 0, hi = a.length - 1;
// loop while the bracket is non-empty
while (lo <= hi) {
int mid = lo + (hi - lo) / 2;
// if a[mid] is below the target, ignore it and everything smaller
if (a[mid] < tgt) lo = mid + 1;
// if a[mid] is above the target, ignore it and everything bigger
else if (a[mid] > tgt) hi = mid - 1;
// else we've hit the target
else return mid;
}
// The bracket is empty. Return "nothing."
return -1;
}
In our new case, the part that obviously needs a change is:
// if a[mid] is above the target, ignore it and everything bigger
else if (a[mid] > tgt) hi = mid - 1;
That's because a[mid] might be the answer. We can't eliminate it from the bracket. The obvious thing to try is keep a[mid] around:
// if a[mid] is above the target, ignore everything bigger
else if (a[mid] > tgt) hi = mid;
But now we've introduced a new problem in one case. If lo == hi, i.e. the bracket has shrunk to 1 element, this if doesn't make progress! It sets hi = mid = lo + (hi - lo) / 2 = lo. The size of the bracket remains 1. The loop never terminates.
Therefore, the other adjustment we need is to the loop condition: stop when the bracket reaches size 1 or less:
// loop while the bracket has more than 1 element.
while (lo < hi) {
For a bracket of size 2 or more, lo + (hi - lo) / 2 is always smaller than hi. Setting hi = mid makes progress.
The last modification we need is checking the bracket after the loop terminates. There are now three cases rather than one in the original algorithm:
empty or
contains one element, which is the answer,
or it's not.
It's easy to sort these out just before returning. In all, we have:
int search(int tgt, int [] a) {
int lo = 0, hi = a.length - 1;
while (lo < hi) {
int mid = lo + (hi - lo) / 2;
if (a[mid] < tgt) lo = mid + 1;
else if (a[mid] > tgt) hi = mid;
else return mid;
}
return lo > hi || a[lo] < tgt ? -1 : lo;
}
As you point out, for a 1.4 million element array, this loop will execute no more than 21 times. My C compiler produces 28 instructions for the whole thing; the loop is 14. 21 iterations ought to be a handful of microseconds. It requires only small constant space and generates zero work for the Java garbage collector. Hard to see how you'll do better.
I was asked the following question in my interview yesterday:
Consider a Java or C++ array say X which is sorted and no two elements in it are same. How best can you find an index say i such that element at that index is also i. That is X[i] = i.
As clarification she also gave me an example:
Array X : -3 -1 0 3 5 7
index : 0 1 2 3 4 5
Answer is 3 as X[3] = 3.
The best I could think was a linear search. After the interview I though a lot on this problem but could not find any better solution. My argument is: the element with the required property can be anywhere in the array. So it could also be at the very end of the array so we need to check every element.
I just wanted to confirm from the community here that I'm right. Please tell me I'm right :)
This can be done in O(logN) time and O(1) space by using a slightly modified binary search.
Consider a new array Y such that Y[i] = X[i] - i
Array X : -3 -1 0 3 5 7
index : 0 1 2 3 4 5
Array Y : -3 -2 -2 0 1 2
Since the elements in X are in increasing order, the elements in the
new array Y will be in non-decreasing order. So a binary
search for 0 in Y will give the answer.
But creating Y will take O(N) space and O(N) time. So instead of
creating the new array you just modify the binary search such that a
reference to Y[i] is replaced by X[i] - i.
Algorithm:
function (array X)
low = 0
high = (num of elements in X) - 1
while(low <= high)
mid = (low + high) / 2
// change X[mid] to X[mid] - mid
if(X[mid] - mid == 0)
return mid
// change here too
else if(X[mid] - mid < 0)
low = mid + 1;
else
high = mid - 1;
end while
return -1 // no such index exists...return an invalid index.
end function
Java implementation
C++ implementation
There are some faster solutions, averaging O(log n) or in some cases O(log log n) instead of O(n). Have a google for "binary search" and "interpolation search", you're likely to find very good explanations.
If the array is unsorted, then yes, the element is anywhere and you can't get under O(n), but that's not the case with sorted arrays.
--
Some explanation on interpolation search as requested:
While the binary search only concerns with comparing two elements in terms of "greater / not greater", the interpolation search tries to also make use of numerical values. The point is: You have a sorted range of values from 0 to, say, 20000. You look for 300 - binary search would start at the half of range, at 10000. The interpolation search guesses that 300 would probably be somewhere closer to 0 than 20000, so it would check the element 6000 first instead of 10000. Then again - if it's too high, recurse into lower subrange, and it's too low - recurse into upper subrange.
For a big array with +- uniform distribution of values, interpolation search should behave much faster than binary search - code it and see for yourself. Also, works best if first you use one interpolation search step, then one binary search step, and so on.
Note that it's the thing a human does intuitively when looking up something in a dictionary.
Its not require to think in terms of any array Y as suggested in answer by #codaddict.
Use binary search and check the middle element of given array, if it is lower than its index, than we do not need to check for any lower index because the array is
sorted and so if we move to the left, subtracting m indexes and (at least) m value, all subsequent elements will also be too small. E.g. if arr[5] = 4 then arr[4] <= (4 - 1) and arr[3] <= (4 - 2) and so on. Similar logic can be apply if middle element is greater than its index.
Here is simple Java implementation:
int function(int[] arr) {
int low = 0;
int high = arr.length - 1;
while(low <= high) {
int mid = high - (high - low) / 2;
if(arr[mid] == mid) {
return mid;
} else if(arr[mid] < mid) {
low = mid + 1;
} else {
high = mid - 1;
}
}
return -1; // There is no such index
}
Note that the above solution would work only if all elements are distinct.
I think this would be faster.
Start in the middle of the list
If X[i] > i then go to the middle of the remaining left side
if X[i] < i then go the middle of the remaining right
Keep doing that and it will reduce the number of possible elements by half for each loop
You can perform a binary search:
search the middle, if the value is lower than the index, than no lower index will contain the same value.
Then you search the higher half, and continue till you find the element, or reach one element span.
This is a solution I came up with, and it works if there are duplicates (I mistakenly overlooked that caveat of there being no duplicates).
//invariant: startIndex <= i <= endIndex
int modifiedBsearch(int startIndex, int endIndex)
{
int sameValueIndex = -1;
int middleIndex = (startIndex + endIndex) /2;
int middleValue = array[middleIndex];
int endValue = array[endIndex];
int startValue = array[startIndex];
if(middleIndex == middleValue)
return middleValue;
else {
if(middleValue <= endIndex)
sameValueIndex = modifiedBsearch(middleIndex + 1, endIndex)
if(sameValueIndex == -1 && startValue <= middleIndex)
sameValueIndex = modifiedBsearch(startIndex, middleIndex -1);
}
return sameValueIndex;
}
I would guess this takes O(log n) time, but this isn't clear in first glance???
If you're unlucky, it'll take O(n log n) time (the height of the stack tree will be log n, and it will be a full tree, with n nodes in the last level, n/2 in next to last, etc.).
So, on average it will be between O(log n) and O(n log n).
of the top of my head, doing binary splitting might be faster.
look at the middle value, if it is high then what you need, re-search in the lower half.
After one comparison, you have already spilt your data set in half
After reading the question it seems like there is one scenario that can be used to speed up the lookup. When comparing the position to the value, if the value is greater then the position then the value can be used as the next position to evaluate. This will help jump through the array faster. This can be done because the array is sorted. The values that we are skipping are conceptually shifted to the left in the array and are in the wrong location.
Example:
int ABC[] = { -2, -5, 4, 7, 11, 22, 55 };
If my current position is 2 and it has a value of 4 they are not equal and conceptually the value 4 is shifted to the left. I can use the value of 4 as my next position because if the value 4 is out of position then everything less then 4 is out of position as well.
Some example code just for the sake of discussion:
void main()
{
int X[] = { -3, -1, 0, 3, 5, 7};
int length = sizeof(X)/sizeof(X[0]);
for (int i = 0; i < length;) {
if (X[i] > i && X[i] < length)
i = X[i]; // Jump forward!
else if (X[i] == i) {
printf("found it %i", i);
break;
} else
++i;
}
}
Modified version of Binary Search would suffice I guess
Suppose the sequence is
Array : -1 1 4 5 6
Index : 0 1 2 3 4
Result : 1
or
Array : -2 0 1 2 4 6 10
Index : 0 1 2 3 4 5 6
Result: 4
From both the examples we see that the required result will never lie on the right side if mid < a[mid]... pseudocode would look something like this
mid <- (first + last )/2
if a[mid] == mid then
return mid
else if a[mid] < mid then
recursive call (a,mid+1,last)
else
recursive call (a,first,mid-1)
Java:
public static boolean check (int [] array, int i)
{
if (i < 0 || i >= array.length)
return false;
return (array[i] == i);
}
C++:
bool check (int array[], int array_size, int i)
{
if (i < 0 || i >= array_size)
return false;
return (array[i] == i);
}