"twoSum" solution and Big O Time Complexity explanation

"twoSum" solution and Big O Time Complexity explanation - java

I was messing around with possible "twoSum" solutions because I'm pretty bored... Anyways I'm pretty new to "Big O Time Complexity", and I was trying to figure out the time complexity of my program.
Now I don't assume my solution is at all polished or better, I was just wanting some help finding out the time complexity of it. I am very new to BIG(O) so don't dog on my skills... you can roast my code if you wish.
public static void main(String[] args) {
int[] array = {5, 3, 4, 1, 5, 2, 1, 6, 3, 2};
System.out.println(Arrays.toString(twoSum(array, 6)));
}
public static int[] twoSum(int[] nums, int target) {
int oldValue = 0;
int newValue;
int currentValue;
for (int i = 0; i < nums.length; i++) {
currentValue = i;
if (currentValue != 0) { // so oldValue never causes an exception
oldValue = i - 1;
}
newValue = i + 1;
if ((nums[currentValue] + nums[oldValue]) == target) {
return new int[]{currentValue, oldValue};
}
if ((nums[currentValue] + nums[newValue]) == target) {
return new int[]{currentValue, newValue};
}
if ((nums[newValue] + nums[oldValue]) == target) {
return new int[]{newValue, oldValue};
}
}
return new int[]{0, 0}; // we print out 0 0 if we can't find a solution
}
This really won't have an ANSWER, as I don't know how time complexity works fully (as of right now) all I am asking for is someone to help me with an explanation

From what I understand, you are trying to find 2 consecutive numbers in an array whose sum is equal to the target
Firstly, you don't need to use oldValue and newValue, since when currentValue = 2, oldValue will be equal to 1, which is basically checking the same pair of numbers as when currentValue = 1 and newValue = 2. The better version of your code should be:
public static void main(String[] args) {
int[] array = {5, 3, 4, 1, 5, 2, 1, 6, 3, 2};
System.out.println(Arrays.toString(twoSum(array, 6)));
}
public static int[] twoSum(int[] nums, int target) {
int oldValue = 0;
int newValue;
int currentValue;
for (int i = 0; i < nums.length - 1; i++) {
currentValue = i;
newValue = i + 1;
if ((nums[currentValue] + nums[newValue]) == target) {
return new int[]{currentValue, newValue};
}
}
return new int[]{0, 0}; // we print out 0 0 if we can't find a solution
}
Secondly, for time complexity, one trick I usually use is to look for loops in the code.
For my suggested code, in the worst case, it has to go through N - 1 = O(N) iterations and takes O(1) operation to execute each iteration. Therefore the time complexity is O(N) * O(1) = O(N).
For your original code, it has to go through N iterations in the worst case, and takes approximately double the number of operations to execute each iteration compared to my code. However, the number of iterations is still O(N) and the number of operations per iteration is still O(1), thus the time complexity of your code is also O(N) * O(1) = O(N)
Note
Having a loop doesn't always mean the number of iterations for that loop is O(N). For example:
for (int i = 1; i < 100; i++) {
// do something
}
The time complexity for this loop is O(1), since the number of iterations is always 100 and is independent of any input value
for (int i = 1; i < n; i = i * 2) {
// do something
}
The time complexity for this loop is O(log2(n)) = O(log(N) / log(2)) = O(log(N)) (since log(2)) is a constant)

Related

Recursion stackoverflow Java

I need to count the number of numbers on the right that are less than the number arr[i]. My problem is that the stack overflows at large sizes and I can't solve it in any way. Please tell me how can I refactor my code to avoid the error StackOverflow ?
public class Smaller {
public static int[] smaller(int[] unsorted) {
int[] result = new int[unsorted.length];
for (int i = 0; i < unsorted.length; i++) {
result[i] = countSmaller(unsorted[i], 0, i + 1, unsorted);
}
return result;
}
private static int countSmaller(int currentNumber, int count, int index, int[] arr) {
if (index >= arr.length) {
return count;
}
return arr[index] < currentNumber
? countSmaller(currentNumber, count + 1, index + 1, arr)
: countSmaller(currentNumber, count, index + 1, arr);
}
}

I agree with comments questioning whether recursion is your best solution here, but if it's a requirement you can avoid stack overflow by chopping subproblems in half rather than whittling them down one-by-one. The logic is that the count in the remaining data will be the sum of the count in the first half plus the count in the second half of the remaining data. This reduces the stack growth from O(n) to O(log n).
I originally did a Python implementation due to not having a Java compiler installed (plus my Java skills being rusty), but found an online java compiler at tutorialspoint.com. Here's an implementation of the divide and conquer logic described in the previous paragraph:
public class Smaller {
public static int[] smaller(int[] unsorted) {
int[] result = new int[unsorted.length];
for (int i = 0; i < unsorted.length; i++) {
result[i] = countSmaller(unsorted[i], i+1, unsorted.length-1, unsorted);
}
return result;
}
private static int countSmaller(int threshold, int start, int end, int[] unsorted) {
if (start < end) {
int mid = start + (end - start) / 2;
int count = countSmaller(threshold, start, mid, unsorted);
count += countSmaller(threshold, mid+1, end, unsorted);
return count;
} else if ((start == end) && (unsorted[start] < threshold)) {
return 1;
}
return 0;
}
}
With O(log n) stack growth, this should be able to handle ridonculously big arrays. Since the algorithm as a whole is O(n2), run time will limit you long before recursive stack limitations will.
Original Python Implementation
Sorry for showing this in Python, but I don't have a Java compiler and I didn't want to risk non-functioning code. The following does the trick and should be easy for you to translate:
def smaller(unsorted):
result = []
for i in range(len(unsorted)):
result.append(countSmaller(unsorted[i], i+1, len(unsorted)-1, unsorted))
return result
def countSmaller(threshold, start, end, unsorted):
if start < end:
mid = start + (end - start) // 2 # double slash is integer division
count = countSmaller(threshold, start, mid, unsorted)
count += countSmaller(threshold, mid+1, end, unsorted)
return count
elif start == end and unsorted[start] < threshold:
return 1
return 0
data = [10, 9, 8, 11, 7, 6]
print(smaller(data)) # [4, 3, 2, 2, 1, 0]
print(smaller([])) # []
print(smaller([42])) # [0]

Coin Change Recursion All Solutions to Distinct Solutions

I am new to recursion and backtracking. I know I need to be completely comfortable with these concepts before I move on to dynamic programming. I have written a program below that helps me find all the possible combinations for a given amount n and an unlimited number of coins. However, I wish to have my program give me distinct solutions. I am having a hard time figuring out how to do this.
I have found a resource here: Coin Change that uses a top down approach recursively and then modifies it to give distinct combinations using the following formula: count (s, n, total) = count (s, n, total-s[n]) + count(s, n-1, total)
This says that I recurse using the value and then recurse excluding the value and decreasing the coins by 1.
I can't seem to grasp how this works. Also I can for sure say, it would have been quite hard to even think of such a technique on the spot at an interview per say. It seems like some one at some point would have had to spend a considerable amount of time on such a problem to devise such a technique.
Anyhow any help on how I can convert my program to print distinct solutions and how it works will be really appreciated.
public class Recursive {
static int[] combo = new int[100];
public static void main(String argv[]) {
int n = 8;
int[] amounts = {1, 5, 10};
ways(n, amounts, combo, 0, 0, 0);
}
public static void ways(int n, int[] amounts, int[] combo, int count, int sum, int index) {
if(sum == n) {
printArray(combo, index);
}
if(sum > n) {
return;
}
for(int i=0;i<amounts.length;i++) {
sum = sum + amounts[i];
combo[index] = amounts[i];
ways(n, amounts, combo, 0, sum, index + 1);
sum = sum - amounts[i];
}
}
public static void printArray(int[] combo, int index) {
for(int i=0;i < index; i++) {
System.out.print(combo[i] + " ");
}
System.out.println();
}
}
The actual amount of non distinct valid combinations for amounts {1, 2, 5} and N = 10 is 128, using a pure recursive exhaustive technique (Code below).
My question is can an exhaustive search be improved with memoization/dynamic programming. If so, how can I modify the algorithm below to incorporate such techniques.

Simple modification allow to avoid repeats.
Use sorted amounts array.
Starting value of the loop should exclude previous values from amounts.
I used count argument (seems unused)
for(int i=count;i<amounts.length;i++) {
sum = sum + amounts[i];
combo[index] = amounts[i];
ways(n, amounts, combo, i, sum, index + 1);
sum = sum - amounts[i];
}

static HashMap<Integer, Integer> memo = new HashMap<Integer, Integer>();
public static void main(String argv[]) {
int n = 1000;
System.out.println(getSteps(n, 0,0 ));
}
public static int getSteps(int n, int sum, int count) {
if(n == sum) {
return 1;
}
if(sum > n) {
return 0;
}
if(memo.containsKey(sum)) {
return memo.get(sum);
}
for(int i=1; i<=3;i++) {
sum = sum + i;
count += getSteps(n, sum, 0);
sum = sum - i;
memo.put(sum, count);
}
return count;
}

Need explanation for algorithm searching minimal large sum

I'm solving Codility questions as practice and couldn't answer one of the questions. I found the answer on the Internet but I don't get how this algorithm works. Could someone walk me through it step-by-step?
Here is the question:
/*
You are given integers K, M and a non-empty zero-indexed array A consisting of N integers.
Every element of the array is not greater than M.
You should divide this array into K blocks of consecutive elements.
The size of the block is any integer between 0 and N. Every element of the array should belong to some block.
The sum of the block from X to Y equals A[X] + A[X + 1] + ... + A[Y]. The sum of empty block equals 0.
The large sum is the maximal sum of any block.
For example, you are given integers K = 3, M = 5 and array A such that:
A[0] = 2
A[1] = 1
A[2] = 5
A[3] = 1
A[4] = 2
A[5] = 2
A[6] = 2
The array can be divided, for example, into the following blocks:
[2, 1, 5, 1, 2, 2, 2], [], [] with a large sum of 15;
[2], [1, 5, 1, 2], [2, 2] with a large sum of 9;
[2, 1, 5], [], [1, 2, 2, 2] with a large sum of 8;
[2, 1], [5, 1], [2, 2, 2] with a large sum of 6.
The goal is to minimize the large sum. In the above example, 6 is the minimal large sum.
Write a function:
class Solution { public int solution(int K, int M, int[] A); }
that, given integers K, M and a non-empty zero-indexed array A consisting of N integers, returns the minimal large sum.
For example, given K = 3, M = 5 and array A such that:
A[0] = 2
A[1] = 1
A[2] = 5
A[3] = 1
A[4] = 2
A[5] = 2
A[6] = 2
the function should return 6, as explained above. Assume that:
N and K are integers within the range [1..100,000];
M is an integer within the range [0..10,000];
each element of array A is an integer within the range [0..M].
Complexity:
expected worst-case time complexity is O(N*log(N+M));
expected worst-case space complexity is O(1), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
*/
And here is the solution I found with my comments about parts which I don't understand:
public static int solution(int K, int M, int[] A) {
int lower = max(A); // why lower is max?
int upper = sum(A); // why upper is sum?
while (true) {
int mid = (lower + upper) / 2;
int blocks = calculateBlockCount(A, mid); // don't I have specified number of blocks? What blocks do? Don't get that.
if (blocks < K) {
upper = mid - 1;
} else if (blocks > K) {
lower = mid + 1;
} else {
return upper;
}
}
}
private static int calculateBlockCount(int[] array, int maxSum) {
int count = 0;
int sum = array[0];
for (int i = 1; i < array.length; i++) {
if (sum + array[i] > maxSum) {
count++;
sum = array[i];
} else {
sum += array[i];
}
}
return count;
}
// returns sum of all elements in an array
private static int sum(int[] input) {
int sum = 0;
for (int n : input) {
sum += n;
}
return sum;
}
// returns max value in an array
private static int max(int[] input) {
int max = -1;
for (int n : input) {
if (n > max) {
max = n;
}
}
return max;
}

So what the code does is using a form of binary search (How binary search works is explained quite nicely here, https://www.topcoder.com/community/data-science/data-science-tutorials/binary-search/. It also uses an example quite similar to your problem.). Where you search for the minimum sum every block needs to contain. In the example case, you need the divide the array in 3 parts
When doing a binary search you need to define 2 boundaries, where you are certain that your answer can be found in between. Here, the lower boundary is the maximum value in the array (lower). For the example, this is 5 (this is if you divide your array in 7 blocks). The upper boundary (upper) is 15, which is the sum of all the elements in the array (this is if you divide the array in 1 block.)
Now comes the search part: In solution() you start with your bounds and mid point (10 for the example).
In calculateBlockCount you count (count ++ does that) how many blocks you can make if your sum is a maximum of 10 (your middle point/ or maxSum in calculateBlockCount).
For the example 10 (in the while loop) this is 2 blocks, now the code returns this (blocks) to solution. Then it checks whether is less or more than K, which is the number of blocks you want. If its less than K your mid point is high because you're putting to many array elements in your blocks. If it's more than K, than your mid point is too high and you're putting too little array elements in your array.
Now after the checking this, it halves the solution space (upper = mid-1).
This happens every loop, it halves the solution space which makes it converge quite quickly.
Now you keep going through your while adjusting the mid, till this gives the amount blocks which was in your input K.
So to go though it step by step:
Mid =10 , calculateBlockCount returns 2 blocks
solution. 2 blocks < K so upper -> mid-1 =9, mid -> 7 (lower is 5)
Mid =7 , calculateBlockCount returns 2 blocks
solution() 2 blocks < K so upper -> mid-1 =6, mid -> 5 (lower is 5, cast to int makes it 5)
Mid =5 , calculateBlockCount returns 4 blocks
solution() 4 blocks < K so lower -> mid+1 =6, mid -> 6 (lower is 6, upper is 6
Mid =6 , calculateBlockCount returns 3 blocks
So the function returns mid =6....
Hope this helps,
Gl learning to code :)
Edit. When using binary search a prerequisite is that the solution space is a monotonic function. This is true in this case as when K increases the sum is strictly decreasing.

Seems like your solution has some problems. I rewrote it as below:
class Solution {
public int solution(int K, int M, int[] A) {
// write your code in Java SE 8
int high = sum(A);
int low = max(A);
int mid = 0;
int smallestSum = 0;
while (high >= low) {
mid = (high + low) / 2;
int numberOfBlock = blockCount(mid, A);
if (numberOfBlock > K) {
low = mid + 1;
} else if (numberOfBlock <= K) {
smallestSum = mid;
high = mid - 1;
}
}
return smallestSum;
}
public int sum(int[] A) {
int total = 0;
for (int i = 0; i < A.length; i++) {
total += A[i];
}
return total;
}
public int max(int[] A) {
int max = 0;
for (int i = 0; i < A.length; i++) {
if (max < A[i]) max = A[i];
}
return max;
}
public int blockCount(int max, int[] A) {
int current = 0;
int count = 1;
for (int i = 0; i< A.length; i++) {
if (current + A[i] > max) {
current = A[i];
count++;
} else {
current += A[i];
}
}
return count;
}
}

This is helped me in case anyone else finds it helpful.
Think of it as a function: given k (the block count) we get some largeSum.
What is the inverse of this function? It's that given largeSum we get a k. This inverse function is implemented below.
In solution() we keep plugging guesses for largeSum into the inverse function until it returns the k given in the exercise.
To speed up the guessing process, we use binary search.
public class Problem {
int SLICE_MAX = 100 * 1000 + 1;
public int solution(int blockCount, int maxElement, int[] array) {
// maxGuess is determined by looking at what the max possible largeSum could be
// this happens if all elements are m and the blockCount is 1
// Math.max is necessary, because blockCount can exceed array.length,
// but this shouldn't lower maxGuess
int maxGuess = (Math.max(array.length / blockCount, array.length)) * maxElement;
int minGuess = 0;
return helper(blockCount, array, minGuess, maxGuess);
}
private int helper(int targetBlockCount, int[] array, int minGuess, int maxGuess) {
int guess = minGuess + (maxGuess - minGuess) / 2;
int resultBlockCount = inverseFunction(array, guess);
// if resultBlockCount == targetBlockCount this is not necessarily the solution
// as there might be a lower largeSum, which also satisfies resultBlockCount == targetBlockCount
if (resultBlockCount <= targetBlockCount) {
if (minGuess == guess) return guess;
// even if resultBlockCount == targetBlockCount
// we keep searching for potential lower largeSum that also satisfies resultBlockCount == targetBlockCount
// note that the search range below includes 'guess', as this might in fact be the lowest possible solution
// but we need to check in case there's a lower one
return helper(targetBlockCount, array, minGuess, guess);
} else {
return helper(targetBlockCount, array, guess + 1, maxGuess);
}
}
// think of it as a function: given k (blockCount) we get some largeSum
// the inverse of the above function is that given largeSum we get a k
// in solution() we will keep guessing largeSum using binary search until
// we hit k given in the exercise
int inverseFunction(int[] array, int largeSumGuess) {
int runningSum = 0;
int blockCount = 1;
for (int i = 0; i < array.length; i++) {
int current = array[i];
if (current > largeSumGuess) return SLICE_MAX;
if (runningSum + current <= largeSumGuess) {
runningSum += current;
} else {
runningSum = current;
blockCount++;
}
}
return blockCount;
}
}

From anhtuannd's code, I refactored using Java 8. It is slightly slower. Thanks anhtuannd.
IntSummaryStatistics summary = Arrays.stream(A).summaryStatistics();
long high = summary.getSum();
long low = summary.getMax();
long result = 0;
while (high >= low) {
long mid = (high + low) / 2;
AtomicLong blocks = new AtomicLong(1);
Arrays.stream(A).reduce(0, (acc, val) -> {
if (acc + val > mid) {
blocks.incrementAndGet();
return val;
} else {
return acc + val;
}
});
if (blocks.get() > K) {
low = mid + 1;
} else if (blocks.get() <= K) {
result = mid;
high = mid - 1;
}
}
return (int) result;

I wrote a 100% solution in python here. The result is here.
Remember: You are searching the set of possible answers not the array A
In the example given they are searching for possible answers. Consider [5] as 5 being the smallest max value for a block. And consider [2, 1, 5, 1, 2, 2, 2] 15 as the largest max value for a block.
Mid = (5 + 15) // 2. Slicing out blocks of 10 at a time won't create more than 3 blocks in total.
Make 10-1 the upper and try again (5+9)//2 is 7. Slicing out blocks of 7 at a time won't create more than 3 blocks in total.
Make 7-1 the upper and try again (5+6)//2 is 5. Slicing out blocks of 5 at a time will create more than 3 blocks in total.
Make 5+1 the lower and try again (6+6)//2 is 6. Slicing out blocks of 6 at a time won't create more than 3 blocks in total.
Therefore 6 is the lowest limit to impose on the sum of a block that will permit breaking into 3 blocks.

Algorithm to find the narrowest intervals, m of which will cover a set of numbers

Let's say you have a list of n numbers. You are allowed to choose m integers (lets call the integer a). For each integer a, delete every number that is within the inclusive range [a - x, a + x], where x is a number. What is the minimum value of x that can get the list cleared?
For example, if your list of numbers was
1 3 8 10 18 20 25
and m = 2, the answer would be x = 5.
You could pick the two integers 5 and 20. This would clear the list because it deletes every number in between [5-5, 5+5] and [20-5, 20+5].
How would I solve this? I think the solution may be related to dynamic programming. I do not want a brute force method solution.
Code would be really helpful, preferably in Java or C++ or C.

Hints
Suppose you had the list
1 3 8 10 18 20 25
and wanted to find how many groups would be needed to cover the set if x was equal to 2.
You could solve this in a greedy way by choosing the first integer to be 1+x (1 is the smallest number in the list). This would cover all elements up to 1+x+x=5. Then simply repeat this process until all numbers are covered.
So in this case, the next uncovered number is 8, so we would choose 8+x=10 and cover all numbers up to 10+x=12 in the second group.
Similarly, the third group would cover [18,24] and the fourth group would cover [25,29].
This value of x needed 4 groups. This is too many, so we need to increase x and try again.
You can use bisection to identify the smallest value of x that does cover all the numbers in m groups.

A recursive solution:
First, you need an estimation, you can split in m groups, then estimated(x) must be ~ (greather - lower element) / 2*m. the estimated(x) could be a solution. If there is a better solution, It has lower x than extimated(x) in all groups! and You can check it with the first group and then repeat recursively. The problem is decreasing until you have only a group: the last one, You know if your new solution is better or not, If there'is better, you can use it to discard another worse solution.
private static int estimate(int[] n, int m, int begin, int end) {
return (((n[end - 1] - n[begin]) / m) + 1 )/2;
}
private static int calculate(int[] n, int m, int begin, int end, int estimatedX){
if (m == 1){
return estimate(n, 1, begin, end);
} else {
int bestX = estimatedX;
for (int i = begin + 1; i <= end + 1 - m; i++) {
// It split the problem:
int firstGroupX = estimate(n, 1, begin, i);
if (firstGroupX < bestX){
bestX = Math.min(bestX, Math.max(firstGroupX, calculate(n, m-1, i, end, bestX)));
} else {
i = end;
}
}
return bestX;
}
}
public static void main(String[] args) {
int[] n = {1, 3, 8, 10, 18, 20, 25};
int m = 2;
Arrays.sort(n);
System.out.println(calculate(n, m, 0, n.length, estimate(n, m, 0, n.length)));
}
EDIT:
Long numbers version: Main idea, It search for "islands" of distances and split the problem into different islands. like divide and conquer, It distribute 'm' into islands.
private static long estimate(long[] n, long m, int begin, int end) {
return (((n[end - 1] - n[begin]) / m) + 1) / 2;
}
private static long calculate(long[] n, long m, int begin, int end, long estimatedX) {
if (m == 1) {
return estimate(n, 1, begin, end);
} else {
long bestX = estimatedX;
for (int i = begin + 1; i <= end + 1 - m; i++) {
long firstGroupX = estimate(n, 1, begin, i);
if (firstGroupX < bestX) {
bestX = Math.min(bestX, Math.max(firstGroupX, calculate(n, m - 1, i, end, bestX)));
} else {
i = end;
}
}
return bestX;
}
}
private static long solver(long[] n, long m, int begin, int end) {
long estimate = estimate(n, m, begin, end);
PriorityQueue<long[]> islands = new PriorityQueue<>((p0, p1) -> Long.compare(p1[0], p0[0]));
int islandBegin = begin;
for (int i = islandBegin; i < end -1; i++) {
if (n[i + 1] - n[i] > estimate) {
long estimatedIsland = estimate(n, 1, islandBegin, i+1);
islands.add(new long[]{estimatedIsland, islandBegin, i, 1});
islandBegin = i+1;
}
}
long estimatedIsland = estimate(n, 1, islandBegin, end);
islands.add(new long[]{estimatedIsland, islandBegin, end, 1});
long result;
if (islands.isEmpty() || m < islands.size()) {
result = calculate(n, m, begin, end, estimate);
} else {
long mFree = m - islands.size();
while (mFree > 0) {
long[] island = islands.poll();
island[3]++;
island[0] = solver(n, island[3], (int) island[1], (int) island[2]);
islands.add(island);
mFree--;
}
result = islands.poll()[0];
}
return result;
}
public static void main(String[] args) {
long[] n = new long[63];
for (int i = 1; i < n.length; i++) {
n[i] = 2*n[i-1]+1;
}
long m = 32;
Arrays.sort(n);
System.out.println(solver(n, m, 0, n.length));
}

An effective algorithm can be(assuming list is sorted) ->
We can think of list as groups of 'm' integers.
Now for each group calculate 'last_element - first_element+1', and store maximum of this value in a variable say, 'ans'.
Now the value of 'x' is 'ans/2'.
I hope its pretty clear how this algorithm works.

I think it's similarly problem of clusterization. For example You may use k-means clustering algorithm: do partitions of initial list on m classes and for x get maximum size divided by two of obtained classes.

1) You should look into BEST CASE, AVERAGE CASE and WORST CASE complexities with regards to TIME and SPACE complexities of algorithms.
2) I think David Pérez Cabrera has the right idea. Let's assume average case (as in the following pseudo code)
3) Let the list of integers be denoted by l
keepGoing = true
min_x = ceiling(l[size-1]-l[0])/(2m)
while(keepGoing)
{
l2 = l.copy
min_x = min_x-1
mcounter = 1
while(mcounter <= m)
{
firstElement = l2[0]
// This while condition will likely result in an ArrayOutOfBoundsException
// It's easy to fix this.
while(l2[0] <= firstElement+2*min_x)
{ remove(l2[0]) }
mcounter = mcounter+1
}
if(l2.size>0)
keepGoing = false
}
return min_x+1
4) Consider
l = {1, 2, 3, 4, 5, 6, 7}, m=2 (gives x=2)
l = {1, 10, 100, 1000, 10000, 100000, 1000000}, m=2
l = {1, 10, 100, 1000, 10000, 100000, 1000000}, m=3

algorithm removing duplicate elements in array without auxillay storage

I am working on this famous interview question on removing duplicate elements in array without using auxillary storage and preserving the order;
I have read a bunch of posts; Algorithm: efficient way to remove duplicate integers from an array, Removing Duplicates from an Array using C.
They are either implemented in C (without explanation) or the Java Code provided just fails when there is consecutive duplicates such as [1,1,1,3,3].
I am not quite confident with using C, my background is Java. So I implemented the code myself;
it follows like this:
use two loops, the outer-loop traverses the array and inner loop checks for duplicates and if present replace it with null.
Then I go over the duplicate-replaced-null array and remove null elements and replacing it with the next non-null element.
The total run-time I see now is O(n^2)+O(n) ~ O(n^2). Reading the above posts, I understood this is the best we can do, if no sorting and auxiliary storage is allowed.
My code is here: I am looking for ways to optimize any further (if there is a possibility) or a better/simplisitc logic;
public class RemoveDup {
public static void main (String[] args){
Integer[] arr2={3,45,1,2,3,3,3,3,2,1,45,2,10};
Integer[] res= removeDup(arr2);
System.out.println(Arrays.toString(res));
}
private static Integer[] removeDup(Integer[] data) {
int size = data.length;
int count = 1;
for (int i = 0; i < size; i++) {
Integer temp = data[i];
for (int j = i + 1; j < size && temp != null; j++) {
if (data[j] == temp) {
data[j] = null;
}
}
}
for (int i = 1; i < size; i++) {
Integer current = data[i];
if (data[i] != null) {
data[count++] = current;
}
}
return Arrays.copyOf(data, count);
}
}
EDIT 1; Reformatted code from #keshlam throws ArrayIndexOutofBound Exception:
private static int removeDupes(int[] array) {
System.out.println("method called");
if(array.length < 2)
return array.length;
int outsize=1; // first is always kept
for (int consider = 1; consider < array.length; ++consider) {
for(int compare=0;compare<outsize;++compare) {
if(array[consider]!=array[compare])
array[outsize++]=array[consider]; // already present; advance to next compare
else break;
// if we get here, we know it's new so append it to output
//array[outsize++]=array[consider]; // could test first, not worth it.
}
}
System.out.println(Arrays.toString(array));
// length is last written position plus 1
return outsize;
}

OK, here's my answer, which should be O(N*N) worst case. (With smaller constant, since even worstcase I'm testing N against -- on average -- 1/2 N, but this is computer science rather than software engineering and a mere 2X speedup isn't significant. Thanks to #Alexandru for pointing that out.)
1) Split cursor (input and output advanced separately),
2) Each new value only has to be compared to what's already been kept, and compare can stop if a match is found. (The hint keyword was "incremental")
3) First element need not be tested.
4) I'm taking advantage of labelled continue where I could have instead set a flag before breaking and then tested the flag. Comes out to the same thing; this is a bit more elegant.
4.5) I could have tested whether outsize==consider and not copied if that was true. But testing for it would take about as many cycles as doing the possibly-unnecessary copy, and the majority case is that they will not be the same, so it's easier to just let a possibly redundant copy take place.
5) I'm not recopying the data in the key function; I've factored out the copy-for-printing operation to a separate function to make clear that removeDupes does run entirely in the target array plus a few automatic variables on the stack. And I'm not spending time zeroing out the leftover elements at the end of the array; that may be wasted work (as in this case). Though I don't think it would actually change the formal complexity.
import java.util.Arrays;
public class RemoveDupes {
private static int removeDupes(final int[] array) {
if(array.length < 2)
return array.length;
int outsize=1; // first is always kept
outerloop: for (int consider = 1; consider < array.length; ++consider) {
for(int compare=0;compare<outsize;++compare)
if(array[consider]==array[compare])
continue outerloop; // already present; advance to next compare
// if we get here, we know it's new so append it to output
array[outsize++]=array[consider]; // could test first, not worth it.
}
return outsize; // length is last written position plus 1
}
private static void printRemoveDupes(int[] array) {
int newlength=removeDupes(array);
System.out.println(Arrays.toString(Arrays.copyOfRange(array, 0, newlength)));
}
public static void main(final String[] args) {
printRemoveDupes(new int[] { 3, 45, 1, 2, 3, 3, 3, 3, 2, 1, 45, 2, 10 });
printRemoveDupes(new int[] { 2, 2, 3, 3 });
printRemoveDupes(new int[] { 1, 1, 1, 1, 1, 1, 1, 1 });
}
}
LATE ADDITION: Since folks expressed confusion about point 4 in my explanation, here's the loop rewritten without labelled continue:
for (int consider = 1; consider < array.length; ++consider) {
boolean matchfound=false;
for(int compare=0;compare<outsize;++compare) {
if(array[consider]==array[compare]) {
matchfound=true;
break;
}
if(!matchFound) // only add it to the output if not found
array[outsize++]=array[consider];
}
Hope that helps. Labelled continue is a rarely-used feature of Java, so it isn't too surprising that some folks haven't seen it before. It's useful, but it does make code harder to read; I probably wouldn't use it in anything much more complicated than this simple algorithm.

Here one version which doesn't use additional memory (except for the array it returns) and doesn't sort either.
I believe this is slightly worse than O(n*log n).
Edit: I'm wrong. This is slightly better than O(n^3).
public class Dupes {
private static int[] removeDupes(final int[] array) {
int end = array.length - 1;
for (int i = 0; i <= end; i++) {
for (int j = i + 1; j <= end; j++) {
if (array[i] == array[j]) {
for (int k = j; k < end; k++) {
array[k] = array[k + 1];
}
end--;
j--;
}
}
}
return Arrays.copyOf(array, end + 1);
}
public static void main(final String[] args) {
System.out.println(Arrays.toString(removeDupes(new int[] { 3, 45, 1, 2, 3, 3, 3, 3, 2, 1, 45, 2, 10 })));
System.out.println(Arrays.toString(removeDupes(new int[] { 2, 2, 3, 3 })));
System.out.println(Arrays.toString(removeDupes(new int[] { 1, 1, 1, 1, 1, 1, 1, 1 })));
}
}
and here's a modified version which doesn't shift all of the elements from after the dupe. Instead it simply switches the dupe with the last, non-matching element. This obviously can't guarantee order.
private static int[] removeDupes(final int[] array) {
int end = array.length - 1;
for (int i = 0; i <= end; i++) {
for (int j = i + 1; j <= end; j++) {
if (array[i] == array[j]) {
while (end >= j && array[j] == array[end]) {
end--;
}
if (end > j) {
array[j] = array[end];
end--;
}
}
}
}
return Arrays.copyOf(array, end + 1);
}

Here you have a worst case of O(n^2) where the return points to the first non unique element. So everything before it is unique.
Instead of C++ iterators indices in Java can be used.
std::vecotr<int>::iterator unique(std::vector<int>& aVector){
auto end = aVector.end();
auto start = aVector.begin();
while(start != end){
auto num = *start; // the element to check against
auto temp = ++start; // start get incremented here
while (temp != end){
if (*temp == num){
std::swap(temp,end);
end--;
}
else
temp++; // the temp is in else so that if the swap occurs the algo should still check the swapped element.
}
}
return end;
}
Java equivalent code: (the return will be an int which is the index of the first not unique element)
int unique(int[] anArray){
int end = anArray.length-1;
int start = 0;
while(start != end){
int num = anArry[start]; // the element to check against
int temp = ++start; // start get incremented here
while (temp != end){
if (anArry[temp] == num){
swap(temp,end); // swaps the values at index of temp and end
end--;
}
else
temp++; // the temp is in else so that if the swap occurs the algo should still check the swapped element.
}
}
return end;
}
The slight difference in this algo and yours is in your point 2. Where instead of replacing the current element with null you go with swapping it with the last possibly unique element which on the first swap is the last element of array, on second swap the second last and so on.
You might as well consider looking at the std::unique implementation in C++ which is linear in one less than the distance between first and last: Compares each pair of elements, and possibly performs assignments on some of them., but as it was noted by #keshlam it is used on sorted arrays only. The return value is the same as in my algo. Here is the code directly from the standard library:
template<class _FwdIt, class _Pr> inline
_FwdIt _Unique(_FwdIt _First, _FwdIt _Last, _Pr _Pred)
{ // remove each satisfying _Pred with previous
if (_First != _Last)
for (_FwdIt _Firstb; (_Firstb = _First), ++_First != _Last; )
if (_Pred(*_Firstb, *_First))
{ // copy down
for (; ++_First != _Last; )
if (!_Pred(*_Firstb, *_First))
*++_Firstb = _Move(*_First);
return (++_Firstb);
}
return (_Last);
}

To bring in a bit perspective - one solution in Haskell, it uses lists instead of arrays
and returns the reversed order, which can be fixed by applying reverse at the end.
import Data.List (foldl')
removeDup :: (Eq a) => [a] -> [a]
removeDup = foldl' (\acc x-> if x `elem` acc then acc else x:acc) []

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

"twoSum" solution and Big O Time Complexity explanation - java

Related

Recursion stackoverflow Java

Coin Change Recursion All Solutions to Distinct Solutions

Need explanation for algorithm searching minimal large sum

Algorithm to find the narrowest intervals, m of which will cover a set of numbers

algorithm removing duplicate elements in array without auxillay storage

Categories

Resources