The questions is:
Given an array of 2n integers, your task is to group these integers into n pairs of integer, say (a1, b1), (a2, b2), ..., (an, bn) which makes sum of min(ai, bi) for all i from 1 to n as large as possible.
The solution provided as:
public class Solution {
public int arrayPairSum(int[] nums) {
int[] arr = new int[20001];
int lim = 10000;
for (int num: nums)
arr[num + lim]++;
int d = 0, sum = 0;
for (int i = -10000; i <= 10000; i++) {
sum += (arr[i + lim] + 1 - d) / 2 * i;
d = (2 + arr[i + lim] - d) % 2;
}
return sum;
}
}
I think it is unfair to say that the time complexity is O(n). Although, O(n+K) K = 20001 is a constant number which seems could be omitted, the n is also less than K. If so, why can't I say time complexity to be O(1)?
The asymptotic complexity is measured as a function of n, for ALL n. We are concerned with what happens when n gets large. Really, really large.
Maybe in practice n will always be tiny. Fine.
But when you give a complexity measure for an algorithm, you are by definition saying what happens as n grows. And grows and grows. And when it does, it will dwarf K.
So O(n) it is.
Clarification:
It is true that the problem specification says:
n is a positive integer, which is in the range of [1, 10000].
All the integers in the array will be in the range of [-10000, 10000].
But remember, that is just for this problem! The solution given hard codes the value of K. The algorithm used here should indeed be written as O(n + K), as you noticed. This K is not a constant factor and probably should not be dropped.
However with asymptotic complexity (Big-O, Big-Theta, etc.) even with an arbitrary but finite K, you can still find constants k and N such that for all n>N, kn > the number of operations needed in this algorithm, which is the Big-O definition. This is why you will see a lot of people say O(n).
Hope that helps.
Related
I have an array of numbers 3,1,3,5 and a number k with the value 3. I want to find pairs whose absolute adjacent difference between them is the smallest. Find k such pairs.
All possible pairs are :
|3 - 1| = 2
|3 - 3| = 0
|3 - 5| = 2
|1 - 3| = 2
|1 - 5| = 4
|3 - 5| = 2
For smallest k = 3 values are [0,2,2] is the required answer
My approach:
public static List<Integer> process(List<Integer> list, int k) {
int n = list.size();
List<Integer> all = new ArrayList<>();
Collections.sort(list);
for(int i=0; i<n; i++) {
for(int j=i+1; j<n; j++) {
all.add(list.get(j) - list.get(i));
}
}
Collections.sort(all);
List<Integer> answer = new ArrayList<>();
for(int i=0; i<k; i++) {
answer.add(all.get(i));
}
return answer;
}
Here I am trying to get all possible pairs and then get the smallest values. so time complexity is more for this program. How to reduce time complexity for this task.
Here is a Python solution. It is somewhat similar to what was already suggested, except that putting values back into the heap allows non-adjacent differences to also be discovered.
Translation to Java left as an exercise for the reader.
import heapq
def closest_differences (elements, limit=None):
e = sorted(elements)
best = []
for i in range(len(elements)-1):
best.append((e[i+1] - e[i], i, i+1))
heapq.heapify(best)
yielded = 0
while 0 < len(best):
value, i, j = heapq.heappop(best)
yield value
yielded += 1
if limit is not None and limit <= yielded:
break
if j+1 < len(e):
heapq.heappush(best, (e[j+1] - e[i], i, j+1))
for d in closest_differences([3, 1, 3, 5], 3):
print(d)
Create a max heap/priority queue of size k.
Use a nested loop to iterate over all possible differences.
For each difference between consecutive elements, push the difference to the heap if heap.max > difference
O(n^2) for iteration, O(n^2 logk) for heap.
Total time complexity: O(n^2 logk)
Total space complexity: O(k) (for heap)
In your approach, you've sorted the list of all n^2 differences, so that has a complexity of O(n^2 logn).
Here's an outline of another approach, which avoids the use of a priority queue.
Reformulating the question, we'd like to find the smallest k differences in an array A, and write it to an output array R. Let count_pairs(arr, d) be a placeholder function that tells us how many pairs in an array arr have a difference less than/equal to d. Let N be the length of A.
Sort A. Let B be the sorted array.
Compute difference d > 0 where count_pairs(B, d) < k and count_pairs(B, d + 1) >= k. Note count_pairs(B, 0) >= k indicates you could just let R be an array of k zeroes. d can be found via binary search from 0 to the difference of the largest and smallest elements in A.
After finding d, pick out the pairs whose difference is less than/equal to d. Write their differences to R.
Fill up the remaining slots in R with d + 1; there should be exactly k - count_pairs(B, d) remaining slots.
Observations
count_pairs() can be implemented in O(N * log(N)) time via binary search if the input array is sorted. Don't think this can be improved by much.
If you implement 3. using binary search (similar to what's done for count_pairs()) the time complexity should be O(N * log(N) + count_pairs(B, d)).
If R needs to be in sorted order, you'll have to sort R before returning it, since 3. won't guarantee this - only that all values are smaller than/equal to d.
So, the overall time complexity of each step is as follows:
Sorting - O(N * log(N))
Searching for d - O(log(max(A) - min(A)) * N * log(N))
Fill in the smallest values - O(N * log(N) + count_pairs(B, d))
Fill in the remaining values of d + 1 - O(k - count_pairs(B, d))
I suspect that 2. will likely be the bottleneck and would be curious to see how well this approach compares to the priority queue one on actual data.
The description of the problem and it's solution(s) can be found here
https://www.geeksforgeeks.org/count-the-number-of-ways-to-divide-n-in-k-groups-incrementally/
Basically the problem is given N people, how many ways can you divide them into K groups, such that each group is greater than or equal in number of people to the one that came before it?
The solution is to recurse through every possibility, and it's complexity can be cut down from O(NK) to O(N2 * K) through dynamic programming.
I understand the complexity of the old recursive solution, but have trouble understanding why the dynamic programming solution has O(N2 * K) complexity. How does one come to this conclusion about the dynamic programming solution's time complexity? Any help would be appreciated!
First of all, big O notation gives us an idea about the relation between two functions t(n)/i(n) when n -> infinity. To be more specific, it's an upper bound for such relation, which means it's f(n) >= t(n)/i(n). t(n) stands for the speed of growth of time spent on execution, i(n) describes the speed of growth of input. In function space (we work with functions there rather than with numbers and treat functions almost like numbers: we can divide or compare them, for example) the relation between two elements is also a function. Hence, t(n)/i(n) is a function.
Secondly, there are two methods of determining bounds for that relation.
The scientific observational approach implies the next steps. Let's see how much time it takes to execute an algorithm with 10 pieces of input. Then let's increase the input up to 100 pieces, and then up to 1000 and so on. The speed of growth of input i(n) is exponential (10^1, 10^2, 10^3, ...). Suppose, we get the exponential speed of growth of time as well (10^1 sec, 10^2 sec, 10^3 sec, ... respectively).
That means t(n)/i(n) = exp(n)/exp(n) = 1, n -> infinity (for the scientific purity sake, we can divide and compare functions only when n -> infinity, it doesn't mean anything for the practicality of the method though). We can say at least (remember it's an upper bound) the execution time of our algorithm doesn't grow faster than its input does. We might have got, say, the quadratic exponential speed of growth of time. In that case t(n)/i(n) = exp^2(n)/exp(n) = a^2n/a^n = exp(n), a > 1, n -> infinity, which means our time complexity is O(exp(n)), big O notation only reminds us that it's not a tight bound. Also, it's worth pointing out that it doesn't matter which speed of growth of input we choose. We might have wanted to increase our input linearly. Then t(n)/i(n) = exp(n)*n/n = exp(n) would express the same as t(n)/i(n) = exp^2(n)/exp(n) = a^2n/a^n = exp(n), a > 1. What matters here is the quotient.
The second approach is theoretical and mostly used in the analysis of relatively obvious cases. Say, we have a piece of code from the example:
// DP Table
static int [][][]dp = new int[500][500][500];
// Function to count the number
// of ways to divide the number N
// in groups such that each group
// has K number of elements
static int calculate(int pos, int prev,
int left, int k)
{
// Base Case
if (pos == k)
{
if (left == 0)
return 1;
else
return 0;
}
// if N is divides completely
// into less than k groups
if (left == 0)
return 0;
// If the subproblem has been
// solved, use the value
if (dp[pos][prev][left] != -1)
return dp[pos][prev][left];
int answer = 0;
// put all possible values
// greater equal to prev
for (int i = prev; i <= left; i++)
{
answer += calculate(pos + 1, i,
left - i, k);
}
return dp[pos][prev][left] = answer;
}
// Function to count the number of
// ways to divide the number N in groups
static int countWaystoDivide(int n, int k)
{
// Intialize DP Table as -1
for (int i = 0; i < 500; i++)
{
for (int j = 0; j < 500; j++)
{
for (int l = 0; l < 500; l++)
dp[i][j][l] = -1;
}
}
return calculate(0, 1, n, k);
}
The first thing to notice here is a 3-d array dp. It gives us a hint of the time complexity of a DP algorithm because usually, we traverse it once. Then we are concerned about the size of the array. It's initialized with the size 500*500*500 which doesn't give us a lot because 500 is a number, not a function, and it doesn't depend on the input variables, strictly speaking. It's done for the sake of simplicity though. Effectively, the dp has size of k*n*n with assumption that k <= 500 and n <= 500.
Let's prove it. Method static int calculate(int pos, int prev, int left, int k) has three actual variables pos, prev and left when k remains constant. The range of pos is 0 to k because it starts from 0 here return calculate(0, 1, n, k); and the base case is if (pos == k), the range of prev is 1 to left because it starts from 1 and iterates through up to left here for (int i = prev; i <= left; i++) and finally the range of left is n to 0 because it starts from n here return calculate(0, 1, n, k); and iterates through down to 0 here for (int i = prev; i <= left; i++). To recap, the number of possible combinations of pos, prev and left is simply their product k*n*n.
The second thing is to prove that each range of pos, prev and left is traversed only once. From the code, it can be determined by analysing this block:
for (int i = prev; i <= left; i++)
{
answer += calculate(pos + 1, i,
left - i, k);
}
All the 3 variables get changed only here. pos grows from 0 by adding 1 on each step. On each particular value of pos, prev gets changed by adding 1 from prev up to left, on each particular combination of values of pos and prev, left gets changed by subtracting i, which has the range prev to left, from left.
The idea behind this approach is once we iterate over an input variable by some rule, we get corresponding time complexity. We could iterate over a variable stepping on elements by decreasing the range by twice on each step, for example. In that case, we would get logarithmical complexity. Or we could step on every element of the input, then we would get linear complexity.
In other words, we without any doubts assume the minimum time complexity t(n)/i(n) = 1 for every algorithm from common sense. Meaning that t(n) and i(n) grow equally fast. That also means we do nothing with the input. Once we do something with the input, t(n) becomes f(n) times bigger than i(n). By the logic shown in the previous lines, we need to estimate f(n).
Im just going over some basic sorting algorithms. I implemented the below insertion sort.
public static int[] insertionSort(int[] arr){
int I = 0;
for(int i = 0; i < arr.length; i++){
for(int j = 0; j < i; j++){
if(arr[i] < arr[j]){
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
I++;
}
}
System.out.println(I);
return arr;
}
I prints out 4950 for a sized 100 array with 100 randomly generated integers.
I know the algorithm is considered O(n^2), but what would be the more arithmetically correct runtime? If it was actually O(N^2) Iim assuming, would print out 10,000 and not 4950.
Big-Oh notation gives us how much work an algorithm must do as the input size grows bigger. A single input test doesn't give enough information to verify the theoretical Big-Oh. You should run the algorithm on arrays of different sizes from 100 to a million and graph the output with the size of the array as the x-variable and the number of steps that your code outputs as the y-variable. When you do this, you will see that the graph is a parabola.
You can use algebra to get an function in the form y = a*x^2 + b*x +c that fits as close as possible to this data. But with Big-Oh notation, we don't care about the smaller terms because they grow insignificant compared to the x^2 part. For example, when x = 10^3, then x^2 = 10^6 which is much larger than b*x + c. If x = 10^6 then x^2 = 10^12 which again is so much larger than b*x + c that we can ignore these smaller terms.
You can make the following observations: On the ith iteration of the outer loop, the inner loop runs i times, for i from 0 to n-1 where n is the length of the array.
In total over the entire algorithm the inner loop runs T(n) times where
T(n) = 0 + 1 + 2 + ... + (n-1)
This is an arithmetic series and it's easy to prove the sum is equal to a second degree polynomial on n:
T(n) = n*(n-1)/2 = .5*n^2 - .5*n
For n = 100, the formula predicts the inner loop will run T(100) = 100*99/2 = 4950 times which matches what you calculated.
This question already has answers here:
How can I find the time complexity of an algorithm?
(10 answers)
Closed 2 years ago.
I have the below code to return the index of 2 numbers that add up to the given target. What is the time complexity of the code? Explanations are appreciated. Thanks.
int[] result = new int[2];
int i=0, k=i+1;
while (i < nums.length)
{
if(nums[i]+nums[k]==target && i!=k)
{
result[0]=i;
result[1]=k;
break;
}
else if (k < nums.length-1)
k++;
else
{
i++;
k=i;
}
}
return result;
Premise
It is hard to analyze this without any additional input how nums and target correspond to each other.
Since you do not provide any additional information here, I have to assume that all inputs are possible. In which case the worst case is that none of the pairs buildable by nums can form target at all.
A simple example explaining what I am referring to would be target = 2 with nums = [4, 5, 10, 3, 9]. You can not build target by adding up pairs of nums.
Iterations
So you would end up never hitting your break statement, going through the full execution of the algorithm.
That is, the full range of k from 0 to nums.length - 1, then one increment of i and then k again the full range, starting from i. And so on until i reaches the end as well.
In total, you will thus have the following amount of iterations (where n denotes nums.length):
n, n - 1, n - 2, n - 3, ..., 2, 1
Summed up, those are exactly
(n^2 + n) / 2
iterations.
Complexity
Since all you do inside the iterations is in constant time O(1), the Big-O complexity of your algorithm is given by
(n^2 + n) / 2 <= n^2 + n <= n^2 + n^2 <= 2n^2
Which by definition is in O(n^2).
Alternative code
Your code is very hard to read and a rather unusual way to express what you are doing here (namely forming all pairs, leaving out duplicates). I would suggest rewriting your code like that:
for (int i = 0; i < nums.length; i++) {
for (int j = i; j < nums.length; j++) {
int first = nums[i];
int second = nums[j];
if (first + second == target) {
return {i, j};
}
}
}
return null;
Also, do yourself a favor and do not return result filled with 0 in case you did not find any hit. Either return null as shown or use Optional, for example:
return Optional.of(...);
...
return Optional.empty();
Time Complexity
The worst-case time complexity of the given code would be O(N^2) , where N is nums.length.
This is because you are checking each distinct pair in the array until you find two numbers that add upto the target. In the worst case, you will end up checking all the pairs. The number of pairs in an array of length N would be N^2. Your algorithm will check for N*(N-1) pairs which comes out to be N^2 - N. The upper bound for this would be O(N^2) only since lower order terms are neglected.
Flow of Code
In the code sample, here's how the flow occurs -
i will start from 0 and k will be i+1 which is 1. So, suppose that you won't find the pair which add up to the target.
In that case, each time ( from i = 0 to i = nums.length-1), only the else if (k < nums.length-1) statement will run.
Once k reaches nums.length, i will be incremented and k will again start from i+1.
This will continue till i becomes nums.length - 1. In this iteration the last pair will be checked and then only the loop will end. So worst-case time complexity will come out to be O(N^2) .
Time Complexity Analysis -
So you are checking N pairs in the first loop, N-1 pairs in the next one, N-2 in next and so on... So, total number of checked pairs will be -
N + ( N-1 ) + ( N-2 ) + ( N-3 ) + ... + 2 + 1
= N * ( N + 1 ) / 2
= ( N^2 + N ) / 2
And the above would be considered to have an upper bound of O(N^2) which is your Big-O Worst-Case time complexity.
The Average Case Time Complexity would also be considered as O(N^2).
The Best Case Time Complexity would come out to be O(1) , where only the first pair would be needed to be checked.
Hope this helps !
given a unsorted set of n integers, return all subsets of size k (i.e. each set has k unique elements) that sum to 0.
So I gave the interviewer the following solution ( which I studied on GeekViewpoint). No extra space used, everything is done in place, etc. But of course the cost is a high time complexity of O(n^k) where k=tuple in the solution.
public void zeroSumTripplets(int[] A, int tuple, int sum) {
int[] index = new int[tuple];
for (int i = 0; i < tuple; i++)
index[i] = i;
int total = combinationSize(A.length, tuple);
for (int i = 0; i < total; i++) {
if (0 != i)
nextCombination(index, A.length, tuple);
printMatch(A, Arrays.copyOf(index, tuple), sum);
}// for
}// zeroSumTripplets(int[], int, int)
private void printMatch(int[] A, int[] ndx, int sum) {
int calc = 0;
for (int i = 0; i < ndx.length; i++)
calc += A[ndx[i]];
if (calc == sum) {
Integer[] t = new Integer[ndx.length];
for (int i = 0; i < ndx.length; i++)
t[i] = A[ndx[i]];
System.out.println(Arrays.toString(t));
}// if
}// printMatch(int[], int[], int)
But then she imposed the following requirements:
must use hashmap in answer so to reduce time complexity
Must absolutely -- ABSOLUTELY -- provide time complexity for general case
hint when k=6, O(n^3)
She was more interested in time-complexity more than anything else.
Does anyone know a solution that would satisfy the new constraints?
EDIT:
Supposedly, in the correct solution, the map is to store the elements of the input and the map is then to be used as a look up table just as in the case for k=2.
When the size of the subset is 2 (i.e. k=2), the answer is trivial: loop through and load all the elements into a map. Then loop through the inputs again this time searching the map for sum - input[i] where i is the index from 0 to n-1, which would then be the answers. Supposedly this trivial case can be extended to where k is anything.
Since no-one else has made an attempt, I might as well throw in at least a partial solution. As I pointed out in an earlier comment, this problem is a variant of the subset sum problem and I have relied heavily on documented approaches to that problem in developing this solution.
We're trying to write a function subsetsWithSum(A, k, s) that computes all the k-length subsets of A that sum to s. This problem lends itself to a recursive solution in two ways:
The solution of subsetsWithSum(x1 ... xn, k, s) can be found by computing subsetsWithSum(x2 ... xn, k, s) and adding all the valid subsets (if any) that include x1; and
All the valid subsets that include element xi can be found by computing subsetsWithSum(A - xi, k-1, s-xi) and adding xi to each subset (if any) that results.
The base case for the recursion occurs when k is 1, in which case the solution to subsetsWithSum(A, 1, s) is the set of all single element subsets where that element is equal to s.
So a first stab at a solution would be
/**
* Return all k-length subsets of A starting at offset o that sum to s.
* #param A - an unordered list of integers.
* #param k - the length of the subsets to find.
* #param s - the sum of the subsets to find.
* #param o - the offset in A at which to search.
* #return A list of k-length subsets of A that sum to s.
*/
public static List<List<Integer>> subsetsWithSum(
List<Integer> A,
int k,
int s,
int o)
{
List<List<Integer>> results = new LinkedList<List<Integer>>();
if (k == 1)
{
if (A.get(o) == s)
results.add(Arrays.asList(o));
}
else
{
for (List<Integer> sub : subsetsWithSum(A, k-1, s-A.get(o), o+1))
{
List<Integer> newSub = new LinkedList<Integer>(sub);
newSub.add(0, o);
results.add(0, newSub);
}
}
if (o < A.size() - k)
results.addAll(subsetsWithSum(A, k, s, o+1));
return results;
}
Now, notice that this solution will often call subsetsWithSum(...) with the same set of arguments that it has been called with before. Hence, subsetsWithSum is just begging to be memoized.
To memoize the function, I've put the arguments k, s and o into a three element list which will be the key to a map from these arguments to a result computed earlier (if there is one):
public static List<List<Integer>> subsetsWithSum(
List<Integer> A,
List<Integer> args,
Map<List<Integer>, List<List<Integer>>> cache)
{
if (cache.containsKey(args))
return cache.get(args);
int k = args.get(0), s = args.get(1), o = args.get(2);
List<List<Integer>> results = new LinkedList<List<Integer>>();
if (k == 1)
{
if (A.get(o) == s)
results.add(Arrays.asList(o));
}
else
{
List<Integer> newArgs = Arrays.asList(k-1, s-A.get(o), o+1);
for (List<Integer> sub : subsetsWithSum(A, newArgs, cache))
{
List<Integer> newSub = new LinkedList<Integer>(sub);
newSub.add(0, o);
results.add(0, newSub);
}
}
if (o < A.size() - k)
results.addAll(subsetsWithSum(A, Arrays.asList(k, s, o+1), cache));
cache.put(args, results);
return results;
}
To use the subsetsWithSum function to compute all the k-length subsets that sum to zero, one can use the following function:
public static List<List<Integer>> subsetsWithZeroSum(List<Integer> A, int k)
{
Map<List<Integer>, List<List<Integer>>> cache =
new HashMap<List<Integer>, List<List<Integer>>> ();
return subsetsWithSum(A, Arrays.asList(k, 0, 0), cache);
}
Regrettably my complexity calculating skills are a bit (read: very) rusty, so hopefully someone else can help us compute the time complexity of this solution, but it should be an improvement on the brute-force approach.
Edit: Just for clarity, note that the first solution above should be equivalent in time complexity to a brute-force approach. Memoizing the function should help in many cases, but in the worst case the cache will never contain a useful result and the time complexity will then be the same as the first solution. Note also that the subset-sum problem is NP-complete meaning that any solution has an exponential time complexity. End Edit.
Just for completeness, I tested this with:
public static void main(String[] args) {
List<Integer> data = Arrays.asList(9, 1, -3, -7, 5, -11);
for (List<Integer> sub : subsetsWithZeroSum(data, 4))
{
for (int i : sub)
{
System.out.print(data.get(i));
System.out.print(" ");
}
System.out.println();
}
}
and it printed:
9 -3 5 -11
9 1 -3 -7
I think your answer was very close to what they were looking for, but you can improve the complexity by noticing that any subset of size k can be thought of as two subsets of size k/2. So instead of finding all subsets of size k (which takes O(n^k) assuming k is small), use your code to find all subsets of size k/2, and put each subset in a hashtable, with its sum as the key.
Then iterate through each subset of size k/2 with a positive sum (call the sum S) and check the hashtable for a subset whose sum is -S. If there is one then the combination of the two subsets of size k/2 is a subset of size k whose sum is zero.
So in the case of k=6 that they gave, you would find all subsets of size 3 and compute their sums (this will take O(n^3) time). Then checking the hashtable will take O(1) time for each subset, so the total time is O(n^3). In general this approach will take O(n^(k/2)) assuming k is small, and you can generalize it for odd values of k by taking subsets of size floor(k/2) and floor(k/2)+1.
#kasavbere -
Recently a friend had one of those harrowing all-day interviews for a C++ programming job with Google. His experience was similar to yours.
It inspired him to write this article - I think you might enjoy it:
The Pragmatic Defense