Find pairs with least difference

Find pairs with least difference - java

I have an array of numbers 3,1,3,5 and a number k with the value 3. I want to find pairs whose absolute adjacent difference between them is the smallest. Find k such pairs.
All possible pairs are :
|3 - 1| = 2
|3 - 3| = 0
|3 - 5| = 2
|1 - 3| = 2
|1 - 5| = 4
|3 - 5| = 2
For smallest k = 3 values are [0,2,2] is the required answer
My approach:
public static List<Integer> process(List<Integer> list, int k) {
int n = list.size();
List<Integer> all = new ArrayList<>();
Collections.sort(list);
for(int i=0; i<n; i++) {
for(int j=i+1; j<n; j++) {
all.add(list.get(j) - list.get(i));
}
}
Collections.sort(all);
List<Integer> answer = new ArrayList<>();
for(int i=0; i<k; i++) {
answer.add(all.get(i));
}
return answer;
}
Here I am trying to get all possible pairs and then get the smallest values. so time complexity is more for this program. How to reduce time complexity for this task.

Here is a Python solution. It is somewhat similar to what was already suggested, except that putting values back into the heap allows non-adjacent differences to also be discovered.
Translation to Java left as an exercise for the reader.
import heapq
def closest_differences (elements, limit=None):
e = sorted(elements)
best = []
for i in range(len(elements)-1):
best.append((e[i+1] - e[i], i, i+1))
heapq.heapify(best)
yielded = 0
while 0 < len(best):
value, i, j = heapq.heappop(best)
yield value
yielded += 1
if limit is not None and limit <= yielded:
break
if j+1 < len(e):
heapq.heappush(best, (e[j+1] - e[i], i, j+1))
for d in closest_differences([3, 1, 3, 5], 3):
print(d)

Create a max heap/priority queue of size k.
Use a nested loop to iterate over all possible differences.
For each difference between consecutive elements, push the difference to the heap if heap.max > difference
O(n^2) for iteration, O(n^2 logk) for heap.
Total time complexity: O(n^2 logk)
Total space complexity: O(k) (for heap)
In your approach, you've sorted the list of all n^2 differences, so that has a complexity of O(n^2 logn).

Here's an outline of another approach, which avoids the use of a priority queue.
Reformulating the question, we'd like to find the smallest k differences in an array A, and write it to an output array R. Let count_pairs(arr, d) be a placeholder function that tells us how many pairs in an array arr have a difference less than/equal to d. Let N be the length of A.
Sort A. Let B be the sorted array.
Compute difference d > 0 where count_pairs(B, d) < k and count_pairs(B, d + 1) >= k. Note count_pairs(B, 0) >= k indicates you could just let R be an array of k zeroes. d can be found via binary search from 0 to the difference of the largest and smallest elements in A.
After finding d, pick out the pairs whose difference is less than/equal to d. Write their differences to R.
Fill up the remaining slots in R with d + 1; there should be exactly k - count_pairs(B, d) remaining slots.
Observations
count_pairs() can be implemented in O(N * log(N)) time via binary search if the input array is sorted. Don't think this can be improved by much.
If you implement 3. using binary search (similar to what's done for count_pairs()) the time complexity should be O(N * log(N) + count_pairs(B, d)).
If R needs to be in sorted order, you'll have to sort R before returning it, since 3. won't guarantee this - only that all values are smaller than/equal to d.
So, the overall time complexity of each step is as follows:
Sorting - O(N * log(N))
Searching for d - O(log(max(A) - min(A)) * N * log(N))
Fill in the smallest values - O(N * log(N) + count_pairs(B, d))
Fill in the remaining values of d + 1 - O(k - count_pairs(B, d))
I suspect that 2. will likely be the bottleneck and would be curious to see how well this approach compares to the priority queue one on actual data.

Related

What is the specific runtime complexity of insertion sort?

Im just going over some basic sorting algorithms. I implemented the below insertion sort.
public static int[] insertionSort(int[] arr){
int I = 0;
for(int i = 0; i < arr.length; i++){
for(int j = 0; j < i; j++){
if(arr[i] < arr[j]){
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
I++;
}
}
System.out.println(I);
return arr;
}
I prints out 4950 for a sized 100 array with 100 randomly generated integers.
I know the algorithm is considered O(n^2), but what would be the more arithmetically correct runtime? If it was actually O(N^2) Iim assuming, would print out 10,000 and not 4950.

Big-Oh notation gives us how much work an algorithm must do as the input size grows bigger. A single input test doesn't give enough information to verify the theoretical Big-Oh. You should run the algorithm on arrays of different sizes from 100 to a million and graph the output with the size of the array as the x-variable and the number of steps that your code outputs as the y-variable. When you do this, you will see that the graph is a parabola.
You can use algebra to get an function in the form y = a*x^2 + b*x +c that fits as close as possible to this data. But with Big-Oh notation, we don't care about the smaller terms because they grow insignificant compared to the x^2 part. For example, when x = 10^3, then x^2 = 10^6 which is much larger than b*x + c. If x = 10^6 then x^2 = 10^12 which again is so much larger than b*x + c that we can ignore these smaller terms.

You can make the following observations: On the ith iteration of the outer loop, the inner loop runs i times, for i from 0 to n-1 where n is the length of the array.
In total over the entire algorithm the inner loop runs T(n) times where
T(n) = 0 + 1 + 2 + ... + (n-1)
This is an arithmetic series and it's easy to prove the sum is equal to a second degree polynomial on n:
T(n) = n*(n-1)/2 = .5*n^2 - .5*n
For n = 100, the formula predicts the inner loop will run T(100) = 100*99/2 = 4950 times which matches what you calculated.

What will be the time complexity of the below code? [duplicate]

This question already has answers here:
How can I find the time complexity of an algorithm?
(10 answers)
Closed 2 years ago.
I have the below code to return the index of 2 numbers that add up to the given target. What is the time complexity of the code? Explanations are appreciated. Thanks.
int[] result = new int[2];
int i=0, k=i+1;
while (i < nums.length)
{
if(nums[i]+nums[k]==target && i!=k)
{
result[0]=i;
result[1]=k;
break;
}
else if (k < nums.length-1)
k++;
else
{
i++;
k=i;
}
}
return result;

Premise
It is hard to analyze this without any additional input how nums and target correspond to each other.
Since you do not provide any additional information here, I have to assume that all inputs are possible. In which case the worst case is that none of the pairs buildable by nums can form target at all.
A simple example explaining what I am referring to would be target = 2 with nums = [4, 5, 10, 3, 9]. You can not build target by adding up pairs of nums.
Iterations
So you would end up never hitting your break statement, going through the full execution of the algorithm.
That is, the full range of k from 0 to nums.length - 1, then one increment of i and then k again the full range, starting from i. And so on until i reaches the end as well.
In total, you will thus have the following amount of iterations (where n denotes nums.length):
n, n - 1, n - 2, n - 3, ..., 2, 1
Summed up, those are exactly
(n^2 + n) / 2
iterations.
Complexity
Since all you do inside the iterations is in constant time O(1), the Big-O complexity of your algorithm is given by
(n^2 + n) / 2 <= n^2 + n <= n^2 + n^2 <= 2n^2
Which by definition is in O(n^2).
Alternative code
Your code is very hard to read and a rather unusual way to express what you are doing here (namely forming all pairs, leaving out duplicates). I would suggest rewriting your code like that:
for (int i = 0; i < nums.length; i++) {
for (int j = i; j < nums.length; j++) {
int first = nums[i];
int second = nums[j];
if (first + second == target) {
return {i, j};
}
}
}
return null;
Also, do yourself a favor and do not return result filled with 0 in case you did not find any hit. Either return null as shown or use Optional, for example:
return Optional.of(...);
...
return Optional.empty();

Time Complexity
The worst-case time complexity of the given code would be O(N^2) , where N is nums.length.
This is because you are checking each distinct pair in the array until you find two numbers that add upto the target. In the worst case, you will end up checking all the pairs. The number of pairs in an array of length N would be N^2. Your algorithm will check for N*(N-1) pairs which comes out to be N^2 - N. The upper bound for this would be O(N^2) only since lower order terms are neglected.
Flow of Code
In the code sample, here's how the flow occurs -
i will start from 0 and k will be i+1 which is 1. So, suppose that you won't find the pair which add up to the target.
In that case, each time ( from i = 0 to i = nums.length-1), only the else if (k < nums.length-1) statement will run.
Once k reaches nums.length, i will be incremented and k will again start from i+1.
This will continue till i becomes nums.length - 1. In this iteration the last pair will be checked and then only the loop will end. So worst-case time complexity will come out to be O(N^2) .
Time Complexity Analysis -
So you are checking N pairs in the first loop, N-1 pairs in the next one, N-2 in next and so on... So, total number of checked pairs will be -
N + ( N-1 ) + ( N-2 ) + ( N-3 ) + ... + 2 + 1
= N * ( N + 1 ) / 2
= ( N^2 + N ) / 2
And the above would be considered to have an upper bound of O(N^2) which is your Big-O Worst-Case time complexity.
The Average Case Time Complexity would also be considered as O(N^2).
The Best Case Time Complexity would come out to be O(1) , where only the first pair would be needed to be checked.
Hope this helps !

Time complexity of leetcode 561

The questions is:
Given an array of 2n integers, your task is to group these integers into n pairs of integer, say (a1, b1), (a2, b2), ..., (an, bn) which makes sum of min(ai, bi) for all i from 1 to n as large as possible.
The solution provided as:
public class Solution {
public int arrayPairSum(int[] nums) {
int[] arr = new int[20001];
int lim = 10000;
for (int num: nums)
arr[num + lim]++;
int d = 0, sum = 0;
for (int i = -10000; i <= 10000; i++) {
sum += (arr[i + lim] + 1 - d) / 2 * i;
d = (2 + arr[i + lim] - d) % 2;
}
return sum;
}
}
I think it is unfair to say that the time complexity is O(n). Although, O(n+K) K = 20001 is a constant number which seems could be omitted, the n is also less than K. If so, why can't I say time complexity to be O(1)?

The asymptotic complexity is measured as a function of n, for ALL n. We are concerned with what happens when n gets large. Really, really large.
Maybe in practice n will always be tiny. Fine.
But when you give a complexity measure for an algorithm, you are by definition saying what happens as n grows. And grows and grows. And when it does, it will dwarf K.
So O(n) it is.
Clarification:
It is true that the problem specification says:
n is a positive integer, which is in the range of [1, 10000].
All the integers in the array will be in the range of [-10000, 10000].
But remember, that is just for this problem! The solution given hard codes the value of K. The algorithm used here should indeed be written as O(n + K), as you noticed. This K is not a constant factor and probably should not be dropped.
However with asymptotic complexity (Big-O, Big-Theta, etc.) even with an arbitrary but finite K, you can still find constants k and N such that for all n>N, kn > the number of operations needed in this algorithm, which is the Big-O definition. This is why you will see a lot of people say O(n).
Hope that helps.

Probability of generating a set of M elements from an array of size N [duplicate]

This question already has answers here:
Generate a set of M elements from an array of size N
(3 answers)
Closed 4 years ago.
I'm trying to understand solution for the following task:
Randomly generate a set of M elements from an array of size N. Each element must have equal probability of being chosen.
I found the following solution (I've already read this question, and this one, but I still have questions that are too long for comments):
int rand(int min, int max) {
return min + (int)(Math.random() * (max - min + 1));
}
int[] generateSet(int[] arr, int m, int n) {
if (n + 1 == m) { //base case
int[] set = new int[m];
for (int k = 0; k < m; k++) {
set[k] = arr[k];
}
return set;
}
int[] set = generateSet(arr, m, n - 1);
int r = rand(0, n);
if (r < m) set[r] = arr[n];
return set;
}
// rand() function returns inclusive value
// i.e. rand(0, 5) will return from 0 to 5
This code was found in book "Cracking the coding interview" (Section Hard, Task 3).
Author explains it as follows:
Suppose we have an algorithm that can pull a random set of m elements from an array of size n - 1. How can we use this algorithm to pull a random set of m elements from an array of size n? We can first pull a random set of size m from the first n - 1 elements. Then, we just need to decide if array[n] should be inserted into our subset (which would require pulling out a random element from it). An easy way to do this is to pick a random number k from 0 through n. If k < m, then insert array[n] into subset[k]. This will both "fairly" (i.e., with proportional probability) insert array[n] into the subset and "fairly" remove a random element from the subset.
This is even cleaner to write iteratively. In this approach, we initialize an array subset to be the first m elements in original. Then, we iterate through the array, starting at element m, inserting array[i] into the subset at (random) position k whenever k < m.
I fully understand the base case.It says that: if we have an array of size N and M == N, therefore, we should return first M elements from the array, because each of them will be selected with equal probability.
Than comes the hard part (recursive case) that I do not understand at all.
Code generates set of size M from array of size N - 1
Now code should decide add or not "new" element arr[N] to the set
With the probability M / N code decides add or not "new" element. Random works as follow:
Generates random number r between 0 and N
If (r < m) it means that r was generated with M / N probability
Also if (r < m) it means that with probability 1 / M we will change one of M elements in the set.
Update:
I don't understand the following:
Imagine that we have a box with N - 1 elements. We take M elements from it. Therefore, a probability of getting set of elements will be:
Pa(get any set with M elements using N-1 elements) = 1 / ((N-1)! / M!(N-1-M)!) = M!(N-1-M)!) / (N-1)!
It is clear that if we will put all elements back into the box, and than take M element again, we will always create a set with equal probability.
Okay, lets imagine that we take M elements. Therefore, box now contains N-1-M elements.
So this is where recursive case starts:
Now we take one for new element from, lets say, our pocket. Now we should decide modify set or not.
Starting from this point I completely do not understand what to do next. My guess:
When we had N-1 elements, a probability of generating any set with M elements was:
Pa(get any set with M elements using N-1 elements) = M!(N-1-M)!) / (N-1)!
But we add one more new element. And now we should generate any set of M elements with probability that must be equal to Pa.
But now new probability will be:
Pb = 1 / (N! / !M(N-M)!) = M!(N-M)!) / N!
So we need to find a way to convert somehow Pb to Pa i.e.
!M(N-M)!) / N! to !M(N-1-M)!) / (N-1)!
And with some magic (I still do not understand how it works) recursive case do that:
Call R = rand(0, X) (I don't know what is X). If R equals to some Y (I don't know what Y value is), it means that we should use our new element.
If R equals Y, then call rand(0, M) to generate index that will be updated with new element
Question:
1. How to calculate X and Y value?

There are choose(n, m) = n! / (m! (n-m)!) ways to choose m elements from a set containing n elements. You want to choose any one of these arrangements with equal probability.
You have two choices as to whether to take a given element of not:
Picking "this" element, and picking the m-1 elements from n-1 elements;
or not picking "this" element, and picking the m elements from n-1 elements.
You have to make the choice in a way which will yield any arrangement with equal frequency
P(pick) = (# arrangements which pick "this" element) / (# arrangements)
= (# arrangements which pick "this" element) / (# arrangements which pick "this" element + # arrangements which do not pick "this" element)
= A / (A + B)
introducing A and B for notational convenience.
A = choose(n-1, m-1)
= (n-1)! / (m-1)!(n-m)!
B = choose(n-1, m)
= (n-1)! / m!(n-m-1)!
Multiplying the numerator and denominator of A and B so that they have a common factor of (n-1)! / m!(n-m)!:
A = m * (n-1)! / m!(n-m)!
B = (n-m) * (n-1)! / m!(n-m)!
Then:
P = m / (m + n - m)
= m / n
As required.

Optimize the time complexity of program which computes the number of different pairs of numbers like described below

As I am pretty new to java, I'm struggeling with optimization of the time complexity of my programs. I have written a simple code which takes an array, and counts how many pairs of numbers there are for which the element with the lower index in the array is greater than the element with the greater index.
For example, if you have the array: [9,8,12,14,10,54,41], there will be 4 such pairs: (9,8),(12,10),(14,10) and (54,41).
I tried to optimize the code by not just comparing every element with every other one. I aimed for a time complexity of n log n. I have not yet figured out a way to write this code in a more efficient manner. I hope my question is clear.
The code(I have omitted adding the heapsort code, as it's not related to my question.)
import java.util.Scanner;
class Main4 {
static int n;
static int[] A;
// "A" is the input vector.
// The number of elements of A can be accessed using A.length
static int solve(int[] A) {
int counter = 0;
int[] B = new int[n];
B = A.clone();
heapSort(B);
for (int i = 0; i < A.length; i++) {
for (int j = 0; j < A.length; j++) {
while( B[j] == Integer.MIN_VALUE&&j+1<n) {
j=j+1;
}
if (A[i] != B[j]) {
counter++;
} else {
B[j] = Integer.MIN_VALUE;
break;
}
}
}
return counter; }
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
int ntestcases = scanner.nextInt();
for (int testno = 0; testno < ntestcases; testno++) {
n = scanner.nextInt();
A = new int[n];
for (int i = 0; i < n; i++)
A[i] = scanner.nextInt();
System.out.println(solve(A));
}
scanner.close();
}
}

Divide and conquer 1 (merge-sort like)
Split the whole list W into two parts L and R of early equal lengths. The count for W is the sum of
counts for L and R
the number of pairs (l, r) with l > r where l and r belong to L and R respectively.
The first bullet is recursion. The second bullet does not depend of the ordering of the lists L and R. So you can sort them and determine the result using a single pass through both lists (count all smaller r in sorted R for the first element of sorted L, the count for the second can now be computed incrementally, etc).
The time complexity is given by
T(n) = T(n/2) + T(n/2) + O(n log n)
which I guess is O(n log n). Anyway, it's much smaller than O(n*n).
You could improve it a bit by using merge sort: You need sorted L and this can be obtained by merging sorted LL and sorted LR (which are the two parts of L in the recursive step).
Divide and conquer 2 (quick-sort like)
Select an element m such that the number of bigger and smaller elements is about the same (the median would be perfect, but a randomly chosen element is usable, too).
Do a single pass through the array and count how many elements smaller than m are there. Do a second pass and count the pairs (x, y) with x placed to the left of y and x >= m and m > y.
Split the list into two parts: elements e >= m and the remaining ones. Rinse and repeat.

You are looking for all possible pairs.
You can check from left to right to find all the matches. That's O(n^2) solution. As suggested by Arkadiy in the comments, this solution is okay for the worst case of the input.
I came up with the idea that you might want to store elements in sorted order AND keep the original unsorted array.
You keep the original array and build binary search tree. You can find the element with original index i in time O(lgn) and remove it in O(lgn), which is great. You can also determine the number of values smaller than ith element with tiny additional cost.
To be able to count the elements smaller than, each node has to store the number of its children + 1. When you remove, you simply decrement the number of children in each node on your way down. When you insert, you increment the number of children in each node on your way down. When you search for a node you store the value root node has in variable and
do nothing when you go to the right child,
subtract the number child has from your variable when you go to the left child
Once you stop (you found the node), you subtract the value right child has (0 if there is no right child) and decrement the value.
You iterate over the original array from left to right. At each step you find element in your tree and calculate how many elements that are smaller are in tree. You know how many smaller than your current are there and you also know that all elements in the tree have greater index than the current element, which know how many elements you can pair it up with! You remove this element from the tree after you calculate the number of pairs. You do that n times. Lookup and removing from the tree is O(lgn) == O(nlgn) time complexity! The total time is O(nlgn + nlgn) = O(nlgn)!!
Chapter 12 of Introduction to algorithms (3rd edition) explains in depth how to implement BST. You may also find many resources on the Internet that explain it with pictures.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.