Im just going over some basic sorting algorithms. I implemented the below insertion sort.
public static int[] insertionSort(int[] arr){
int I = 0;
for(int i = 0; i < arr.length; i++){
for(int j = 0; j < i; j++){
if(arr[i] < arr[j]){
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
I++;
}
}
System.out.println(I);
return arr;
}
I prints out 4950 for a sized 100 array with 100 randomly generated integers.
I know the algorithm is considered O(n^2), but what would be the more arithmetically correct runtime? If it was actually O(N^2) Iim assuming, would print out 10,000 and not 4950.
Big-Oh notation gives us how much work an algorithm must do as the input size grows bigger. A single input test doesn't give enough information to verify the theoretical Big-Oh. You should run the algorithm on arrays of different sizes from 100 to a million and graph the output with the size of the array as the x-variable and the number of steps that your code outputs as the y-variable. When you do this, you will see that the graph is a parabola.
You can use algebra to get an function in the form y = a*x^2 + b*x +c that fits as close as possible to this data. But with Big-Oh notation, we don't care about the smaller terms because they grow insignificant compared to the x^2 part. For example, when x = 10^3, then x^2 = 10^6 which is much larger than b*x + c. If x = 10^6 then x^2 = 10^12 which again is so much larger than b*x + c that we can ignore these smaller terms.
You can make the following observations: On the ith iteration of the outer loop, the inner loop runs i times, for i from 0 to n-1 where n is the length of the array.
In total over the entire algorithm the inner loop runs T(n) times where
T(n) = 0 + 1 + 2 + ... + (n-1)
This is an arithmetic series and it's easy to prove the sum is equal to a second degree polynomial on n:
T(n) = n*(n-1)/2 = .5*n^2 - .5*n
For n = 100, the formula predicts the inner loop will run T(100) = 100*99/2 = 4950 times which matches what you calculated.
Related
The questions is:
Given an array of 2n integers, your task is to group these integers into n pairs of integer, say (a1, b1), (a2, b2), ..., (an, bn) which makes sum of min(ai, bi) for all i from 1 to n as large as possible.
The solution provided as:
public class Solution {
public int arrayPairSum(int[] nums) {
int[] arr = new int[20001];
int lim = 10000;
for (int num: nums)
arr[num + lim]++;
int d = 0, sum = 0;
for (int i = -10000; i <= 10000; i++) {
sum += (arr[i + lim] + 1 - d) / 2 * i;
d = (2 + arr[i + lim] - d) % 2;
}
return sum;
}
}
I think it is unfair to say that the time complexity is O(n). Although, O(n+K) K = 20001 is a constant number which seems could be omitted, the n is also less than K. If so, why can't I say time complexity to be O(1)?
The asymptotic complexity is measured as a function of n, for ALL n. We are concerned with what happens when n gets large. Really, really large.
Maybe in practice n will always be tiny. Fine.
But when you give a complexity measure for an algorithm, you are by definition saying what happens as n grows. And grows and grows. And when it does, it will dwarf K.
So O(n) it is.
Clarification:
It is true that the problem specification says:
n is a positive integer, which is in the range of [1, 10000].
All the integers in the array will be in the range of [-10000, 10000].
But remember, that is just for this problem! The solution given hard codes the value of K. The algorithm used here should indeed be written as O(n + K), as you noticed. This K is not a constant factor and probably should not be dropped.
However with asymptotic complexity (Big-O, Big-Theta, etc.) even with an arbitrary but finite K, you can still find constants k and N such that for all n>N, kn > the number of operations needed in this algorithm, which is the Big-O definition. This is why you will see a lot of people say O(n).
Hope that helps.
Here are two different solutions for finding "Number of subarrays having product less than K", one with runtime O(n) and the other O(n^2). However, the one with O(n^2) finished executing about 4x faster than the one with linear runtime complexity (1s vs 4s). Could someone explain why this is the case, please?
Solution 1 with O(n) runtime:
static long countProductsLessThanK(int[] numbers, int k)
{
if (k <= 1) { return 0; }
int prod = 1;
int count = 0;
for (int right = 0, left = 0; right < numbers.length; right++) {
prod *= numbers[right];
while (prod >= k)
prod /= numbers[left++];
count += (right-left)+1;
}
return count;
}
Solution 2 with O(n^2) runtime:
static long countProductsLessThanK(int[] numbers, int k) {
long count = 0;
for (int i = 0; i < numbers.length; i++) {
int productSoFar = 1;
for (int j = i; j < numbers.length; j++) {
productSoFar *= numbers[j];
if (productSoFar >= k)
break;
count++;
}
}
return count;
}
Sample main program:
public static void main(String[] args) {
int size = 300000000;
int[] numbers = new int[size];
int bound = 1000;
int k = bound/2;
for (int i = 0; i < size; i++)
numbers[i] = (new Random().nextInt(bound)+2);
long start = System.currentTimeMillis();
System.out.println(countProductLessThanK(numbers, k));
System.out.println("O(n) took " + ((System.currentTimeMillis() - start)/1000) + "s");
start = System.currentTimeMillis();
System.out.println(countMyWay(numbers, k));
System.out.println("O(n^2) took " + ((System.currentTimeMillis() - start)/1000) + "s");
}
Edit1:
The array size I chose in my sample test program has 300,000,000 elements.
Edit2:
array size: 300000000:
O(n) took 4152ms
O(n^2) took 1486ms
array size: 100000000:
O(n) took 1505ms
O(n^2) took 480ms
array size: 10000:
O(n) took 2ms
O(n^2) took 0ms
The numbers you are choosing are uniformly distributed in the range [2, 1001], and you're counting subarrays whose products are less than 500. The probability of finding a large subarray is essentially 0; the longest possible subarray whose products is less than 500 has length 8, and there are only nine possible subarrays of that length (all 2s, and the eight arrays of seven 2s and a 3); the probability of hitting one of those is vanishingly small. Half of the array values are already over 500; the probability of finding even a length two subarray at a given starting point is less than one-quarter.
So your theoretically O(n²) algorithm is effectively linear with this test. And your O(n) algorithm requires a division at each point, which is really slow; slower than n multiplications for small values of n.
In the first one, you're dividing (slow), multiplying and doing multiple sums.
In the second one, the heavier operation is multiplication, and as the first answer says, the algorithm is effectively linear for your tests cases.
Tasked with solving the following problem (Pascal Triangle) which looks like this.
[
[1],
[1,1],
[1,2,1],
[1,3,3,1],
[1,4,6,4,1]
]
I've successfully implemented the code(see below) but I'm having a tough time figuring out what the time complexity would be for this solution. The number of operations by list is 1 + 2 + 3 + 4 + .... + n would number of operations reduce to n^2 how does the math work and translate into Big-O notation?
I'm thinking this is similar to the gauss formula n(n+1)/2 so O(n^2) but I could be wrong any help is much appreciated
public class Solution {
public List<List<Integer>> generate(int numRows) {
if(numRows < 1) return new ArrayList<List<Integer>>();;
List<List<Integer>> pyramidVal = new ArrayList<List<Integer>>();
for(int i = 0; i < numRows; i++){
List<Integer> tempList = new ArrayList<Integer>();
tempList.add(1);
for(int j = 1; j < i; j++){
tempList.add(pyramidVal.get(i - 1).get(j) + pyramidVal.get(i - 1).get(j -1));
}
if(i > 0) tempList.add(1);
pyramidVal.add(tempList);
}
return pyramidVal;
}
}
Complexity is O(n^2).
Each calculation of element in your code is done in constant time. ArrayList accesses are constant time operations, as well as insertions, amortized constant time. Source:
The size, isEmpty, get, set, iterator, and listIterator operations run
in constant time. The add operation runs in amortized constant time
Your triangle has 1 + 2 + ... + n elements. This is arithmetic progression that sums to n*(n+1)/2, which is in O(n^2)
I'm studying for a test and found this question:
I can't really determine the complexity, I figured it's either O(n2) or O(n3) and I'm leaning towards O(n3).
Can someone tell me what it is and why?
My idea that it's O(n2) is because in the j loop, j = i which gives a triangle shape, and then the k loop goes from i + 1 to j, which I think is the other half of the triangle.
public static int what(int[] arr)
{
int m = arr[0];
for (int i=0; i<arr.length; i++)
{
for (int j=i; j<arr.length;j++)
{
int s = arr[i];
for (int k=i+1; k<=j; k++)
s += arr[k];
if (s > m)
m = s;
}
}
return m;
}
Also if you can tell me what it does?
I figured it returns the addition of positive integers or the biggest integer in the array.
But for arrays like {99, -3, 0, 1} it returns 99, which I think is because it's buggy. If not than I have no Idea what it does:
{99, 1} => returns 100
{-1, -2, -3} => return -1
{-1, 5, -2} => returns 5
{99, -3, 0, 1} => returns 99 ???
You can proceed methodically, using Sigma Notation, to obtain the order of growth complexity:
You have 3 for statements. For large n, it is quite obvious that is O(n^3). i and j have O(n) each, k is a little shorter, but still O(n).
The algorithm returns the biggest sum of consecutive terms. That's why for the last one it returns 99, even if you have 0 and 1, you also have -3 that will drop your sum to a maximum 97.
PS: Triangle shape means 1 + 2 + ... + n = n(n+1) / 2 = O(n^2)
Code:
for (int i=0; i<arr.length; i++) // Loop A
{
for (int j=i; j<arr.length;j++) // Loop B
{
for (int k=i+1; k<=j; k++) // Loop C
{
// ..
}
}
}
Asymptotic Analysis on Big-O:
Loop A: Time = 1 + 1 + 1 + .. 1 (n times) = n
Loop B+C: Time = 1 + 2 + 3 + .. + m = m(m+1)/2
Time = SUM { m(m+1)/2 | m in (n,0] }
Time < n * (n(n+1)/2) = 1/2 n^2 * (n+1) = 1/2 n^3 + 1/2 n^2
Time ~ O(n^3)
No matter triangle shape or not, it always a complexity O(N^3), but of course with lower constant then a full triple nested cycles.
You can model the running time of the function as
sum(sum(sum(Theta(1), k=i+1..j),j=i..n),i=1..n)
As
sum(sum(sum(1, k=i+1..j),j=i..n),i=1..n) = 1/6 n^3 - 1/6 n,
the running time is Theta(n^3).
If you do not feel well-versed enough in the underlying theory to directly apply #MohamedEnnahdiElIdri's analysis, why not simply start by testing the code?
Note first that the loop boundaries only depend on the array's length, not its content, so regarding the time complexity, it does not matter what the algorithm does. You might as well analyse the time complexity of
public static long countwhat(int length) {
long count = 0;
for (int i = 0; i < length; i++) {
for (int j = i; j < length; j++) {
for (int k = i + 1; k <= j; k++) {
count++;
}
}
}
return count;
}
Looking at this, is it easier to derive a hypothesis? If not, simply test whether the return value is proportional to length squared or length cubed...
public static void main(String[] args) {
for (int l = 1; l <= 10000; l *= 2) {
long count = countwhat(l);
System.out.println("i=" + l + ", #iterations:" + count +
", #it/n²:" + (double) count / l / l +
", #it/n³:" + (double) count / l / l / l);
}
}
... and notice how one value does not approach anyconstant with rising l and the other one does (not incidentally the very same constant associated with the highest power of $n$ in the methodological analysis).
This requires O(n^3) time due to the fact that in the three loops, three distinct variables are incremented. That is, when one inside loop is over, it does not affect the outer loop. The outer loop runs as many times it was to run before the inner loop was entered.
And this is the maximum contiguous subarray sum problem. Self-explanatory when you see the example:
{99, 1} => returns 100
{-1, -2, -3} => return -1
{-1, 5, -2} => returns 5
{99, -3, 0, 1} => returns 99
There is an excellent algorithm known as Kadane's algorithm (do google for it) which solves this in O(n) time.
Here it goes:
Initialize:
max_so_far = 0
max_ending_here = 0
Loop for each element of the array
(a) max_ending_here = max_ending_here + a[i]
(b) if(max_ending_here < 0)
max_ending_here = 0
(c) if(max_so_far < max_ending_here)
max_so_far = max_ending_here
return max_so_far
References: 1, 2, 3.
O(n^3).
You have calculated any two item between arr[0] and arr[arr.length - 1], running by "i" and "j", which means C(n,2), that is n*(n + 1)/2 times calculation.
And the average step between each calculation running by "k" is (0 + arr.length)/2, so the total calculation times is C(n, 2) * arr.length / 2 = n * n *(n + 1) / 4, that is O(n^3).
The complete reasoning is as follows:
Let n be the length of the array.
1) There are three nested loops.
2) The innermost loop performs exactly j-i iterations (k running from i+1 to j inclusive). There is no premature exit from this loop.
3) The middle loop performs exactly n-j iterations (j running from i to n-1 inclusive), each involving j-i innermost iterations, in total (i-i)+(i+1-i)+(i+2-i)+... (n-1-i) = 0+1+2... + (n-1-i). There is no premature exit from this loop.
4) The outermost loop performs exactly n iterations (i running from 0 to n-1 inclusive), each involving 0+1+2+ ... (n-1-i) innermost iterations. In total, (0+1+2... n-1) + (0+1+2+... n-2) + (0+1+2+... n-3) + ... (0). There is no premature exit from this loop.
Now how do handle handle this mess ? You need to know a little about the Faulhaber's formula (http://en.wikipedia.org/wiki/Faulhaber%27s_formula). In a nutshell, it says that the sum of integers up to n is O(n^2); and the sum of the sum of integers up to n is O(n^3), and so on.
If you recall from calculus, the primitive of X is X^2/2; and the primitive of X^2 is X^3/3. Every time the degree increases. This is not by coincidence.
Your code runs in O(n^3).
I have written a class(greedy strategy) that at first i used sort method which has O(nlogn)
Collections.sort(array, new
SortingObjectsWithProbabilityField());
and then i used the insert method of binary search tree which takes O(h) and h here is the tree height.
for different n ,the running time will be :
n,running time
17,515428
33,783340
65,540572
129,1285080
257,2052216
513,4299709
which I think is not correct because for increasing n , the running time should almost increase.
This method will take the running time:
Exponent = -1;
for(int n = 2;n<1000;n+=Math.pow(2,exponent){
for (int j = 1; j <= 3; j++) {
Random rand = new Random();
for (int i = 0; i < n; i++) {
Element e = new Element(rand.nextInt(100) + 1, rand.nextInt(100) + 1, 0);
for (int k = 0; k < i; k++) {
if (e.getDigit() == randList.get(k).getDigit()) {
e.setDigit(e.getDigit() + 1);
}
}
randList.add(e);
}
double sum = 0.0;
for (int i = 0; i < randList.size(); i++) {
sum += randList.get(i).getProbability();
}
for (Element i : randList) {
i.setProbability(i.getProbability() / sum);
}
//Get time.
long t2 = System.nanoTime();
GreedyVersion greedy = new GreedyVersion((ArrayList<Element>) randList);
long t3 = System.nanoTime();
timeForGreedy = timeForGreedy + t3 - t2;
}
System.out.println(n + "," + "," + timeForGreedy/3 );
exponent++;
}
thanks
Your data appears to roughly fit an order of nlogn, as we can see below. Notice that the curve is almost linear, as for large values of n, logn is pretty small. For example, for your largest value of n=513, logn is 9.003.
There are ways to achieve more accurate timings, which would likely make the curve fit the data points better. Such as taking a larger sample of random inputs (I'd advise at least 10, 100 if possible) and running multiple iterations per dataset (5 is an acceptable number) to smooth out the inaccuracies of the timer. You can use a single start/stop timer to time all iterations for the same n, and then divide by the number of runs, to get more accurate data points. Just be sure to first generate all data sets, store them all, and then run them all.
Good choice to sample n at powers of 2. You just might want to subtract 1 to make them exactly powers of 2, not that it makes any real impact.
For reference, here's the gnuplot script used to generate the plot:
set terminal png
set output 'graph.png'
set xrange [0:5000000]
set yrange [0:600]
f1(x) = a1*x*log(x)/log(2)
a1 = 1000
plot 'time.dat' title 'Actual runtimes', \
a1*x*log(x)/log(2) title 'Fitted curve: O(nlogn)
fit f1(x) 'time.dat' via a1
It's not that easy to relate asymptotic complexity to running times. When the sample is so small there are lots of things that will affect your timing.
To have more accurate timings you should run your algorithm K times per instance (e.g. K times with 17, K times with 33 and so forth) and take the average time as sample point (e.g. K=100)
That said it looks about right. You can plot nlog(n) vs your timings and you'll see that despite the different scales they are growing similarly. Still too little sample points to be sure...