Which one is faster?(from CTCI book)

Which one is faster?(from CTCI book) - java

I came across this two code snippets in CTCI book,
Code snippet 1:
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
for(int x : array) {
if (x < min) min = x;
if (x > max) max = x;
}
Code snippet 2:
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
for(int x : array) {
if (x < min) min = x;
}
for(int x : array) {
if (x > max) max = x;
}
The book didnt gave a clear cut answer on which one is faster and more efficient from assembly level and compiler optimization perspective. I believe both of this have O(n) running time. The first one has one loop with the expense of two conditional operations while the second one, loops twice with only one conditional operation.
To be technically precise the second run time would be O(2N) and the first one O(N) but since we omit the constants, both would be described as O(N). So let say for a huge size of N, would the constant really matter? Also which one would result in more optimized assembly code from a compiler perspective?
EDIT: The constants do no matter for a large size of N, but comparing the two code snippets, where one has a constant of 2, and the other one not, would it effect the running time, if we run both of them parallel on same size of N and same machine specs?

To be technically precise the second run time would be O(2N) and the first one O(N) but since we omit the constants, both would be described as O(N).
I don't think that is right. As far as the number of comparisons is concerned (and that is the essence here), in the first case, you are doing 2 comparisons per iteration, whereas in the second case there are two loops, but there is 1 comparison per iteration in each. So, these two are equivalent, since O(2N) = O(N) + O(N).
Of course, several correct comments reveal the practical aspects of running such a code in silico. The real reason we find the Big-Oh complexity of an algorithm is to get an idea of how the computation behaves without any regards to the computational (machine) power and in the presence of arbitrarily large N, the input size (that's why we say asymptotically).

Indeed, both code snippets have O(N) complexity.
You estimated the constants for the two code snippets as being 1 and 2, probably based on the number of the for instructions. However, this ignores the fact that the for in the first code snippet actually contains more instructions than the one in the second snippet. In general, finding tight Big-Oh constants is difficult, and it is an indirect approach for comparing the practical performance of two algorithms.
As mentioned in the comments, what will matter here would be memory accesses -- when using two fors it is possible to require to load the array into memory twice.
Assuming that we ignore these caching aspects -- a more direct approach than comparing complexity constants would be to directly look into the number of instructions made by each approach. The second snippet duplicates the instructions used for advancing through the array (e.g. incrementing the iterator), which makes it less efficient.

Related

Big-Oh notation for a single while loop covering two halves of an array with two iterator vars

Trying to brush up on my Big-O understanding for a test (A very basic Big-O understanding required obviously) I have coming up and was doing some practice problems in my book.
They gave me the following snippet
public static void swap(int[] a)
{
int i = 0;
int j = a.length-1;
while (i < j)
{
int temp = a[i];
a[i] = a[j];
a[j] = temp;
i++;
j--;
}
}
Pretty easy to understand I think. It has two iterators each covering half the array with a fixed amount of work (which I think clocks them both at O(n/2))
Therefore O(n/2) + O(n/2) = O(2n/2) = O(n)
Now please forgive as this is my current understanding and that was my attempt at the solution to the problem. I have found many examples of big-o online but none that are quite like this where the iterators both increment and modify the array at basically the same time.
The fact that it has one loop is making me think it's O(n) anyway.
Would anyone mind clearing this up for me?
Thanks

The fact that it has one loop is making me think it's O(n) anyway.
This is correct. Not because it is making one loop, but because it is one loop that depends on the size of the array by a constant factor: the big-O notation ignores any constant factor. O(n) means that the only influence on the algorithm is based on the size of the array. That it actually takes half that time, does not matter for big-O.
In other words: if your algorithm takes time n+X, Xn, Xn + Y will all come down to big-O O(n).
It gets different if the size of the loop is changed other than a constant factor, but as a logarithmic or exponential function of n, for instance if size is 100 and loop is 2, size is 1000 and loop is 3, size is 10000 and loop is 4. In that case, it would be, for instance, O(log(n)).
It would also be different if the loop is independent of size. I.e., if you would always loop 100 times, regardless of loop size, your algorithm would be O(1) (i.e., operate in some constant time).
I was also wondering if the equation I came up with to get there was somewhere in the ballpark of being correct.
Yes. In fact, if your equation ends up being some form of n * C + Y, where C is some constant and Y is some other value, the result is O(n), regardless of whether see is greater than 1, or smaller than 1.

You are right about the loop. Loop will determine the Big O. But the loop runs only for half the array.
So its. 2 + 6 *(n/2)
If we make n very large, other numbers are really small. So they won't matter.
So its O(n).
Lets say you are running 2 separate loops. 2 + 6* (n/2) + 6*(n/2) . In that case it will be O(n) again.
But if we run a nested loop. 2+ 6*(n*n). Then It will be O(n^2)
Always remove the constants and do the math. You got the idea.

As j-i decreases by 2 units on each iteration, N/2 of them are taken (assuming N=length(a)).
Hence the running time is indeed O(N/2). And O(N/2) is strictly equivalent to O(N).

What is the time complexity of Arrays.parallelSetAll()?

I just read from : Everything about java 8
that, java 8 adds Arrays.parallelSetAll()
int[] array = new int[8];
AtomicInteger i= new AtomicInteger();
Arrays.parallelSetAll(array, operand -> i.incrementAndGet());
[Edited] Is it O(1) or a constant time complexity on the same machine for same no.of elements in array ? What sort of performance improvement is indicated by the method name?

To start off, it can never be O(1), more clarification following:
I am using that n = array.length, which in your case is 8, however that does not matter as it could also be a very big number.
Now observe that normally you would do:
for (int i = 0; i < n; i++) {
array[i] = i.incrementAndGet();
}
This is with Java 8 much easier:
Arrays.setAll(array, v -> i.incrementAndGet());
Observe that they both take O(n) time.
Now take into account that you execute the code parallel, but there are no guarantees as to how it executes it, you do not know the number of parallellizations it does under the hood, if any at all for such a low number.
Therefore it still takes O(n) time, because you cannot prove that it will parallellize over n threads.
Edit, as an extra, I have observed that you seem to think that parallellizing an action means that any O(k) will converge to O(1), where k = n or k = n^2, etc.
This is not the case in practice as you can prove that you never have k processor cores available.
An intuitive argument is your own computer, if you are lucky it may have 8 cores, therefore the maximum time you could get under perfect parallellization conditions is O(n / 8).
I can already hear the people from the future laughing at that we only had 8 CPU cores...

It is O(N). Calling Arrays.parallelSetAll(...) involves assignments to set a total of array.length array elements. Even if those assignments are spread across P processors, the total number of assignments is linearly proportional to the length of the array. Take N as the length of the array, and math is obvious.
The thing to realize is that P ... the number of available processors ... is going to be a constant for any given execution of a program on a single computer. (Or if it is not a constant, there will be a constant upper bound.) And a computation whose sole purpose is to assign values to an array only makes sense when executed on a single computer.

The best, worst, and average-case runtime of a function to check for duplicates?

I'm having some trouble finding the big O for the if statement in the code below:
public static boolean areUnique (int[] ar)
{
for (int i = 0; i < ar.length-1; i++) // O(n)
{
for (int j = i+1; j < ar.length-1; j++) // O(n)
{
if (ar[i] == ar[j]) // O(???)
return false; // O(1)
}
}
return true; //O(1)
}
I'm trying to do a time complexity analysis for the best, worst, and average case
Thank you everyone for answering so quickly! I'm not sure if my best worst and average cases are correct... There should be a case difference should there not because of the if statement? But when I do my analysis I have them all ending up as O(n2)
Best: O(n) * O(n) * [O(1) + O(1)] = O(n2)
Worst: O(n) * O(n) * [O(1) + O(1) + O(1)] = n2
Average: O(n) * O(n) * [O(1) + O(1) + O(1)] = O(n2)
Am I doing this right? My textbook is not very helpful

For starters, this line
if (ar[i] == ar[j])
always takes time Θ(1) to execute. It does only a constant amount of work (a comparison plus a branch), so the work done here won't asymptotically contribute to the overall runtime.
Given this, we can analyze the worst-case behavior by considering what happens if this statement is always false. That means that the loop runs as long as possible. As you noticed, since each loop runs O(n) times, the total work done is Θ(n2) in the worst-case.
In the best case, however, the runtime is much lower. Imagine any array where the first two elements are the same. In that case, the function will terminate almost instantly when the conditional is encountered for the first time. In this case, the runtime is Θ(1), because a constant number of statements will be executed.
The average-case, however, is not well-defined here. Average-case is typically defined relative to some distribution - the average over what? - and it's not clear what that is here. If you assume that the array consists of truly random int values and that ints can take on any integer value (not a reasonable assumption, but it's fine for now), then the probability that a randomly-chosen array has a duplicate is 0 and we're back in the worst-case (runtime Θ(n2)). However, if the values are more constrained, the runtime changes. Let's suppose that there are n numbers in the array and the integers range from 0 to k - 1, inclusive. Given a random array, the runtime depends on
Whether there's any duplicates or not, and
If there is a duplicate, where the first duplicated value appears in the array.
I am fairly confident that this math is going to be very hard to work out and if I have the time later today I'll come back and try to get an exact value (or at least something asymptotically appropriate). I seriously doubt this is what was expected since this seems to be an introductory big-O assignment, but it's an interesting question and I'd like to look into it more.
Hope this helps!

the if itself is O(1);
this is because it does not take into account the process within the ALU or the CPU, so if(ar[i] == ar[j]) would be in reality O(6), that translates into O(1)

You can regard it as O(1).
No matter what you consider as 'one' step,
the instructions for carrying out a[i] == a[j] doesn't depend on the
value n in this case.

Measuring Performance of a java algorithm

How can i measure the performance of my java algorithm effectively ? is there any accurate way of doing it?
i read other questions of same kind but not satisfied.
any help would be appreciated.

long reference=System.nanoTime();
your_funct();
long finishm=System.nanoTime();
System.out.println( ( (double)(finishm-reference) )/1000000000.0); //in seconds
Has a meaningful level of ~0.003 seconds in my machine. I mean, you measure in nano-seconds but smallest step is around 3000000 nanoseconds in my machine.

You ask for performance which indicates some sort of timing. But then what would you compare against?
A general way of measuring an algorithm is using Big O, which takes a simplified mathematical approach.
To explain this at a very basic level a simple linear search of a list of integers has a linear (n) worst case big o. Eg:
for(int i = 0; i < sizeofarray; ++i)
if(array[i] == to_find)
return i;
At the worst case this would take i iterations (often number is referred to as n in big o) - so we call it n or linear complexity algorithm.
Something like a bubblesort algorithm is a loop within a loop so we have n * n complexity = n^2 or quadratic complexity.
Comparing like with like, if we just consider sorting, quicksort is more efficient than quadratic complexity (it is n log n complexity) so you can consider quicksort being 'better' than bubblesort.
So when evaluating your algorithm think about it in terms of n. Is there a loop? how many? The fewer the better. No loops even better - constant big o.

You can use some profilers. Many IDEs (such as Netbeans) have one.

If can make it practical or theoretical. If practical then put a timer before the algorithm starts and stop it when it ends. If theoretical then use Big O notation (isn't that hard) and you'll get an estimate of its time or space complexity.

The best way to do it is still java.lang.System.currentTimeMillis() as it will work regardless the IDE you are using.

How can I prove that one algorithm is faster than another in Java

Is there anything in Java that would allow me to take a code snippit and allow me to see exactly how many "ticks" it takes to execute. I want to prove that an algorithm I wrote is faster than another.

"Ticks"? No. I'd recommend that you run them several times each and compare the average results:
public class AlgorithmDriver {
public static void main(String [] args) {
int numTries = 1000000;
long begTime = System.currentTimeMillis();
for (int i = 0; i < numTries; ++i) {
Algorithm.someMethodCall();
}
long endTime = System.currentTimeMillis();
System.out.printf("Total time for %10d tries: %d ms\n", numTries, (endTime-begTime));
}
}

You probably are asking two different questions:
How can you measure the run time of a java implementation (Benchmarking)
How can you prove the asymptotic run time of an algorithm
For the first of these I wouldn't use the solutions posted here. They are mostly not quite right. Forst, its probably better to use System.nanoTime than System.currentTimeMillis. Second, you need to use a try catch block. Third, take statistic of running times of your code running many times outside of the metric, so that you can have a more complete picture.
Run code that looks vaguely like this many times:
long totalTime = 0;
long startTime = System.nanoTime();
try{
//method to test
} finally {
totalTime = System.nanoTime() - startTime;
}
Getting benchmarking correct is hard. For example, you must let your code "warmup"" for a few minutes before testing it. Benchmark early and often, but dont over believe your benchmarks. Particularly small micro benchmarks almost always lie in one way or another.
The second way to interpret your question is about asymptotic run times. The truth is this has almost nothing to do with Java, it is general computer science. Here the question we want to ask is: what curves describe the behavior of the run time of our algorithm in terms of the input size.
The first thing is to understand Big-Oh notation. I'll do my best, but SO doesn't support math notation. O(f(n)) denotes a set of algorithms such that in the limit as n goes to infinity f(n) is within a constant factor of an upper bound on the algorithm run time. Formally, T(n) is in O(f(n)) iff there exists some constant n0 and some constant c such that for all n > n0 c*f(n) >= n. Big Omega is the same thing, except for upper bounds, and big Theta f(n) just means its both big Oh f(n) and big Omega f(n). This is not two hard.
Well, it gets a little more complicated because we can talk about different kinds of run time, ie "average case", best case, and worst case. For example, normall quicksort is O(n^2) in the worst case, but O(n log n) for random lists.
So I skipped over what T(n) means. Basically it is the number of "ticks." Some machine instructions (like reading from memory) take much longer than others (like adding). But, so long as they are only a constant factor apart from each other, we can treat them all as the same for the purposes of big Oh, since it will just change the value of c.
Proving asymptotic bounds isn't that hard. For simple structured programming problems you just count
public int square(int n){
int sum = 0
for(int i = 0, i < n, i++){
sum += n
}
return sum
}
In this example we have one instruction each for: initializing sum, initializing i, and returning the value. The loop happens n times and on each time we do a comparison, and addition, and an increment. So we have O(square(n)) = O(3 + 3n) using n0 of 2 and c of 4 we can easily prove this is in O(n). You can always safely simplify big Oh expressions by removing excess constant terms, and by dividing by constant multiples.
When you are faced with a recursive function you have to solve a recurrence relation. If you have a function like T(n) = 2*T(n/2) + O(1) you want to find a closed form solution. You sometimes have to do this by hand, or with a computer algebra system. For this example, using forward substitution, we can see the pattern (in an abuse of notation) T(1) = O(1), T(2) = O(3), T(4) = O(7), T(8) = (15) this looks alot like O(2n - 1), to prove this is the right value:
T(n) = 2*T(n/2) + 1
T(n) = 2*(2(n/2) - 1) + 1
T(n) = 2*(n-1) + 1
T(n) = 2n - 2 + 1
T(n) = 2n - 1
As we saw earlier you can simplify O(2n -1) to O(n)
More often though you can use the master theorem which is a mathematical tool for saving you time on this kind of problem. If you check wikipedia you can find the master theorem, which if you plug and play the example above you get the same answer.
For more, check out an algorithms text book like Levitin's "The Design & Analysis of Algorithms"

You could use System.currentTimeMillis() to get start and end times.
long start = System.currentTimeMillis();
// your code
long end = System.currentTimeMillis();
System.out.println( "time: " + (end - start) );

You can measure wall time with System.currentTimeMillis() or System.nanoTime() (which have different characteristics). This is relatively easy as you just have to print out the differences at the end.
If you need to count specific operations (which is common in algorithms), the easiest is to simply increment a counter when the operations are being done , and then print it when you are done. long is well suited for this. For multiple operations use multiple counters.

I had to do this algorithm efficiency proofs mostly on my Data Structures lesson this year.
First,I measured the time like they mentioned upper.
Then I increased the method's input number with squaring each time(10,100,1000,...)
Lastly,I put the time measurements in an Excel file and drawed graphics for these time values.
By this way,you can check if one algorithm is faster than other or not,slightly.

I would:
Come up with a few data sets for the current algorithm: a set where it performs well, a set where it performs ok, and a data set where it performs poorly. You want to show that your new algorithm outperforms the current one for each scenario.
Run and measure the performance of each algorithm multiple times for increasing input sizes of each of the three data sets, then take average, standard deviation etc. Standard deviation will show a crude measure of the consistency of the algorithm performance.
Finally look at the numbers and decide in your case what is important: which algorithm's performance is more suitable for the type of input you will have most of the time, and how does it degrade when the inputs are not optimal.
Timing the algorithm is not necessarily everything - would memory footprint be important as well? One algorithm might be better computationally but it might create more objects while it runs.. etc. Just trying to point out there is more to consider than purely timing!

I wouldn't use the current time in ms as some of the others have suggested. The methods provided by ThreadMXBeans are more accurate (I dare not say 100% accurate).
They actually measure the cpu time taken by the thread, rather then elapsed system time, which may be skewed due to context switches performed by the underlying OS.
Java Performance Testing

I am not too familiar with the Java Framework but i would do it the following way:
Define a set of test cases (mostly example data) that can be used for both algorithms
Implement a timing method to measure the amount of time that a specific function takes
Create a for loop and execute method A (repeatedly, e.g. 1000 times with the whole test data). Measure the timing of the loop, not the sum of the single functions since timing functions can bias your result when called a lot)
Do the same for method B
Compare your result and choose a winner

If both algorithms have the same definition of a macro-level "tick" (e.g. walking one node in a tree) and your goal is to prove that your algorithm accomplishes its goal in a lower number of those macro-level ticks than the other, then by far the best way is to just instrument each implementation to count those ticks. That approach is ideal because it doesn't reward low-level implementation tricks that can make code execute faster but are not algorithm-related.
If you don't have that luxury, but you are trying to calculate which approach solves the problem using the least amount of CPU resources, contrary to the approaches listed here involving System.currentTimeMillis etc, I would use an external approach: the linux time command would be ideal. You have each program run on the same set of (large) inputs, preferably that take on the order of minutes or hours to process, and just run time java algo1 vs time java algo2.

If your purpose is compare the performances between two pieces of code, the best way to do is using JMH. You can import via maven and is now official in openjdk 12.
https://openjdk.java.net/projects/code-tools/jmh/

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.