I saw this problem from some blog. The following two loops are given, the question is which one is faster.
for(int i = 100000; i > 0; i--) {}
for(int i = 1; i < 100001; i++) {}
Why the first one is faster than the second one?
On some processors in code generated by some compilers the first one could be faster because the value is compared to zero. As #DeadMG notes it applies, for example, to x86 processors before 1985.
But it is:
a premature optimization, don't change first line to the second only because of this reason! Donald Knuth once said and I totally agree with him
We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil.
a non-portable optimization - it works faster only on some specific processors in some specific cases! And with a high probability it will work not just the same but even slower on other architectures.
You are warned.
Related
I came across this two code snippets in CTCI book,
Code snippet 1:
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
for(int x : array) {
if (x < min) min = x;
if (x > max) max = x;
}
Code snippet 2:
int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;
for(int x : array) {
if (x < min) min = x;
}
for(int x : array) {
if (x > max) max = x;
}
The book didnt gave a clear cut answer on which one is faster and more efficient from assembly level and compiler optimization perspective. I believe both of this have O(n) running time. The first one has one loop with the expense of two conditional operations while the second one, loops twice with only one conditional operation.
To be technically precise the second run time would be O(2N) and the first one O(N) but since we omit the constants, both would be described as O(N). So let say for a huge size of N, would the constant really matter? Also which one would result in more optimized assembly code from a compiler perspective?
EDIT: The constants do no matter for a large size of N, but comparing the two code snippets, where one has a constant of 2, and the other one not, would it effect the running time, if we run both of them parallel on same size of N and same machine specs?
To be technically precise the second run time would be O(2N) and the first one O(N) but since we omit the constants, both would be described as O(N).
I don't think that is right. As far as the number of comparisons is concerned (and that is the essence here), in the first case, you are doing 2 comparisons per iteration, whereas in the second case there are two loops, but there is 1 comparison per iteration in each. So, these two are equivalent, since O(2N) = O(N) + O(N).
Of course, several correct comments reveal the practical aspects of running such a code in silico. The real reason we find the Big-Oh complexity of an algorithm is to get an idea of how the computation behaves without any regards to the computational (machine) power and in the presence of arbitrarily large N, the input size (that's why we say asymptotically).
Indeed, both code snippets have O(N) complexity.
You estimated the constants for the two code snippets as being 1 and 2, probably based on the number of the for instructions. However, this ignores the fact that the for in the first code snippet actually contains more instructions than the one in the second snippet. In general, finding tight Big-Oh constants is difficult, and it is an indirect approach for comparing the practical performance of two algorithms.
As mentioned in the comments, what will matter here would be memory accesses -- when using two fors it is possible to require to load the array into memory twice.
Assuming that we ignore these caching aspects -- a more direct approach than comparing complexity constants would be to directly look into the number of instructions made by each approach. The second snippet duplicates the instructions used for advancing through the array (e.g. incrementing the iterator), which makes it less efficient.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Shifting bits left and right is apparently faster than multiplication and division operations on most, maybe even all, CPUs if you happen to be using a power of 2. However, it can reduce the clarity of code for some readers and some algorithms. Is bit-shifting really necessary for performance, or can I expect the compiler or VM to notice the case and optimize it (in particular, when the power-of-2 is a literal)? I am mainly interested in the Java and .NET behavior but welcome insights into other language implementations as well.
Almost any environment worth its salt will optimize this away for you. And if it doesn't, you've got bigger fish to fry. Seriously, do not waste one more second thinking about this. You will know when you have performance problems. And after you run a profiler, you will know what is causing it, and it should be fairly clear how to fix it.
You will never hear anyone say "my application was too slow, then I started randomly replacing x * 2 with x << 1 and everything was fixed!" Performance problems are generally solved by finding a way to do an order of magnitude less work, not by finding a way to do the same work 1% faster.
Most compilers today will do more than convert multiply or divide by a power-of-two to shift operations. When optimizing, many compilers can optimize a multiply or divide with a compile time constant even if it's not a power of 2. Often a multiply or divide can be decomposed to a series of shifts and adds, and if that series of operations will be faster than the multiply or divide, the compiler will use it.
For division by a constant, the compiler can often convert the operation to a multiply by a 'magic number' followed by a shift. This can be a major clock-cycle saver since multiplication is often much faster than a division operation.
Henry Warren's book, Hacker's Delight, has a wealth of information on this topic, which is also covered quite well on the companion website:
http://www.hackersdelight.org/
See also a discussion (with a link or two ) in:
Reading assembly code
Anyway, all this boils down to allowing the compiler to take care of the tedious details of micro-optimizations. It's been years since doing your own shifts outsmarted the compiler.
Humans are wrong in these cases.
99% when they try to second guess a modern (and all future) compilers.
99.9% when they try to second guess modern (and all future) JITs at the same time.
99.999% when they try to second guess modern (and all future) CPU optimizations.
Program in a way that accurately describes what you want to accomplish, not how to do it. Future versions of the JIT, VM, compiler, and CPU can all be independantly improved and optimized. If you specify something so tiny and specific, you lose the benefit of all future optimizations.
You can almost certainly depend on the literal-power-of-two multiplication optimisation to a shift operation. This is one of the first optimisations that students of compiler construction will learn. :)
However, I don't think there's any guarantee for this. Your source code should reflect your intent, rather than trying to tell the optimiser what to do. If you're making a quantity larger, use multiplication. If you're moving a bit field from one place to another (think RGB colour manipulation), use a shift operation. Either way, your source code will reflect what you are actually doing.
Note that shifting down and division will (in Java, certainly) give different results for negative, odd numbers.
int a = -7;
System.out.println("Shift: "+(a >> 1));
System.out.println("Div: "+(a / 2));
Prints:
Shift: -4
Div: -3
Since Java doesn't have any unsigned numbers it's not really possible for a Java compiler to optimise this.
On computers I tested, integer divisions are 4 to 10 times slower than other operations.
When compilers may replace divisions by multiples of 2 and make you see no difference, divisions by not multiples of 2 are significantly slower.
For example, I have a (graphics) program with many many many divisions by 255.
Actually my computation is :
r = (((top.R - bottom.R) * alpha + (bottom.R * 255)) * 0x8081) >> 23;
I can ensure that it is a lot faster than my previous computation :
r = ((top.R - bottom.R) * alpha + (bottom.R * 255)) / 255;
so no, compilers cannot do all the tricks of optimization.
I would ask "what are you doing that it would matter?". First design your code for readability and maintainability. The likelyhood that doing bit shifting verses standard multiplication will make a performance difference is EXTREMELY small.
It is hardware dependent. If we are talking micro-controller or i386, then shifting might be faster but, as several answers state, your compiler will usually do the optimization for you.
On modern (Pentium Pro and beyond) hardware the pipelining makes this totally irrelevant and straying from the beaten path usually means you loose a lot more optimizations than you can gain.
Micro optimizations are not only a waste of your time, they are also extremely difficult to get right.
If the compiler (compile-time constant) or JIT (runtime constant) knows that the divisor or multiplicand is a power of two and integer arithmetic is being performed, it will convert it to a shift for you.
According to the results of this microbenchmark, shifting is twice as fast as dividing (Oracle Java 1.7.0_72).
Most compilers will turn multiplication and division into bit shifts when appropriate. It is one of the easiest optimizations to do. So, you should do what is more easily readable and appropriate for the given task.
I am stunned as I just wrote this code and realized that shifting by one is actually slower than multiplying by 2!
(EDIT: changed the code to stop overflowing after Michael Myers' suggestion, but the results are the same! What is wrong here?)
import java.util.Date;
public class Test {
public static void main(String[] args) {
Date before = new Date();
for (int j = 1; j < 50000000; j++) {
int a = 1 ;
for (int i = 0; i< 10; i++){
a *=2;
}
}
Date after = new Date();
System.out.println("Multiplying " + (after.getTime()-before.getTime()) + " milliseconds");
before = new Date();
for (int j = 1; j < 50000000; j++) {
int a = 1 ;
for (int i = 0; i< 10; i++){
a = a << 1;
}
}
after = new Date();
System.out.println("Shifting " + (after.getTime()-before.getTime()) + " milliseconds");
}
}
The results are:
Multiplying 639 milliseconds
Shifting 718 milliseconds
How can i measure the performance of my java algorithm effectively ? is there any accurate way of doing it?
i read other questions of same kind but not satisfied.
any help would be appreciated.
long reference=System.nanoTime();
your_funct();
long finishm=System.nanoTime();
System.out.println( ( (double)(finishm-reference) )/1000000000.0); //in seconds
Has a meaningful level of ~0.003 seconds in my machine. I mean, you measure in nano-seconds but smallest step is around 3000000 nanoseconds in my machine.
You ask for performance which indicates some sort of timing. But then what would you compare against?
A general way of measuring an algorithm is using Big O, which takes a simplified mathematical approach.
To explain this at a very basic level a simple linear search of a list of integers has a linear (n) worst case big o. Eg:
for(int i = 0; i < sizeofarray; ++i)
if(array[i] == to_find)
return i;
At the worst case this would take i iterations (often number is referred to as n in big o) - so we call it n or linear complexity algorithm.
Something like a bubblesort algorithm is a loop within a loop so we have n * n complexity = n^2 or quadratic complexity.
Comparing like with like, if we just consider sorting, quicksort is more efficient than quadratic complexity (it is n log n complexity) so you can consider quicksort being 'better' than bubblesort.
So when evaluating your algorithm think about it in terms of n. Is there a loop? how many? The fewer the better. No loops even better - constant big o.
You can use some profilers. Many IDEs (such as Netbeans) have one.
If can make it practical or theoretical. If practical then put a timer before the algorithm starts and stop it when it ends. If theoretical then use Big O notation (isn't that hard) and you'll get an estimate of its time or space complexity.
The best way to do it is still java.lang.System.currentTimeMillis() as it will work regardless the IDE you are using.
For the given code (I am using just one from my previous questions), the running time using O notation is O(n^2). If I want to express the running time using Theta notation would it be the same? Meaning Theta(n^2)?
for(int i=0; i<N; i++){
for(int j=1; j<N; j++){
System.out.println("Yayyyy");
if(i<=j){
System.out.println("Yayyy not");
}
}
}
In essence:
Big O-notation is for UPPER bounds for running time. This means that most algorithms have several Big O-bounds (your algorithm is for example O(n^23) because it is by far more effective than a theta(n^23) algorithm)
Theta-notation is for tight bounds. Not all algorithms have a clearly defined tight bound, because this would mean that it grows proportionally with the other function. In your example, because there is no way the algorithm can finish without having printed "Yayyy not"
(n^2 - n)/2 times, and it will never run more than this number of times, it will always grow proportionally with n^2, and thus have a theta(n^2) bound!
To make this short and palatable, BigO(n^2) means that your algorithm will not take longer than ~n^2 time. BigTheta(n^2) means that your algorithm will not take longer than ~n^2 time, and it will not take less than ~n^2 time.
Is it recommended to count in small loops (where possible) down from length - 1 to zero
instead of counting up to length - 1?
1.) Counting down
for (int i = a.length - 1; i >= 0; i--) {
if (a[i] == key) return i;
}
2.) Counting up
for (int i = 0; i < a.length; i++) {
if (a[i] == key) return i;
}
The first one is slightly faster that the second one (because comparing to zero is faster) but is a little more error-prone in my opinion. Besides, the first one could maybe not be optimized by future improvements of the JVM. Any ideas on that?
If you store the result of a.length in variable, it won't be any "faster" if it is actually so. In any event, it is rarely worth worrying about the performance of such a trivial operation. Focus on the readability of the method.
For me, counting up is more readable.
In my opinion, it's far better to favour convention and readability (in this case, the count-up approach) over preemptive optimisation. According to Josh Bloch, it's better not to optimise your code until you are sure that optimisation is required.
Counting downwards tends to be slower, despite the possibility to drop one machine code instruction. In the modern day, performance ain't that simple. Compilers have optimisation geared towards forward loop, so you reverse loop may miss out on optimisation. Cache hardware is designed for normal forward scanning. So don't worry about this sort of micro-optimisation (and if you ever find yourself in a situation where you really need to, measure).
I would recommend you to make sure you have benchmark showing that this is a performance issue before doing too much changes like this. I'd go for the most readable any day (in my opinion it's the one counting upwards).
If you are into micro optimizations and don't trust the compiler to do the right thing, maybe you should consider caching a.length in a variable in the second loop to avoid an indirection as well.
I'd say if there is a reason to count one way vs. the other (say order of the items in the list) then don't twist yourself in a knot trying to go with convention (From experience, count up); If there isn't - make it easier for the next person to work on and just go with convention.
Comparing to 0 vs. comparing to int shouldn't really be a concern...
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil", Donald Knuth.
In this case I would argue that any possible performance gain would be outweighed by the loss of readability alone. Programmer hours are much more expensive than cpu hours.
P.S.: To further improve performance you should consider testing for inequality to zero. But watch out for empty arrays ;)