Related
Is there is any difference to put most probable condition in if, else-if or else condition
Ex :
int[] a = {2,4,6,9,10,0,30,0,31,66}
int firstCase = 0, secondCase = 0, thirdCase = 0;
for( int i=0;i<10;i++ ){
int m = a[i] % 5;
if(m < 3) {
firstCase++;
} else if(m == 3) {
secondCase++;
} else {
thirdCase++;
}
}
What is the difference of the execution time with input
int[] a = {3,6,8,7,0,0,0,0,0,0}
Is there is any different to put most possible true condition in if, else-if or else condition
Actually, the answer with Java is that "it depends".
You see, when you run Java code, the JVM starts out by using the using the interpreter while gathering statistics. One of the statistics that may be recorded is which of the paths in a branch instruction is most often taken. These statistics could then used by the JIT compiler to influence code reordering, where this does not alter the compiled code's semantics.
So if you were to execute your code, with two different datasets (i.e. "mostly zero" and "mostly non-zero"), it is possible that the JIT compiler would compile the code differently.
Whether it can actually make this optimization depends on whether it can figure out that the reordering is valid. For example, can it deduce that the conditions being tested are mutually exclusive?
So how does this affect the complexity? Well ... lets do the sums for your simplified example, assuming that the JIT compiler doesn't do anything "smart". And assume that we are not just dealing with arrays of length 10 (which renders the discussion of complexity moot).
Consider this:
For each zero, the loop does one test and one increment - say 2 operations.
For each non-zero element, the loop does two tests and one increment - say 3 operations.
So that is roughly 2*N operations for N elements when all zero versus 3*N operations ehen all non-zero. But both are O(N) ... so the Big O complexity is not affected.
(OK I left some stuff out ... but you get the picture. One of the cases is going to be faster, but the complexity is not affected.)
There's a bit more to this than you're being told.
'if' versus 'else': If a condition and its converse are not equally likely, you should handle the more likely condition in the 'else' block, not the 'if' block. The 'if' block requires a conditional jump which isn't taken and a final branch around the 'else' block; the 'else' block requires a condition branch which is taken and no final branch at all.
'if' versus 'else if' versus 'else': Obviously you should handle the most common case in the 'if' block, to avoid the second test. The same considerations as at (1) determine that the more common case as between the final 'else if' and the final 'else' should be handled in the final 'else' block.
Having said all that, unless the tests are non-trivial, or the contents of all these blocks are utterly trivial, it it is rather unlikely that any of it will make a discernible difference.
There is no difference if you only have an if-else, since the condition will always be evaluated and it does not matter whether it is almost always true or false. However, if you have an if in the else part (the else if), it is much better to put the most possible true condition in the first if. Therefore, most of the time you won't need to evaluate the condition inside the else, increasing performance.
If most conditions are true in if then the execution time will be less .Because in the first if condition only it satisfied.
If most conditions are true in if-else then the execution time will be less then last and more than first scenarios .
If most conditions are true in else then the execution time will be more.Because it checkes first 2 conditions.
Sure it is.
if ... else if ... checks are going in order in which they were coded. So, if you will place most possible condition in the end of this conditions checking queue - such code will work slightly slower.
But it all depenends how these conditions are built (how complex they are).
Most Possible condition should go to the if and then if else and so on.
It's good to write the most common condition in the very first level so that if that condition is true or false will be treated first in less time.
If you put the most frequent condition in middle (else..if) or in last (else), then it will take time to reach to that condition statement because it needs to check every condition statement.
I have a variable that gets read and updated thousands of times a second. It needs to be reset regularly. But "half" the time, the value is already the reset value. Is it a good idea to check the value first (to see if it needs resetting) before resetting (a write operaion), or I should just reset it regardless? The main goal is to optimize the code for performance.
To illustrate:
Random r = new Random();
int val = Integer.MAX_VALUE;
for (int i=0; i<100000000; i++) {
if (i % 2 == 0)
val = Integer.MAX_VALUE;
else
val = r.nextInt();
if (val != Integer.MAX_VALUE) //skip check?
val = Integer.MAX_VALUE;
}
I tried to use the above program to test the 2 scenarios (by un/commenting the 2nd "if" line), but any difference is masked by the natural variance of the run duration time.
Thanks.
Don't check it.
It's more execution steps = more cycles = more time.
As an aside, you are breaking one of the basic software golden rules: "Don't optimise early". Unless you have hard evidence that this piece if code is a performance problem, you shouldn't be looking at it. (Note that doesn't mean you code without performance in mind, you still follow normal best practice, but you don't add any special code whose only purpose is "performance related")
The check has no actual performance impact. We'd be talking about a single clock cycle or something, which is usually not relevant in a Java program (as hard-core number crunching usually isn't done in Java).
Instead, base the decision on readability. Think of the maintainer who's going to change this piece of code five years on.
In the case of your example, using my rationale, I would skip the check.
Most likely the JIT will optimise the code away because it doesn't do anything.
Rather than worrying about performance, it is usually better to worry about what it
simpler to understand
cleaner to implement
In both cases, you might remove the code as it doesn't do anything useful and it could make the code faster as well.
Even if it did make the code a little slower it would be very small compared to the cost of calling r.nextInt() which is not cheap.
Suppose I have an IF condition :
if (A || B)
∧
|
|
left
{
// do something
}
Now suppose that A is more likely to receive a true value then B , why do I care which one is on the left ?
If I put both of them in the IF brackets , then I know (as the programmer of the code) that both parties are needed .
The thing is , that my professor wrote on his lecture notes that I should put the "more likely variable to receive a true" on the left .
Can someone please explain the benefit ? okay , I put it on the left ... what am I gaining ? run time ?
Its not just about choosing the most likely condition on the left. You can also have a safe guard on the left meaning you can only have one order. Consider
if (s == null || s.length() == 0) // if the String is null or empty.
You can't swap the order here as the first condition protects the second from throwing an NPE.
Similarly you can have
if (s != null && s.length() > 0) // if the String is not empty
The reason for choosing the most likely to be true for || or false for && is a micro-optimisation, to avoid the cost of evaluated in the second expression. Whether this translates to a measurable performance difference is debatable.
I put it on the left ... what am I gaining ? run time ?
Because || operator in C++ uses short-circuit evaluation.
i.e: B is evaulated only if A is evaluated to a false.
However, note that in C++ short-circuit evaluation is guaranteed for "built in" data types and not custom data types.
As per javadoc
The && and || operators perform Conditional-AND and Conditional-OR operations on two boolean expressions. These operators exhibit "short-circuiting" behavior, which means that the second operand is evaluated only if needed
So, if true statement comes first in the order, it short-circuits the second operand at runtime.
If the expression on the left is true, there is no need to evaluate the expression on the right, and so it can be optimized out at run time. This is a technique called short-circuiting. So by placing the expression more likely to be true on the left, we can expect our program to perform better than if it were the other way around.
You should place the condition that is more likely to be true first because that will cause the if statement to short-circuit. Meaning it will not evaluate the rest of the if statement because it will already know the answer is true. This makes code more efficient.
This is especially useful when your if statement is evaluating expensive things:
if(doExpensiveCheck1() || doExpensiveCheck2()) { }
In this case cause the checks are expensive it is in your benefit to place the most likely check first.
In many cases there is no practical difference apart from a tiny performance improvement. Where this becomes useful is if your checks are very expensive function calls (unlikely) or you need to check things in order. Say for example you want to check a property on something and to check if that something is nil first, you might do something like:
If (a != nil && a.attribute == valid)
{}
Yes exactly, you're gaining runtime, it won't seem much for one operation, but you have to keep in mind that operations will get repeated millions of times
Why perform two evaluations when one is enough is the logic
At runtime if(a||b) will test a first, if a is true it will not waste time testing b therefor the compiler will be 1 execution ahead. Therefore if a is more likely to be true than b this test is also likely to cut 1 line. The total number of lines not executed is tiny on a single line but it’s huge if the statement is nested in a loop of some sort(for,while ,recession or database related queries ). Eg per say we have 1million mins to test data in a database at 1 minute per record (30sec for condition A and 30 sec for condition B). Let A have 80% chances to be true and B have 20% chances to be true. The total time needed if you put A first is 600-000hrs yet it’s 900-000hrs if you put B first.if A is tested first[(0,8*1millions hours)*0,5mins+(0,2*1million hours)*1min]===6000-000hrs : if B is tested first [(0,2*1million hours)*0,5mins+(0,2*1million hours)*1min]===9000-000hrs. However you will notice the difference is less significant if the probability of A becoming true is closer to that of B.
public class Main
{
public static void main(String[] args) {
System.out.println("Hello World");
Integer a = null;
Integer b = 3;
Integer c = 5;
if(a != null && a == 2){
System.out.println("both");
}else{
System.out.println("false");
}
}
}
Hello World
false
I wrote some code that looks similar to the following:
String SKIP_FIRST = "foo";
String SKIP_SECOND = "foo/bar";
int skipFooBarIndex(String[] list){
int index;
if (list.length >= (index = 1) && list[0].equals(SKIP_FIRST) ||
list.length >= (index = 2) &&
(list[0] + "/" + list[1]).equals(SKIP_SECOND)){
return index;
}
return 0;
}
String[] myArray = "foo/bar/apples/peaches/cherries".split("/");
print(skipFooBarIndex(myArray);
This changes state inside of the if statement by assigning index. However, my coworkers disliked this very much.
Is this a harmful practice? Is there any reason to do it?
Yes. This clearly reduces readability. What's wrong with the following code?
int skipFooBarIndex(String[] list){
if(list.length >= 1 && list[0].equals(SKIP_FIRST))
return 1;
if(list.length >= 2 && (list[0] + "/" + list[1]).equals(SKIP_SECOND))
return 2;
return 0;
}
It's much easier to understand. In general, having side effects in expressions is discouraged as you'll be relying on the order of evaluation of subexpressions.
Assuming you count it as "clever" code, it's good to always remember Brian Kernighan's quote:
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
...However, my coworkers disliked this very much...
Yes, it is. Not just because you can code it like that, you have to.
Remember that that piece of code will eventually have to be maintained by someone ( that someone may be your self in 8 months )
Changing the state inside the if, make is harder to read and understand ( mostly because it is non common )
Quoting Martin Fowler:
Any fool can write code that a computer can understand. Good programmers write code that humans can understand
There's an excellent reason not to do it: it's makes your code really hard to understand and reason about.
The problem is that the code would generate multiple-WTFs in a code review session. Anything that makes people go "wait, what?" has got to go.
It's sadly easy enough to create bugs even in easy-to-read code. No reason to make it even easier.
Yes, side effects are hard to follow when reviewing code.
Regarding reasons to do it: No, there is no real reason to do it. I haven't yet stumbled upon an if statement that can't be rewritten without side effects without having any loss.
The only thing wrong with it is that it's unfamiliar and confusing to people who didn't write it, at least for a minute while they figure it out. I would probably write it like this to make it more readable:
if (list.length >= 1 && list[0].equals(SKIP_FIRST)) {
return 1;
}
if (list.length >= 2 && (list[0] + "/" + list[1]).equals(SKIP_SECOND)) {
return 2;
}
Borrowed from cppreference.com:
One important aspect of C++ that is related to operator precedence is the order of evaluation and the order of side effects in expressions. In some circumstances, the order in which things happen is not defined. For example, consider the following code:
float x = 1;
x = x / ++x;
The value of x is not guaranteed to be consistent across different compilers, because it is not clear whether the computer should evaluate the left or the right side of the division first. Depending on which side is evaluated first, x could take a different value.
Furthermore, while ++x evaluates to x+1, the side effect of actually storing that new value in x could happen at different times, resulting in different values for x.
The bottom line is that expressions like the one above are horribly ambiguous and should be avoided at all costs. When in doubt, break a single ambiguous expression into multiple expressions to ensure that the order of evaluation is correct.
Is this a harmful practice?
Absolutely yes. The code is hard to understand. It takes two or three reads for anyone but the author. Any code that is hard to understand and that can be rewritten in a simpler way that is easier to understand SHOULD be rewritten that way.
Your colleagues are absolutely right.
Is there any reason to do it?
The only possible reason for doing something like that is that you have extensively profiled the application and found this part of code to be a significant bottleneck. Then you have implemented the abomination above, rerun the profiler, and found that it REALLY improves the performance.
Well, I spent some time reading the above without realising what was going on. So I would definitely suggest that it's not ideal. I wouldn't really ever expect the if() statement itself to change state.
I wouldn't recommend an if condition having side-effects without a very good reason. For me, this particular example took several looks to figure out what was going on. There may be a case where it isn't so bad, although I certainly can't think of one.
Ideally, each piece of code should do one thing. Making it do more than one thing is potentially confusing, and confusing is exactly what you don't want in your code.
The code in the condition of an if statement is supposed to generate a boolean value. Tasking it with assigning a value is making it do two things, which is generally bad.
Moreover, people expect conditions to be just conditions, and they often glance over them when they're getting an impression of what the code is doing. They don't carefully parse everything until they decide they need to.
Stick that in code I'm reviewing and I'll flag it as a defect.
You can also get ternary to avoid multiple returns:
int skipFooBarIndex(String[] list) {
return (list.length > 0 && list[0].equals(SKIP_FIRST)) ? 1 :
((list.length > 1 && (list[0] + "/" + list[1]).equals(SKIP_SECOND)) ? 2 : 0);
}
Though this example is less readable.
Speaking as someone who does a lot of maintenance programming: if I came across this I would curse you, weep and then change it.
Code like this is a nightmare - it screams one of two things
I'm new here and I need help doing the right thing.
I think I am very clever because I have saved lines of code or I have fooled the compiler and made it quicker. Its not clever, its not optimal and its not funny
;)
In C it's fairly common to change state inside if statements. Generally speaking, I find that there are a few unwritten rules on where this is acceptable, for example:
You are reading into a variable and checking the result:
int a;
...
if ((a = getchar()) == 'q') { ... }
Incrementing a value and checking the result:
int *a = (int *)0xdeadbeef;
...
if (5 == *(a++)) { ... }
And when it is not acceptable:
You are assigning a constant to a variable:
int a;
...
if (a = 5) { ... } // this is almost always unintentional
Mixing and matching pre- and post-increment, and short-circuiting:
int a = 0, b;
...
if (b || a++) { ... } // BAD!
For some reason the font for sections I'm trying to mark as code is not fixed-width on SO, but in a fixed width font there are situations where assignment inside if expressions is both sensible and clear.
I am writing code in Java where I branch off based on whether a string starts with certain characters while looping through a dataset and my dataset is expected to be large.
I was wondering whether startsWith is faster than indexOf. I did experiment with 2000 records but not found any difference.
startsWith only needs to check for the presence at the very start of the string - it's doing less work, so it should be faster.
My guess is that your 2000 records finished in a few milliseconds (if that). Whenever you want to benchmark one approach against another, try to do it for enough time that differences in timing will be significant. I find that 10-30 seconds is long enough to show significant improvements, but short enough to make it bearable to run the tests multiple times. (If this were a serious investigation I'd probably try for longer times. Most of my benchmarking is for fun.)
Also make sure you've got varied data - indexOf and startsWith should have roughly the same running time in the case where indexOf returns 0. So if all your records match the pattern, you're not really testing correctly. (I don't know whether that was the case in your tests of course - it's just something to watch out for.)
In general, the golden rule of micro-optimization applies here:
"Measure, don't guess".
As with all optimizations of this type, the difference between the two calls almost certainly won't matter unless you are checking millions of strings that are each tens of thousands of characters long.
Run a profiler over your code, and only optimize this call when you can measure that it's slowing you down. Till then, go with the more readable options (startsWith, in this case). Once you know that this block is slowing you down, try both and use whichever is faster. Rinse. Repeat ;-)
Academically, my guess is that startsWith will likely be implemented using indexOf. Check the source code and see if you're interested. (Turns out that startsWith does not call indexOf)
Even without looking into the sources, it should be clear that startsWith() is faster at least for large strings and short pattern.
The running time of a.startsWith(b) is bound be the length of b. After at most the first b characters are checked, the search finished.
The running time of a.indexOf(b) is larger (depending on the actual algorithm). Every algorithm has at least a running time depending on the length of a. Roughly, you can say, that you have to look at each character once to check if the pattern starts at that position.
However, as always, it depends on the actual use case if you really see a difference in practice. Measuring the difference in real life is never bad.
Probably, if it doesn't match it can stop looking whereas indexOf needs to look for occurrences later in the string.
startsWith is clearer than indexOf == 0.
Have you identified the test as a performance bottleneck for which you need to sacrifice readability?
public class Test
{
public static void main(String args[]) {
long value1 = System.currentTimeMillis();
for(long i=0;i<100000000;i++)
{
"abcd".indexOf("a");
}
long value2 = System.currentTimeMillis();
System.out.println(value2-value1);
value1 = System.currentTimeMillis();
for(long i=0;i<100000000;i++)
{
"abcd".startsWith("a");
}
value2 = System.currentTimeMillis();
System.out.println(value2-value1);
}
}
Tested it with this piece of code and perf for startsWith seems to be better, for obvious reason that it doesn't have to traverse through string. But in best case scenario both should perform close while in a worst case scenario startsWith will always perform better than indexOf
You mentioned the dataset is expected to be large. So i will bet a lot of performanve will go into access this dataset and handle it in memory. That means use one or the other will not change the perfomance significant. But if this is important to you you may write your own startWith method that could be significant faster than standard library methods or at least you know exactly what is done.
Unfortunate, statsWith is not working as supposed to! it uses "indexOf" behind the sene (lazy developpers :D) so indexOf is 10x faster than implemented statsWith