SonarQube thinks the following code violates rule fb-contrib:SEO_SUBOPTIMAL_EXPRESSION_ORDER (note, the code example is simplified and not logical):
class Foo {
boolean baz;
boolean foo() {
return bar() && baz==Baz.VALUE; //Violation of fb-contrib:SEO_SUBOPTIMAL_EXPRESSION_ORDER
}
boolean bar() {
return baz == Baz.VALUE_2;
}
}
enum Baz {
VALUE, VALUE2
}
Performance - Method orders expressions in a conditional in a sub optimal way
(fb-contrib:SEO_SUBOPTIMAL_EXPRESSION_ORDER)
This method builds a conditional expression, for example, in an if or while statement where the expressions contain both simple local variable comparisons, as well as comparisons on method calls. The expression orders these so that the method calls come before the simple local variable comparisons. This causes method calls to be executed in conditions when they do not need to be, and thus potentially causes a lot of code to be executed for nothing. By ordering the expressions so that the simple conditions containing local variable conditions are first, you eliminate this waste. This assumes that the method calls do not have side effects. If the method do have side effects, it is probably a better idea to pull these calls out of the condition and execute them first, assigning a value to a local variable. In this way you give a hint that the call may have side effects.
I think it's reasonable for the rule implementation to look inside the actual expression and if the content is a value check then not trigger a violation.
Is this a bug or am I missing something?
You almost gave the answer in your question already:
This causes method calls to be executed in conditions when they do not need to be, and thus potentially causes a lot of code to be executed for nothing. By ordering the expressions so that the simple conditions containing local variable conditions are first, you eliminate this waste.
FB-Contrib wants you to turn around the expression:
boolean foo() {
return baz==Baz.VALUE && bar();
}
This way, bar() needs to be executed only if baz==Baz.value. Of course this is something which the compiler or the JVM might optimize, so it would come down to a micro-benchmark to figure out if this precaution is actually necessary.
But it makes sense on a syntactical level, because calling a method is generally more expensive than a value check. So you don't need to look inside the method to tell that. Any guesses on the inlining behavior of the compiler / JVM are probably wrong anyway.
Related
I've always assumed every time I call a method in Java the method is executed again. I've assumed the return value is not stored automatically unless I store it in a variable.
Then I ran across this code in Princeton's algs4.BST class where they call three methods twice each:
private boolean check() {
if (!isBST()) StdOut.println("Not in symmetric order");
if (!isSizeConsistent()) StdOut.println("Subtree counts not consistent");
if (!isRankConsistent()) StdOut.println("Ranks not consistent");
return isBST() && isSizeConsistent() && isRankConsistent();
}
Are they simply not concerned with performance? Or is the compiler smart enough keep the first return value of each method to use in the return statement?
Sorry if this is a duplicate seems like this answer should exist but I can't find it here or in Java docs. I found these (and others) but they don't answer my question:
Is this the cleanest way to repeat method call in Java?
How to call a method without repeat from other methods in java
The Java Language Specification explicitly and unconditionally states that evaluation of a method invocation expression involves executing the designated method's code. JLS 8 puts it this way:
At run time, method invocation requires five steps. First, a target
reference may be computed. Second, the argument expressions are
evaluated. Third, the accessibility of the method to be invoked is
checked. Fourth, the actual code for the method to be executed is
located. Fifth, a new activation frame is created, synchronization is
performed if necessary, and control is transferred to the method code.
(JLS 8, 15.12.4; emphasis added)
Thus, invoking a method a second time incurs its cost a second time, regardless of whether the same value can be expected to be computed. It is conceivable that JIT compilation could optimize that, but there are more considerations there than whether the same result is going to be computed, and you're anyway unlikely to see any JIT action triggered by just two invocations of a given method.
Bottom line: yes, the author of the code was simply not concerned with performance. They may have considered the implementation presented to be clearer than one that avoided redundant method invocations, or they may have had some other personal reason for the choice. This kind of disregard for practicalities is not uncommon in code serving academic purposes, such as that presented.
The check function in that code is only called in the context:
assert check();
Asserts are not normally enabled in production code, and if asserts are not enabled, the assert statement does absolutely nothing. So the check function will only be run in debugging runs. That fact does not give license to be arbitrarily inefficient, but it is common to make no attempt to optimize such code. The point of code run as an assertion is to be obviously and indisputably correct while it verifies an invariant, precondition or postcondition, and optimizations -- even trivial ones like saving a result in a local variable -- do not contribute to that goal.
##14.10. The assert Statement
An assertion is an assert statement containing a boolean expression. An assertion is either enabled or disabled. If an assertion is enabled, execution of the assertion causes evaluation of the boolean expression and an error is reported if the expression evaluates to false. If the assertion is disabled, execution of the assertion has no effect whatsoever.
I encountered a situation where a non-void method is missing a return statement and the code still compiles.
I know that the statements after the while loop are unreachable (dead code) and would never be executed. But why doesn't the compiler even warn about returning something? Or why would a language allow us to have a non-void method having an infinite loop and not returning anything?
public int doNotReturnAnything() {
while(true) {
//do something
}
//no return statement
}
If I add a break statement (even a conditional one) in the while loop, the compiler complains of the infamous errors: Method does not return a value in Eclipse and Not all code paths return a value in Visual Studio.
public int doNotReturnAnything() {
while(true) {
if(mustReturn) break;
//do something
}
//no return statement
}
This is true of both Java and C#.
Why would a language allow us to have a non-void method having an infinite loop and not returning anything?
The rule for non-void methods is every code path that returns must return a value, and that rule is satisfied in your program: zero out of zero code paths that return do return a value. The rule is not "every non-void method must have a code path that returns".
This enables you to write stub-methods like:
IEnumerator IEnumerable.GetEnumerator()
{
throw new NotImplementedException();
}
That's a non-void method. It has to be a non-void method in order to satisfy the interface. But it seems silly to make this implementation illegal because it does not return anything.
That your method has an unreachable end point because of a goto (remember, a while(true) is just a more pleasant way to write goto) instead of a throw (which is another form of goto) is not relevant.
Why doesn't the compiler even warn about returning something?
Because the compiler has no good evidence that the code is wrong. Someone wrote while(true) and it seems likely that the person who did that knew what they were doing.
Where can I read more about reachability analysis in C#?
See my articles on the subject, here:
ATBG: de facto and de jure reachability
And you might also consider reading the C# specification.
The Java compiler is smart enough to find the unreachable code ( the code after while loop)
and since its unreachable, there is no point in adding a return statement there (after while ends)
same goes with conditional if
public int get() {
if(someBoolean) {
return 10;
}
else {
return 5;
}
// there is no need of say, return 11 here;
}
since the boolean condition someBoolean can only evaluate to either true or false, there is no need to provide a return explicitly after if-else, because that code is unreachable, and Java does not complain about it.
The compiler knows that the while loop will never stop executing, hence the method will never finish, hence a return statement is not necessary.
Given your loop is executing on a constant - the compiler knows that it's an infinite loop - meaning the method could never return, anyway.
If you use a variable - the compiler will enforce the rule:
This won't compile:
// Define other methods and classes here
public int doNotReturnAnything() {
var x = true;
while(x == true) {
//do something
}
//no return statement - won't compile
}
The Java specification defines a concept called Unreachable statements. You are not allowed to have an unreachable statement in your code (it's a compile time error). You are not even allowed to have a return statement after the while(true); statement in Java. A while(true); statement makes the following statements unreachable by definition, therefore you don't need a return statement.
Note that while Halting problem is undecidable in generic case, the definition of Unreachable Statement is more strict than just halting. It's deciding very specific cases where a program definitely does not halt. The compiler is theoretically not able to detect all infinite loops and unreachable statements but it has to detect specific cases defined in the specification (for example, the while(true) case)
The compiler is smart enough to find out that your while loop is infinite.
So the compiler cannot think for you. It cannot guess why you wrote that code. Same stands for the return values of methods. Java won't complain if you don't do anything with method's return values.
So, to answer your question:
The compiler analyzes your code and after finding out that no execution path leads to falling off the end of the function it finishes with OK.
There may be legitimate reasons for an infinite loop. For example a lot of apps use an infinite main loop. Another example is a web server which may indefinitely wait for requests.
In type theory, there is something called the bottom type which is a subclass of every other type (!) and is used to indicate non-termination among other things. (Exceptions can count as a type of non-termination--you don't terminate via the normal path.)
So from a theoretical perspective, these statements that are non-terminating can be considered to return something of Bottom type, which is a subtype of int, so you do (kind of) get your return value after all from a type perspective. And it's perfectly okay that it doesn't make any sense that one type can be a subclass of everything else including int because you never actually return one.
In any case, via explicit type theory or not, compilers (compiler writers) recognize that asking for a return value after a non-terminating statement is silly: there is no possible case in which you could need that value. (It can be nice to have your compiler warn you when it knows something won't terminate but it looks like you want it to return something. But that's better left for style-checkers a la lint, since maybe you need the type signature that way for some other reason (e.g. subclassing) but you really want non-termination.)
There is no situation in which the function can reach its end without returning an appropriate value. Therefore, there is nothing for the compiler to complain about.
Visual studio has the smart engine to detect if you have typed a return type then it should have a return statement with in the function/method.
As in PHP Your return type is true if you have not returned anything. compiler get 1 if nothing has returned.
As of this
public int doNotReturnAnything() {
while(true) {
//do something
}
//no return statement
}
Compiler know that while statement itself has a infinte nature so not to consider it. and php compiler will automatically get true if you write a condition in expression of while.
But not in the case of VS it will return you a error in the stack .
Your while loop will run forever and hence won't come outside while; it will continue to execute. Hence, the outside part of while{} is unreachable and there is not point in writing return or not. The compiler is intelligent enough to figure out what part is reachable and what part isn't.
Example:
public int xyz(){
boolean x=true;
while(x==true){
// do something
}
// no return statement
}
The above code won't compile, because there can be a case that the value of variable x is modified inside the body of while loop. So this makes the outside part of while loop reachable! And hence compiler will throw an error 'no return statement found'.
The compiler is not intelligent enough (or rather lazy ;) ) to figure out that whether the value of x will be modified or not. Hope this clears everything.
"Why doesn't the compiler even warn about returning something? Or why would a language allow us to have a non-void method having an infinite loop and not returning anything?".
This code is valid in all other languages too (probably except Haskell!). Because the first assumption is we are "intentionally" writing some code.
And there are situations that this code can be totally valid like if you are going to use it as a thread; or if it was returning a Task<int>, you could do some error checking based on the returned int value - which should not be returned.
I may be wrong but some debuggers allow modification of variables. Here while x is not modified by code and it will be optimized out by JIT one might modify x to false and method should return something (if such thing is allowed by C# debugger).
The specifics of the Java case for this (which are probably very similar to the C# case) are to do with how the Java compiler determines if a method is able to return.
Specifically, the rules are that a method with a return type must not be able to complete normally and must instead always complete abruptly (abruptly here indicating via a return statement or an exception) per JLS 8.4.7.
If a method is declared to have a return type, then a compile-time
error occurs if the body of the method can complete normally.
In other words, a method with a return type must return only by using
a return statement that provides a value return; it is not allowed to
"drop off the end of its body".
The compiler looks to see whether normal termination is possible based on the rules defined in JLS 14.21 Unreachable Statements as it also defines the rules for normal completion.
Notably, the rules for unreachable statements make a special case just for loops that have a defined true constant expression:
A while statement can complete normally iff at least one of the
following is true:
The while statement is reachable and the condition expression is not a
constant expression (§15.28) with value true.
There is a reachable break statement that exits the while statement.
So if the while statement can complete normally, then a return statement below it is necessary since the code is deemed reachable, and any while loop without a reachable break statement or constant true expression is considered able to complete normally.
These rules mean that your while statement with a constant true expression and without a break is never considered to complete normally, and so any code below it is never considered to be reachable. The end of the method is below the loop, and since everything below the loop is unreachable, so is the end of the method, and thus the method cannot possibly complete normally (which is what the complier looks for).
if statements, on the other hand, do not have the special exemption regarding constant expressions that are afforded to loops.
Compare:
// I have a compiler error!
public boolean testReturn()
{
final boolean condition = true;
if (condition) return true;
}
With:
// I compile just fine!
public boolean testReturn()
{
final boolean condition = true;
while (condition)
{
return true;
}
}
The reason for the distinction is quite interesting, and is due to the desire to allow for conditional compilation flags that do not cause compiler errors (from the JLS):
One might expect the if statement to be handled in the following
manner:
An if-then statement can complete normally iff at least one of the
following is true:
The if-then statement is reachable and the condition expression is not
a constant expression whose value is true.
The then-statement can complete normally.
The then-statement is reachable iff the if-then statement is reachable
and the condition expression is not a constant expression whose value
is false.
An if-then-else statement can complete normally iff the then-statement
can complete normally or the else-statement can complete normally.
The then-statement is reachable iff the if-then-else statement is
reachable and the condition expression is not a constant expression
whose value is false.
The else-statement is reachable iff the if-then-else statement is
reachable and the condition expression is not a constant expression
whose value is true.
This approach would be consistent with the treatment of other control
structures. However, in order to allow the if statement to be used
conveniently for "conditional compilation" purposes, the actual rules
differ.
As an example, the following statement results in a compile-time
error:
while (false) { x=3; } because the statement x=3; is not reachable;
but the superficially similar case:
if (false) { x=3; } does not result in a compile-time error. An
optimizing compiler may realize that the statement x=3; will never be
executed and may choose to omit the code for that statement from the
generated class file, but the statement x=3; is not regarded as
"unreachable" in the technical sense specified here.
The rationale for this differing treatment is to allow programmers to
define "flag variables" such as:
static final boolean DEBUG = false; and then write code such as:
if (DEBUG) { x=3; } The idea is that it should be possible to change
the value of DEBUG from false to true or from true to false and then
compile the code correctly with no other changes to the program text.
Why does the conditional break statement result in a compiler error?
As quoted in the loop reachability rules, a while loop can also complete normally if it contains a reachable break statement. Since the rules for the reachability of an if statement's then clause do not take the condition of the if into consideration at all, such a conditional if statement's then clause is always considered reachable.
If the break is reachable, then the code after the loop is once again also considered reachable. Since there is no reachable code that results in abrupt termination after the loop, the method is then considered able to complete normally, and so the compiler flags it as an error.
Given the following code sample:
public class WeirdStuff {
public static int doSomething() {
while(true);
}
public static void main(String[] args) {
doSomething();
}
}
This is a valid Java program, although the method doSomething() should return an int but never does. If you run it, it will end in an infinite loop. If you put the argument of the while loop in a separate variable (e.g. boolean bool = true) the compiler will tell you to return an int in this method.
So my question is: is this somewhere in the Java specification and are there situation where this behavior might be useful?
I'll just quote the Java Language Specification, as it's rather clear on this:
This section is devoted to a precise explanation of the word "reachable." The idea is that there must be some possible execution path from the beginning of the constructor, method, instance initializer or static initializer that contains the statement to the statement itself. The analysis takes into account the structure of statements. Except for the special treatment of while, do, and for statements whose condition expression has the constant value true, the values of expressions are not taken into account in the flow analysis.
...
A while statement can complete normally iff at least one of the following is true:
The while statement is reachable and the condition expression is not a constant expression with value true.
There is a reachable break statement that exits the while statement.
...
Every other statement S in a nonempty block that is not a switch block is reachable iff the statement preceding S can complete normally.
And then apply the above definitions to this:
If a method is declared to have a return type, then every return statement (§14.17) in its body must have an Expression. A compile-time error occurs if the body of the method can complete normally (§14.1).
In other words, a method with a return type must return only by using a return statement that provides a value return; it is not allowed to "drop off the end of its body."
Note that it is possible for a method to have a declared return type and yet contain no return statements. Here is one example:
class DizzyDean {
int pitch() { throw new RuntimeException("90 mph?!"); }
}
Java specification defines a concept called Unreachable statements. You are not allowed to have an unreachable statement in your code (it's a compile time error). A while(true); statement makes the following statements unreachable by definition. You are not even allowed to have a return statement after the while(true); statement in Java. Note that while Halting problem is undecidable in generic case, the definition of Unreachable Statement is more strict than just halting. It's deciding very specific cases where a program definitely does not halt. The compiler is theoretically not able to detect all infinite loops and unreachable statements but it has to detect specific cases defined in the spec.
If you are asking if infinite loops can be useful, the answer is yes. There are plenty of situations where you want something running forever, though the loop will usually be terminated at some point.
As to your question: "Can java recognized when a loop will be infinite?" The answer is that it is impossible for a computer to have an algorithm to determine if a program will run forever or not. Read about: Halting Problem
Reading a bit more, your question is also asking why the doSomething() function does not complain that it is not returning an int.
Interestingly the following source does NOT compile.
public class test {
public static int doSomething() {
//while(true);
boolean test=true;
while(test){
}
}
public static void main(String[] args) {
doSomething();
}
}
This indicates to me that, as the wiki page on the halting problem suggests, it is impossible for there to be an algorithm to determine if every problem will terminate, but this does not mean someone hasn't added the simple case:
while(true);
to the java spec. My example above is a little more complicated, so Java can't have it remembered as an infinite loop. Truely, this is a weird edge case, but it's there just to make things compile. Maybe someone will try other combinations.
EDIT: not an issue with unreachable code.
import java.util.*;
public class test {
public static int doSomething() {
//while(true);
while(true){
System.out.println("Hello");
}
}
public static void main(String[] args) {
doSomething();
}
}
The above works, so the while(true); isn't being ignored by the compiler as unreachable, otherwise it would throw a compile time error!
Yes, you can see these 'infinite' loops in some threads, for example server threads that listen on a certain port for incoming messages.
So my question is: is this somewhere in the Java specification
The program is legal Java according to the specification. The JLS (and Java compiler) recognize that the method cannot return, and therefore no return statement is required. Indeed, if you added a return statement after the loop, the Java compiler would give you a compilation error because the return statement would be unreachable code.
and are there situation where this behavior might be useful?
I don't think so, except possibly in obscure unit tests.
I occasionally write methods that will never return (normally), but putting the current thread into an uninterruptible infinite busy-loop rarely makes any sense.
After rereading the question....
Java understands while(true); can never actually complete, it does not trace the following code completely.
boolean moo = true;
while (moo);
Is this useful? Doubtful.
You might be implementing a general interface such that, even though the method may exit with a meaningful return value, your particular implementation is a useful infinite loop (for example, a network server) which never has a situation where it should exit, i.e. trigger whatever action returning a value means.
Also, regarding code like boolean x = true; while (x);, this will compile given a final modifier on x. I don't know offhand but I would imagine this is Java's choice of reasonable straightforward constant expression analysis (which needs to be defined straightforwardly since, due to this rejection of programs dependent on it, it is part of the language definition).
Some notes about unreachable statements:
In java2 specs the description of 'unreachable statement' could be found. Especially interesting the following sentence:
Except for the special treatment of while, do, and for statements whose condition expression has the constant value true, the values of expressions are not taken into account in the flow analysis.
So, it is not obviously possible to exit from while (true); infinite loop. However, there were two more options: change cached values or hack directly into class file or JVM operating memory space.
I did some reading up on JLS 15.7.4 and 15.12.4.2, but it doesn't guarantee that there won't be any compiler/runtime optimization that would change the order in which method arguments are evaluated.
Assume the following code:
public static void main (String[] args) {
MyObject obj = new MyObject();
methodRelyingOnEvalOrder(obj, obj.myMethod());
}
public static Object methodRelyingOnEvalOrder(MyObject obj, Object input) {
if (obj.myBoolean())
return null;
else
return input;
}
Is it guaranteed that the compiler or runtime will not do a false optimization such as the following? This optimization may appear correct, but it is wrong when the order of evaluation matters.
In the case where calling obj.myMethod alters the value that will be returned by obj.myBoolean, it is crucial that obj.myMethod be called first as methodRelyingOnEvalOrder requires this alteration to happen first.
//*******************************
//Unwanted optimization possible:
//*******************************
public static void main (String[] args) {
MyObject obj = new MyObject();
methodRelyingOnEvalOrder(obj);
}
public static Object methodRelyingOnEvalOrder(MyObject obj) {
if (obj.myBoolean())
return null;
else
return obj.myMethod();
}
//*******************************
If possible, please show some sources or Java documentation that supports your answer.
Note: Please do not ask to rewrite the code. This is a specific case where I am questioning the evaluation order guarantee and the compiler/runtime optimization guarantee. The execution of obj.myMethod must happen in the main method.
The bit of the JLS you referred to (15.7.4) does guarantee that:
Each argument expression appears to be fully evaluated before any part of any argument expression to its right.
and also in 15.12.4.2:
Evaluation then continues, using the argument values, as described below.
The "appears to" part allows for some optimization, but it must not be visible. The fact that all the arguments are evaluated before "evaluation then continues" shows that the arguments are really fully evaluted before the method executes. (Or at least, that's the visible result.)
So for example, if you had code of:
int x = 10;
foo(x + 5, x + 20);
it would be possible to optimize that to evaluate both x + 5 and x + 20 in parallel: there's no way you could detect this occurring.
But in the case you've given, you would be able to detect the call to obj.myMethod occurring after the call to obj.myBoolean(), so that wouldn't be a valid optimization at all.
In short: you're fine to assume that everything will execute in the obvious way here.
In addition to the argument evaluation order, the overview of the steps preformed when invoking a method, and their order, explained in section 15.12.4. Run-Time Evaluation of Method Invocation, makes it clear that all argument evaluations are performed before executing the method's code. Quote:
At run time, method invocation requires five steps. First, a target
reference may be computed. Second, the argument expressions are
evaluated. Third, the accessibility of the method to be invoked is
checked. Fourth, the actual code for the method to be executed is
located. Fifth, a new activation frame is created, synchronization is
performed if necessary, and control is transferred to the method code.
The case you presented where an argument is only evaluated after control has been transferred to the method code is out of the question.
I have a getter that returns a String and I am comparing it to some other String. I check the returned value for null so my ifstatement looks like this (and I really do exit early if it is true)
if (someObject.getFoo() != null && someObject.getFoo().equals(someOtherString)) {
return;
}
Performancewise, would it be better to store the returned String rather than calling the getter twice like this? Does it even matter?
String foo = someObject.getFoo();
if (foo != null && foo.equals(someOtherString)) {
return;
}
To answer questions from the comments, this check is not performed very often and the getter is fairly simple. I am mostly curious how allocating a new local variable compares to executing the getter an additional time.
It depends entirely on what the getter does. If it's a simple getter (retrieving a data member), then the JVM will be able to inline it on-the-fly if it determines that code is a hot spot for performance. This is actually why Oracle/Sun's JVM is called "HotSpot". :-) It will apply aggressive JIT optimization where it sees that it needs it (when it can). If the getter does something complex, though, naturally it could be slower to use it and have it repeat that work.
If the code isn't a hot spot, of course, you don't care whether there's a difference in performance.
Someone once told me that the inlined getter can sometimes be faster than the value cached to a local variable, but I've never proven that to myself and don't know the theory behind why it would be the case.
Use the second block. The first block will most likely get optimized to the second anyway, and the second is more readable. But the main reason is that, if someObject is ever accessed by other threads, and if the optimization somehow gets disabled, the first block will throw no end of NullPointerException exceptions.
Also: even without multi-threading, if someObject is by any chance made volatile, the optimization will disappear. (Bad for performance, and, of course, really bad with multiple threads.) And lastly, the second block will make using a debugger easier (not that that would ever be necessary.)
You can omit the first null check since equals does that for you:
The result is true if and only if the argument is not null and is a String object that represents the same sequence of characters as this object.
So the best solution is simply:
if(someOtherString.equals(someObject.getFoo())
They both look same,even Performance wise.Use the 1st block if you are sure you won't be using the returned value further,if not,use 2nd block.
I prefer the second code block because it assigns foo and then foo cannot change to null/notnull.
Null are often required and Java should solve this by using the 'Elvis' operator:
if (someObject.getFoo()?.equals(someOtherString)) {
return;
}