Why does Java have an "unreachable statement" compiler error? - java

I often find when debugging a program it is convenient, (although arguably bad practice) to insert a return statement inside a block of code. I might try something like this in Java ....
class Test {
public static void main(String args[]) {
System.out.println("hello world");
return;
System.out.println("i think this line might cause a problem");
}
}
of course, this would yield the compiler error.
Test.java:7: unreachable statement
I could understand why a warning might be justified as having unused code is bad practice. But I don't understand why this needs to generate an error.
Is this just Java trying to be a Nanny, or is there a good reason to make this a compiler error?

Because unreachable code is meaningless to the compiler. Whilst making code meaningful to people is both paramount and harder than making it meaningful to a compiler, the compiler is the essential consumer of code. The designers of Java take the viewpoint that code that is not meaningful to the compiler is an error. Their stance is that if you have some unreachable code, you have made a mistake that needs to be fixed.
There is a similar question here: Unreachable code: error or warning?, in which the author says "Personally I strongly feel it should be an error: if the programmer writes a piece of code, it should always be with the intention of actually running it in some scenario." Obviously the language designers of Java agree.
Whether unreachable code should prevent compilation is a question on which there will never be consensus. But this is why the Java designers did it.
A number of people in comments point out that there are many classes of unreachable code Java doesn't prevent compiling. If I understand the consequences of Gödel correctly, no compiler can possibly catch all classes of unreachable code.
Unit tests cannot catch every single bug. We don't use this as an argument against their value. Likewise a compiler can't catch all problematic code, but it is still valuable for it to prevent compilation of bad code when it can.
The Java language designers consider unreachable code an error. So preventing it compiling when possible is reasonable.
(Before you downvote: the question is not whether or not Java should have an unreachable statement compiler error. The question is why Java has an unreachable statement compiler error. Don't downvote me just because you think Java made the wrong design decision.)

There is no definitive reason why unreachable statements must be not be allowed; other languages allow them without problems. For your specific need, this is the usual trick:
if (true) return;
It looks nonsensical, anyone who reads the code will guess that it must have been done deliberately, not a careless mistake of leaving the rest of statements unreachable.
Java has a little bit support for "conditional compilation"
http://java.sun.com/docs/books/jls/third_edition/html/statements.html#14.21
if (false) { x=3; }
does not result in a compile-time
error. An optimizing compiler may
realize that the statement x=3; will
never be executed and may choose to
omit the code for that statement from
the generated class file, but the
statement x=3; is not regarded as
"unreachable" in the technical sense
specified here.
The rationale for this differing
treatment is to allow programmers to
define "flag variables" such as:
static final boolean DEBUG = false;
and then write code such as:
if (DEBUG) { x=3; }
The idea is that it should be possible
to change the value of DEBUG from
false to true or from true to false
and then compile the code correctly
with no other changes to the program
text.

It is Nanny.
I feel .Net got this one right - it raises a warning for unreachable code, but not an error. It is good to be warned about it, but I see no reason to prevent compilation (especially during debugging sessions where it is nice to throw a return in to bypass some code).

I only just noticed this question, and wanted to add my $.02 to this.
In case of Java, this is not actually an option. The "unreachable code" error doesn't come from the fact that JVM developers thought to protect developers from anything, or be extra vigilant, but from the requirements of the JVM specification.
Both Java compiler, and JVM, use what is called "stack maps" - a definite information about all of the items on the stack, as allocated for the current method. The type of each and every slot of the stack must be known, so that a JVM instruction doesn't mistreat item of one type for another type. This is mostly important for preventing having a numeric value ever being used as a pointer. It's possible, using Java assembly, to try to push/store a number, but then pop/load an object reference. However, JVM will reject this code during class validation,- that is when stack maps are being created and tested for consistency.
To verify the stack maps, the VM has to walk through all the code paths that exist in a method, and make sure that no matter which code path will ever be executed, the stack data for every instruction agrees with what any previous code has pushed/stored in the stack. So, in simple case of:
Object a;
if (something) { a = new Object(); } else { a = new String(); }
System.out.println(a);
at line 3, JVM will check that both branches of 'if' have only stored into a (which is just local var#0) something that is compatible with Object (since that's how code from line 3 and on will treat local var#0).
When compiler gets to an unreachable code, it doesn't quite know what state the stack might be at that point, so it can't verify its state. It can't quite compile the code anymore at that point, as it can't keep track of local variables either, so instead of leaving this ambiguity in the class file, it produces a fatal error.
Of course a simple condition like if (1<2) will fool it, but it's not really fooling - it's giving it a potential branch that can lead to the code, and at least both the compiler and the VM can determine, how the stack items can be used from there on.
P.S. I don't know what .NET does in this case, but I believe it will fail compilation as well. This normally will not be a problem for any machine code compilers (C, C++, Obj-C, etc.)

One of the goals of compilers is to rule out classes of errors. Some unreachable code is there by accident, it's nice that javac rules out that class of error at compile time.
For every rule that catches erroneous code, someone will want the compiler to accept it because they know what they're doing. That's the penalty of compiler checking, and getting the balance right is one of the tricker points of language design. Even with the strictest checking there's still an infinite number of programs that can be written, so things can't be that bad.

While I think this compiler error is a good thing, there is a way you can work around it.
Use a condition you know will be true:
public void myMethod(){
someCodeHere();
if(1 < 2) return; // compiler isn't smart enough to complain about this
moreCodeHere();
}
The compiler is not smart enough to complain about that.

It is certainly a good thing to complain the more stringent the compiler is the better, as far as it allows you to do what you need.
Usually the small price to pay is to comment the code out, the gain is that when you compile your code works. A general example is Haskell about which people screams until they realize that their test/debugging is main test only and short one. I personally in Java do almost no debugging while being ( in fact on purpose) not attentive.

If the reason for allowing if (aBooleanVariable) return; someMoreCode; is to allow flags, then the fact that if (true) return; someMoreCode; does not generate a compile time error seems like inconsistency in the policy of generating CodeNotReachable exception, since the compiler 'knows' that true is not a flag (not a variable).
Two other ways which might be interesting, but don't apply to switching off part of a method's code as well as if (true) return:
Now, instead of saying if (true) return; you might want to say assert false and add -ea OR -ea package OR -ea className to the jvm arguments. The good point is that this allows for some granularity and requires adding an extra parameter to the jvm invocation so there is no need of setting a DEBUG flag in the code, but by added argument at runtime, which is useful when the target is not the developer machine and recompiling & transferring bytecode takes time.
There is also the System.exit(0) way, but this might be an overkill, if you put it in Java in a JSP then it will terminate the server.
Apart from that Java is by-design a 'nanny' language, I would rather use something native like C/C++ for more control.

Related

Does a Java compiler optimize ad hoc instances of `Set.of`, `List.of`, and `Map.of` in loops as constants

Given the following code snippet in which an ad hoc instance of a Set is created, will todays Java compiler see that they do not need to create the instance for each loop pass and optimize set as a kind of final constant such that the instance is the same for the entire loop? The question applies similarly to while and do while loops.
for (/* a loop that iterates quite often */)
{
var set = Set.of("foo", "blah", "baz");
// Do something with the set.
}
I would also be interested whether such an optimization is done already at compile-time (meaning that the byte code is already optimized) or whether there exist runtime optimizations (made by the just-in-time compiler) that essentially achieve the same.
If a compiler does not see this, the only option seems to instantiate the set outside the loop in order to be optimal:
var set = Set.of("foo", "blah", "baz");
for (/* a loop that iterates quite often */)
{
// Do something with the set.
}
Side note, for those finding the first snippet rather bad practice and that would write it like the second snippet anyway: This snippet is actually a simplified variant of my real use case, made for simplicity reasons of the question. The real, slightly more complex use case that brought me to this question is the following, where a string needs to be checked whether it is one of a very few strings:
for (/* a loop that iterates quite often */)
{
// ...
var testString = // some String to check ...
if (Set.of("foo", "blah", "baz").contains(testString))
{
// Do something.
}
// ...
}
Assuming that the if condition is surrounded by additional code (i.e., the loop body is rather large), I think one wants to declare the set inline rather than further away outside the loop.
The answer will inevitably depend on the Java tools that you use.
The javac bytecode compiler is (deliberately) naive, and I would not expect it to do any optimization of this. The JVM's JIT compiler may optimize this, but that will depend on:
your Java version,
the JIT compiler tier used
what /* do something with the set */ actually entails.
If this really matters to you, I would advise a couple of approaches.
Rewrite the code yourself to hoist the Set declaration out of the loop. IMO, there is nothing fundamentally wrong with this approach, though it is potentially a premature optimization. (I'd probably do this myself in the examples you gave. It just "feels right" to me.)
Use the JVM options for dumping out the native code produced by the JIT compiler and see what it actually does. But beware that the results are liable to vary: see above.
But even if you know with a good degree of certainty how the compilers will do, it is not clear that you should be worrying about this in general.
Note looking at the bytecodes with javap -c will tell you if the java compiler is doing any optimization. (Or more likely, confirm that it isn't.)

when we use return statement inside if block why compiler not giving any error? [duplicate]

This question already has answers here:
Why does Java have an "unreachable statement" compiler error?
(8 answers)
Closed 6 years ago.
The following code gives an unreachable statement compiler error
public static void main(String[] args) {
return;
System.out.println("unreachable");
}
Sometimes for testing purposes a want to prevent a method from being called, so a quick way to do it (instead of commenting it out everywhere it's used) is to return immediately from the method so that the method does nothing. What I then always do to get arround the compiler error is this
public static void main(String[] args) {
if (true) {
return;
}
System.out.println("unreachable");
}
I'm just curious, why is it a compiler error?? Will it break the Java bytecode somehow, is it to protect the programmer or is it something else?
Also (and this to me is more interesting), if compiling java to bytecode does any kind of optimization (or even if it doesn't) then why won't it detect the blatant unreachable code in the second example? What would the compiler pseudo code be for checking if a statement is unreachable?
Unreachable code is meaningless, so the compile-time error is helpful. The reason why it won’t be detected at the second example is, like you expect, for testing / debugging purposes. It’s explained in The Specification:
if (false) { x=3; }
does not result in a compile-time error. An optimizing compiler may
realize that the statement x=3; will never be executed and may choose
to omit the code for that statement from the generated class file, but
the statement x=3; is not regarded as "unreachable" in the technical
sense specified here.
The rationale for this differing treatment is to allow programmers to
define "flag variables" such as:
static final boolean DEBUG = false;
and then write code such as:
if (DEBUG) { x=3; }
The idea is that it should be possible to change the value of DEBUG
from false to true or from true to false and then compile the code
correctly with no other changes to the program text.
Reference: http://docs.oracle.com/javase/specs/jls/se8/html/jls-14.html#jls-14.21
Its because the compiler writer assumed that the human at the controls is dumb, and probably didn't mean to add code that would never be executed - so by throwing an error, it attempts to prevent you from inadvertently creating a code path that cannot be executed - instead forcing you to make a decision about it (even though, as you have proven, you still can work around it).
This error is mainly there to prevent programmer errors (a swap of 2 lines or more). In the second snippet, you make it clear that you don't care about the system.out.println().
Will it break the Java bytecode somehow, is it to protect the programmer or is it something else?
This is not required as far as Java/JVM is concerned. The sole purpose of this compilation error is to avoid silly programmer mistakes. Consider the following JavaScript code:
function f() {
return
{
answer: 42
}
}
This function returns undefined as the JavaScript engine adds semicolon at the end of the line and ignores dead-code (as it thinks). Java compiler is more clever and when it discoveres you are doing something clearly and obviously wrong, it won't let you do this. There is no way on earth you intended to have dead-code. This somehow fits into the Java premise of being a safe language.
http://docs.oracle.com/javase/specs/jls/se8/html/jls-14.html#jls-14.21
says:
14.21. Unreachable Statements
It is a compile-time error if a statement cannot be executed because it is unreachable.
Example 1:
In this case you are return before any statement because of it compiler never going to execute that code.
public static void main(String[] args) {
return;
System.out.println("unreachable");
}
In second code I have put the statement above of return and its work now :)
Example 2:
public static void main(String[] args) {
System.out.println("unreachable"); // Put the statement before return
return;
}
The reason behind this is that if you return sometime then code after it never going to execute because you already return the function data and as so it is shown unreachable code.
It's because it's a waste of resources for it to even be there. Also, the compiler designers don't want to assume what they can strip out, but would rather force you to remove either the code that makes it unreachable or the unreachable code itself. They don't know what is supposed to be there. There's a difference between the optimizations where they tweak your code to be a bit more efficient when it's compiled down to machine code and blatantly just removing code "you didn't need."

How to tell Java that a variable cannot possibly be null?

I have a program that basically looks like this:
boolean[] stuffNThings;
int state=1;
for(String string:list){
switch(state){
case 1:
if(/*condition*/){
// foo
break;
}else{
stuffNThings=new boolean[/*size*/];
state=2;
}
// intentional fallthrough
case 2:
// bar
stuffNThings[0]=true;
}
}
As you, a human, can see, case 2 only ever happens when there was previously a state 1 and it switched to state 2 after initialising the array. But Eclipse and the Java compiler don't see this, because it looks like pretty complex logic to them. So Eclipse complains:
The local variable stuffNThings may not have been initialized."
And if I change "boolean[] stuffNThings;" to "boolean[] stuffNThings=null;", it switches to this error message:
Potential null pointer access: The variable stuffNThings may be null at this location.
I also can't initialise it at the top, because the size of the array is only determined after the final loop in state 1.
Java thinks that the array could be null there, but I know that it can't. Is there some way to tell Java this? Or am I definitely forced to put a useless null check around it? Adding that makes the code harder to understand, because it looks like there may be a case where the value doesn't actually get set to true.
Java thinks that the array could be null there, but I know that it can't.
Strictly speaking, Java thinks that the variable could be uninitialized. If it is not definitely initialized, the value should not be observable.
(Whether the variable is silently initialized to null or left in an indeterminate state is an implementation detail. The point is, the language says you shouldn't be allowed to see the value.)
But anyway, the solution is to initialize it to null. It is redundant, but there is no way to tell Java to "just trust me, it will be initialized".
In the variations where you are getting "Potential null pointer access" messages:
It is a warning, not an error.
You can ignore or suppress a warning. (If your correctness analysis is wrong then you may get NPE's as a result. But that's your choice.)
You can turn off some or all warnings with compiler switches.
You can suppress a specific warning with a #SuppressWarnings annotation:
For Eclipse, use #SuppressWarnings("null").
For Android, use #SuppressWarnings("ConstantConditions").
Unfortunately, the warning tags are not fully standardized. However, a compiler should silently ignore a #SuppressWarnings for a warning tag that it doesn't recognize.
You may be able to restructure the code.
In your example, the code is using switch drop through. People seldom do that because it leads to code that is hard to understand. So, I'm not surprised that you can find edge-case examples involving drop-through where a compiler gets the NPE warnings a bit wrong.
Either way, you can easily avoid the need to do drop-through by restructuring your code. Copy the code in the case 2: case to the end of the case 1: case. Fixed. Move on.
Note the "possibly uninitialized" error is not the Java compiler being "stupid". There is a whole chapter of the JLS on the rules for definite assignment, etcetera. A Java compiler is not permitted to be smart about it, because that would mean that the same Java code would be legal or not legal, depending on the compiler implementation. That would be bad for code portability.
What we actually have here is a language design compromise. The language stops you from using variables that are (really) not initialized. But to do this, the "dumb" compiler must sometimes stop you using variables that you (the smart programmer) know will be initialized ... because the rules say that it should.
(The alternatives are worse: either no compile-time checks for uninitialized variables leading to hard crashes in unpredictable places, or checks that are different for different compilers.)
A distinct non-answer: when code is "so" complicated that an IDE / java compiler doesn't "see it", then that is a good indication that your code is too complicated anyway. At least for me, it wasn't obvious what you said. I had to read up and down repeatedly to convince myself that the statement given in the question is correct.
You have an if in a switch in a for. Clean code, and "single layer of abstraction" would tell you: not a good starting point.
Look at your code. What you have there is a state machine in disguise. Ask yourself whether it would be worth to refactor this on larger scale, for example by turning it into an explicit state machine of some sort.
Another less intrusive idea: use a List instead of an array. Then you can simply create an empty list, and add elements to that as needed.
After just trying to execute the code regardless of Eclipse complaining, I noticed that it does indeed run without problems. So apparently it was just a warning being set to "error" level, despite not being critical.
There was a "configure problem severity" button, so I set the severity of "Potential null pointer access" to "warning" (and adjusted some other levels accordingly). Now Eclipse just marks it as warning and executes the code without complaining.
More understandable would be:
boolean[] stuffNThings;
boolean initialized = false;
for (String string: list) {
if (!initialized) {
if (!/*condition*/) {
stuffNThings = new boolean[/*size*/];
initailized = true;
}
}
if (initialized) {
// bar
stuffNThings[0] = true;
}
}
Two loops, one for the initialisation, and one for playing with the stuff might or might not be more clear.
It is easier on flow analysis (compared to a switch with fall-through).
Furthermore instead of a boolean[] a BitSet might used too (as it is not fixed sized as an array).
BitSet stuffNThings = new BitSet(/*max size*/);

Memory/Performance differences of declaring variable for return result of method call versus inline method call

Are there any performance or memory differences between the two snippets below? I tried to profile them using visualvm (is that even the right tool for the job?) but didn't notice a difference, probably due to the code not really doing anything.
Does the compiler optimize both snippets down to the same bytecode? Is one preferable over the other for style reasons?
boolean valid = loadConfig();
if (valid) {
// OK
} else {
// Problem
}
versus
if (loadConfig()) {
// OK
} else {
// Problem
}
The real answer here: it doesn't even matter so much what javap will tell you how the corresponding bytecode looks like!
If that piece of code is executed like "once"; then the difference between those two options would be in the range of nanoseconds (if at all).
If that piece of code is executed like "zillions of times" (often enough to "matter"); then the JIT will kick in. And the JIT will optimize that bytecode into machine code; very much dependent on a lot of information gathered by the JIT at runtime.
Long story short: you are spending time on a detail so subtle that it doesn't matter in practical reality.
What matters in practical reality: the quality of your source code. In that sense: pick that option that "reads" the best; given your context.
Given the comment: I think in the end, this is (almost) a pure style question. Using the first way it might be easier to trace information (assuming the variable isn't boolean, but more complex). In that sense: there is no "inherently" better version. Of course: option 2 comes with one line less; uses one variable less; and typically: when one option is as readable as another; and one of the two is shorter ... then I would prefer the shorter version.
If you are going to use the variable only once then the compiler/optimizer will resolve the explicit declaration.
Another thing is the code quality. There is a very similar rule in sonarqube that describes this case too:
Local Variables should not be declared and then immediately returned or thrown
Declaring a variable only to immediately return or throw it is a bad practice.
Some developers argue that the practice improves code readability, because it enables them to explicitly name what is being returned. However, this variable is an internal implementation detail that is not exposed to the callers of the method. The method name should be sufficient for callers to know exactly what will be returned.
https://jira.sonarsource.com/browse/RSPEC-1488

Why does the Java compiler only report one kind of error at a time?

I've a snippet
class T{
int y;
public static void main(String... s){
int x;
System.out.println(x);
System.out.println(y);
}
}
Here there are two error, but on compilation why only one error is shown?
The error shown is:
non-static variable y cannot be referenced from a static context
System.out.println(y);
^
But what about the error
variable x might not have been initialized
System.out.println(x);
^
The Java compiler compiles your code in several passes. In each pass, certain kinds of errors are detected. In your example, javac doesn't look to see whether x may be initialised or not, until the rest of the code actually passes the previous compiler pass.
#Greg Hewgill has nailed it.
In particular, the checks for variables being initialized, exceptions being declared, unreachable code and a few other things occur in a later pass. This pass doesn't run if there were errors in earlier passes.
And there's a good reason for that. The earlier passes construct a decorated parse tree that represent the program as the compiler understands it. If there were errors earlier on, that tree will not be an accurate representation of the program as the developer understands it. (It can't be!). If the compiler then were to go on to run the later pass to produce more error messages, the chances are that a lot of those error messages would be misleading artifacts of the incorrect parse tree. This would only confuse the developer.
Anyway, that's the way that most compilers for most programming languages work. Fixing some compilation errors can cause other (previously unreported) errors to surface.
The Dragon Book ("Compilers: Principles, Techniques, and Tools" by Aho, Sethi, and Ullman) describe several methods that compiler writers can employ to try to improve error detection and reports when given input files that don't conform to the language specification.
Some of the techniques they give:
Panic mode recovery: skip all input tokens in the input stream until you have found a "synchronizing token" -- think of ;, }, or end statement and block terminators or separators, or do, while, for, function, etc. keywords that can show intended starts of new code blocks, etc. Hopefully the input from that point forward will make enough sense to parse and return useful error messages.
Phrase level recovery: guess at what a phrase might have meant: insert semicolons, change commas to semicolons, insert = assignment operators, etc., and hope the result makes enough sense to parse and return useful error messages.
Error productions: in addition to the "legitimate" productions that recognize allowed grammar operators, include "error productions" that recognize mistakes in the language that would be against the grammar, but that the compiler authors recognize as very likely mistakes. The error messages for these can be very good, but can drastically grow the size of the parser to optimize for mistakes in input (which should be the exception, not the common case).
Global correction: try to make minimal modifications to the input string to try to bring it back to a program that can be parsed. Since the corrections can be made in many different ways (inserting, changing, and deleting any number of characters), try to generate them all and discover the "least cost" change that makes the program parse well enough to continue parsing.
These options are presenting in the chapter on parsing grammars (page 161 in my edition) -- obviously, some errors are only discovered after the input file has been parsed, but is beginning to be converted into basic blocks for optimization and code generation. Any errors that occur in the earlier syntactic levels will prevent the typical compiler from starting the code optimization and generation phases, and any errors that might be detected in these phases must wait for fixed input before they can be run.
I strongly recommend finding a copy of The Dragon Book, it'll give you some sympathy and hopefully new-found respect for our friendly compiler authors.
The whole point of an error is that the compiler doesn't know how to interpret your code correctly. That's why you should always ignore errors after the first one, because the compiler's confusion may result in extra/missing errors.

Categories

Resources