Is this a bug? (recursive constructors in Java)

Is this a bug? (recursive constructors in Java) - java

I've been playing with recursive constructors in Java. The following class is accepted by the compiler two examples of recursive constructors in Java. It crashes with a StackOverflowError at runtime using java 1.7.0_25 and Eclipse Juno (Version: Juno Service Release 2 Build id: 20130225-0426).
class MyList<X> {
public X hd;
public MyList<X> tl;
public MyList(){
this.hd = null;
this.tl = new MyList<X>();
}
}
The error message makes sense, but I'm wondering if the compiler should catch it. A counterexample might be a list of integers with a constructor that takes an int as an argument and sets this.tl to null if the argument is less than zero. This seems reasonable to allow in the same way that recursive methods are allowed, but on the other hand I think constructors ought to terminate. Should a constructor be allowed to call itself?
So I'm asking a higher authority before submitting a Java bug report.
EDIT: I'm advocating for a simple check, like prohibiting a constructor from calling itself or whatever the Java developers did to address https://bugs.openjdk.java.net/browse/JDK-1229458. A wilder solution would be to check that the arguments to recursive constructor calls are decreasing with respect to some well-founded relation, but the point of the question is not "should Java determine whether all constructors terminate?" but rather "should Java use a stronger heuristic when compiling constructors?".

You could even have several constructors with different parameters, calling each other wiht this(...). In general, by computer science, a termination of code can not always be guaranteed. Some intelligence, like in this simple case, would be nice to have, but one may not require a compiler error. A bit like unreachable code. There is no difference between a constructor or normal method in my eyes however.

I wouldn't see any reason why a constructor should more need to terminate than any other kind of function. But, as with any other kind of function, the compiler cannot infer in the general case whether such function ever terminates (halting problem).
Now whether there's generally much need for a recursive constructor is debatable, but it certainly is not a bug, unless the Java specification would explicitly state that recursive constructor calls must result in an error.
And finally, it's important to differentiate between recursive calls to constructor(s) of the same object, which is a common pattern for instance to overcome the lack of default parameters, and calling the constructor of the same class to create another object, as done in your example.

Although this specific situation seems quite obvious, determining whether or not code terminates is an impossible question to answer.
If you try to configure compiler warnings for infinite recursion, you run into the Halting Problem:
"Given a description of an arbitrary computer program, decide whether
the program finishes running or continues to run forever."
Alan Turing proved in 1936 that a general algorithm to solve the
halting problem for all possible program-input pairs cannot exist.

Related

Calling method twice in same block. Why?

I'm reading a blog post and trying to understand what's going on.
This is the blogpost.
it has this code:
if (validation().hasErrors())
throw new IllegalArgumentException(validation().errorMessage());
In the validation() method we have some object initialization and calculations so let' say it's an expensive call. Is it going to be executed twice? Or will it be optimized by the compiler to be something like this?
var validation = validation();
if (validation.hasErrors())
throw new IllegalArgumentException(validation.errorMessage());
Thanks!

The validation method will be called twice, and it will do the same work each time. First, the method is relatively big, and so it won't get inlined. Without being inlined, the compiler doesn't know what it does. Therefore, it safely assumes that the method has side effects, and so it cannot optimize away the second call.
Even if the method was inlined, and the compiler could examine it, it would see that there are in fact side effects. Calling LocalDate.now() returns a different result each time. For this reason, the code that you linked to is defective, although it's not likely to experience a problem in practice.
It's safer to capture the validation result in a local variable not for performance reasons, but for stability reasons. Imagine the odd case in which the initial validation call fails, but the second call passes. You'd then throw an exception with no message in it.

The Java to Bytecode compiler has a limited set of optimization techniques (e.g. 9*9 in the condition would turn into 81).
The real optimization happens by the JIT (Just In Time) compiler. This compiler is the result of over a decade and a half of extensive research and there is no simple answer to tell what it is capable of in every scenario.
With that being said, as a good practice, I always handle repetitive identical method calls by storing their result before approaching any loop structure where that result is needed. Example:
int[] grades = new int[500];
int countOfGrades = arr.length;
for (int i = 0; i < countOfGrades; i++) {
// Some code here
}
For your code (which is only run twice), you shouldn't worry as much about such optimization. But if you're looking for the ultimate – guaranteed – optimization on the account of a fraction of space (which is cheap), then you're better off using a variable to store any identical method result when needed more than once:
var validation = validation();
if (validation.hasErrors())
throw new IllegalArgumentException(validation.errorMessage());

However, I must simply question ... "these days," does it even actually matter anymore? Simply write the source-code "in the most obvious manner available," as the original programmer certainly did.
"Microseconds" really don't matter anymore. But, "clarity still does." To me, the first version of the code is frankly more understandable than the second, and "that's what matters to me most." Please don't bother to try to "out-smart" the compiler, if it results in source-code that is in any way harder to understand.

Memory/Performance differences of declaring variable for return result of method call versus inline method call

Are there any performance or memory differences between the two snippets below? I tried to profile them using visualvm (is that even the right tool for the job?) but didn't notice a difference, probably due to the code not really doing anything.
Does the compiler optimize both snippets down to the same bytecode? Is one preferable over the other for style reasons?
boolean valid = loadConfig();
if (valid) {
// OK
} else {
// Problem
}
versus
if (loadConfig()) {
// OK
} else {
// Problem
}

The real answer here: it doesn't even matter so much what javap will tell you how the corresponding bytecode looks like!
If that piece of code is executed like "once"; then the difference between those two options would be in the range of nanoseconds (if at all).
If that piece of code is executed like "zillions of times" (often enough to "matter"); then the JIT will kick in. And the JIT will optimize that bytecode into machine code; very much dependent on a lot of information gathered by the JIT at runtime.
Long story short: you are spending time on a detail so subtle that it doesn't matter in practical reality.
What matters in practical reality: the quality of your source code. In that sense: pick that option that "reads" the best; given your context.
Given the comment: I think in the end, this is (almost) a pure style question. Using the first way it might be easier to trace information (assuming the variable isn't boolean, but more complex). In that sense: there is no "inherently" better version. Of course: option 2 comes with one line less; uses one variable less; and typically: when one option is as readable as another; and one of the two is shorter ... then I would prefer the shorter version.

If you are going to use the variable only once then the compiler/optimizer will resolve the explicit declaration.
Another thing is the code quality. There is a very similar rule in sonarqube that describes this case too:
Local Variables should not be declared and then immediately returned or thrown
Declaring a variable only to immediately return or throw it is a bad practice.
Some developers argue that the practice improves code readability, because it enables them to explicitly name what is being returned. However, this variable is an internal implementation detail that is not exposed to the callers of the method. The method name should be sufficient for callers to know exactly what will be returned.
https://jira.sonarsource.com/browse/RSPEC-1488

Where to patch back the information gathered during program analysis

I'm new to compiler design and have few years with java.
Using this and the paper
It's look like after Class hierarchy analysis and rapid type analysis will get information to do de-virtualisation. But where to patch back the information on source code or on Byte-code. And how to check the results?
Trying to understand how things really happens but stuck here.
For example : We have an example program taken from paper specified above.
public class MyProgram {
public static void main(String[] args) {
EUCitizen citizen = getCitizen();
citizen.hasRightToVote(); // Call site 1
Estonian estonian = getEstonian();
estonian.hasRightToVote(); // Call site 2
}
private static EUCitizen getCitizen() {
return new Estonian();
}
private static Estonian getEstonian() {
return new Estonian();
}
}
Using Class hieracrchy method we can conclude as none of the subclasses override hasRightToVote() , the dynamic method invocation can be replaced with a static procedure call to Estonian#hasRightToVote() . But where to replace this information and How? How to tell JVM (feed JVM) that information that we have gathered during analysis.
You can't change source code and put this there ? Could anyone provide me an example so i can start trying new ways to do analysis and still be able to patch that information.
Thanks.

Class Hierarchy Analysis is an optimization done by the virtual machine itself at runtime, you do not have to tell the VM anything. It simply does the analysis by itself based on the information available in the class files.

What generally happens is that analysis results are typically stored as some kind of association with a program representation, or are used immediately to effect the optimization so "nothing" needs to be stored.
You are right: there is generally no "good" way to annotate the source code with an analysis result (you can use Java annotations as a way). But the compiler has already read the source code and isn't going read it again.
In general, the program is parsed and variety of compiler-like structures are built (ASTs, symbol tables, control flow graphs, data flow arcs, ...) by the compiler pretty much before any serious analysis/optimization begins. A low level model of the program (data flow over the operators) is normally what gets analyzed, and the optimization analyzer will either decorate this structure with its opinions, or often just directly modify this structure to achieve the effect of the optimization.
With Java, there are two opportunities to do this: in JavaC, and in the JITter. My understanding (probably wrong, probably varies across JavaC implementations) is that not much optimization occurs in JavaC at all; it just generates naive JVM bytecode, and that all the real work is done in the JITter. The JITter doesn't have source code, but it can do all the same kinds of analysis (control flow, dataflow, ...) on the byte code that one can do on classic compiler structures, and thus achieve the same effect.

I had some doubts with the same and Rohan Padhey Cleared the ones.
In Java, I don't think there is a way to specify monomophrism of virtual method calls in byte-code. The de-virtualization analysis usually happens in the JIT compiler which compiles bytecode to native code and it does so using dynamic analysis.
Why Patching is a Problem :
In Java bytecode, the only method call instructions are: invokestatic, invokedynamic, invokevirtual, invokeinterface and invokespecial (the last is used for constructors, etc). The only type of call that does not refer to virtual method table lookups is the invokestatic call, since static methods cannot be overridden and used polymorphically on objects.
Hence, while there is no way to do a compile-time specification of the target method, you can replace virtual calls with static calls. How? consider an object "x" with a method "foo", and a call-site:
x.foo(arg1, arg2, ...)
If you know for sure that "x" is of the class "A", then you can transform this to:
A.static_foo(x, arg1, arg2, ...)
where "static_foo" is a newly created static method in class A whose body contains exactly everything that the body of "foo()" in "A" would have done, except that references to "this" inside the body should now be replaced by the first parameter, whatever you may call it.
That is exactly what the Whole-Jimple-Optimization-Pack (WJOP) in Soot does.
As regards static analysis using Soot, there is an optimization pack that does devirtualization using a work-around: https://github.com/Sable/soot/wiki/Whole-program-Devirtualization-Optimizations
But That's just a hack.
Why JIT Times Its Better :
JIT doing this better is due to the fact that static analysis has to be sound because you need to be sure when doing this transformation that 100% of the time the target of the virtual call will be one class. With JIT compilation, you can find more opportunities for optimization because even if the target is a single class 90% of the time, but not 10%, you can just-in-time compile the code to use the most-frequently taken route, and fall-back to using bytecode in the 10% of the cases where this prediction was wrong, because you can check this mistake dynamically. While the fall-back is expensive, the common-case of correct predictions 90% of the time leads to overall benefit. With static transformation, you have to make a decision of whether or not to optimize and it better be sound.

Why does Java have an "unreachable statement" compiler error?

I often find when debugging a program it is convenient, (although arguably bad practice) to insert a return statement inside a block of code. I might try something like this in Java ....
class Test {
public static void main(String args[]) {
System.out.println("hello world");
return;
System.out.println("i think this line might cause a problem");
}
}
of course, this would yield the compiler error.
Test.java:7: unreachable statement
I could understand why a warning might be justified as having unused code is bad practice. But I don't understand why this needs to generate an error.
Is this just Java trying to be a Nanny, or is there a good reason to make this a compiler error?

Because unreachable code is meaningless to the compiler. Whilst making code meaningful to people is both paramount and harder than making it meaningful to a compiler, the compiler is the essential consumer of code. The designers of Java take the viewpoint that code that is not meaningful to the compiler is an error. Their stance is that if you have some unreachable code, you have made a mistake that needs to be fixed.
There is a similar question here: Unreachable code: error or warning?, in which the author says "Personally I strongly feel it should be an error: if the programmer writes a piece of code, it should always be with the intention of actually running it in some scenario." Obviously the language designers of Java agree.
Whether unreachable code should prevent compilation is a question on which there will never be consensus. But this is why the Java designers did it.
A number of people in comments point out that there are many classes of unreachable code Java doesn't prevent compiling. If I understand the consequences of Gödel correctly, no compiler can possibly catch all classes of unreachable code.
Unit tests cannot catch every single bug. We don't use this as an argument against their value. Likewise a compiler can't catch all problematic code, but it is still valuable for it to prevent compilation of bad code when it can.
The Java language designers consider unreachable code an error. So preventing it compiling when possible is reasonable.
(Before you downvote: the question is not whether or not Java should have an unreachable statement compiler error. The question is why Java has an unreachable statement compiler error. Don't downvote me just because you think Java made the wrong design decision.)

There is no definitive reason why unreachable statements must be not be allowed; other languages allow them without problems. For your specific need, this is the usual trick:
if (true) return;
It looks nonsensical, anyone who reads the code will guess that it must have been done deliberately, not a careless mistake of leaving the rest of statements unreachable.
Java has a little bit support for "conditional compilation"
http://java.sun.com/docs/books/jls/third_edition/html/statements.html#14.21
if (false) { x=3; }
does not result in a compile-time
error. An optimizing compiler may
realize that the statement x=3; will
never be executed and may choose to
omit the code for that statement from
the generated class file, but the
statement x=3; is not regarded as
"unreachable" in the technical sense
specified here.
The rationale for this differing
treatment is to allow programmers to
define "flag variables" such as:
static final boolean DEBUG = false;
and then write code such as:
if (DEBUG) { x=3; }
The idea is that it should be possible
to change the value of DEBUG from
false to true or from true to false
and then compile the code correctly
with no other changes to the program
text.

It is Nanny.
I feel .Net got this one right - it raises a warning for unreachable code, but not an error. It is good to be warned about it, but I see no reason to prevent compilation (especially during debugging sessions where it is nice to throw a return in to bypass some code).

I only just noticed this question, and wanted to add my $.02 to this.
In case of Java, this is not actually an option. The "unreachable code" error doesn't come from the fact that JVM developers thought to protect developers from anything, or be extra vigilant, but from the requirements of the JVM specification.
Both Java compiler, and JVM, use what is called "stack maps" - a definite information about all of the items on the stack, as allocated for the current method. The type of each and every slot of the stack must be known, so that a JVM instruction doesn't mistreat item of one type for another type. This is mostly important for preventing having a numeric value ever being used as a pointer. It's possible, using Java assembly, to try to push/store a number, but then pop/load an object reference. However, JVM will reject this code during class validation,- that is when stack maps are being created and tested for consistency.
To verify the stack maps, the VM has to walk through all the code paths that exist in a method, and make sure that no matter which code path will ever be executed, the stack data for every instruction agrees with what any previous code has pushed/stored in the stack. So, in simple case of:
Object a;
if (something) { a = new Object(); } else { a = new String(); }
System.out.println(a);
at line 3, JVM will check that both branches of 'if' have only stored into a (which is just local var#0) something that is compatible with Object (since that's how code from line 3 and on will treat local var#0).
When compiler gets to an unreachable code, it doesn't quite know what state the stack might be at that point, so it can't verify its state. It can't quite compile the code anymore at that point, as it can't keep track of local variables either, so instead of leaving this ambiguity in the class file, it produces a fatal error.
Of course a simple condition like if (1<2) will fool it, but it's not really fooling - it's giving it a potential branch that can lead to the code, and at least both the compiler and the VM can determine, how the stack items can be used from there on.
P.S. I don't know what .NET does in this case, but I believe it will fail compilation as well. This normally will not be a problem for any machine code compilers (C, C++, Obj-C, etc.)

One of the goals of compilers is to rule out classes of errors. Some unreachable code is there by accident, it's nice that javac rules out that class of error at compile time.
For every rule that catches erroneous code, someone will want the compiler to accept it because they know what they're doing. That's the penalty of compiler checking, and getting the balance right is one of the tricker points of language design. Even with the strictest checking there's still an infinite number of programs that can be written, so things can't be that bad.

While I think this compiler error is a good thing, there is a way you can work around it.
Use a condition you know will be true:
public void myMethod(){
someCodeHere();
if(1 < 2) return; // compiler isn't smart enough to complain about this
moreCodeHere();
}
The compiler is not smart enough to complain about that.

It is certainly a good thing to complain the more stringent the compiler is the better, as far as it allows you to do what you need.
Usually the small price to pay is to comment the code out, the gain is that when you compile your code works. A general example is Haskell about which people screams until they realize that their test/debugging is main test only and short one. I personally in Java do almost no debugging while being ( in fact on purpose) not attentive.

If the reason for allowing if (aBooleanVariable) return; someMoreCode; is to allow flags, then the fact that if (true) return; someMoreCode; does not generate a compile time error seems like inconsistency in the policy of generating CodeNotReachable exception, since the compiler 'knows' that true is not a flag (not a variable).
Two other ways which might be interesting, but don't apply to switching off part of a method's code as well as if (true) return:
Now, instead of saying if (true) return; you might want to say assert false and add -ea OR -ea package OR -ea className to the jvm arguments. The good point is that this allows for some granularity and requires adding an extra parameter to the jvm invocation so there is no need of setting a DEBUG flag in the code, but by added argument at runtime, which is useful when the target is not the developer machine and recompiling & transferring bytecode takes time.
There is also the System.exit(0) way, but this might be an overkill, if you put it in Java in a JSP then it will terminate the server.
Apart from that Java is by-design a 'nanny' language, I would rather use something native like C/C++ for more control.

What are the suggested alternatives for Class<T>.isAssignableFrom(Class<?> cls)?

Currently I am doing the profiling to a piece of code. During the profiling, I discovered that this very method call,
Class<T>.isAssignableFrom(Class<?> cls)
takes up to quite amount of the entire time.
Because this is a method from reflection, it takes a lot of time compared to normal keywords or method calls. I am wondering if there are some good alternatives for this method calls?

"[I]t examines the Class type
passed in through a method argument to
see if the type matches certain
qualifications."
To me, that implies that the method argument should be required to implement a particular interface or inherit from a particular class. Keep in mind, the interface could be a marker like RandomAccess. I realize changing your API may not be an option.

If you have an object whose class you are retrieving, you can replace this with:
obj instanceof ClassName
but I wouldn't say its faster. Actually, I doubt this causes any problems with the program execution. Don't overoptimize.

I don't know if this effects you, but I think it is worth noting that in the early days of Java 5, isAssignableFrom had significant performance problems that were later corrected. I couldn't find if the fix was backported to Java 5, but it certainly went into Java 6.
Additionally, the Sun JVM Performance Wiki points out that Class.isInstance and Class.isAssignableFrom are as performant as instanceof.
So if you are on Java 6 or later, there doesn't seem to be an alternative for Class.isAssignableFrom that will be faster than what is already there.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.