I'm currently writing a toy compiler targeting Java bytecode in the translation.
I would like to know if there is some kind of catalog, maybe a summary, of various simple peephole optimizations that can be made in the emitted bytecode before writing the .class file. I actually am aware of some libraries that have this functionality, but I'd like to implement that myself.
You are aware of Proguard? http://proguard.sourceforge.net/
This is a great bytecode optimizer which implements a lot of optimizations. See the FAQ for a list: http://proguard.sourceforge.net/FAQ.html
Evaluate constant expressions.
Remove unnecessary field accesses and method
calls.
Remove unnecessary branches.
Remove unnecessary comparisons and
instanceof tests.
Remove unused code
blocks.
Merge identical code blocks.
Reduce variable allocation.
Remove
write-only fields and unused method
parameters.
Inline constant fields,
method parameters, and return values.
Inline methods that are short or only
called once.
Simplify tail recursion
calls.
Merge classes and interfaces.
Make methods private, static, and
final when possible.
Make classes
static and final when possible.
Replace interfaces that have single
implementations.
Perform over 200
peephole optimizations, like
replacing ...*2 by ...<<1.
Optionally
remove logging code.
I'm sure you can further look into the source code to understand how they are implemented.
Related
Heads up: I'm writing some of this from memory so I may have some of the concepts incorrect.
Java has the ability to write an anonymous function. This is useful when you have a listener interface for some kind of event. As an example:
button.setOnClickListener(new View.OnClickListener(View v) {
#Override
public void onClick(View v) {
// handle the action here
}
});
The anonymous listener will be compiled as a class that is called something like OnClickListener$1.class. This is an underlying design decision of the Java language. Everything is an object, even anonymous functions.
This becomes an issue when you want to write a more functionally driven code base. The large amount of anonymous classes creates a large class count, which can be a problem on constrained platforms such as Android.
In Kotlin functions are much more first class from a source code point of view. My question is, does Kotlin compile these functions down to byte code more efficiently than Java does with anonymous classes or will I run into the same issues as the large class count in Java?
Thanks,
The short answer is yes, the Kotlin inline functions are quite cheap.
When an inline function call is compiled, the lambdas passed to the call get inlined into the function body, which is in turn inlined at the call site. This allows the compiler not to generate any additional classes or methods for the lambda bodies.
One of the slides about Kotlin constructs compilation by #yole.
Unfortunately, I found the record only in Russian. The other slides are also of some interest, you can find more about non-inlined lambdas there.
In general, the Kotlin code that uses inline functions with lambdas works faster than the identical Java code with lambdas or Streams. All the code binding is done at compile-time, and there is no runtime overhead of virtual method calls, nor increased methods count, which matters for Android.
The downside of excessive inlining is the code size growth: the common part of the bytecode of an inline function body gets actually duplicated at the call sites. Also, inlining complicates debugging, because the line numbers and the call stack of the code will differ from what was in the source file. Though the IDE support can help here.
I would recommend you to experiment with inline functions yourself: you can easily inspect the resulting bytecode; and, of course, do some benchmarking of your particular use cases where performance matters.
Kotlin has an inline keyword. If you use this keyword, not only does it inline the function but you can treat the lambda body as if it was just a nested scope level, so that you can return from it!
Example (straight from the docs)
fun foo() {
inlineFunction {
return // OK: the lambda is inlined
}
}
Check out the docs for more:
https://kotlinlang.org/docs/reference/inline-functions.html
Edit:
To clarify your exact question about performance, this is the first paragraph from the docs:
Using higher-order functions imposes certain runtime penalties: each function is an object, and it captures a closure, i.e. those variables that are accessed in the body of the function. Memory allocations (both for function objects and classes) and virtual calls introduce runtime overhead.
But it appears that in many cases this kind of overhead can be eliminated by inlining the lambda expressions.
So as far as I can tell yes, it will inline the function and remove any overhead that would otherwise be imposed.
However, this seems to only apply to functions you declare as inline.
TL;DR: Given bytecode, how can I find out what classes and what methods get used in a given method?
In my code, I'd like to programmatically find all classes and methods having too generous access qualifiers. This should be done based on an analysis of inheritance, static usage and also hints I provide (e.g., using some home-brew annotation like #KeepPublic). As a special case, unused classes and methods will get found.
I just did something similar though much simpler, namely adding the final keyword to all classes where it makes sense (i.e., it's allowed and the class won't get proxied by e.g., Hibernate). I did it in the form of a test, which knows about classes to be ignored (e.g., entities) and complains about all needlessly non-final classes.
For all classes of mine, I want to find all methods and classes it uses. Concerning classes, there's this answer using ASM's Remapper. Concerning methods, I've found an answer proposing instrumentation, which isn't what I want just now. I'm also not looking for a tool like ucdetector which works with Eclipse AST. How can I inspect method bodies based on bytecode? I'd like to do it myself so I can programmatically eliminate unwanted warnings (which are plentiful with ucdetector when using Lombok).
Looking at the usage on a per-method basis, i.e. by analyzing all instructions, has some pitfalls. Besides method invocations, there might be method references, which will be encoded using an invokedynamic instruction, having a handle to the target method in its bsm arguments. If the byte code hasn’t been generated from ordinary Java code (or stems from a future version), you have to be prepared to possibly encounter ldc instructions pointing to a handle which would yield a MethodHandle at runtime.
Since you already mentioned “analysis of inheritance”, I just want to point out the corner cases, i.e. for
package foo;
class A {
public void method() {}
}
class B implements bar.If {
}
package bar;
public interface If {
void method();
}
it’s easy to overlook that A.method() has to stay public.
If you stay conservative, i.e. when you can’t find out whether B instances will ever end up as targets of the If.method() invocations at other places in your application, you have to assume that it is possible, you won’t find much to optimize. I think that you need at least inlining of bridge methods and the synthetic inner/outer class accessors to identify unused members across inheritance relationships.
When it comes class references, there are indeed even more possibilities, to make a per-instruction analysis error prone. They may not only occur as owner of member access instructions, but also for new, checkcast, instanceof and array specific instructions, annotations, exception handlers and, even worse, within signatures which may occur at member references, annotations, local variable debugging hints, etc. The ldc instruction may refer to classes, producing a Class instance, which is actually used in ordinary Java code, e.g. for class literals, but as said, there’s also the theoretical possibility to produce MethodHandles which may refer to an owner class, but also have a signature bearing parameter types and a return type, or to produce a MethodType representing a signature.
You are better off analyzing the constant pool, however, that’s not offered by ASM. To be precise, a ClassReader has methods to access the pool, but they are actually not intended to be used by client code (as their documentation states). Even there, you have to be aware of pitfalls. Basically, the contents of a CONSTANT_Utf8_info bears a class or signature reference if a CONSTANT_Class_info resp. the descriptor index of a CONSTANT_NameAndType_info or a CONSTANT_MethodType_info points to it. However, declared members of a class have direct references to CONSTANT_Utf8_info pool entries to describe their signatures, see Methods and Fields. Likewise, annotations don’t follow the pattern and have direct references to CONSTANT_Utf8_info entries of the pool assigning a type or signature semantic to it, see enum_const_value and class_info_index…
How does method/field visibility impact on method inlining in Java?
The case I have in mind is something like a public getter for a private field:
private Thing blah;
public Thing getBlah() {
return blah;
}
There are several issues that arise here.
For one, does the Java compiler itself do any inlining? This question seems to be answered variously, with some saying yes and some saying no. I don't know whether that's because it used not to do any inlining but now does, or whether it's just that some people are right and some people are wrong...
Now, if it does do some inlining, am I right in thinking that it can't possibly inline getBlah() calls? They would be an obvious place for inlining to be useful, because the method is very simple, and the overhead of invoking a method is as big as the code for the method itself. But if it got inlined by the compiler, you'd end up with bytecode that accessed a private field directly; and surely the JVM would then complain? (This would apply even if this method were static final.)
Secondly, what about the JIT compiler? As far as I can see, this problem doesn't apply when it comes to inlining at that level. Once it's producing native code, the JVM has already done its checks, and confirmed that I can invoke the method (because it's public); so it can then generate native code that inlines the call, without any visibility issues... yes?
The javac compiler (and any valid java compiler) will not and can not inline getters; think about it: You could extend a class from that class and overwrite the getter. And yes if a compiler would overzealously inline that access it would not pass the verifier (well at least it should not pass the verifier, but they don't verify everything - in java 1.3 you could even make main() private and it would still work... likewise there used to be an -O option in javac that did sometimes screw your code).
The JIT is a whole other beast; it knows (well at least nowadays) at any time if there is an overwite for a method or not. Even if a class is later loaded that overwrites the getter, it can deoptimize already JIT'd methods to refelect alterations on the inhertance tree (thats one of the optimizations AOT compilers lack the information for).
Thus it can safely inline whatever it wants. It also doesn't need to artificially uphold access modfiers, because there is no such thing in the compiled machine code and again it knows what is a vaild code transformatiom (and since getters are so common its also a low hanging fruit for the JIT to optimize).
Edit: To make it absolutely clear, above paragraphs address potentially virtual methods; specifically those that are not private, static or final. Javac could perform inlining in some cases; because it could prove that no override can exist for those. It would be a pointless undertaking in face of the fact that the JIT also does it, and it does a far better job at it.
javac does not inline methods, even as simple as getBlah() and setBlah()
As to HotSpot JVM, JIT compiler does inline all such methods unless it reaches the maximum level of inlining (-XX:MaxInlineLevel)
JIT equally treats public and private methods in terms of inlining. Access modifiers does not generally affect inlining unless some very specific cases.
Whether or not any particular Java compiler -- Oracle's, for instance -- performs any inlining is an implementation detail that you would be better off ignoring. A future generation of your chosen compiler or an alternative compiler might operate differently than the one you happen to be looking at now.
To the extent that a Java compiler did perform inlining, however, it could inline only private methods and constructors. Anything else is a candidate for access from (unknowable) other classes.
A JIT compiler, on the other hand, can do pretty much anything it wants, including inlining public methods. It is responsible for any adjustments that may be needed when new classes are loaded. Beans-style accessor methods are a particularly likely thing for a JIT to optimize, they being such a common pattern.
Particularly in J2ME which approach consumes more resource : manipulating the public static variables or manipulating the set() and get() methods ?
That's impossible to tell, since it depends on the actual runtime environment. A JIT, AOT or Hotspot compiler may very well optimize away the potential method overhead.
Introducing accessor methods significantly increases the size of class files. However:
statics are evil
prefer a bit OO, and encapsulate with behavioural methods rather than writing structs with pointless boilerplate
you can probably find an obfuscator that will compact the object code for you
Using get() and set() methods may be a bit more costly than directly accessing attributes, (although the compiler or the JIT my optimize the method calls by inlining them) but anyway the difference is negligible. Also, in general you should not declare all your attributes as static, only the constant values.
On the other hand, using get() and set() methods is the the preferred option for enforcing the encapsulation of data, it's a good object-oriented programming practice. Not having those methods, forces you to expose the attributes to the outside, diminishing the ability of the class to hide implementation details and making future changes in the implementation harder.
public static field access would cost you less resources than setter / getter methods. If you are on a modern hotspot JVM, there will be minimal difference.
set and get consume more than access to normal fields. You probably did not mean static.
Is there a concept of inline functions in java, or its replaced something else? If there is, how is it used? I've heard that public, static and final methods are the inline functions. Can we create our own inline function?
In Java, the optimizations are usually done at the JVM level. At runtime, the JVM perform some "complicated" analysis to determine which methods to inline. It can be aggressive in inlining, and the Hotspot JVM actually can inline non-final methods.
The java compilers almost never inline any method call (the JVM does all of that at runtime). They do inline compile time constants (e.g. final static primitive values). But not methods.
For more resources:
Article: The Java HotSpot Performance Engine: Method Inlining Example
Wiki: Inlining in OpenJDK, not fully populated but contains links to useful discussions.
No, there is no inline function in java. Yes, you can use a public static method anywhere in the code when placed in a public class. The java compiler may do inline expansion on a static or final method, but that is not guaranteed.
Typically such code optimizations are done by the compiler in combination with the JVM/JIT/HotSpot for code segments used very often. Also other optimization concepts like register declaration of parameters are not known in java.
Optimizations cannot be forced by declaration in java, but done by compiler and JIT. In many other languages these declarations are often only compiler hints (you can declare more register parameters than the processor has, the rest is ignored).
Declaring java methods static, final or private are also hints for the compiler. You should use it, but no garantees. Java performance is dynamic, not static. First call to a system is always slow because of class loading. Next calls are faster, but depending on memory and runtime the most common calls are optimized withinthe running system, so a server may become faster during runtime!
Java does not provide a way to manually suggest that a method should be inlined. As #notnoop says in the comments, the inlining is typically done by the JVM at execution time.
What you said above is correct. Sometimes final methods are created as inline, but there is no other way to explicitly create an inline function in java.
Well, there are methods could be called "inline" methods in java, but depending on the jvm. After compiling, if the method's machine code is less than 35 byte, it will be transferred to a inline method right away, if the method's machine code is less than 325 byte, it could be transferred into a inline method, depending on the jvm.
Real life example:
public class Control {
public static final long EXPIRED_ON = 1386082988202l;
public static final boolean isExpired() {
return (System.currentTimeMillis() > EXPIRED_ON);
}
}
Then in other classes, I can exit if the code has expired. If I reference the EXPIRED_ON variable from another class, the constant is inline to the byte code, making it very hard to track down all places in the code that checks the expiry date. However, if the other classes invoke the isExpired() method, the actual method is called, meaning a hacker could replace the isExpired method with another which always returns false.
I agree it would be very nice to force a compiler to inline the static final method to all classes which reference it. In that case, you need not even include the Control class, as it would not be needed at runtime.
From my research, this cannot be done. Perhaps some Obfuscator tools can do this, or, you could modify your build process to edit sources before compile.
As for proving if the method from the control class is placed inline to another class during compile, try running the other class without the Control class in the classpath.
so, it seems there arent, but you can use this workaround using guava or an equivalent Function class implementation, because that class is extremely simple, ex.:
assert false : new com.google.common.base.Function<Void,String>(){
#Override public String apply(Void input) {
//your complex code go here
return "weird message";
}}.apply(null);
yes, this is dead code just to exemplify how to create a complex code block (within {}) to do something so specific that shouldnt bother us on creating any method for it, AKA inline!
Java9 has an "Ahead of time" compiler that does several optimizations at compile-time, rather than runtime, which can be seen as inlining.