I am trying to understand the JVM and HotSpot optimizers internals.
I tackle the problem of initializing object tree structures with an awful lot of nodes as fast as possible.
Right now, for every tree structure given, we generate Java source code to initialize the tree as following. In the end, we have thousands of these classes.
public class TypeATreeNodeInitializer {
public TypeATreeNode initialize(){
return getTypeATree();
}
private TypeATreeNode getTypeATree() {
TypeATreeNode node = StaticTypeAFactory.create();
TypeBTreeNode child1 = getTypeBTreeNode1();
node.getChildren().add(child1);
TypeBTreeNode child2 = getTypeBTreeNode2();
node.getChildren().add(child2);
//... may be many more children
return node;
}
private TypeBTreeNode getTypeBTreeNode1() {
TypeBTreeNode node = StaticTypeBFactory.create();
TypeBTreeNode child1 = getTypeCTreeNode1();
node.getChildren().add(child1);
//store of value in variable first
String value1 = "Some value";
// assign value to node
node.setSomeValue(value1);
boolean value2 = false;
node.setSomeBooleanValue(value2);
return node;
}
private TypeBTreeNode getTypeCTreeNode1() {
// ...
return null;
}
private TypeBTreeNode getTypeBTreeNode2() {
// ...
return null;
}
//... many more child node getter / initializer
}
As you can see, the values to be assigned to the tree nodes are stored inside local variables first. Looking at the generated byte code, this results in:
A load of the variable from the constant pool to the stack // e.g. String “Some Value”
A store of the variable inside the local variables
A load from the method target onto the stack // e.g. TypeBTreeNode
A load of the variable from the local variables // “Some Value”
The invocation of the setter
Yet this could be written shorter by not storing into a local variable and directly passing the parameters. So, it becomes just:
pushing the method target onto the stack // e.g TypeBTreeNode
then loading the constant onto the stack // “Some Value”
then invoking the setter
I know that in other languages (e.g. C++) compiles are capable of such optimizations.
In Java, the HotSpot optimizer is responsible for such magic during runtime.
However, as far as I understand the docs, HotSpot only kicks in after the 500ths method call (client VM).
Questions:
Do I understand correctly: if I initialize every tree only once, but do that for a large number (let’s say 10.000) of generated TreeInitializers the first byte code sequence is executed for every TreeInitializer, as they are different classes with different methods and every method is called just once?
I suspect a significant speed up rewriting the genreator using no locals, as I am saving about a third of byte code instructions and possibly expensive loads of the variables. I know that this is hard to tell without measuring, but altering the generators code is non-trivial, so would you think it is worth a try?
Removing temporary/stack variables like this is almost always premature optimization. Your processor can handle hundreds of millions of these instructions per second; meanwhile, if you're initializing tens of thousands of anything, your program is probably going to be blocking at some point waiting on memory allocation.
My advise is always going to be to hold off on optimizations until you've profiled your code. In the meantime, write code to be as easy-to-read as possible, so that when you do need to come back and modify something, it's easy to find the places that need to be updated.
Before optimizing, the JVM runs your code byte-by-byte and profiles its behavior. Based on this observation, it will compile your code to machine code. For this reason, it is difficult to give general advice for this. You should however only treat your byte code as a general abstraction, not as a performance fundamental.
A few rules of thumb:
Avoid large methods as those methods are often not inlined into other methods, even if the JVM considers this to be a good idea. This is to avoid memory overhead as inlining large methods would create a lot of duplicate code.
Avoid polymorphism and unstable branches if you can. If your VM finds out that a method call only ever hits a specific class, this is good news. The JVM will most likely remove the virtual properties of this call. Similarly, stable branches can help you with branch prediction.
Avoid object allocations for long-lived objects. If you create a lot of objects, rather let them die young then keeping them around for long.
The first rule of Optimize Club is "don't optimize." That said...
There is already no point in assigning a value to a local (stack) variable only to reference it once. If I was reviewing this code, I would have the author remove the assignment and just pass results of get...() to add().
This is not a "premature optimization" but a code simplification (code quality) issue. The fact that it eliminates some byte codes is usually not a consideration either, as the JIT compiler will optimize the code at run time. In this case, because these initializers sound like they will only be run once, the threshold for this optimization will likely never be met, so there will be value in eliminating the unnecessary stack assign and load.
Related
Simple question asked mostly out of curiosity about what java compiler's are smart enough to do. I know not all compilers are built equally, but I'm wondering if others feel it's reasonable to expect an optimization on most compilers I'm likely to run against, not if it works on a specific version or on all versions.
So lets say that I have some tree structure and I want to collect all the descendant of a node. There are two easy ways to do this recursively.
The more natural method, for me, to do this would be something like this:
public Set<Node> getDescendants(){
Set<Node> descendants=new HashSet<Node>();
descendants.addall(getChildren());
for(Node child: getChildren()){
descendants.addall(child.getDescendants());
}
return descendants;
}
However, assuming no compiler optimizations and a decent sized tree this could get rather expensive. On each recursive call I create and fully populate a set, only to return that set up the stack so the calling method can add the contents of my returning set to it's version of the descendants set, discarding the version that was just built and populated in the recursive call.
So now I'm creating many sets just to have them be discarded as soon as I return their contents. Not only do I pay a minor initialization cost for building the sets, but I also pay the more substantial cost of moving all the contents of one set into the larger set. In large trees most of my time is spent moving Nodes around in memory from set A to B. I think this even makes my algorithm O(n^2) instead of O(n) due to the time spent copying Nodes; though it may work out to being O(N log(n)) if I set down to do the math.
I could instead have a simple getDescendants method that calls a helper method that looks like this:
public Set<Node> getDescendants(){
Set<node> descendants=new HashSet<Node>();
getDescendantsHelper(descendants);
return descendants;
}
public Set<Node> getDescendantsHelper(Set<Node> descendants){
descendants.addall(getChildren());
for(Node child: getChildren()){
child.getDescendantsHelper(descendant);
}
return nodes;
}
This ensures that I only ever create one set and I don't have to waste time copying from one set to another. However, it requires writing two methods instead of one and generally feels a little more cumbersome.
The question is, do I need to do option two if I'm worried about optimizing this sort of method? or can I reasonably expect the java compiler, or JIT, to recognize that I am only creating temporary sets for convenience of returning to the calling method and avoid the wasteful copying between sets?
edit: cleaned up bad copy paste job which lead to my sample method adding everything twice. You know something is bad when your 'optimized' code is slower then your regular code.
The question is, do I need to do option two if I'm worried about optimizing this sort of method?
Definitely yes. If performance is a concern (and most of the time it is not!), then you need it.
The compiler optimizes a lot but on a very different scale. Basically, it works with one method only and it optimizes the most commonly used path there in. Due to heavy inlining it can sort of optimize across method calls, but nothing like the above.
It can also optimize away needless allocations, but only in very simple cases. Maybe something like
int sum(int... a) {
int result = 0;
for (int x : a) result += x;
return result;
}
Calling sum(1, 2, 3) means allocating int[3] for the varargs arguments and this can be eliminated (if the compiler really does it is a different question). It can even find out that the result is a constant (which I doubt it really does). If the result doesn't get used, it can perform dead code elimination (this happens rather often).
Your example involves allocating a whole HashMap and all its entries, and is several orders of magnitude more complicated. The compiler has no idea how a HashMap works and it can't find out e.g., that after m.addAll(m1) the set m contains all member of m1. No way.
This is an algorithmical optimization rather than low-level. That's what humans are still needed for.
For things the compiler could do (but currently fails to), see e.g. these questions of mine concerning associativity and bounds checks.
In the following piece of code we make a call listType.getDescription() twice:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
if (listType.getDescription() != null)
{
children.add(new SelectItem( listType.getId() , listType.getDescription()));
}
}
I would tend to refactor the code to use a single variable:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
String description = listType.getDescription();
if (description != null)
{
children.add(new SelectItem(listType.getId() ,description));
}
}
My understanding is the JVM is somehow optimized for the original code and especially nesting calls like children.add(new SelectItem(listType.getId(), listType.getDescription()));.
Comparing the two options, which one is the preferred method and why? That is in terms of memory footprint, performance, readability/ease, and others that don't come to my mind right now.
When does the latter code snippet become more advantageous over the former, that is, is there any (approximate) number of listType.getDescription() calls when using a temp local variable becomes more desirable, as listType.getDescription() always requires some stack operations to store the this object?
I'd nearly always prefer the local variable solution.
Memory footprint
A single local variable costs 4 or 8 bytes. It's a reference and there's no recursion, so let's ignore it.
Performance
If this is a simple getter, the JVM can memoize it itself, so there's no difference. If it's a expensive call which can't be optimized, memoizing manually makes it faster.
Readability
Follow the DRY principle. In your case it hardly matters as the local variable name is character-wise as about as long as the method call, but for anything more complicated, it's readability as you don't have to find the 10 differences between the two expressions. If you know they're the same, so make it clear using the local variable.
Correctness
Imagine your SelectItem does not accept nulls and your program is multithreaded. The value of listType.getDescription() can change in the meantime and you're toasted.
Debugging
Having a local variable containing an interesting value is an advantage.
The only thing to win by omitting the local variable is saving one line. So I'd do it only in cases when it really doesn't matter:
very short expression
no possible concurrent modification
simple private final getter
I think the way number two is definitely better because it improves readability and maintainability of your code which is the most important thing here. This kind of micro-optimization won't really help you in anything unless you writing an application where every millisecond is important.
I'm not sure either is preferred. What I would prefer is clearly readable code over performant code, especially when that performance gain is negligible. In this case I suspect there's next to no noticeable difference (especially given the JVM's optimisations and code-rewriting capabilities)
In the context of imperative languages, the value returned by a function call cannot be memoized (See http://en.m.wikipedia.org/wiki/Memoization) because there is no guarantee that the function has no side effect. Accordingly, your strategy does indeed avoid a function call at the expense of allocating a temporary variable to store a reference to the value returned by the function call.
In addition to being slightly more efficient (which does not really matter unless the function is called many times in a loop), I would opt for your style due to better code readability.
I agree on everything. About the readability I'd like to add something:
I see lots of programmers doing things like:
if (item.getFirst().getSecond().getThird().getForth() == 1 ||
item.getFirst().getSecond().getThird().getForth() == 2 ||
item.getFirst().getSecond().getThird().getForth() == 3)
Or even worse:
item.getFirst().getSecond().getThird().setForth(item2.getFirst().getSecond().getThird().getForth())
If you are calling the same chain of 10 getters several times, please, use an intermediate variable. It's just much easier to read and debug
I would agree with the local variable approach for readability only if the local variable's name is self-documenting. Calling it "description" wouldn't be enough (which description?). Calling it "selectableListTypeDescription" would make it clear. I would throw in that the incremented variable in the for loop should be named "selectableListType" (especially if the "listTypeManager" has accessors for other ListTypes).
The other reason would be if there's no guarantee this is single-threaded or your list is immutable.
Using JProfiler, I've identified a hot spot in my Java code that I cannot make sense of. JProfiler explains that this method takes 150μs (674μs without warmup) on average, not including the time it takes to call descendant methods. 150μs may not seem much, but in this application it adds up (and is experienced by my users) and also it seems a lot, compared to other methods that seem more complex to me than this one. Hence it matters to me.
private boolean assertReadAuthorizationForFields(Object entity, Object[] state,
String[] propertyNames) {
boolean changed = false;
final List<Field> fields = FieldUtil.getAppropriatePropertyFields(entity, propertyNames);
// average of 14 fields to iterate over
for (final Field field : fields) {
// manager.getAuthorization returns an enum type
// manager is a field referencing another component
if (manager.getAuthorization(READ, field).isDenied()) {
FieldUtil.resetField(field.getName(), state, propertyNames);
changed = true;
}
}
return changed;
}
I have for myself minimized this method in different directions, but it never teaches me much useful. I cannot stress enough that the JProfiler-reported duration (150μs) is merely about the code in this method and does not include the time it takes to execute getAuthorization, isDenied, resetField and such. That is also why I start of by just posting this snippet, without much context, since the issue seems to be with this code and not its subsequent descendant method calls.
Maybe you can argue why – if you feel I'm seeing ghosts :) Anyhow, thanks for your time!
Candidate behaviour that could slow you down:
Major effect: Obviously iteration. If you have lots of fields... You say 14 on average, which is fairly significant
Major effect: hotspot inlining would mean called methods are included in your times - and this could be noticeable because your method call(s) use reflection. getAppropriatePropertyFields introspects on class field definition metadata; resetField dynamically invokes setter methods (possibly using Method.invoke()??). If you are desperate for performance, you could use a cache via a HashSet (mapping ElementClass->FieldMetadataAndMethodHandle) This could contain field metadata and MethodHandles of setter methods (instead of using method.invoke, which is slow). Then you would only reflect during application startup and would use the JVM's fast dynamicInvoke support.
Minor effect - but multiplied by number of iterations: if you have very large arrays for state and property names, and they use primitive fields, then they would involve some degree of copying during method invocations (method parameters pass-by-'value' actually means pass-by-reference/pass-by-copy-of-primitives)
I suggest you time the method yourself as the profiler doesn't always give accurate timing.
Create a micro-benchmark with just this code and time it for at least 2 second. To work out how much difference method calls make, comment them out and hard code the values they return.
I think the issue is that FieldUtil is using Reflection and doesn't cache the fields it's using.
I have a question about instruction optimization. If an object is to be used in two statements, is it faster to create a new object reference or should I instead call the object directly in both statements?
For the purposes of my question, the object is part of a Vector of objects (this example is from a streamlined version of Java without ArrayLists). Here is an example:
AutoEvent ptr = ((AutoEvent)e_autoSequence.elementAt(currentEventIndex));
if(ptr.exitConditionMet()) {currentEventIndex++; return;}
ptr.registerSingleEvent();
AutoEvent is the class in question, and e_autoSequence is the Vector of AutoEvent objects. The AutoEvent contains two methods in question: exitConditionMet() and registerSingleEvent().
This code could, therefore, alternately be written as:
if(((AutoEvent)e_autoSequence.elementAt(currentEventIndex)).exitConditionMet())
{currentEventIndex++; return;}
((AutoEvent)e_autoSequence.elementAt(currentEventIndex)).registerSingleEvent();
Is this faster than the above?
I understand the casting process is slow, so this question is actually twofold: additionally, in the event that I am not casting the object, which would be more highly optimized?
Bear in mind this is solely for two uses of the object in question.
The first solution is better all round:
Only one call to the vector elementAt method. This is actually the most expensive operation here, so only doing it once is a decent performance win. Also doing it twice potentially opens you up to some race conditions.
Only one cast operation. Casts are very cheap on moderns JVMs, but still have a slight cost.
It's more readable IMHO. You are getting an object then doing two things with it. If you get it twice, then the reader has to mentally figure out that you are getting the same object. Better to get it once, and assign it to a variable with a good name.
A single assignment of a local variable (like ptr in the first solution) is extremely cheap and often free - the Java JIT compiler is smart enough to produce highly optimised code here.
P.S. Vector is pretty outdated. Consider converting to an ArrayList<AutoEvent>. By using the generic ArrayList you won't need to explicitly cast, and it is much faster than a Vector (because it isn't synchronised and therefore has less locking overhead)
First solution will be faster.
The reason is that assignments work faster than method invocations.
In the second case you will have method elementAt() invoked twice, which will make it slower and JVM will probably not be able to optimize this code because it doesn't know what exactly is happening in the elementAt().
Also remember that Vector's methods are synchronized, which makes every method invocation even slower due to lock acquisition.
I don't know what do you mean by "create a new object reference" here. The following code ((AutoEvent)e_autoSequence.elementAt(currentEventIndex)) probably will be translated into bytecode that obtains sequence element, casts it to AutoEven and store the resulting reference on stack. Local variable ptr as other local variables is stored on stack too, so assigning reference to is is just copying 4 bytes from one stack slot to another, nearby stack slot. This is very-very fast operation. Modern JVMs do not do reference counting, so assigning references is probably as cheap as assigning int values.
Lets get some terminology straight first. Your code does not "create a new object reference". It is fetching an existing object reference (either once or twice) from a Vector.
To answer your question, it is (probably) a little bit faster to fetch once and put the reference into a temporary variable. But the difference is small, and unlikely to be significant unless you do it lots of times in a loop.
(The elementAt method on a Vector or ArrayList is O(1) and cheap. If the list was a linked list, which has an O(N) implementation for elementAt, then that call could be expensive, and the difference between making 1 or 2 calls could be significant ...)
Generally speaking, you should think about the complexity of your algorithms, but beyond that you shouldn't spend time optimizing ... until you have solid profiling evidence to tell you where to optimize.
I can't say whether ArrayList would be more appropriate. This could be a case where you need the thread-safety offered by Vector.
As I understand, in case of an array, JAVA checks the index against the size of the Array.
So instead of using array[i] multiple times in a loop, it is better to declare a variable which stores the value of array[i], and use that variable multiple times.
My question is, if I have a class like this:
public class MyClass(){
public MyClass(int value){
this.value = value;
}
int value;
}
If I create an instance of this class somewhere else: (MyClass myobject = new MyClass(7)), and I have to use the objects value multiple times, is it okay to use myobject.value often or would it be better to declare a variable which stores that value and use that multiple times, or would it be the same?
In your case, it wouldn't make any difference, since referencing myobject.value is as fast and effective as referencing a new int variable.
Also, the JVM is usually able to optimize these kinds of things, and you shouldn't spend time worrying about it unless you have a highly performance critical piece of code. Just concentrate on writing clear, readable code.
The short answer is yes (in fact, in the array case, it does not only have to check the index limit but to calculate the actual memory position of the reference you are looking for -as in i=7, get the base position of the array and add 7 words-).
The long answer is that, unless you are really using that value a lot (and I mean a lot) and you are really constrained due to speed, it is not worth the added complexity of the code. Add to that that the local variable means that your JVM uses more memory, may hit a cache fault, and so on.
In general, you should worry more about the efficiency of your algorithm (the O(n)) and less about these tiny things.
The Java compiler is no bozo. He will do that optimization for you. There is 0 speed difference between all the options you give, usually.
I say 'usually' because whether or not accessing the original object or your local copy isn't always the same. If your array is globally visible, and another thread is accessing it, the two forms will yield different results, and the compiler cannot optimize one into the other. It is possible that something confuses the compiler into thinking there may be a problem, even though there isn't. Then it won't apply a legal optimization.
However, if you aren't doing funny stuff, the compiler will see what you're doing and optimize variable access for you. Really, that's what a compiler does. That's what it's for.
You need to optimize at least one level above that. This one isn't for you.