I am interested whether should I manually inline small methods which are called 100k - 1 million times in some performance-sensitive algorithm.
First, I thought that, by not inlining, I am incurring some overhead since JVM will have to find determine whether or not to inline this method (or even fail to do so).
However, the other day, I replaced this manually inlined code with invocation of static methods and seen a performance boost. How is that possible? Does this suggest that there is actually no overhead and that by letting JVM inline at "its will" actually boosts performance? Or this hugely depends on the platform/architecture?
(The example in which a performance boost occurred was replacing array swapping (int t = a[i]; a[i] = a[j]; a[j] = t;) with a static method call swap(int[] a, int i, int j). Another example in which there was no performance difference was when I inlined a 10-liner method which was called 1000000 times.)
I have seen something similar. "Manual inlining" isn't necessarily faster, the result program can be too complex for optimizer to analyze.
In your example let's make some wild guesses. When you use the swap() method, JVM may be able to analyze the method body, and conclude that since i and j don't change, although there are 4 array accesses, only 2 range checks are needed instead of 4. Also the local variable t isn't necessary, JVM can use 2 registers to do the job, without involving r/w of t on stack.
Later, the body of swap() is inlined into the caller method. That is after the previous optimization, so the saves are still in place. It's even possible that caller method body has proved that i and j are always within range, so the 2 remaining range checks are also dropped.
Now in the manually inlined version, the optimizer has to analyze the whole program at once, there are too many variables and too many actions, it may fail to prove that it's safe to save range checks, or eliminate the local variable t. In the worst case this version may cost 6 more memory accesses to do the swap, which is a huge overhead. Even if there is only 1 extra memory read, it is still very noticeable.
Of course, we have no basis to believe that it's always better to do manual "outlining", i.e. extract small methods, wishfully thinking that it will help the optimizer.
--
What I've learned is that, forget manual micro optimizations. It's not that I don't care about micro performance improvements, it's not that I always trust JVM's optimization. It is that I have absolutely no idea what to do that does more good than bad. So I gave up.
The JVM can inline small methods very efficiently. The only benifit inlining yourself is if you can remove code i.e. simplify what it does by inlining it.
The JVM looks for certain structures and has some "hand coded" optimisations when it recognises those structures. By using a swap method, the JVM may recognise the structure and optimise it differently with a specific optimisation.
You might be interested to try the OpenJDK 7 debug version which has an option to print out the native code it generates.
Sorry for my late reply, but I just found this topic and it got my attention.
When developing in Java, try to write "simple and stupid" code. Reasons:
the optimization is made at runtime (since the compilation itself is made at runtime). The compiler will figure out anyway what optimization to make, since it compiles not the source code you write, but the internal representation it uses (several AST -> VM code -> VM code ... -> native binary code transformations are made at runtime by the JVM compiler and the JVM interpreter)
When optimizing the compiler uses some common programming patterns in deciding what to optimize; so help him help you! write a private static (maybe also final) method and it will figure out immediately that it can:
inline the method
compile it to native code
If the method is manually inlined, it's just part of another method which the compiler first tries to understand and see whether it's time to transform it into binary code or if it must wait a bit too understand the program flow. Also, depending on what the method does, several re-JIT'ings are possible during runtime => JVM produces optimum binary code only after a "warm up"... and maybe your program ended before the JVM warms itself up (because I expect that in the end the performance should be fairly similar).
Conclusion: it makes sense to optimize code in C/C++ (since the translation into binary is made statically), but the same optimizations usually don't make a difference in Java, where the compiler JITs byte code, not your source code. And btw, from what I've seen javac doesn't even bother to make optimizations :)
However, the other day, I replaced this manually inlined code with invocation of static methods and seen a performance boost. How is that possible?
Probably the JVM profiler sees the bottleneck more easily if it is in one place (a static method) than if it is implemented several times separately.
The Hotspot JIT compiler is capable of inlining a lot of things, especially in -server mode, although I don't know how you got an actual performance boost. (My guess would be that inlining is done by method invocation count and the method swapping the two values isn't called too often.)
By the way, if its performance really matters, you could try this for swapping two int values. (I'm not saying it will be faster, but it may be worth a punt.)
a[i] = a[i] ^ a[j];
a[j] = a[i] ^ a[j];
a[i] = a[i] ^ a[j];
Related
Is there any difference between doing
if (numberOfEntries >= array.length) {do stuff}; // Check if array is full directly
over doing something like
private boolean isArrayFull(){
return numberOfEntries >= array.length;
}
if (isArrayFull()) {do stuff}; // Call a check function
Over large arrays, many iterations and any other environment of execution, is there any difference to these methods other than readability and code duplication, if I need to check if the array is full anywhere else?
Forget about performance. That is negligible.
But if you are doing it many times, util method isArrayFull() makes sense. Because if you are adding more conditions to your check, changing in the function reflects everywhere.
As said above, first make your design good and then determine performance issues, using some tools. Java has JIT optimisations for inlining, so there is no difference.
The JIT aggressively inlines methods, removing the overhead of method calls
from https://techblug.wordpress.com/2013/08/19/java-jit-compiler-inlining/
Note: The below explanation is not any language specific. It is generic.
The difference comes when you analyze the options at machine level, A function is actually some JMP operations and allot of PUSH/POP operations on the CPU. An IF is usually a single COMP operation which is much cheaper than any what happens during function call.
If your 'IF's usually return false/true then I won't worry about it as the CPU optimizes IFs in a very good way by predicting the result as long as the IFs are "predictable" (usually returns true or false or has some pattern of true/false)
I would go with the IFs in cases where even negligible improvement in performance is a big deal.
In cases like web applications reducing the code redundancy to make the code manageable and readable is way more important than the optimization to save a few instructions at machine level.
I have the following piece of code:
Player player = (Player)Main.getInstance().getPlayer();
player.setSpeedModifier(keyMap[GLFW_KEY_LEFT_SHIFT] ? 1.8f : 1);
if (keyMap[GLFW_KEY_W]) {
player.moveForward();
}
if (keyMap[GLFW_KEY_S]) {
player.moveBackward();
}
player.rotateTowards(getMousePositionInWorld());
I was wondering if the usage of a local variable (For the player) to make the code more readable has any impact on performance or whether it would be optimised during compilation to replace the uses of the variable seeing as it is just a straight copy of another variable. Whilst it is possible to keep the long version in place, I prefer the readability of having the shorter version. I understand that the performance impact if there was any would be miniscule, but was just interested if there would be any whatsoever.
Thanks, -Slendy.
For any modern compiler, this will most likely be optimized away and it will not have any performance implications. The few additional bytes used for storage are well worth the added readability.
consider these 2 pieces of code:
final Player player = (Player)Main.getInstance().getPlayer();
player.callmethod1();
player.callmethod2();
and:
((Player)Main.getInstance().getPlayer()).callmethod1();
((Player)Main.getInstance().getPlayer()).callmethod2();
There are reasons, why first variant is preferable:
First one is more readable, at least because of line length
Java compiler cannot assume that the same object will be returned by Main.getInstance().getPlayer() this is why second variant will actually call getPlayer twice, which could be performance penalty
Apart from the probably unneeded (Player) cast, I even find your version to be superior to having long worms of calls.
IMHO if you need one special object more than once or twice, it is worth to be saved in a local variable.
The local variable will need some bytes on the stack, but on the other hand, several calls are omitted, so your version clearly wins.
Your biggest performance hit will likely be the function lookup of the objects:
(Player)Main.getInstance().getPlayer();
Otherwise, you want to minimize these function calls if possible. In this case, a local var could save CPU, though if you have a global var, it might be a hair faster to use it.
It really depends on how many times this is done in a loop though. Quite likely you will see no difference either way in normal usage. :)
Consider the following Java code fragment:
String buffer = "...";
for (int i = 0; i < buffer.length(); i++)
{
System.out.println(buffer.charAt(i));
}
Since String is immutable and buffer is not reassigned within the loop, will the Java compiler be smart enough to optimize away the buffer.length() call in the for loop's condition? For example, would it emit byte code equivalent to the following, where buffer.length() is assigned to a variable, and that variable is used in the loop condition? I have read that some languages like C# do this type of optimization.
String buffer = "...";
int length = buffer.length();
for (int i = 0; i < length; i++)
{
System.out.println(buffer.charAt(i));
}
In Java (and in .Net), strings are length counted (number of UTF-16 code points), so finding the length is a simple operation.
The compiler (javac) may or may not perform hoisting, but the JVM JIT Compiler will almost certainly inline the call to .length(), making buffer.length() nothing more than a memory access.
The Java compiler (javac) performs no such optimization. The JIT compiler will likely inline the length() method, which at the very least would avoid the overhead of a method call.
Depending on which JDK you're running, the length() method itself likely returns a final length field, which is a cheap memory access, or the length of the string's internal char[] array. In the latter case, the array's length is constant, and the array reference is presumably final, so the JIT may be sophisticated enough to record the length once in a temporary as you suggest. However, that sort of thing is an implementation detail. Unless you control every machine that your code will run on, you shouldn't make too many any assumptions about which JVM it will run on, or which optimizations it will perform.
As to how you should write your code, calling length() directly in the loop condition is a common code pattern, and benefits from readability. I'd keep things simple and let the JIT optimizer do its job, unless you're in a critical code path that has demonstrated performance issues, and you have likewise demonstrated that such a micro-optimization is worthwhile.
You can do several things to examine the two variations of your implementation.
(difficulty: easy) Make a test and measure the speed under similar conditions for each version of the code. Make sure you loop is significant enough to notice a difference, it is possible that there is none.
(difficulty: medium) Examine the bytecode with javap and see how the compiler has interpreted both versions (this might differ depending on javac implementation) or it might not (when the behavior was specified in the spec and left no room for interpretation by the implementor).
(difficulty: hard) Examine the JIT output of both versions with JITWatch, you will need to have a very good understanding of bytecode and assembler.
Call A:
double Value = Object.Object.Object.Object.DoubleValue;
Call B:
double Value : Object.DoubleValue;
If this were in a for loop and being called many times over and over would there be a performance loss for calling an object within an object or is it worth noting about?
Readbility is for programmers, optimizations are for compilers (and jit optimizations, to be honest).
Do whatever is the standard in your team and is more readable.
If after you do it you suspect some performance issue - use a profiler to check if it is indeed the case, and do adjustments accordingly.
is it not worth noting about?
Its could cost you tens of nano-seconds (is that important to you?) The JIT fairly good at optimising/caching reference look ups so placing them in local variable is unlikely to be mcuh faster. i.e. even if it matters there is unlikely to be something simple you can do about it.
Assuming a 10 element or less Object[] in Java, what would be the fastest way of copying the array?
for(int i = 0;i < a.length;i++)
for(int i = 0,l = a.length;i < l;i++) // i.e. is caching array len in local var faster?
System.arrayCopy(a, 0, a2, 0, a.length);
The chances are that the difference between the three alternatives is relatively small.
The chances are that this is irrelevant to your application's overall performance.
The relative difference is likely to depend on the hardware platform and the implementation of Java that you use.
The relative difference will also vary depending on the declared and actual types of the arrays.
You are best off forgetting about this and just coding the way that seems most natural to you. If you find that your completed application is running too slowly, profile it and tune based on the profiling results. At that point it might be worthwhile to try out the three alternatives to see which is faster for your application's specific use-case. (Another approach might be to see if it is sensible to avoid the array copy in the first place.)
Caching the length isn't useful. You're accessing a field directly. And even is it was a method, the JIT would inline and optimize it.
If something had to be optimized, System.arraycopy would contain the optimization.
But the real answer is that it doesn't matter at all. You're not going to obtain a significant gain in performance by choosing the most appropriate way of copying an array of 10 elements or less. If you have a performance problem, then search where it comes from by measuring, and then optimize what must be optimized. For the rest, use what is the most readable and maintainable. What you're doing is premature optimization. And it's the root of all evil (says D. Knuth).
System.arraycopy() is the fastest way to copy array -- as it designed and optimized exactly for this job. There was rumors that for small arrays it hadcoded loop may be faster -- but it is not true for now. System.arraycopy is a JIT intrinsics, and JIT choose best implementation for each case.
Do get yourself a book on JVM internals (for example, "Oracle JRockit, The Definitive Guide") and realize that what the JVM executes, after warming up, loop unrolling, method inlining, register re-allocation, loop invariant extraction and so on will not even closely resemble what you write in Java source code.
Sorry :-) Otherwise, you will enjoy reading http://www.javaspecialists.eu.