Costs of operations in Java - java

Is there a why to tell, how expensive an operation for the processor in millisecons or flops is?
I would be intrested in "instanceof", casts (I heard they are very "expensive").
Are there some studies about that?

It will depend on which JVM you're using, and the cost of many operations can vary even within the same JVM, depending on the exact situation and how much optimization the JIT has performed.
For example, a virtual method call can still be inlined by the Hotspot JIT - so long as it hasn't been overridden by anything else. In some cases with the server JIT it can still be inlined with a quick type test, for up to a couple of types.
Basically, JITs are complex enough that there's unlikely to be a meaningful general purpose answer to the question. You should benchmark your own specific situation in as real-world a way as possible. You should usually write code with primary goals of simplicit and readability - but measure the performance regularly.

The time where counting instructions or cycles could give you a good idea about the performance of some code are long gone, thanks to many, many optimizations happening on all levels of software execution.
This is especially true for VM-based languages, where the JVM can simply skip some steps because it knows that it's not necessary.
For example, I've read some time ago in an article (I'll try to find and link it eventually) that these two methods are pretty much equivalent in cost (on the HotSpot JVM, that is):
public void frobnicate1(Object o) {
if (!(foo instanceof SomeClass)) {
throw new IllegalArgumentException("Oh Noes!");
}
frobnicateSomeClass((SomeClass) o);
}
public void frobnicate2(Object o) {
frobnicateSomeClass((SomeClass) o);
}
Obviously the first method does more work, but the JVM knows that the type of o has already been checked in the if and can actually skip the type-check on the cast later on and make it a no-op.
This and many other optimizations make counting "flops" or cycles pretty much useless.
Generally speaking an instanceof check is relatively cheap. On the HotSpot JVM it boils down to a numeric check of the type id in the object header.
This classic article describes why you should "Write Dumb Code".
There's also an article from 2002 that describes how instanceof is optimized in the HotSpot JVM.

Once the JVM has warmed up most operations can be counted in nano-seconds (millionths of a milli-second) When talking about something being expensive, you usually have to say its expensive relative to an alternative. Its next to impossible to describe something as expensive in all cases.
Usually, the most important expense is your time (and other developers in your team) Using instanceof can be expensive in development and code support time because it often indicates a poor design. Using proper OOP techniques is usually a better idea. The 10 nano-second an instanceof might take, is usually relatively trivial.

The cost of specific operations performed inside the CPU is almost never relavant for performance. If performance is bad, it's almost always because of IO (network, disk) or inefficient code. Writing efficient code is much more about finding a way to reduce the overall amount of operations rather than avoiding "costly" operations (except those that are orders of magnitude more costly, like IO).

Related

Does code length impact on performance?

I take the following example to illustrate example but note that it could be any other task.
for (int i =0; i< 1000; i++){
String a= "world";
Log.d("hello",a);
}
Versus
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
...
//1000 times
Let's ignore readability and code quality, just performance of the compiled program.
So which one is better?
The one and only answer: this doesn't matter at all.
What I mean is: aspects such as readability and a good design are much more worth spending your time on.
When, and only when you managed to design a superb application; and you are down to the point that you have a performance problem and you would have to look into this specific question; well, then there would be merit in looking into it. But I am 99.9% sure: you are not at this point. And in that sense: you are wasting your time with such thoughts!
In other words: especially within the "JVM stack" there are tons and tons of things that can have subtle or not-so-subtle effects on performance. Like: there are many many options that influence the inner works of garbage collection and what the Just-in-time compiler is doing. That will affect the "performance" of your running application in much more significant ways than manually un-rolling loop code.
And for the record: many benchmarking experiments showed that the JIT does do its best job if you give "normal" code to it. Meaning: the Oracle JIT is designed to give you "best results" on "normal" input. But as soon as you start to fiddle around with your java code, in order to somehow create "optimized" code ... chances are that your changes make it harder for the JIT to do a good job.
Thus: please forget about such micro-optimizations. If you spend the same amount of time learning about "good OO design" and maybe "generic java performance topics" ... you will gain much more from that!
So which one is better?
The first one, since it actually compiles (there's a typo in the second). Non-compiling code is absolutely the least performant code.
But also, the first one, because, despite you saying "ignore readability and code quality", readability is almost always more important than performance. Readability is certainly what you should write code to be first, before making it performant.
And the first one, because the JIT will unroll the loop for you, it determines that has better performance on the specific JVM on which it is running, for the specific data that the code is run on (I'm assuming you don't simply like filling your logs with the same message over and over).
Don't try to second guess the optimizations the JIT will apply. Applying the optimizations yourself might make performance worse, because it makes it harder for the JIT to optimize the code. Write clear, maintainable code, and let the JVM work its magic.
Depending on the programming language. The compiler is responsible for compressing your code in such a way that it can avoid repeat calls at the low level. Object Orientated Programming with languages like java-script sit on a relatively high level, and therefore the compilation may take some time to run.
So to conclude briefly based widely on assumption from your question, the only impacts are the overall filesize and time taken to compile.
What you are doing is called "Loop Unrolling".
Check: https://en.wikipedia.org/wiki/Loop_unrolling
This is one of the optimization which compilers will do if compiled with appropriate optimization level.
In my view your second option should run faster than first one. Because in first option code has to increment loop counter, check it against loop condition and take appropriate branch decision. Branching itself can be expensive if there is a branch mis-prediction. This is because on branch mis-prediction CPUs has to clear all the pipline and restart the instruction. Check: https://en.wikipedia.org/wiki/Branch_misprediction

Is it faster to compare and set or just set?

Which is faster:
if(this.foo != 1234)
this.foo = 1234;
or
this.foo = 1234;
is the penalty of write high enough that one should check value before writing or is it faster to just write?
wouldn't having a branch cause possible mispredictions and screwup cpu pipleine? but what is the field is volatile, with writes having higher cost than reads?
yes, it is easy to say that in isolation these operations themselves are 'free' or benchmark it but that is not an answer.
There is a nice example illustrating this dilemma in the very recent talk by Sergey Kuksenko about hardware counters (slides 45-49), where the right answer for the same code depends on the data size! The idea is that "compare and set" approach cause more branch misses and loads, but less stores and L1 store misses. The difference is subtle, and I can't even rationalize why one factors overweight different on small data sizes, but become less signigicant on large data sizes.
So, measure, don't guess.
Both those operations are free: they really take almost no time!
Now if this code is in a loop, you should definitely favor the second option as it will minimize branch mispredictions.
Otherwise, what matters here is what makes the code the more readable. And again in my opinion, the second option is clearer.
Also as mentionned in the comments, assigning is an atomic operation which makes it thread safe. An other advantage for the second option.
They are not free. They cost time and space. And branching in a tight loop can actually be very costly because of branch prediction (kid's these days and their modern CPUs) . See Mysticial's answer.
But mostly it's pointless. Either just set to what it should be or throw when it's not what you expect.
Any code you make me read had better have a good reason to exist.
What I think you are trying to do is express what you expect it's value to be and assert that it should be that. Without context I can't tell you if you should throw when your expectations are violated or simply assign to assert what it should be. But making your expectations clear and enforcing them one way or another is certainly worth a few CPU cycles. I'd rather you were a little slower than quickly giving me garbage in and garbage out.
I believe this is actually a general question rather than java-related because of low level of this operations (CPU, not JVM level).
First of all, let's see what the choice is. On one hand we have reading from memory + comparison + (optionally) writing to memory, on other hand - writing to memory.
Memory access is much more expensive than registry operations (operations on data, already loaded to CPU). Therefore, choise is read + (sometimes) write vs write.
What is more expensive, read or write? Short answer - write. Long answer - write, but difference is probably small and depends on system caching strategy. It is not easy to explain in a few words, you can learn more about caching in the beautiful book "Operating Systems" by William Stallings.
I think in practice you can ignore distinction between read and write operations and just write without a test. That is because (returning back to Java) your object with all it's fields will be in cache for this moment.
Another thing to consider is branch prediction - others already mentioned that this is the reason to just write value without test too.
It depends on what you're really interested in.
If this is a plain old vanilla program, not only does the fetch/compare/branch of the first scheme take extra time, but it's extra code and complexity, and even if the first scheme did actually save a miniscule amount of time (instead of costing time) it wouldn't be worth doing it.
However, there are scenarios where it would be different. In an intensely multi-threaded environment with multiple processors modifying shared storage can be expensive, since changes to that storage need to be propagated to other processors. In such an environment it could be well worth it to spend a few extra instructions to avoid "dirtying" cache.

Java call type performance

I put together a microbenchmark that seemed to show that the following types of calls took roughly the same amount of time across many iterations after warmup.
static.method(arg);
static.finalAnonInnerClassInstance.apply(arg);
static.modifiedNonFinalAnonInnerClassInstance.apply(arg);
Has anyone found evidence that these different types of calls in the aggregate will have different performance characteristics? My findings are they don't, but I found that a little surprising (especially knowing the bytecode is quite different for at least the static call) so I want to find if others have any evidence either way.
If they indeed had the same exact performance, then that would mean there was no penalty to having that level of indirection in the modified non final case.
I know standard optimization advice would be: "write your code and profile" but I'm writing a framework code generation kind of thing so there is no specific code to profile, and the choice between static and non final is fairly important for both flexibility and possibly performance. I am using framework code in the microbenchmark which I why I can't include it here.
My test was run on Windows JDK 1.7.0_06.
If you benchmark it in a tight loop, JVM would cache the instance, so there's no apparent difference.
If the code is executed in a real application,
if it's expected to be executed back-to-back very quickly, for example, String.length() used in for(int i=0; i<str.length(); i++){ short_code; }, JVM will optimize it, no worries.
if it's executed frequently enough, that the instance is mostly likely in CPU's L1 cache, the extra load of the instance is very fast; no worries.
otherwise, there is a non trivial overhead; but it's executed so infrequently, the overhead is almost impossible to detect among the overall cost of the application. no worries.

Confusion on loops in java

Which one is faster in java ?
a) for(int i = 100000; i > 0; i--) {}
b) for(int i = 1; i < 100001; i++) {}
I have been looking for explanation to the answer which is option a, anyone? any help is appreciated
There are situations when a reverse loop might be slightly faster in Java. Here's a benchmark showing an example. Typically, the difference is explained by implementation details in the increment/decrement instructions or in the loop-termination comparison instructions, both in the context of the underlying processor architecture. In more complex examples, reversing the loop can help eliminate dependencies and thus enable other optimizations, or can improve memory locality and caching, even garbage collection behavior.
One can not assume that either kind of loop will always be faster, for all cases - a benchmark would be needed to determine which one performs better on a given platform, for a concrete case. And I'm not even considering what the JIT compiler has to do with this.
Anyway, this is the sort of micro-optimizations that can make code more difficult to read, without providing a noticeable performance boost. And so it's better to avoid them unless strictly necessary, remember - "premature optimization is the root of all evil".
Just talking out of my hat, but I know assembly languages have specific comparisons to zero that take fewer cycles than comparisons between registered values.
Generally, Oracle HotSpot has an emphasis on optimisation in real code, which means that forward loop optimisations are more likely to be implemented than backward loops. From a machine code point of view, the decrementing loop may save an instruction, but it is unlikely to have a significant impact on performance, particularly when there is much memory access going on. I understand modern CPUs are more or less as happy going backwards as forwards (historically there was a time when they were better optimised for forward access). They'll even optimise certain stride access patterns.
(Also HotSpot (at least the Server/C2 flavour) is capable of removing empty loops.)
You told that the answer is a so, I guess an answer: java virtual machine will "translate" the comparison with zero in a faster way.

Java performance issue

I've got a question related to java performance and method execution.
In my app there are a lot of place where I have to validate some parameter, so I've written a Validator class and put all the validation methods into it. Here is an example:
public class NumberValidator {
public static short shortValidator(String s) throws ValidationException{
try{
short sh = Short.parseShort(s);
if(sh < 1){
throw new ValidationException();
}
return sh;
}catch (Exception e) {
throw new ValidationException("The parameter is wrong!");
}
}
...
But I'm thinking about that. Is this OK? It's OO and modularized, but - considering performance - is it a good idea?
What if I had awful lot of invocation at the same time? The snippet above is short and fast, but there are some methods that take more time.
What happens when there are a lot of calling to a static method or an instance method in the same class and the method is not synchronized? All the calling methods have to fall in line and the JVM executes them sequentially?
Is it a good idea to have some class that are identical to the above-mentioned and randomly call their identical methods? I think it is not, because "Don't repeat yourself " and "Duplication is Evil" etc. But what about performance?
Thanks is advance.
On reentrancy of your method: if it's static, it doesn't hold any state, so it's perfectly safe.
On performance: look at your use cases. Since you're validating Strings, I can only assume you are validating user input. In any case, the number of simultaneous users of your system is not probable to incur a performance bottleneck.
Just two comments:
1) Factoring out validation into methods may in fact improve performance a little. As far as I know, the JIT compiler is designed to detect frequent method invocations. The validation methods are therefore good candidates for JIT optimization.
2) Try avoiding 'catch(Exception e)'. This is not recommended since you are capturing all kinds of RuntimeException as well. If you have a bug in one of the non-trivial validations, you may throw a ValidationException that hides a bug in the code.
Not sure what are your concern.
Since you mentioned the method not beeing synchronized I suppose you have concurrent invocations from multiple threads.
And since the method is not sychronized any invocation can be executed concurrently without problems.
For sure you won't get any performance improvements by copy and paste this method in the calling classes. May be you would reduce performance because your code size will increase and will waste space in che processor cache, but for such a short method I think it's a trascurable effect.
Are you experiencing performance problems?
Make the code easy to maintain first, and if it's not living up to expectations, it'll be easy to read and easy to refactor.
If you make the code optimized for speed on every choice, you'll often wind up with something unreadable, and have to start from scratch to fix simple bugs.
That said, your method is a static, and will only need to be initialized once. That is the fast version. :-)
I'm suspicious of this code, not so much on performance grounds as that I don't think it is successfully abstracting out anything that deserves abstraction.
If it is for checking user input, it replaces reasonable error messages like 'maximum number of widgets allowed is 9999' with 'ValidationException'. And if you add something like arguments (or try/catch clauses) to get the messages right in context, then almost certainly the required call-site code is more complex, harder to write and maintain, than the straightforward way of doing things.
If it is for internal sanity checking, you may well start to lose meaningful performance, and certainly greatly increase complexity and bugginess, if you are passing arguments round as strings all over the place and continually parsing and validating them.

Categories

Resources