I take the following example to illustrate example but note that it could be any other task.
for (int i =0; i< 1000; i++){
String a= "world";
Log.d("hello",a);
}
Versus
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
...
//1000 times
Let's ignore readability and code quality, just performance of the compiled program.
So which one is better?
The one and only answer: this doesn't matter at all.
What I mean is: aspects such as readability and a good design are much more worth spending your time on.
When, and only when you managed to design a superb application; and you are down to the point that you have a performance problem and you would have to look into this specific question; well, then there would be merit in looking into it. But I am 99.9% sure: you are not at this point. And in that sense: you are wasting your time with such thoughts!
In other words: especially within the "JVM stack" there are tons and tons of things that can have subtle or not-so-subtle effects on performance. Like: there are many many options that influence the inner works of garbage collection and what the Just-in-time compiler is doing. That will affect the "performance" of your running application in much more significant ways than manually un-rolling loop code.
And for the record: many benchmarking experiments showed that the JIT does do its best job if you give "normal" code to it. Meaning: the Oracle JIT is designed to give you "best results" on "normal" input. But as soon as you start to fiddle around with your java code, in order to somehow create "optimized" code ... chances are that your changes make it harder for the JIT to do a good job.
Thus: please forget about such micro-optimizations. If you spend the same amount of time learning about "good OO design" and maybe "generic java performance topics" ... you will gain much more from that!
So which one is better?
The first one, since it actually compiles (there's a typo in the second). Non-compiling code is absolutely the least performant code.
But also, the first one, because, despite you saying "ignore readability and code quality", readability is almost always more important than performance. Readability is certainly what you should write code to be first, before making it performant.
And the first one, because the JIT will unroll the loop for you, it determines that has better performance on the specific JVM on which it is running, for the specific data that the code is run on (I'm assuming you don't simply like filling your logs with the same message over and over).
Don't try to second guess the optimizations the JIT will apply. Applying the optimizations yourself might make performance worse, because it makes it harder for the JIT to optimize the code. Write clear, maintainable code, and let the JVM work its magic.
Depending on the programming language. The compiler is responsible for compressing your code in such a way that it can avoid repeat calls at the low level. Object Orientated Programming with languages like java-script sit on a relatively high level, and therefore the compilation may take some time to run.
So to conclude briefly based widely on assumption from your question, the only impacts are the overall filesize and time taken to compile.
What you are doing is called "Loop Unrolling".
Check: https://en.wikipedia.org/wiki/Loop_unrolling
This is one of the optimization which compilers will do if compiled with appropriate optimization level.
In my view your second option should run faster than first one. Because in first option code has to increment loop counter, check it against loop condition and take appropriate branch decision. Branching itself can be expensive if there is a branch mis-prediction. This is because on branch mis-prediction CPUs has to clear all the pipline and restart the instruction. Check: https://en.wikipedia.org/wiki/Branch_misprediction
Related
I would like to know if there's any difference of performance between these two ways of getting the parameter value in Java:
Option 1:
for(int i=0; i<1000; i++) {
System.out.println(object.getName());
}
Option 2:
String name = object.getName();
for(int i=0; i<1000; i++) {
System.out.println(name);
}
Maybe with just 1 attribute (name), the option 2 is better, but, what if I would have 50 different attributes? I would be wasting memory storing those variables.
Please, think big, in a huge system with tons of users accessing to the WebApp.
The first option should run object.getName() 1000 times, the other loop just once.
So, yes, obviously, there should be a certain performance impact. There is also a slight semantical difference: if that name isn't immutable, other threads might change it while that loop is running. Then option 2 might pick up that change at some random point in time, whereas option 1 will not do that.
Regarding the performance aspects: in Java, it is really hard to determine the effects of such subtle code changes. When that loop runs 100K times, the Just-in-time compiler would come in and translate everything into highly optimized machine code, using techniques such as method inlining, loop unrolling, constant folding, whatnot. It might even detect that object.getName() has no side effect, and thus turn your code into something that you put into your option 2 snippet. All of that happens at runtime, depending on the profiling information that the JVM collected for the JIT while running your code.
So, the typical answer regarding "java performance": avoid stupid mistakes (invoking a method that doesn't have side effects inside a loop would be such a mistake), but don't expect that someone could tell you "yeah, option 1 will run 500 ms faster". The "real" performance boosts in java are created by the JIT (and of course: clever designs for your implementation). Thus it is extremely hard to predict what this or that source code artefact will have at runtime.
And finally: please note that using System.out.println() is pretty expensive. So when your getName() really just fetches a property from memory, then the printing of that value to the console might be multiple times more expensive compared to fetching the values!
When I was learning C, I was taught to do stuff, say, if I wanted to loop through a something strlen(string) times, I should save that value in an 'auxiliary' variable, say count and put that in the for condition clause instead of the strlen, as I'd save the need of processing that many times.
Now, when I started learning Java, I noticed this is quite not the norm. I've seen lots of code and programmers doing just what I was told not to do in C.
What's the reason for this? Are they trading efficiency for 'readibility'? Or does the compiler manage to fix that?
E: This is NOT a duplicate to the linked question. I'm not simply asking about string length, mine is a more general question.
In the old times, every function call was expensive, compilers were dumb, usable profilers yet to come, and computers slow. This way C macros and other terrible things were born. Java is not that old.
Efficiency is important, but the impact of most program parts on efficiency is very small. But reading code still needs programmers time and this is much more costly than CPU. So we'd better optimize for readability most of the time and care about speed just in the most important places.
A local variable can make the code simpler, when it avoids repetitions of complicated expressions - this happens sometimes. It can make it faster, when it avoids expensive computation which the compiler can't do - this happens rather rarely. When neither condition is met, it's just a wasted line, so why bother?
I put together a microbenchmark that seemed to show that the following types of calls took roughly the same amount of time across many iterations after warmup.
static.method(arg);
static.finalAnonInnerClassInstance.apply(arg);
static.modifiedNonFinalAnonInnerClassInstance.apply(arg);
Has anyone found evidence that these different types of calls in the aggregate will have different performance characteristics? My findings are they don't, but I found that a little surprising (especially knowing the bytecode is quite different for at least the static call) so I want to find if others have any evidence either way.
If they indeed had the same exact performance, then that would mean there was no penalty to having that level of indirection in the modified non final case.
I know standard optimization advice would be: "write your code and profile" but I'm writing a framework code generation kind of thing so there is no specific code to profile, and the choice between static and non final is fairly important for both flexibility and possibly performance. I am using framework code in the microbenchmark which I why I can't include it here.
My test was run on Windows JDK 1.7.0_06.
If you benchmark it in a tight loop, JVM would cache the instance, so there's no apparent difference.
If the code is executed in a real application,
if it's expected to be executed back-to-back very quickly, for example, String.length() used in for(int i=0; i<str.length(); i++){ short_code; }, JVM will optimize it, no worries.
if it's executed frequently enough, that the instance is mostly likely in CPU's L1 cache, the extra load of the instance is very fast; no worries.
otherwise, there is a non trivial overhead; but it's executed so infrequently, the overhead is almost impossible to detect among the overall cost of the application. no worries.
Which one is faster in java ?
a) for(int i = 100000; i > 0; i--) {}
b) for(int i = 1; i < 100001; i++) {}
I have been looking for explanation to the answer which is option a, anyone? any help is appreciated
There are situations when a reverse loop might be slightly faster in Java. Here's a benchmark showing an example. Typically, the difference is explained by implementation details in the increment/decrement instructions or in the loop-termination comparison instructions, both in the context of the underlying processor architecture. In more complex examples, reversing the loop can help eliminate dependencies and thus enable other optimizations, or can improve memory locality and caching, even garbage collection behavior.
One can not assume that either kind of loop will always be faster, for all cases - a benchmark would be needed to determine which one performs better on a given platform, for a concrete case. And I'm not even considering what the JIT compiler has to do with this.
Anyway, this is the sort of micro-optimizations that can make code more difficult to read, without providing a noticeable performance boost. And so it's better to avoid them unless strictly necessary, remember - "premature optimization is the root of all evil".
Just talking out of my hat, but I know assembly languages have specific comparisons to zero that take fewer cycles than comparisons between registered values.
Generally, Oracle HotSpot has an emphasis on optimisation in real code, which means that forward loop optimisations are more likely to be implemented than backward loops. From a machine code point of view, the decrementing loop may save an instruction, but it is unlikely to have a significant impact on performance, particularly when there is much memory access going on. I understand modern CPUs are more or less as happy going backwards as forwards (historically there was a time when they were better optimised for forward access). They'll even optimise certain stride access patterns.
(Also HotSpot (at least the Server/C2 flavour) is capable of removing empty loops.)
You told that the answer is a so, I guess an answer: java virtual machine will "translate" the comparison with zero in a faster way.
Is there a why to tell, how expensive an operation for the processor in millisecons or flops is?
I would be intrested in "instanceof", casts (I heard they are very "expensive").
Are there some studies about that?
It will depend on which JVM you're using, and the cost of many operations can vary even within the same JVM, depending on the exact situation and how much optimization the JIT has performed.
For example, a virtual method call can still be inlined by the Hotspot JIT - so long as it hasn't been overridden by anything else. In some cases with the server JIT it can still be inlined with a quick type test, for up to a couple of types.
Basically, JITs are complex enough that there's unlikely to be a meaningful general purpose answer to the question. You should benchmark your own specific situation in as real-world a way as possible. You should usually write code with primary goals of simplicit and readability - but measure the performance regularly.
The time where counting instructions or cycles could give you a good idea about the performance of some code are long gone, thanks to many, many optimizations happening on all levels of software execution.
This is especially true for VM-based languages, where the JVM can simply skip some steps because it knows that it's not necessary.
For example, I've read some time ago in an article (I'll try to find and link it eventually) that these two methods are pretty much equivalent in cost (on the HotSpot JVM, that is):
public void frobnicate1(Object o) {
if (!(foo instanceof SomeClass)) {
throw new IllegalArgumentException("Oh Noes!");
}
frobnicateSomeClass((SomeClass) o);
}
public void frobnicate2(Object o) {
frobnicateSomeClass((SomeClass) o);
}
Obviously the first method does more work, but the JVM knows that the type of o has already been checked in the if and can actually skip the type-check on the cast later on and make it a no-op.
This and many other optimizations make counting "flops" or cycles pretty much useless.
Generally speaking an instanceof check is relatively cheap. On the HotSpot JVM it boils down to a numeric check of the type id in the object header.
This classic article describes why you should "Write Dumb Code".
There's also an article from 2002 that describes how instanceof is optimized in the HotSpot JVM.
Once the JVM has warmed up most operations can be counted in nano-seconds (millionths of a milli-second) When talking about something being expensive, you usually have to say its expensive relative to an alternative. Its next to impossible to describe something as expensive in all cases.
Usually, the most important expense is your time (and other developers in your team) Using instanceof can be expensive in development and code support time because it often indicates a poor design. Using proper OOP techniques is usually a better idea. The 10 nano-second an instanceof might take, is usually relatively trivial.
The cost of specific operations performed inside the CPU is almost never relavant for performance. If performance is bad, it's almost always because of IO (network, disk) or inefficient code. Writing efficient code is much more about finding a way to reduce the overall amount of operations rather than avoiding "costly" operations (except those that are orders of magnitude more costly, like IO).