I would like to know if there's any difference of performance between these two ways of getting the parameter value in Java:
Option 1:
for(int i=0; i<1000; i++) {
System.out.println(object.getName());
}
Option 2:
String name = object.getName();
for(int i=0; i<1000; i++) {
System.out.println(name);
}
Maybe with just 1 attribute (name), the option 2 is better, but, what if I would have 50 different attributes? I would be wasting memory storing those variables.
Please, think big, in a huge system with tons of users accessing to the WebApp.
The first option should run object.getName() 1000 times, the other loop just once.
So, yes, obviously, there should be a certain performance impact. There is also a slight semantical difference: if that name isn't immutable, other threads might change it while that loop is running. Then option 2 might pick up that change at some random point in time, whereas option 1 will not do that.
Regarding the performance aspects: in Java, it is really hard to determine the effects of such subtle code changes. When that loop runs 100K times, the Just-in-time compiler would come in and translate everything into highly optimized machine code, using techniques such as method inlining, loop unrolling, constant folding, whatnot. It might even detect that object.getName() has no side effect, and thus turn your code into something that you put into your option 2 snippet. All of that happens at runtime, depending on the profiling information that the JVM collected for the JIT while running your code.
So, the typical answer regarding "java performance": avoid stupid mistakes (invoking a method that doesn't have side effects inside a loop would be such a mistake), but don't expect that someone could tell you "yeah, option 1 will run 500 ms faster". The "real" performance boosts in java are created by the JIT (and of course: clever designs for your implementation). Thus it is extremely hard to predict what this or that source code artefact will have at runtime.
And finally: please note that using System.out.println() is pretty expensive. So when your getName() really just fetches a property from memory, then the printing of that value to the console might be multiple times more expensive compared to fetching the values!
Related
I take the following example to illustrate example but note that it could be any other task.
for (int i =0; i< 1000; i++){
String a= "world";
Log.d("hello",a);
}
Versus
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
String a="";
a="world";
Log.d("hello",a);
...
//1000 times
Let's ignore readability and code quality, just performance of the compiled program.
So which one is better?
The one and only answer: this doesn't matter at all.
What I mean is: aspects such as readability and a good design are much more worth spending your time on.
When, and only when you managed to design a superb application; and you are down to the point that you have a performance problem and you would have to look into this specific question; well, then there would be merit in looking into it. But I am 99.9% sure: you are not at this point. And in that sense: you are wasting your time with such thoughts!
In other words: especially within the "JVM stack" there are tons and tons of things that can have subtle or not-so-subtle effects on performance. Like: there are many many options that influence the inner works of garbage collection and what the Just-in-time compiler is doing. That will affect the "performance" of your running application in much more significant ways than manually un-rolling loop code.
And for the record: many benchmarking experiments showed that the JIT does do its best job if you give "normal" code to it. Meaning: the Oracle JIT is designed to give you "best results" on "normal" input. But as soon as you start to fiddle around with your java code, in order to somehow create "optimized" code ... chances are that your changes make it harder for the JIT to do a good job.
Thus: please forget about such micro-optimizations. If you spend the same amount of time learning about "good OO design" and maybe "generic java performance topics" ... you will gain much more from that!
So which one is better?
The first one, since it actually compiles (there's a typo in the second). Non-compiling code is absolutely the least performant code.
But also, the first one, because, despite you saying "ignore readability and code quality", readability is almost always more important than performance. Readability is certainly what you should write code to be first, before making it performant.
And the first one, because the JIT will unroll the loop for you, it determines that has better performance on the specific JVM on which it is running, for the specific data that the code is run on (I'm assuming you don't simply like filling your logs with the same message over and over).
Don't try to second guess the optimizations the JIT will apply. Applying the optimizations yourself might make performance worse, because it makes it harder for the JIT to optimize the code. Write clear, maintainable code, and let the JVM work its magic.
Depending on the programming language. The compiler is responsible for compressing your code in such a way that it can avoid repeat calls at the low level. Object Orientated Programming with languages like java-script sit on a relatively high level, and therefore the compilation may take some time to run.
So to conclude briefly based widely on assumption from your question, the only impacts are the overall filesize and time taken to compile.
What you are doing is called "Loop Unrolling".
Check: https://en.wikipedia.org/wiki/Loop_unrolling
This is one of the optimization which compilers will do if compiled with appropriate optimization level.
In my view your second option should run faster than first one. Because in first option code has to increment loop counter, check it against loop condition and take appropriate branch decision. Branching itself can be expensive if there is a branch mis-prediction. This is because on branch mis-prediction CPUs has to clear all the pipline and restart the instruction. Check: https://en.wikipedia.org/wiki/Branch_misprediction
Here is the piece of code I'm questioning about
for (int i = 0; i < this.options.size(); i++) {
RadioButton butt = this.options.get(i);
//do something with butt
}
would I gain a huge performance improvement if I changed it to:
RadioButton butt;
for (int i = 0; i < this.options.size(); i++) {
butt = this.options.get(i);
//do something with butt
}
EDIT: how about if this code is to be executed 30-50 times a second with options being around size 20?
For all realistic, measurable cases, there is absolutely no difference between the two performance wise. In fact, I'm pretty sure (admittedly I don't know for sure) they result in the exact same number of assignments and reference creations. It would be stupid for the JVM to create N number of reference holders. It would simply reuse the one created during the first iteration, simply giving it the reference in the next assignment. Which means only one reference holder is used for both cases (assuming this is true).
You're not creating objects here, you're just creating references, and whether you're creating one reference or more doesn't really matter.
Looking at the title, I knew this was going to be yet-another-misguided-performance-question.
A couple of things:
No, those are virtually identical except for the scope of the variable.
In general, if you're worried about micro-optimizations like that, you're spending your time on entirely the wrong thing. In this case it's moot since there is no difference, but even if you were talking about e.g. one assignment:
The difference is nanoseconds and completely negligible compared to other things you are doing.
The compiler is much smarter than you about optimizing.
The JVM interpreter and hotspot compiler are far smarter than you as well.
If you haven't set clear performance requirements, and you haven't determined that your code does not meet those requirements, and you haven't profiled your code and determined where the bottleneck is, you have no business asking optimization questions like this.
As for the GC comment you made in another answer: The GC happens in the background, is intelligent, and makes decisions that you have absolutely zero control over (aside from JVM command line tuning -- don't get excited, based on the fact that you asked this question, you probably aren't equipped to make good decisions about tuning parameters). Moving the reference from one place to another gives you no significant measure of control over how the GC handles it. Each time through the loop the previous reference is no longer reachable, the GC will clean it up at an undefined point in the future.
I think code and performance is almost same only looks different. You are not creating new instances but only copy references of objects from your collection.
But i like and usually use second approach.
The difference is not huge as assignment of the object is the biggest cost here. Also, the compiler will make your code more efficient, so in the end it is the same cost in performance.
In both cases you are creating the RadioButton object in the loop because RadioButton butt it's only a refernce and not an instance of the object. Presumably is this.option.get(i) which creates your object.
So my answer is: no.
The only thing that changes is that in the second loop you're creating this.options.size()-times the reference butt
Consider a simple following code in Java:
void func(String test)
{
if(str.length() > 0)
{
//do something
}
}
Does executing str.length() > 0 means that every time this function is called, 4 bytes of memory will be allocated to store 0 integer value ?
The memory needed to run this function (including the 0) would be part of the compiled program (.class / .jar/.apk), and has nothing to do with how many times the function is run. Even if the function is inlined, only the code size grows based on how many different locations the function is called, and there is NO memory allocation in run time, while the code runs.
Meanwhile 2 comments
There are far bigger issues with hardcoding.
I doubt length > 0 counts as hardcoding in any but the strictest sense.
If you write clean, clear and simple code the JIT will optimise the code best in 95+% of cases. If you attempt to out smart it, it is far more likely you will make the code worse, not better.
There are been some notable exceptions to this rule, but these tend to only last a few years. For example Locks in Java 5.0 were much faster than synchronized, however in Java 7 synchronized can be much faster.
When considering performance you should look at the behaviour of the whole system, not individual lines of code or even individual libraries. Failing to do this can mean you spend time worrying about something which makes no difference while a much more important thing is being ignored.
I have seen whole teams work on optimising a piece of a system for years when they could have made the whole thing faster just be changing a configuration setting. This is because they limited their view to the code they were writing, and didn't consider how they used the systems they connected to. Imagine wasting years of work when they could have seen more of a speed up with something trivial, and make sure this doesn't happen to you. ;)
No memory allocations are done when this code executes.
Its not looking anything serious in the above code as memory allocation point of view.
if(str.length() > 0) { }
Here its genuine requirement for comparision so it wont be consider as hard coding values.
If you are very strict towards memory utilization then always pick exact required data type.
This code is method local, hence after executing memory will reclaim automatically.
AS it is inside method.
Yes, but destroyed immediately after exiting the function. int is primitive type. Primitive types are considered the fastest in Java. So, I think it won't cost much.
I put together a microbenchmark that seemed to show that the following types of calls took roughly the same amount of time across many iterations after warmup.
static.method(arg);
static.finalAnonInnerClassInstance.apply(arg);
static.modifiedNonFinalAnonInnerClassInstance.apply(arg);
Has anyone found evidence that these different types of calls in the aggregate will have different performance characteristics? My findings are they don't, but I found that a little surprising (especially knowing the bytecode is quite different for at least the static call) so I want to find if others have any evidence either way.
If they indeed had the same exact performance, then that would mean there was no penalty to having that level of indirection in the modified non final case.
I know standard optimization advice would be: "write your code and profile" but I'm writing a framework code generation kind of thing so there is no specific code to profile, and the choice between static and non final is fairly important for both flexibility and possibly performance. I am using framework code in the microbenchmark which I why I can't include it here.
My test was run on Windows JDK 1.7.0_06.
If you benchmark it in a tight loop, JVM would cache the instance, so there's no apparent difference.
If the code is executed in a real application,
if it's expected to be executed back-to-back very quickly, for example, String.length() used in for(int i=0; i<str.length(); i++){ short_code; }, JVM will optimize it, no worries.
if it's executed frequently enough, that the instance is mostly likely in CPU's L1 cache, the extra load of the instance is very fast; no worries.
otherwise, there is a non trivial overhead; but it's executed so infrequently, the overhead is almost impossible to detect among the overall cost of the application. no worries.
First of all, i would like to know what is the fundamental difference between loop optimization and transformation , also
A simple loop in C follows:
for (i = 0; i < N; i++)
{
a[i] = b[i]*c[i];
}
but we can unroll it to:
for (i = 0; i < N/2; i++)
{
a[i*2] = b[i*2]*c[i*2];
a[i*2 + 1] = b[i*2 + 1]*c[i*2 + 1];
}
but further we can unroll it..but what is the limit till which we can unroll it, and how do we find that.
There are many more techniques like Loop Tilling,Loop Distribution,etc. , how to determine when to use the appropriate one.
I will assume that the OP has already profiled his/her code and has discovered that this piece of code is actually important, and actually answer the question :-) :
The compiler will try to make the loop unrolling decision based on what it knows about your code and the processor architecture.
In terms of making things faster.
As someone pointed out, unrolling does reduce the number of loop termination condition compares and jumps.
Depending on the architecture, the hardware may also support an efficient way to to index near memory locations (E.g., mov eax, [ebx + 4]), without adding additional instructions (this may expand to more micro-ops though - not sure).
Most modern processors use out of order execution, to find instruction level parallelism. This is hard to do, when the next N instructions are after multiple conditional jumps (i.e., the hardware would need to be able to discard variable levels of speculation).
There is more opportunity to reorder memory operations earlier so that the data fetch latency is hidden.
Code vectorization (e.g., converting to SSE/AVX), may also occur which allows parallel execution of the code in some cases. This is also a form of unrolling.
In terms of deciding when to stop unrolling:
Unrolling increases code size. The compiler knows that there are penalties for exceeding instruction code cache size (all modern processors), trace cache(P4), loop buffer cache(Core2/Nehalem/SandyBridge), micro-op cache(SandyBridge), etc. Ideally it uses static cost-benefit heuristics (a function of the specic code and architecture) to determine which level of unrolling will result in the best overall net performance. Depending on the compiler, the heurstics may vary (often I find that it would be nice to tweak this oneself).
Generally, if the loop contains a large amount of code it is less likely to be unrolled because the loop cost is already amortized, there is plenty of ILP available, and the code bloat cost of unrolling is excessive. For smaller pieces of code, the loop is likely to be unrolled, since the cost is likely to be low. The actual number of unrolls will depend on the specifics of the architecture, compiler heuristics and code, and will be what the compiler decides is optimal (it may not be :-) ).
In terms of when YOU should be doing these optimizations:
When you don't think the compiler did the correct thing. The compiler may not be sophisticated (or sufficiently up to date) enough to use the knowledge of the architecture you are working on optimally.
Possibly, the heuristics just failed (they are just heuristics after all). In general, if you know the piece of code is very important, try unroll it, and if it improved performance, keep it, otherwise throw it out. Also, only do this when you have roughly the whole system in place, since what may be beneficial, when your code working set is 20k, may not be beneficial when your code working set is 31k.
This may seem rather off topic to your question but I cannot but stress the importance of this.
The key is to write a correct code and get your code working as per the requirement without being bothered about micro optimization.
If later you find your program to be lacking in performance then you profile!! your application to find the problem areas and then try to optimize them.
Remember as one of the wise guys said It is only 10% of your code which runs 90% of the total run time of your application trick is to identify that code through profiling and then try to optimize it.
Well considering that your first attempt at optimizing is already wrong in 50% of all cases I really wouldn't try anything more complex (try any odd number).
Also instead of multiplying your indices, just add 2 to i and loop up to N again - avoids the unnecessary shifting (minor effect as long as we stay with powers of 2, but still)
To summarize: You created incorrect, slower code than what a compiler could do - well that's the perfect example of why you shouldn't do this stuff I assume.