Java for loop performance

Java for loop performance - java

What is better in for loop
This:
for(int i = 0; i<someMethod(); i++)
{//some code
}
or:
int a = someMethod();
for(int i = 0; i<a; i++)
{//some code
}
Let's just say that someMethod() returns something large.
First method will execute someMethod() in each loop thus decreasing speed, second is faster but let's say that there are a lot of similar loops in application so declaring a variable vill consume more memory.
So what is better, or am I just thinking stupidly.

The second is better - assuming someMethod() does not have side effects.
It actually caches the value calculated by someMethod() - so you won't have to recalculate it (assuming it is a relatively expansive op).
If it does (has side effects) - the two code snaps are not equivalent - and you should do what is correct.
Regarding the "size for variable a" - it is not an issue anyway, the returned value of someMethod() needs to be stored on some intermediate temp variable anyway before calculation (and even if it wasn't the case, the size of one integer is negligible).
P.S.
In some cases, compiler / JIT optimizer might optimize the first code into the second, assuming of course no side effects.

If in doubt, test. Use a profiler. Measure.

Assuming the iteration order isn't relevant, and also assuming you really want to nano-optimize your code, you may do this :
for (int i=someMethod(); i-->0;) {
//some code
}
But an additional local variable (your a) isn't such a burden. In practice, this isn't much different from your second version.

If you don't need this variable after loop, there is simple way to hide it inside:
for (int count = someMethod (), i = 0; i < count; i++)
{
// some code
}

It really depends how long it takes to generate the output of someMethod(). Also the memory usage would be the same, because someMethod() first has to generate the output and stores this then. The second way safes your cpu from computing the same output every loop and it should not take more memory. So the second one is better.

I would not consider the memory consumption of the variable a as a problem as it is an int and requires 192 bit on a 64 bit machine. So I would prefer the second alternative as it execution efficiency is better.

The most important part about loop optimizations is allowing the JVM to unroll the loop. To do so in the 1st variant it has to be able to inline the call to someMethod(). Inlining has some budget and it can get busted at some point. If someMethod() is long enough the JVM may decide it doesn't like to inline.
The second variant is more helpful (to JIT compiler) and likely to work better.
my way for putting down the loop is:
for (int i=0, max=someMethod(); i<max; i++){...}
max doesn't pollute the code, you ensure no side effects from multiple calls of someMethod() and it's compact (single liner)

If you need to optimize this, then this is the clean / obvious way to do it:
int a = someMethod();
for (int i = 0; i < a; i++) {
//some code
}
The alternative version suggested by #dystroy
for (int i=someMethod(); i-->0;) {
//some code
}
... has three problems.
He is iterating in the opposite direction.
That iteration is non-idiomatic, and hence less readable. Especially if you ignore the Java style guide and don't put whitespace where you are supposed to.
There is no proof that the code will actually be faster than the more idiomatic version ... especially once the JIT compiler has optimized them both. (And even if the less readable version is faster, the difference is likely to be negligible.)
On the other hand, if someMethod() is expensive (as you postulate) then "hoisting" the call so that it is only done once is likely to be worthwhile.

I was a bit confused about the same and did a sanity test for the same with a list of 10,000,000 integers in it. Difference was more than two seconds with latter being faster:
int a = someMethod();
for(int i = 0; i<a; i++)
{//some code
}
My results on Java 8 (MacBook Pro, 2.2 GHz Intel Core i7) were:
using list object:
Start- 1565772380899,
End- 1565772381632
calling list in 'for' expression:
Start- 1565772381633,
End- 1565772384888

Related

Performance loss of continued call to array.length or list.size()

I have seen people say to cache the values of size for a list or length for an array when iterating, to save the time of checking the length/size over and over again.
So
for (int i = 0; i < someArr.length; i++) // do stuff
for (int i = 0; i < someList.size(); i++) // do stuff
Would be turned into
for (int i = 0, length = someArr.length; i < length; i++) // do stuff
for (int i = 0, size = someList.size(); i < size; i++) // do stuff
But since Array#length isn't a method, just a field, shouldn't it not have any difference? And if using an ArrayList, size() is just a getter so shouldn't that also be the same either way?

It is possible the JIT compiler will do some of those optimizations for itself. Hence, doing the optimizations by hand may be a complete waste of time.
It is also possible (indeed likely) that the performance benefit you are going to get from hand optimizing those loops is too small to be worth the effort. Think of it this way:
Most of the statements in a typical program are only executed rarely
Most loops will execute in a few microseconds or less.
Hand optimizing a program takes in the order of minutes or hours of developer time.
If you spend minutes to get a execution speedup that is measured in microseconds, you are probably wasting your time. Even thinking about it too long is wasting time.
The corollary is that:
You should benchmark your code to decide whether you need to optimize it.
You should profile your code to figure out which parts of your code is worth spending optimization effort on.
You should set (realistic) performance goals, and stop optimization when you reach those goals.
Having said all of that:
theArr.length is very fast, probably just a couple of machine instructions
theList.size() will probably also be very fast, though it depends on what List class you are using.
For an ArrayList the size() call is probably a method call + a field fetch versus a field fetch for length.
For an ArrayList the size() call is likely to be inlined by the JIT compiler ... assuming that the JIT compiler can figure that out.
The JIT compiler should be able to hoist the length fetch out of the loop. It can probably deduce that it doesn't change in the loop.
The JIT compiler might be able to hoist the size() call, but it will be harder for it to deduce that the size doesn't change.
What this means is that if you do hand optimize those two examples, you will most likely get negligible performance benefit.

In general the loss is negligible. Even a LinkedList.size() will use a stored count, and not iterate over all nodes.
For large sizes you may assume the conversion to machine code may catch up, and optimize it oneself.
If inside the loop the size is changed (delete/insert) the size variable must be changed too, which gives us even less solid code.
The best would be to use a for-each
for (Bar bar: bars) { ... }
You might also use the somewhat more costing Stream:
barList.forEach(bar -> ...);
Stream.of(barArray).forEach(bar -> ...);
Streams can be executed in parallel.
barList.parallelStream().forEach(bar -> ...);
And last but not least you may use standard java code for simple loops:
Arrays.setAll(barArray, i -> ...);
We are talking here about micro-optimisations. I would go for elegance.
Most often the problem is the used algorithm & datastructurs. List is notorious, as everything can be a List. However Set or Map often provide much higher power/expressiveness.
If a complex piece of software is slow, profile the application. Check the break lines: java collections versus database queries, file parsing.

Overuse of Method-chaining in Java

I see a lot of this kind of code written by Java developers and Java instructors:
for ( int x = 0 ; x < myArray.length ; x++ )
accum += (mean() - myArray[x]) * (mean() - myArray[x] );
I am very critical of this because mean() is being invoked twice for every element in the array, when it only has to be invoked once:
double theMean = mean();
for ( int x = 0 ; x < myArray.length ; x++ )
accum += (theMean - myArray[x]) * (theMean - myArray[x]);
Is there something about optimization in Java that makes the first example acceptable? Should I stop riding developers about this?
*** More information. An array of samples is stored as an instance variable. mean() has to traverse the array and calculate the mean every time it is invoked.

You are right. Your way (second code sample) is more efficient. I don't think Java can optimize the first code sample to call mean() just once and re-use its return value, since mean() might have side effects, so the compiler can't decide to call it once if your code calls it twice.

Leave your developers alone, it's fine -- it's readable and it works, without introducing unnecessary names and variables.
Optimization should only ever be done under the guidance of a performance monitoring tool which can show you where you're actually slow. And, typically, performance is enhanced more effectively by considering the large scale architecture of an application, not line by line bytecode optimization, which is expensive and usually unhelpful.

Your version will likely run faster, though an optimizing compiler may be able to detect if the mean() method returns the same value every time (e.g. if the value is hard-coded or stored in a field) and eliminate the method call.
If you are recommending this change for efficiency reasons, you may be falling foul of premature optimization. You don't really know where the bottlenecks are in your system until you measure in the appropriate environment under appropriate loads. Even then, improved hardware is often more cost-effective solution than developer time.
If you are recommending it because it will eliminate duplication then I think you might be on stronger ground. If the mean() method took arguments too, it would be especially reasonable to pull that out of the loop and call the method once and only once.

Yes, some compilers will optimize this to just what you say.
Yes, you should stop riding developers about this.
I think your preferred way is better, but not mostly because of the optimization. It is more clear that the value is the same in both places if it does not involve a method call, particularly in cases where the method call is more complex than the one you have here.
For that matter, I think it's better to write
double theMean = mean();
for (int x=0; x < myArray.length; x++)
{ double curValue = myArray[x];
double toSquare = theMean - curValue;
accum += toSquare * toSquare;
}
Because it makes it easier to determine that you are squaring whatever is being accumulated, and just what it is that's being sqaured.

Normally the compiler will not optimize the method call since it cannot know whether the return value would be the same (this is especially true when mean processes an array as it has no way of checking whether the result can be cached). So yes the mean() method would be invoked twice.
In this case, if you know for sure that the array is kept the same regardless of the values of x and accum in the loop (more generally, regardless of any change in the program values), then the second code is more optimal.

How to write Java for loops to avoid repeatedly computing the upper bound

I generally write
for (int i = 0, n = someMethod(); i < n; i++)
in preference to
for (int i = 0; i < someMethod(); i++)
to avoid someMethod() being computed repeatedly. However I'm never really sure when I need to do this. How clever is Java at recognising methods that will give the same result every time and only need to be executed once at the beginning of a loop?

I believe the onus is on you, as the programmer, to identify cases where the upper bound of your for-loop needs to be calculated on the fly and address them accordingly.
Assuming that the value of n does not depend upon some operation that is performed within the loop, personally, I would prefer it written as
int n = someMethod();
for (int i = 0; i < n; i++);
because it preserves the most common style of for-loop while unambiguously defining the upper bound.

The JIT, as far as I can tell, will only detect this if it's a fairly simple inlineable method. That being said it's very easy for a programmer to detect these cases and is a hard problem for a JIT compiler to detect. Preferably you should use final int to cache results from large methods, as the JIT can very easily detect that value can't change and can even remove array access checks to speed up loops.
Something like
int[] arr = new int[ 10 ];
for( int i = 0; i < arr.length; i++ ) {
//...
}
or
List< String > list = Arrays.asList( new String[] { ... } );
for( int i = 0; i < list.size(); i++ ) {
//...
}
can probably be very easily optimized by the JIT. Other loops that call large or complicated methods can't easily be proven to always return the same value, but methods like size() probably could be inlined or even removed completely.
Finally with for-each loops on arrays. They are decayed to the first loop I posted in the case of arrays, and can also easily be optimized to produce the quickest loop. Although for-each loops on non-arrays, I prefer to avoid when it comes to quick loops, as they decay into Iterator loops and not the second loop I posted. That is not true for LinkedList because an Iterator is faster than using get() due to O( n ) traversal.
This is all speculation on what the JIT could do to optimize a loop. It's important to know that the JIT will only optimize something it can prove will not change the resulting effects. Keeping things simple will make the JIT's job much easier. Like using the final keyword. Using final on values or on methods allows the JIT to easily prove that won't change and can inline like crazy. That's the JIT's most important optimization, inlining. Make that job easy and the JIT will help you out in a big way.
Here is a link discussing loop optimizations where the JIT can't always optimize a loop if it can't prove it's optimization won't change anything.

Best Practice in `For` loop in java [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
for loop optimization
In java i have a block of code:
List e = {element1, element2, ...., elementn};
for(int i = 0; i < e.size(); i++){//Do something in here
};
and another block:
List e = {element1, element2, ...., elementn};
int listSize = e.size();
for(int i = 0; i < listSize; i++){//Do something in here
};
I think that the second block is better, because in the first block, if i++, we have to calculate e.size() one more times to compare the condition in the for loop. Is it right or wrong?
And comparing the two block above, what is the best practice for writing for? And why?Explain clearly and try this loop yourself

Personally I'd use the enhanced for statement instead:
for (Object element : e) {
// Use element
}
Unless you need the index, of course.
If I had to use one of the two forms, I'd use the first as it's tidier (it doesn't introduce another local variable which is only used in that loop), until I had concrete evidence that it was causing a problem. (In most list implementations, e.size() is a simple variable access which can be inlined by the JIT anyway.)

Usually, the most brief and readable code is the best choice, all things being equal. In the case of Java, the enhanced for loop (which works with any class that implements Iterable) is the way to go.
for (Object object : someCollection) { // do something }
In terms solely of the two you posted, I think the first is the better option. It's more readable, and you have to remember that, under the hood, JIT will attempt to optimize a great deal of the code you write anyway.
EDIT: Have you heard the phrase "premature optimisation is the root of all evil"? Your second block is an example of premature optimisation.

If you check the size() implementation on a LinkedList class, you will find that the size is incemented or decremented when an element is added or removed from the list.
Calling size() just returns the value of this property and does not involve any calculation.
So directly calling size() method should be better as you will save on the save for another integer.

I would always use (if you need an index variable):
List e = {element1, element2, ...., elementn};
for(int i = 0, size = e.size(); i < size; i++){
// Do something in here
};
Since e.size() could be an expensive operation.
Your 2nd option is not good, since it introduces a new variable outside of the for loop. I recommend to keep variable visibility as limited as possible.
Otherwise a
for (MyClass myObj : list) {
// Do something here
}
is even cleaner, but might introduce a small performance hit (the index approach doesn't require to instantiate an Iterator).

Yes, the second form is marginally more efficient as you don't repeated perform the size() method invocation. Compilers are good are doing this sort of optimisation themselves.
However, it's unlikely that this would be the performance bottleneck of your application. Avoid premature optimisation. Make your code clean and readable foremost.

HotSpot will move e.size() from cycle in most cases. So it will calculate size of List only once.
As for me I prefer the following notation:
for (Object elem: e) {
//Do something
}

i think this should be much more better..
may be initializing the int variable every time can be escaped from this..
List e = {element1, element2, ...., elementn};
int listSize = e.size();
int i=0;
for(i = 0; i < listSize; i++){//Do something in here
};

Second one is better approach because in the first block, you are calling the e.size() is a method which is an operation in a loop that is a extra burden to JVM.

Im not so sure but i think the optimizer of java will replace the value with a static value, so in the end it will be the same.

To avoid all this numbering and iterators and checkings in writing the code use the following simple most readable code that has its performance to maximum.
Why this has maximum performance (details are coming up)
for (Object object : aCollection) {
// Do something here
}
If the index is needed then:
To choose between the above two forms:
The second is the better as you said because it only calculated the size once.

I think now we have a tendency to write short and understandable code, so the first option is better.

the second is better , cos in the firt loop in the body of it maybe u will do this statment
e.remove, and then the size of e will be changed , so it is better to save the size in a parameter before the looop

How bad is declaring arrays inside a for loop in Java?

I come from a C background, so I admit that I'm still struggling with letting go of memory management when writing in Java. Here's one issue that's come up a few times that I would love to get some elaboration on. Here are two ways to write the same routine, the only difference being when double[] array is declared:
Code Sample 1:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Code Sample 2:
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Here, private double[] calculateSomethingAndReturnAnArray(int i) always returns an array of the same length. I have a strong aversion to Code Sample 2 because it creates a new array for each iteration when it could just overwrite the existing array. However, I think this might be one of those times when I should just sit back and let Java handle the situation for me.
What are the reasons to prefer one of the ways over the other or are they truly identical in Java?

There's nothing special about arrays here because you're not allocating for the array, you're just creating a new variable, it's equivalent to:
Object foo;
for(...){
foo = func(...);
}
In the case where you create the variable outside the loop it, the variable (which will hold the location of the thing it refers to) will only ever be allocated once, in the case where you create the variable inside the loop, the variable may be reallocated for in each iteration, but my guess is the compiler or the JIT will fix that in an optimization step.
I'd consider this a micro-optimization, if you're running into problems with this segment of your code, you should be making decisions based on measurements rather than on the specs alone, if you're not running into issues with this segment of code, you should do the semantically correct thing and declare the variable in the scope that makes sense.
See also this similar question about best practices.

A declaration of a local variable without an initializing expression will do NO work whatsoever. The work happens when the variable is initialized.
Thus, the following are identical with respects to semantics and performance:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
// ...
}
and
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
// ...
}
(You can't even quibble that the first case allows the array to be used after the loop ends. For that to be legal, array has to have a definite value after the loop, and it doesn't unless you add an initializer to the declaration; e.g. double[] array = null;)
To elaborate on #Mark Elliot 's point about micro-optimization:
This is really an attempt to optimize rather than a real optimization, because (as I noted) it should have no effect.
Even if the Java compiler actually emitted some non-trivial executable code for double[] array;, the chances are that the time to execute would be insignificant compared with the total execution time of the loop body, and of the application as a whole. Hence, this is most likely to be a pointless optimization.
Even if this is a worthwhile optimization, you have to consider that you have optimized for a specific target platform; i.e. a particular combination of hardware and JVM version. Micro-optimizations like this may not be optimal on other platforms, and could in theory be anti-optimizations.
In summary, you are most likely wasting your time if you focus on things like this when writing Java code. If performance is a concern for your application, focus on the MACRO level performance; e.g. things like algorithmic complexity, good database / query design, patterns of network interactions, and so on.

Both create a new array for each iteration. They have the same semantics.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.