Are these two constructs equivalent?
char[] arr = new char[5];
for (char x : arr) {
// code goes here
}
Compared to:
char[] arr = new char[5];
for (int i = 0; i < arr.length; i++) {
char x = arr[i];
// code goes here
}
That is, if I put exactly the same code in the body of both loops (and they compile), will they behave exactly the same???
Full disclaimer: this was inspired by another question (Java: are these 2 codes the same). My answer there turned out not to be the answer, but I feel that the exact semantics of Java for-each has some nuances that needs pointing out.
While often the two constructs are interchangeable, THEY ARE NOT 100% EQUIVALENT!!!
A proof can be constructed by defining // code goes here that would cause the two constructs to behave differently. One such loop body is:
arr = null;
Therefore, we are now comparing:
char[] arr = new char[5];
for (char x : arr) {
arr = null;
}
with:
char[] arr = new char[5];
for (int i = 0; i < arr.length; i++) {
char x = arr[i];
arr = null;
}
Both code compiles, but if you run them, you will find that the first loop terminates normally, while the second loop will throw a NullPointerException.
This means that they are not 100% equivalent! There are scenarios where the two constructs will behave differently!
Such scenarios are likely to be rare, but this fact should not be forgotten when debugging, because otherwise you might miss some really subtle bugs.
As an addendum, note that sometimes the for-each construct is not even an option, e.g. if you need the index. The crucial lesson here is that even if it's an option, you need to make sure that it's actually an equivalent substitute, because it's not always guaranteed
Similarly, if you start with a for-each loop and later realized that you need to switch to the indexed for loop, make sure that you're preserving the semantics, because it's not guaranteed.
In particular, _be wary of any modification to the reference of the array/collection being iterated_ (modification to the content may/may not trigger ConcurrentModificationException, but that's a different issue).
Guaranteeing semantics preservation is also a lot more difficult when you use collections that use custom iterators, but as this example shows, the two constructs are different even when simple arrays are involved.
They are pretty much equivalent. But there can be few cases where they are not. It is best to make it final.
final char[] arr = new char[5]; // Now arr cannot be set null as shown
// in above answer.
Even then you can do i-- in the second loop. If you don't do some these unlikely things they are mostly equivalent.
The first one is more readable and so more preferable, if you're going to need the index anyway then plump for the second.
Please note that the most important difference is implied by the answer that "polygenelubricants" gave, but not explicitly stated: for-each iterates through the array but you can't modify any of the elements using the instance that the loop gives you (in this case, the variable "char x"). The classic loop allows you to use the index and alter an element of the array.
Edit: quick correction.
Related
I generally write
for (int i = 0, n = someMethod(); i < n; i++)
in preference to
for (int i = 0; i < someMethod(); i++)
to avoid someMethod() being computed repeatedly. However I'm never really sure when I need to do this. How clever is Java at recognising methods that will give the same result every time and only need to be executed once at the beginning of a loop?
I believe the onus is on you, as the programmer, to identify cases where the upper bound of your for-loop needs to be calculated on the fly and address them accordingly.
Assuming that the value of n does not depend upon some operation that is performed within the loop, personally, I would prefer it written as
int n = someMethod();
for (int i = 0; i < n; i++);
because it preserves the most common style of for-loop while unambiguously defining the upper bound.
The JIT, as far as I can tell, will only detect this if it's a fairly simple inlineable method. That being said it's very easy for a programmer to detect these cases and is a hard problem for a JIT compiler to detect. Preferably you should use final int to cache results from large methods, as the JIT can very easily detect that value can't change and can even remove array access checks to speed up loops.
Something like
int[] arr = new int[ 10 ];
for( int i = 0; i < arr.length; i++ ) {
//...
}
or
List< String > list = Arrays.asList( new String[] { ... } );
for( int i = 0; i < list.size(); i++ ) {
//...
}
can probably be very easily optimized by the JIT. Other loops that call large or complicated methods can't easily be proven to always return the same value, but methods like size() probably could be inlined or even removed completely.
Finally with for-each loops on arrays. They are decayed to the first loop I posted in the case of arrays, and can also easily be optimized to produce the quickest loop. Although for-each loops on non-arrays, I prefer to avoid when it comes to quick loops, as they decay into Iterator loops and not the second loop I posted. That is not true for LinkedList because an Iterator is faster than using get() due to O( n ) traversal.
This is all speculation on what the JIT could do to optimize a loop. It's important to know that the JIT will only optimize something it can prove will not change the resulting effects. Keeping things simple will make the JIT's job much easier. Like using the final keyword. Using final on values or on methods allows the JIT to easily prove that won't change and can inline like crazy. That's the JIT's most important optimization, inlining. Make that job easy and the JIT will help you out in a big way.
Here is a link discussing loop optimizations where the JIT can't always optimize a loop if it can't prove it's optimization won't change anything.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
for loop optimization
In java i have a block of code:
List e = {element1, element2, ...., elementn};
for(int i = 0; i < e.size(); i++){//Do something in here
};
and another block:
List e = {element1, element2, ...., elementn};
int listSize = e.size();
for(int i = 0; i < listSize; i++){//Do something in here
};
I think that the second block is better, because in the first block, if i++, we have to calculate e.size() one more times to compare the condition in the for loop. Is it right or wrong?
And comparing the two block above, what is the best practice for writing for? And why?Explain clearly and try this loop yourself
Personally I'd use the enhanced for statement instead:
for (Object element : e) {
// Use element
}
Unless you need the index, of course.
If I had to use one of the two forms, I'd use the first as it's tidier (it doesn't introduce another local variable which is only used in that loop), until I had concrete evidence that it was causing a problem. (In most list implementations, e.size() is a simple variable access which can be inlined by the JIT anyway.)
Usually, the most brief and readable code is the best choice, all things being equal. In the case of Java, the enhanced for loop (which works with any class that implements Iterable) is the way to go.
for (Object object : someCollection) { // do something }
In terms solely of the two you posted, I think the first is the better option. It's more readable, and you have to remember that, under the hood, JIT will attempt to optimize a great deal of the code you write anyway.
EDIT: Have you heard the phrase "premature optimisation is the root of all evil"? Your second block is an example of premature optimisation.
If you check the size() implementation on a LinkedList class, you will find that the size is incemented or decremented when an element is added or removed from the list.
Calling size() just returns the value of this property and does not involve any calculation.
So directly calling size() method should be better as you will save on the save for another integer.
I would always use (if you need an index variable):
List e = {element1, element2, ...., elementn};
for(int i = 0, size = e.size(); i < size; i++){
// Do something in here
};
Since e.size() could be an expensive operation.
Your 2nd option is not good, since it introduces a new variable outside of the for loop. I recommend to keep variable visibility as limited as possible.
Otherwise a
for (MyClass myObj : list) {
// Do something here
}
is even cleaner, but might introduce a small performance hit (the index approach doesn't require to instantiate an Iterator).
Yes, the second form is marginally more efficient as you don't repeated perform the size() method invocation. Compilers are good are doing this sort of optimisation themselves.
However, it's unlikely that this would be the performance bottleneck of your application. Avoid premature optimisation. Make your code clean and readable foremost.
HotSpot will move e.size() from cycle in most cases. So it will calculate size of List only once.
As for me I prefer the following notation:
for (Object elem: e) {
//Do something
}
i think this should be much more better..
may be initializing the int variable every time can be escaped from this..
List e = {element1, element2, ...., elementn};
int listSize = e.size();
int i=0;
for(i = 0; i < listSize; i++){//Do something in here
};
Second one is better approach because in the first block, you are calling the e.size() is a method which is an operation in a loop that is a extra burden to JVM.
Im not so sure but i think the optimizer of java will replace the value with a static value, so in the end it will be the same.
To avoid all this numbering and iterators and checkings in writing the code use the following simple most readable code that has its performance to maximum.
Why this has maximum performance (details are coming up)
for (Object object : aCollection) {
// Do something here
}
If the index is needed then:
To choose between the above two forms:
The second is the better as you said because it only calculated the size once.
I think now we have a tendency to write short and understandable code, so the first option is better.
the second is better , cos in the firt loop in the body of it maybe u will do this statment
e.remove, and then the size of e will be changed , so it is better to save the size in a parameter before the looop
I have a loop like:
String tmp;
for(int x = 0; x < 1000000; x++) {
// use temp
temp = ""; // reset
}
This string is holding at most 10 characters.
What would be the most effecient way of creating a variable for this use case?
Should I use a fixed size array? Or a stringbuffer instead?
I don't want to create 1million variables when I don't have to, and it matters for this method (performance).
Edit
I simplified my scenerio, I actually need this variable to be at the class level scope as there are some events that take place i.e. it can't be declared within the loop.
Why not simply declare temp inside the loop like so:
for(int x = 0; x < 1000000; x++) {
String temp;
// use temp
}
You even get a very (very, very) slight performance increase because you don't have to waste time resetting the value of temp to "".
With regards to your update, It still depends on what you do with temp but a StringBuffer would probably be the easiest to use. And especially if you need to concatenate together a Sting, it would be quite fast.
What exactly are you looking to do with tmp (or temp)?
Honestly, I'd just try declaring your variables within the loop if they aren't needed afterwards, and profile it. Many of the obscurities that have been used in the past to help with performance issues within loops are no longer needed in recent versions of Java, due to optimizations and other improvements in the compiler and the Hotspot JVM.
Whats the problem with using fixed array? I think array will do. Here is similar question i found Making a very large Java array
Well, stringbuffer or StringBuilder will do too. But stringBuilder is fast than stringBuffer.
And if it based on the performance level, i think you might want to check the types of loops that give better performance.
Try this
public class Robal {
public void looping()
{
for(int x = 0; x < 1000000; x++) {
String temp=x+"";
System.out.println(temp);
temp = ""; // reset
}
}
The answer really depends on what you do with temp in the loop.
String instances are immutable by definition. If your processing includes string manipulation, you should not use String since you'll end up creating a lot of unnecessary very short-lived immutable instances. In this case use StringBuilder (or StringBuffer if thread-safety is required) instead.
If you merely create a new String (or obtain it from an external source) in every iteration and use it without any string manipulation operations that create new String objects, then you're OK using String. Note that creating a new String instance every iteration is usually quite fast and unless your profiler specifically points to this being a problem, you should not attempt to optimize this prematurely.
Note, also, that unless you specifically rely in each iteration on temp initial value being a reference to an empty string, there is no need to do temp = ""
I come from a C background, so I admit that I'm still struggling with letting go of memory management when writing in Java. Here's one issue that's come up a few times that I would love to get some elaboration on. Here are two ways to write the same routine, the only difference being when double[] array is declared:
Code Sample 1:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Code Sample 2:
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
if (someFunctionOnArrays(array)) {
// DO ONE THING
} else {
// DO SOME OTHER THING
}
}
Here, private double[] calculateSomethingAndReturnAnArray(int i) always returns an array of the same length. I have a strong aversion to Code Sample 2 because it creates a new array for each iteration when it could just overwrite the existing array. However, I think this might be one of those times when I should just sit back and let Java handle the situation for me.
What are the reasons to prefer one of the ways over the other or are they truly identical in Java?
There's nothing special about arrays here because you're not allocating for the array, you're just creating a new variable, it's equivalent to:
Object foo;
for(...){
foo = func(...);
}
In the case where you create the variable outside the loop it, the variable (which will hold the location of the thing it refers to) will only ever be allocated once, in the case where you create the variable inside the loop, the variable may be reallocated for in each iteration, but my guess is the compiler or the JIT will fix that in an optimization step.
I'd consider this a micro-optimization, if you're running into problems with this segment of your code, you should be making decisions based on measurements rather than on the specs alone, if you're not running into issues with this segment of code, you should do the semantically correct thing and declare the variable in the scope that makes sense.
See also this similar question about best practices.
A declaration of a local variable without an initializing expression will do NO work whatsoever. The work happens when the variable is initialized.
Thus, the following are identical with respects to semantics and performance:
double[] array;
for (int i=0; i<n; ++i) {
array = calculateSomethingAndReturnAnArray(i);
// ...
}
and
for (int i=0; i<n; ++i) {
double[] array = calculateSomethingAndReturnAnArray(i);
// ...
}
(You can't even quibble that the first case allows the array to be used after the loop ends. For that to be legal, array has to have a definite value after the loop, and it doesn't unless you add an initializer to the declaration; e.g. double[] array = null;)
To elaborate on #Mark Elliot 's point about micro-optimization:
This is really an attempt to optimize rather than a real optimization, because (as I noted) it should have no effect.
Even if the Java compiler actually emitted some non-trivial executable code for double[] array;, the chances are that the time to execute would be insignificant compared with the total execution time of the loop body, and of the application as a whole. Hence, this is most likely to be a pointless optimization.
Even if this is a worthwhile optimization, you have to consider that you have optimized for a specific target platform; i.e. a particular combination of hardware and JVM version. Micro-optimizations like this may not be optimal on other platforms, and could in theory be anti-optimizations.
In summary, you are most likely wasting your time if you focus on things like this when writing Java code. If performance is a concern for your application, focus on the MACRO level performance; e.g. things like algorithmic complexity, good database / query design, patterns of network interactions, and so on.
Both create a new array for each iteration. They have the same semantics.
String s = "";
for(i=0;i<....){
s = some Assignment;
}
or
for(i=0;i<..){
String s = some Assignment;
}
I don't need to use 's' outside the loop ever again.
The first option is perhaps better since a new String is not initialized each time. The second however would result in the scope of the variable being limited to the loop itself.
EDIT: In response to Milhous's answer. It'd be pointless to assign the String to a constant within a loop wouldn't it? No, here 'some Assignment' means a changing value got from the list being iterated through.
Also, the question isn't because I'm worried about memory management. Just want to know which is better.
Limited Scope is Best
Use your second option:
for ( ... ) {
String s = ...;
}
Scope Doesn't Affect Performance
If you disassemble code the compiled from each (with the JDK's javap tool), you will see that the loop compiles to the exact same JVM instructions in both cases. Note also that Brian R. Bondy's "Option #3" is identical to Option #1. Nothing extra is added or removed from the stack when using the tighter scope, and same data are used on the stack in both cases.
Avoid Premature Initialization
The only difference between the two cases is that, in the first example, the variable s is unnecessarily initialized. This is a separate issue from the location of the variable declaration. This adds two wasted instructions (to load a string constant and store it in a stack frame slot). A good static analysis tool will warn you that you are never reading the value you assign to s, and a good JIT compiler will probably elide it at runtime.
You could fix this simply by using an empty declaration (i.e., String s;), but this is considered bad practice and has another side-effect discussed below.
Often a bogus value like null is assigned to a variable simply to hush a compiler error that a variable is read without being initialized. This error can be taken as a hint that the variable scope is too large, and that it is being declared before it is needed to receive a valid value. Empty declarations force you to consider every code path; don't ignore this valuable warning by assigning a bogus value.
Conserve Stack Slots
As mentioned, while the JVM instructions are the same in both cases, there is a subtle side-effect that makes it best, at a JVM level, to use the most limited scope possible. This is visible in the "local variable table" for the method. Consider what happens if you have multiple loops, with the variables declared in unnecessarily large scope:
void x(String[] strings, Integer[] integers) {
String s;
for (int i = 0; i < strings.length; ++i) {
s = strings[0];
...
}
Integer n;
for (int i = 0; i < integers.length; ++i) {
n = integers[i];
...
}
}
The variables s and n could be declared inside their respective loops, but since they are not, the compiler uses two "slots" in the stack frame. If they were declared inside the loop, the compiler can reuse the same slot, making the stack frame smaller.
What Really Matters
However, most of these issues are immaterial. A good JIT compiler will see that it is not possible to read the initial value you are wastefully assigning, and optimize the assignment away. Saving a slot here or there isn't going to make or break your application.
The important thing is to make your code readable and easy to maintain, and in that respect, using a limited scope is clearly better. The smaller scope a variable has, the easier it is to comprehend how it is used and what impact any changes to the code will have.
In theory, it's a waste of resources to declare the string inside the loop.
In practice, however, both of the snippets you presented will compile down to the same code (declaration outside the loop).
So, if your compiler does any amount of optimization, there's no difference.
In general I would choose the second one, because the scope of the 's' variable is limited to the loop. Benefits:
This is better for the programmer because you don't have to worry about 's' being used again somewhere later in the function
This is better for the compiler because the scope of the variable is smaller, and so it can potentially do more analysis and optimisation
This is better for future readers because they won't wonder why the 's' variable is declared outside the loop if it's never used later
If you want to speed up for loops, I prefer declaring a max variable next to the counter so that no repeated lookups for the condidtion are needed:
instead of
for (int i = 0; i < array.length; i++) {
Object next = array[i];
}
I prefer
for (int i = 0, max = array.lenth; i < max; i++) {
Object next = array[i];
}
Any other things that should be considered have already been mentioned, so just my two cents (see ericksons post)
Greetz, GHad
To add on a bit to #Esteban Araya's answer, they will both require the creation of a new string each time through the loop (as the return value of the some Assignment expression). Those strings need to be garbage collected either way.
I know this is an old question, but I thought I'd add a bit that is slightly related.
I've noticed while browsing the Java source code that some methods, like String.contentEquals (duplicated below) makes redundant local variables that are merely copies of class variables. I believe that there was a comment somewhere, that implied that accessing local variables is faster than accessing class variables.
In this case "v1" and "v2" are seemingly unnecessary and could be eliminated to simplify the code, but were added to improve performance.
public boolean contentEquals(StringBuffer sb) {
synchronized(sb) {
if (count != sb.length())
return false;
char v1[] = value;
char v2[] = sb.getValue();
int i = offset;
int j = 0;
int n = count;
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
}
return true;
}
It seems to me that we need more specification of the problem.
The
s = some Assignment;
is not specified as to what kind of assignment this is. If the assignment is
s = "" + i + "";
then a new sting needs to be allocated.
but if it is
s = some Constant;
s will merely point to the constants memory location, and thus the first version would be more memory efficient.
Seems i little silly to worry about to much optimization of a for loop for an interpreted lang IMHO.
When I'm using multiple threads (50+) then i found this to be a very effective way of handling ghost thread issues with not being able to close a process correctly ....if I'm wrong, please let me know why I'm wrong:
Process one;
BufferedInputStream two;
try{
one = Runtime.getRuntime().exec(command);
two = new BufferedInputStream(one.getInputStream());
}
}catch(e){
e.printstacktrace
}
finally{
//null to ensure they are erased
one = null;
two = null;
//nudge the gc
System.gc();
}