This question already has answers here:
For vs. while in C programming?
(19 answers)
Disadvantage of for loop [closed]
(4 answers)
Closed 9 years ago.
It seems like, at least in all the languages I'm used to, a while loop can do all the things that a for loop can, and more. Since I'm most acquainted with Java, I'll use that for an example:
int foo = 6;
while (foo > 0)
{
this.bar();
foo--;
}
seems functionally identical to
for (int foo = 6; foo > 0; foo--)
this.bar();
From this, it looks to me like the for loop is wholly redundant in function to the while one. What am I missing here? Is one more faster or more streamlined than the other once compiled? Does one automatically ditch the foo timer once it's no longer needed? Are they exactly the same in some compilers?
I'd be really surprised if they were completely identical, because, you know, DRY.
I've seen similar questions asked before, but none of them sought a really detailed answer.
Internally, both loops compile to the same machine code. The existence of two ways to repeat execution of code stems from two different use cases for each of them: for loops are usually used to iterate through a finite collection of identical objects, processing them in the same way. while loops, on the other hand, are slightly more versatile, and are usually used to repeat a piece of code until a condition is fulfilled. For example, in this piece of pseudocode:
while(document.nextLine())
document.doStuff();
the while loop iterates through the lines of a document, and processing each line. It is not known in advance how many lines are there. This could not be done easily in a for loop, where you need to know in advance when you are going to stop.
A for loop is a while loop. The only difference is that the for loop includes an initialize and state-change options, whereas a while loop requires you to do those separately.
The distinction is really more for ease of use and readability than it is purely functional. Say you want to iterate through a list:
for(int i = 0; i < list.size(); i++){
//code
}
is a lot nicer looking than:
int i = 0;
while(i<list.size()){
//code
i++;
}
Although they are (almost) functionally identical. Why did I say almost? Well the scope of the variable i is different in those two cases. If, say, you wanted to do another iteration through that list after doing the first one, you could re-use the i variable if you used the for loop. But if you used the while loop, the i would still be in scope (and you might not want that as it's purpose was only to assist in that iteration).
The two are functionally equivalent. However, there are dowhile loops which will execute at least once (something that for loops cannot do) and there are enhanced for loops:
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/for.html
Related
This question already has an answer here:
Time Complexity for Java ArrayList
(1 answer)
Closed 9 years ago.
When writing a for loop, we can write code like:
ArrayList<Object> myList = ...
for(int i=0; i < myList.size(); i++){
...
}
This way we are invoking .size() every time. Is it better to get the size in a variable and use that, i.e.
ArrayList<Object> myList = ...
int listSize = myList.size();
for(int i=0; i < listSize ; i++){
...
}
And there is another way for iteration, i.e.
for ( Object o : myList) { ... }
Which iteration method should be used for efficient coding pratice?
Thanks
Yes, size is a constant-time operation.
Since you're using the concrete type ArrayList, the call will almost certainly be inlined by the JIT compiler.
The inlining will also probably open the door for hoisting, so the actual machine code will be exactly as if you manually extracted size into a local variable.
It will almost never actually matter whether it's inlined/hoisted or not.
If your loop runs for at least 100k iterations, does almost nothing in the body, and is the inner loop executed many times over, then it starts making sense to wonder about the performance impact of the size call.
Check the implementation. Yes it does run in constant time, there is a field that holds the size.
The for-each operator should be used whenever possible (that is: whenever you are not modifying the list in between), as it allows the list to choose the most efficient processing mode.
If you must use a for-loop, you can solve the constant checking in a very easy way, by running backwards:
for (int i = list.size()-1; i>= 0; --i)
Edit: As of the comment of Marko Topolnik I wrote a small program to test the efficiency of Iterators and it turned out that the Iterator is actually faster than the index implementation. See here for the code.
This is only true if the JVM has fully optimized that code, as otherwise the Iterator is about 2% slower than the index implementation, but even then this isn't any relevant time for a normal program execution.
Given the need to loop up to an arbitrary int value, is it better programming practice to convert the value into an array and for-each the array, or just use a traditional for loop?
FYI, I am calculating the number of 5 and 6 results ("hits") in multiple throws of 6-sided dice. My arbitrary int value is the dicePool which represents the number of multiple throws.
As I understand it, there are two options:
Convert the dicePool into an array and for-each the array:
public int calcHits(int dicePool) {
int[] dp = new int[dicePool];
for (Integer a : dp) {
// call throwDice method
}
}
Use a traditional for loop:
public int calcHits(int dicePool) {
for (int i = 0; i < dicePool; i++) {
// call throwDice method
}
}
My view is that option 1 is clumsy code and involves unnecessary creation of an array, even though the for-each loop is more efficient than the traditional for loop in Option 2.
At this point, speed isn't important (insert premature-optimization comment ;). What matters is how quickly you can understand what the code does, which is to call a method dicePool times.
The first method allocates an array of size dicePool and iterates through its values, which happens to run the loop body dicePool times (I'll pretend you meant int instead of Integer to avoid the unrelated autoboxing issue). This is potentially inefficient for the computer running the code, but more importantly it's inefficient for the human reading the code as it's conceptually distant from what you wanted to accomplish. Specifically, you force the reader to think about the new array you've just made, AND the value of the variable a, which will be 0 for every iteration of the loop, even though neither of those are related to your end goal.
Any Java programmer looking at the second method will realize that you're executing the loop body dicePool times with i 'counting up' to dicePool. While the latter part isn't especially important, the beginning is exactly what you meant to do. Using this common Java idiom minimizes the unrelated things a reader needs to think about, so it's the best choice.
When in doubt, go with simplicity. :D
Why would you need to allocate an array to loop over a variable that can be safely incremented and used without any need of allocation?
It sounds unecessarily inefficient. You can need to allocate an array if you need to swap the order of ints but this is not the case. I would go for option 2 for sure.
The foreach is useful when you want to iterate on a collection but creating a collection just to iterate over it when you don't need it is just without sense..
(2) is the obvious choice because there's no point in creating the array, based on your description. If there is, of course things change.
What makes you think that the for-each loop is more efficient?
Iterating over a set is very likely less efficient than a simple loop and counter.
It might help if you gave more context about the problem, specifically whether there's more to this question than choosing one syntax over the other. I am having trouble thinking of a problem to which #1 would be a better solution.
I wouldn't write the first one. It's not necessary to use the latest syntax in every setting.
Your instinct is a good one: if it feels and looks clumsy, it probably is.
Go with #2 and sleep at night.
Why was this loop introduced in java?Is it a java creation? What is its purpose(increases memory/cpu utilisation efficiency)?
Why was this loop introduced in java?
It's just to ease looping over generic collections and arrays. Instead of
for (int i = 0; i < strings.length; i++) {
String string = strings[i];
// ...
}
you can just do
for (String string : strings) {
// ...
}
which makes the code more readable and better maintainable.
Is it a java creation?
No, it existed in other languages long before Java. Java was relatively late in implementing it.
What is its purpose?
See the first answer.
To learn more about it, checkout the Sun guide on the subject.
Update: this does not mean that it makes the other kinds of loops superfluous. the for loop using index is still useful if you'd like to maintain a loop counter for other purposes than getting the item by index. The for loop using an iterator is still useful if you'd like to remove or change elements of the collection itself inside a loop.
It masks the use of iterators, which are heavy and clumsy to use. There are many, many instances where you just want to iterate over a collection without working about its index. The java foreach structure makes this possible.
Please see Foreach:
For each (or foreach) is a computer
language idiom for traversing items in
a collection. Foreach is usually used
in place of a standard for statement.
Unlike other for loop constructs,
however, foreach loops 1 usually
maintain no explicit counter: they
essentially say "do this to everything
in this set", rather than "do this x
times". This avoids potential
off-by-one errors and makes code
simpler to read. In object-oriented
languages an iterator, even if
implicit, is often used as the means
of traversal.
Several languages, including Python,
have only a foreach loop, requiring
explicit counting to achieve
"standard" for behavior.
And specifically the section on Java:
A foreach-construct was introduced in JDK 5.0. Official sources use several names for the construct. It is referred to as the "Enhanced for Loop" the "For-Each Loop" and the "foreach statement".
It's really just Java's imitation of a functional construct that's been around much longer, it's called map. The reason for implementing it is that it is common to make a loop that simply performs an action to every element of a container without regard to it's index. Java's for(element : container) { doSomethingWith(element); } syntax is just a cleaner way to do it than the alternatives, which are either to make a for loop with an index
for(int i=0; i<container.size(); ++i) { doSomethingWith(container.at(i)); }
which is longer and creates a needless index variable, or to do a loop with an iterator
Iterator it = container.iterator();
while(it.hasNext()) { doSomethingWith(it.next()); }
which is also longer. This loop is essentially what the for( : ) {} loop gets compiled as, although there may be some slight differences (I haven't actually seen the bytecode).
It is plain "Syntactic sugar"
Dont think there is any efficiency improvement.
Java community wanted the language to be a bit modernized, competing with C# and Ruby..
I have been refactoring throwaway code which I wrote some years ago in a FORTRAN-like style. Most of the code is now much more organized and readable. However the heart of the algorithm (which is performance-critical) uses 1- and 2-dimensional Java arrays and is typified by:
for (int j = 1; j < len[1]+1; j++) {
int jj = (cont == BY_TYPE) ? seq[1][j-1] : j-1;
for (int i = 1; i < len[0]+1; i++) {
matrix[i][j] = matrix[i-1][j] + gap;
double m = matrix[i][j-1] + gap;
if (m > matrix[i][j]) {
matrix[i][j] = m;
pointers[i][j] = UP;
}
//...
}
}
For clarity, maintainability and interfacing with the rest of the code I would like to refactor it. However on reading Java Generics Syntax for arrays and Java Generics and numbers I have the following concerns:
Performance. The code is planned to use about 10^8 - 10^9 secs/yr and this is just about manageable. My reading suggests that changing double to Double can sometimes add a factor of 3 in performance. I'd like other experience on this. I would also expect that moving from foo[] to List would be a hit as well. I have no first-hand knowledge and again experience would be useful.
Array-bound checking. Is this treated differently in double[] and List and does it matter? I expect some problems to violate bounds as the algorithm is fairly simple and has only been applied to a few data sets.
If I don't refactor then the code has an ugly and possibly fragile intermixture of the two approaches. I am already trying to write things such as:
List<double[]> and
List<Double>[]
and understand that the erasure does not make this pretty and at best gives rise to compiler warnings. It seems difficult to do this without very convoluted constructs.
Obsolescence. One poster suggested that Java arrays should be obsoleted. I assume this isn't going to happen RSN but I would like to move away from outdated approaches.
SUMMARY The consensus so far:
Collections have a significant performance hit over primitive arrays, especially for constructs such as matrices. This is incurred in auto(un)boxing numerics and in accessing list items
For tight numerical (scientific) algorithms the array notation [][] is actually easier to read but the variables should named as helpfully as possible
Generics and arrays do not mix well. It may be useful to wrap the arrays in classes to transport them in/out of the tight algorithm.
There is little objective reason to make the change
QUESTION #SeanOwen has suggested that it would be useful to take constant values out of the loops. Assuming I haven't goofed this would look like:
int len1 = len[1];
int len0 = len[0];
int seq1 = seq[1];
int[] pointersi;
double[] matrixi;
for (int i = 1; i < len0+1; i++) {
matrixi = matrix[i];
pointersi = pointers[i];
}
for (int j = 1; j < len1+1; j++) {
int jj = (cont == BY_TYPE) ? seq1[j-1] : j-1;
for (int i = 1; i < len0+1; i++) {
matrixi[j] = matrixi[j] + gap;
double m = matrixi[j-1] + gap;
if (m > matrixi[j]) {
matrixi[j] = m;
pointersi[j] = UP;
}
//...
}
}
I thought compilers were meant to be smart at doing this sort of thing. Do we need to still do this?
I read an excellent book by Kent Beck on coding best-practices ( http://www.amazon.com/Implementation-Patterns/dp/B000XPRRVM ). There are also interesting performance figures.
Specifically, there are comparison between arrays and various collections., and arrays are really much faster (maybe x3 compared to ArrayList).
Also, if you use Double instead of double, you need to stick to it, and use no double, as auto(un)boxing will kill your performance.
Considering your performance need, I would stick to array of primitive type.
Even more, I would calculate only once the upper bound for the condition in loops.
This is typically done the line before the loop.
However, if you don't like that the upper bound variable, used only in the loop, is accessible outside the loop, you can take advantage of the initialization phase of the for loop like this:
for (int i=0, max=list.size(); i<max; i++) {
// do something
}
I don't believe in obsolescence for arrays in java. For performance-critical loop, I can't see any language designer taking away the fastest option (especially if the difference is x3).
I understand your concern for maintainability, and for coherence with the rest of the application. But I believe that a critical loop is entitled to some special practices.
I would try to make the code the clearest possible without changing it:
by carefully questionning each variable name, ideally with a 10-min brainstorming session with my collegues
by writing coding comments (I'm against their use in general, as a code that is not clear should be made clear, not commented ; but a critical loop justifies it).
by using private methods as needed (as Andreas_D pointed out in his answer). If made private final, chances are very good (as they would be short) that they will get inlined when running, so there would be no performance impact at runtime.
I fully agree with KLE's answer. Because the code is performance-critical, I'd keep the array based datastructures as well. And I believe, that just introducing collections, wrappers for primitive types and generics will not improve maintainability and clarity.
In addition, if this algorithm is the heart of the application and has been in use for several years now, chance are fairly low, that it will need maintenance like bug fixing or improvements.
For clarity, maintainability and
interfacing with the rest of the code
I would like to refactor it.
Instead of changing datastructures I'd concentrate on renaming and maybe moving some part of the code to private methods. From looking at the code, I have no idea what's happening, and the problem, as I see it, are the more or less short and technical variable and field names.
Just an example: one 2-dimensional array is just named 'matrix'. But it's obviously clear, that this is a matrix, so naming it 'matrix' is pretty redundant. It would be more helpful to rename it so that it becomes clear, what this matrix is really used for, what kind of data is inside.
Another candidate is your second line. With two refactorings, I'd rename 'jj' to something more meaningful and move the expression to a private method with a 'speaking' name.
The general guideline is to prefer generified collections over arrays in Java, but it's only a guideline. My first thought would be to NOT change this working code. If you really want to make this change, then benchmark both approaches.
As you say, performance is critical, in which case the code that meets the needed performance is better than code that doesn't.
You might also run into auto-boxing issues when boxing/unboxing the doubles - a potentially more subtle problem.
The Java language guys have been very strict about keeping the JVM compatible across different versions so I don't see arrays going anywhere - and I wouldn't call them obsolete, just more primitive than the other options.
Well I think that arrays are the best way to store process data in algorithms. Since Java doesn't support operator overloading (one of the reasons why I think arrays won't be obsolete that soon) switching to collections would make the code quite hard to read:
double[][] matrix = new double[10][10];
double t = matrix[0][0];
List<List<Double>> matrix = new ArrayList<List<Double>>(10);
Collections.fill(matrix, new ArrayList<Double>(10));
double t = matrix.get(0).get(0); // autoboxing => performance
As far as I know Java prestores some wrapper Object for Number instances (e.g. the first 100 integers), so that you can access them faster but I think that won't help much with that many data.
I thought compilers were meant to be
smart at doing this sort of thing. Do
we need to still do this?
You are probably right that the JIT takes care of it, but if this section is so performance critical, trying and benchmarking wouldn't hurt.
When you know the exact dimensions of the list you should stick with arrays. Arrays are not inherently bad, and they're not going anywhere. If you are performing a lot of (non-sequential) read and write operations you should use arrays and not lists, because the access methods of lists introduce a large overhead.
In addition to sticking with arrays, I think you can tighten up this code in some meaningful ways. For instance:
Indeed, don't compute the loop bounds every time, save them off
You repeatedly reference matrix[i]. Just save off a reference to this subarray rather than dereferencing the 2D array every time
That trick gets even more useful if you can loop over i in the outer loop instead of inner loop
It's getting extreme, but saving the value of j-1 in a local might even prove to be worth it rather than recomputing
Finally if you are really really concerned about performance, run the ProGuard optimizer over the resulting byte code to have it perform some compiler optimizations like unrolling or peephole optimizations
I often see code like:
Iterator i = list.iterator();
while(i.hasNext()) {
...
}
but I write that (when Java 1.5 isn't available or for each can't be used) as:
for(Iterator i = list.iterator(); i.hasNext(); ) {
...
}
because
It is shorter
It keeps i in a smaller scope
It reduces the chance of confusion. (Is i used outside the
while? Where is i declared?)
I think code should be as simple to understand as possible so that I only have to make complex code to do complex things. What do you think? Which is better?
From: http://jamesjava.blogspot.com/2006/04/iterating.html
I prefer the for loop because it also sets the scope of the iterator to just the for loop.
There are appropriate uses for the while, the for, and the foreach constructs:
while - Use this if you are iterating and the deciding factor for looping or not is based merely on a condition. In this loop construct, keeping an index is only a secondary concern; everything should be based on the condition
for - Use this if you are looping and your primary concern is the index of the array/collection/list. It is more useful to use a for if you are most likely to go through all the elements anyway, and in a particular order (e.g., going backwards through a sorted list, for example).
foreach - Use this if you merely need to go through your collection regardless of order.
Obviously there are exceptions to the above, but that's the general rule I use when deciding to use which. That being said I tend to use foreach more often.
Why not use the for-each construct? (I haven't used Java in a while, but this exists in C# and I'm pretty sure Java 1.5 has this too):
List<String> names = new ArrayList<String>();
names.add("a");
names.add("b");
names.add("c");
for (String name : names)
System.out.println(name.charAt(0));
I think scope is the biggest issue here, as you have pointed out.
In the "while" example, the iterator is declared outside the loop, so it will continue to exist after the loop is done. This may cause issues if this same iterator is used again at some later point. E. g. you may forget to initialize it before using it in another loop.
In the "for" example, the iterator is declared inside the loop, so its scope is limited to the loop. If you try to use it after the loop, you will get a compiler error.
if you're only going to use the iterator once and throw it away, the second form is preferred; otherwise you must use the first form
IMHO, the for loop is less readable in this scenario, if you look at this code from the perspective of English language. I am working on a code where author does abuse for loop, and it ain't pretty. Compare following:
for (; (currUserObjectIndex < _domainObjectReferences.Length) && (_domainObjectReferences[currUserObjectIndex].VisualIndex == index); ++currUserObjectIndex)
++currNumUserObjects;
vs
while (currUserObjectIndex < _domainObjectReferences.Length && _domainObjectReferences[currUserObjectIndex].VisualIndex == index)
{
++currNumUserObjects;
++currUserObjectIndex;
}
I would agree that the "for" loop is clearer and more appropriate when iterating.
The "while" loop is appropriate for polling, or where the number of loops to meet exit condition will change based on activity inside the loop.
Not that it probably matters in this case, but Compilers, VMs and CPU's normally have special optimization techniques they user under the hood that will make for loops performance better (and in the near future parallel), in general they don't do that with while loops (because its harder to determine how it's actually going to run). But in most cases code clarity should trump optimization.
Using for loop you can work with a single variable, as it sets the scope of variable for a current working for loop only. However this is not possible in while loop.
For Example:
int i; for(i=0; in1;i++) do something..
for(i=0;i n2;i+=2) do something.
So after 1st loop i=n1-1 at the end. But while using second loop you can set i again to 0.
However
int i=0;
while(i less than limit) { do something ..; i++; }
Hence i is set to limit-1 at the end. So you cant use same i in another while loop.
Either is fine. I use for () myself, and I don't know if there are compile issues. I suspect they both get optimized down to pretty much the same thing.
I agree that the for loop should be used whenever possible but sometimes there's more complex logic that controls the iterator in the body of the loop. In that case you have to go with while.
I was the for loop for clarity. While I use the while loop when faced with some undeterministic condition.
Both are fine, but remember that sometimes access to the Iterator directly is useful (such as if you are removing elements that match a certain condition - you will get a ConcurrentModificationException if you do collection.remove(o) inside a for(T o : collection) loop).
I prefer to write the for(blah : blah) [foreach] syntax almost all of the time because it seems more naturally readable to me. The concept of iterators in general don't really have parallels outside of programming
Academia tends to prefer the while-loop as it makes for less complicated reasoning about programs. I tend to prefer the for- or foreach-loop structures as they make for easier-to-read code.
Although both are really fine, I tend to use the first example because it is easier to read.
There are fewer operations happening on each line with the while() loop, making the code easier for someone new to the code to understand what's going on.
That type of construct also allows me to group initializations in a common location (at the top of the method) which also simplifies commenting for me, and conceptualization for someone reading it for the first time.