Related
I always asked myself "what to use" should I use a for-loop or a foreach.
In my opinion it's both the "same". I know for iterating through a list etc. is a foreach better but what if we have the following case :
for (String zipCode : zipCodes) {
if (zipCode.equals(zip)) {
return true;
}
}
or
for (int i = 0; i < zipCodes.length; i++) {
if (zipCodes[i].equals(zip)) {
return true;
}
}
What would be better? Or is in this case really no difference?
First things first - for-each is nothing but syntactic sugar for Iterator. Read this section of JLS. So, I will address this question as a simple FOR loop vs Iterator.
Now, when you use Iterator to traverse over a collection, at bare minimum you will be using two method - next() and hasNext(), and below are their ArrayList implementations:
public boolean hasNext() {
return cursor != size;
}
#SuppressWarnings("unchecked")
public E next() {
checkForComodification();
int i = cursor;
if (i >= size)
throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = I]; // hagrawal: this is what simple FOR loop does
}
Now, we all know the basic computing that there will be performance difference if on the processor I have to just execute myArray[i] v/s complete implementation of next() method. So, there has to be a difference in performance.
It is likely that some folk might come back strongly on this, citing performance benchmarks and excerpts from Effective Java, but the only other way I can try to explain is that this is even written in Oracle's official documentation - please read below from RandomAccess interface docs over here.
It is very clearly mentioned that there will be differences. So, if you can convince me that what is written in official docs is wrong and will be changed, I will be ready to accept the argument that there is no performance difference between simple FOR loop and Iterator or for-each.
So IMHO, correct way to put this whole argument is this:
If the collection implements RandomAccess interface then simple FOR loop will perform (at least theoretically) better than Iterator or for-each. (this is what is also written in RandomAccess docs)
If the collection doesn't implement RandomAccess interface then Iterator or for-each will perform (for sure) better than simple FOR loop.
However, for all practical purposes, for-each is the best choice in general.
If zipCodes[i] is not O(1), then the performance of your second case will be much worse. (That said, I don't think there yet exists a container in Java where [] is not O(1)). Put another way, the short form for loop cannot be slower.
Plus the short form for loop is clearer, which really ought to be the primary consideration unless speed matters.
It is less about optimisation nowadays, as any difference will be unnoticeable, unless you need to process a very large amount of data. Also, if you used a Collection, the performance would depend on the chosen implementation.
What you should really think about is the quality of the code. The rule is that you should use as few elements as possible to present the logic as clearly as possible. The second solution introduces a new element, the i index, which is not actually needed and only makes the code this little bit more complicated. Only use the fori loop if you actually need to know the index in each iteration.
So, from code quality perspective, you should use the first solution :-)
Note that there is no performance penalty for using the for-each loop,
even for arrays. In fact, it may offer a slight performance advantage
over an ordinary for loop in some circumstances, as it computes the
limit of the array index only once.
Item 46 in Effective Java by Joshua Bloch
Which of the following is better practice in Java 8?
Java 8:
joins.forEach(join -> mIrc.join(mSession, join));
Java 7:
for (String join : joins) {
mIrc.join(mSession, join);
}
I have lots of for loops that could be "simplified" with lambdas, but is there really any advantage of using them? Would it improve their performance and readability?
EDIT
I'll also extend this question to longer methods. I know that you can't return or break the parent function from a lambda and this should also be taken into consideration when comparing them, but is there anything else to be considered?
The better practice is to use for-each. Besides violating the Keep It Simple, Stupid principle, the new-fangled forEach() has at least the following deficiencies:
Can't use non-final variables. So, code like the following can't be turned into a forEach lambda:
Object prev = null;
for(Object curr : list)
{
if( prev != null )
foo(prev, curr);
prev = curr;
}
Can't handle checked exceptions. Lambdas aren't actually forbidden from throwing checked exceptions, but common functional interfaces like Consumer don't declare any. Therefore, any code that throws checked exceptions must wrap them in try-catch or Throwables.propagate(). But even if you do that, it's not always clear what happens to the thrown exception. It could get swallowed somewhere in the guts of forEach()
Limited flow-control. A return in a lambda equals a continue in a for-each, but there is no equivalent to a break. It's also difficult to do things like return values, short circuit, or set flags (which would have alleviated things a bit, if it wasn't a violation of the no non-final variables rule). "This is not just an optimization, but critical when you consider that some sequences (like reading the lines in a file) may have side-effects, or you may have an infinite sequence."
Might execute in parallel, which is a horrible, horrible thing for all but the 0.1% of your code that needs to be optimized. Any parallel code has to be thought through (even if it doesn't use locks, volatiles, and other particularly nasty aspects of traditional multi-threaded execution). Any bug will be tough to find.
Might hurt performance, because the JIT can't optimize forEach()+lambda to the same extent as plain loops, especially now that lambdas are new. By "optimization" I do not mean the overhead of calling lambdas (which is small), but to the sophisticated analysis and transformation that the modern JIT compiler performs on running code.
If you do need parallelism, it is probably much faster and not much more difficult to use an ExecutorService. Streams are both automagical (read: don't know much about your problem) and use a specialized (read: inefficient for the general case) parallelization strategy (fork-join recursive decomposition).
Makes debugging more confusing, because of the nested call hierarchy and, god forbid, parallel execution. The debugger may have issues displaying variables from the surrounding code, and things like step-through may not work as expected.
Streams in general are more difficult to code, read, and debug. Actually, this is true of complex "fluent" APIs in general. The combination of complex single statements, heavy use of generics, and lack of intermediate variables conspire to produce confusing error messages and frustrate debugging. Instead of "this method doesn't have an overload for type X" you get an error message closer to "somewhere you messed up the types, but we don't know where or how." Similarly, you can't step through and examine things in a debugger as easily as when the code is broken into multiple statements, and intermediate values are saved to variables. Finally, reading the code and understanding the types and behavior at each stage of execution may be non-trivial.
Sticks out like a sore thumb. The Java language already has the for-each statement. Why replace it with a function call? Why encourage hiding side-effects somewhere in expressions? Why encourage unwieldy one-liners? Mixing regular for-each and new forEach willy-nilly is bad style. Code should speak in idioms (patterns that are quick to comprehend due to their repetition), and the fewer idioms are used the clearer the code is and less time is spent deciding which idiom to use (a big time-drain for perfectionists like myself!).
As you can see, I'm not a big fan of the forEach() except in cases when it makes sense.
Particularly offensive to me is the fact that Stream does not implement Iterable (despite actually having method iterator) and cannot be used in a for-each, only with a forEach(). I recommend casting Streams into Iterables with (Iterable<T>)stream::iterator. A better alternative is to use StreamEx which fixes a number of Stream API problems, including implementing Iterable.
That said, forEach() is useful for the following:
Atomically iterating over a synchronized list. Prior to this, a list generated with Collections.synchronizedList() was atomic with respect to things like get or set, but was not thread-safe when iterating.
Parallel execution (using an appropriate parallel stream). This saves you a few lines of code vs using an ExecutorService, if your problem matches the performance assumptions built into Streams and Spliterators.
Specific containers which, like the synchronized list, benefit from being in control of iteration (although this is largely theoretical unless people can bring up more examples)
Calling a single function more cleanly by using forEach() and a method reference argument (ie, list.forEach (obj::someMethod)). However, keep in mind the points on checked exceptions, more difficult debugging, and reducing the number of idioms you use when writing code.
Articles I used for reference:
Everything about Java 8
Iteration Inside and Out (as pointed out by another poster)
EDIT: Looks like some of the original proposals for lambdas (such as http://www.javac.info/closures-v06a.html Google Cache) solved some of the issues I mentioned (while adding their own complications, of course).
The advantage comes into account when the operations can be executed in parallel. (See http://java.dzone.com/articles/devoxx-2012-java-8-lambda-and - the section about internal and external iteration)
The main advantage from my point of view is that the implementation of what is to be done within the loop can be defined without having to decide if it will be executed in parallel or sequential
If you want your loop to be executed in parallel you could simply write
joins.parallelStream().forEach(join -> mIrc.join(mSession, join));
You will have to write some extra code for thread handling etc.
Note: For my answer I assumed joins implementing the java.util.Stream interface. If joins implements only the java.util.Iterable interface this is no longer true.
When reading this question one can get the impression, that Iterable#forEach in combination with lambda expressions is a shortcut/replacement for writing a traditional for-each loop. This is simply not true. This code from the OP:
joins.forEach(join -> mIrc.join(mSession, join));
is not intended as a shortcut for writing
for (String join : joins) {
mIrc.join(mSession, join);
}
and should certainly not be used in this way. Instead it is intended as a shortcut (although it is not exactly the same) for writing
joins.forEach(new Consumer<T>() {
#Override
public void accept(T join) {
mIrc.join(mSession, join);
}
});
And it is as a replacement for the following Java 7 code:
final Consumer<T> c = new Consumer<T>() {
#Override
public void accept(T join) {
mIrc.join(mSession, join);
}
};
for (T t : joins) {
c.accept(t);
}
Replacing the body of a loop with a functional interface, as in the examples above, makes your code more explicit: You are saying that (1) the body of the loop does not affect the surrounding code and control flow, and (2) the body of the loop may be replaced with a different implementation of the function, without affecting the surrounding code. Not being able to access non final variables of the outer scope is not a deficit of functions/lambdas, it is a feature that distinguishes the semantics of Iterable#forEach from the semantics of a traditional for-each loop. Once one gets used to the syntax of Iterable#forEach, it makes the code more readable, because you immediately get this additional information about the code.
Traditional for-each loops will certainly stay good practice (to avoid the overused term "best practice") in Java. But this doesn't mean, that Iterable#forEach should be considered bad practice or bad style. It is always good practice, to use the right tool for doing the job, and this includes mixing traditional for-each loops with Iterable#forEach, where it makes sense.
Since the downsides of Iterable#forEach have already been discussed in this thread, here are some reasons, why you might probably want to use Iterable#forEach:
To make your code more explicit: As described above, Iterable#forEach can make your code more explicit and readable in some situations.
To make your code more extensible and maintainable: Using a function as the body of a loop allows you to replace this function with different implementations (see Strategy Pattern). You could e.g. easily replace the lambda expression with a method call, that may be overwritten by sub-classes:
joins.forEach(getJoinStrategy());
Then you could provide default strategies using an enum, that implements the functional interface. This not only makes your code more extensible, it also increases maintainability because it decouples the loop implementation from the loop declaration.
To make your code more debuggable: Seperating the loop implementation from the declaration can also make debugging more easy, because you could have a specialized debug implementation, that prints out debug messages, without the need to clutter your main code with if(DEBUG)System.out.println(). The debug implementation could e.g. be a delegate, that decorates the actual function implementation.
To optimize performance-critical code: Contrary to some of the assertions in this thread, Iterable#forEach does already provide better performance than a traditional for-each loop, at least when using ArrayList and running Hotspot in "-client" mode. While this performance boost is small and negligible for most use cases, there are situations, where this extra performance can make a difference. E.g. library maintainers will certainly want to evaluate, if some of their existing loop implementations should be replaced with Iterable#forEach.
To back this statement up with facts, I have done some micro-benchmarks with Caliper. Here is the test code (latest Caliper from git is needed):
#VmOptions("-server")
public class Java8IterationBenchmarks {
public static class TestObject {
public int result;
}
public #Param({"100", "10000"}) int elementCount;
ArrayList<TestObject> list;
TestObject[] array;
#BeforeExperiment
public void setup(){
list = new ArrayList<>(elementCount);
for (int i = 0; i < elementCount; i++) {
list.add(new TestObject());
}
array = list.toArray(new TestObject[list.size()]);
}
#Benchmark
public void timeTraditionalForEach(int reps){
for (int i = 0; i < reps; i++) {
for (TestObject t : list) {
t.result++;
}
}
return;
}
#Benchmark
public void timeForEachAnonymousClass(int reps){
for (int i = 0; i < reps; i++) {
list.forEach(new Consumer<TestObject>() {
#Override
public void accept(TestObject t) {
t.result++;
}
});
}
return;
}
#Benchmark
public void timeForEachLambda(int reps){
for (int i = 0; i < reps; i++) {
list.forEach(t -> t.result++);
}
return;
}
#Benchmark
public void timeForEachOverArray(int reps){
for (int i = 0; i < reps; i++) {
for (TestObject t : array) {
t.result++;
}
}
}
}
And here are the results:
Results for -client
Results for -server
When running with "-client", Iterable#forEach outperforms the traditional for loop over an ArrayList, but is still slower than directly iterating over an array. When running with "-server", the performance of all approaches is about the same.
To provide optional support for parallel execution: It has already been said here, that the possibility to execute the functional interface of Iterable#forEach in parallel using streams, is certainly an important aspect. Since Collection#parallelStream() does not guarantee, that the loop is actually executed in parallel, one must consider this an optional feature. By iterating over your list with list.parallelStream().forEach(...);, you explicitly say: This loop supports parallel execution, but it does not depend on it. Again, this is a feature and not a deficit!
By moving the decision for parallel execution away from your actual loop implementation, you allow optional optimization of your code, without affecting the code itself, which is a good thing. Also, if the default parallel stream implementation does not fit your needs, no one is preventing you from providing your own implementation. You could e.g. provide an optimized collection depending on the underlying operating system, on the size of the collection, on the number of cores, and on some preference settings:
public abstract class MyOptimizedCollection<E> implements Collection<E>{
private enum OperatingSystem{
LINUX, WINDOWS, ANDROID
}
private OperatingSystem operatingSystem = OperatingSystem.WINDOWS;
private int numberOfCores = Runtime.getRuntime().availableProcessors();
private Collection<E> delegate;
#Override
public Stream<E> parallelStream() {
if (!System.getProperty("parallelSupport").equals("true")) {
return this.delegate.stream();
}
switch (operatingSystem) {
case WINDOWS:
if (numberOfCores > 3 && delegate.size() > 10000) {
return this.delegate.parallelStream();
}else{
return this.delegate.stream();
}
case LINUX:
return SomeVerySpecialStreamImplementation.stream(this.delegate.spliterator());
case ANDROID:
default:
return this.delegate.stream();
}
}
}
The nice thing here is, that your loop implementation doesn't need to know or care about these details.
forEach() can be implemented to be faster than for-each loop, because the iterable knows the best way to iterate its elements, as opposed to the standard iterator way. So the difference is loop internally or loop externally.
For example ArrayList.forEach(action) may be simply implemented as
for(int i=0; i<size; i++)
action.accept(elements[i])
as opposed to the for-each loop which requires a lot of scaffolding
Iterator iter = list.iterator();
while(iter.hasNext())
Object next = iter.next();
do something with `next`
However, we also need to account for two overhead costs by using forEach(), one is making the lambda object, the other is invoking the lambda method. They are probably not significant.
see also http://journal.stuffwithstuff.com/2013/01/13/iteration-inside-and-out/ for comparing internal/external iterations for different use cases.
TL;DR: List.stream().forEach() was the fastest.
I felt I should add my results from benchmarking iteration.
I took a very simple approach (no benchmarking frameworks) and benchmarked 5 different methods:
classic for
classic foreach
List.forEach()
List.stream().forEach()
List.parallelStream().forEach
the testing procedure and parameters
private List<Integer> list;
private final int size = 1_000_000;
public MyClass(){
list = new ArrayList<>();
Random rand = new Random();
for (int i = 0; i < size; ++i) {
list.add(rand.nextInt(size * 50));
}
}
private void doIt(Integer i) {
i *= 2; //so it won't get JITed out
}
The list in this class shall be iterated over and have some doIt(Integer i) applied to all it's members, each time via a different method.
in the Main class I run the tested method three times to warm up the JVM. I then run the test method 1000 times summing the time it takes for each iteration method (using System.nanoTime()). After that's done i divide that sum by 1000 and that's the result, average time.
example:
myClass.fored();
myClass.fored();
myClass.fored();
for (int i = 0; i < reps; ++i) {
begin = System.nanoTime();
myClass.fored();
end = System.nanoTime();
nanoSum += end - begin;
}
System.out.println(nanoSum / reps);
I ran this on a i5 4 core CPU, with java version 1.8.0_05
classic for
for(int i = 0, l = list.size(); i < l; ++i) {
doIt(list.get(i));
}
execution time: 4.21 ms
classic foreach
for(Integer i : list) {
doIt(i);
}
execution time: 5.95 ms
List.forEach()
list.forEach((i) -> doIt(i));
execution time: 3.11 ms
List.stream().forEach()
list.stream().forEach((i) -> doIt(i));
execution time: 2.79 ms
List.parallelStream().forEach
list.parallelStream().forEach((i) -> doIt(i));
execution time: 3.6 ms
I feel that I need to extend my comment a bit...
About paradigm\style
That's probably the most notable aspect. FP became popular due to what you can get avoiding side-effects. I won't delve deep into what pros\cons you can get from this, since this is not related to the question.
However, I will say that the iteration using Iterable.forEach is inspired by FP and rather result of bringing more FP to Java (ironically, I'd say that there is no much use for forEach in pure FP, since it does nothing except introducing side-effects).
In the end I would say that it is rather a matter of taste\style\paradigm you are currently writing in.
About parallelism.
From performance point of view there is no promised notable benefits from using Iterable.forEach over foreach(...).
According to official docs on Iterable.forEach :
Performs the given action on the contents of the Iterable, in the
order elements occur when iterating, until all elements have been
processed or the action throws an exception.
... i.e. docs pretty much clear that there will be no implicit parallelism. Adding one would be LSP violation.
Now, there are "parallell collections" that are promised in Java 8, but to work with those you need to me more explicit and put some extra care to use them (see mschenk74's answer for example).
BTW: in this case Stream.forEach will be used, and it doesn't guarantee that actual work will be done in parallell (depends on underlying collection).
UPDATE: might be not that obvious and a little stretched at a glance but there is another facet of style and readability perspective.
First of all - plain old forloops are plain and old. Everybody already knows them.
Second, and more important - you probably want to use Iterable.forEach only with one-liner lambdas. If "body" gets heavier - they tend to be not-that readable.
You have 2 options from here - use inner classes (yuck) or use plain old forloop.
People often gets annoyed when they see the same things (iteratins over collections) being done various vays/styles in the same codebase, and this seems to be the case.
Again, this might or might not be an issue. Depends on people working on code.
One of most upleasing functional forEach's limitations is lack of checked exceptions support.
One possible workaround is to replace terminal forEach with plain old foreach loop:
Stream<String> stream = Stream.of("", "1", "2", "3").filter(s -> !s.isEmpty());
Iterable<String> iterable = stream::iterator;
for (String s : iterable) {
fileWriter.append(s);
}
Here is list of most popular questions with other workarounds on checked exception handling within lambdas and streams:
Java 8 Lambda function that throws exception?
Java 8: Lambda-Streams, Filter by Method with Exception
How can I throw CHECKED exceptions from inside Java 8 streams?
Java 8: Mandatory checked exceptions handling in lambda expressions. Why mandatory, not optional?
The advantage of Java 1.8 forEach method over 1.7 Enhanced for loop is that while writing code you can focus on business logic only.
forEach method takes java.util.function.Consumer object as an argument, so It helps in having our business logic at a separate location that you can reuse it anytime.
Have look at below snippet,
Here I have created new Class that will override accept class method from Consumer Class,
where you can add additional functionility, More than Iteration..!!!!!!
class MyConsumer implements Consumer<Integer>{
#Override
public void accept(Integer o) {
System.out.println("Here you can also add your business logic that will work with Iteration and you can reuse it."+o);
}
}
public class ForEachConsumer {
public static void main(String[] args) {
// Creating simple ArrayList.
ArrayList<Integer> aList = new ArrayList<>();
for(int i=1;i<=10;i++) aList.add(i);
//Calling forEach with customized Iterator.
MyConsumer consumer = new MyConsumer();
aList.forEach(consumer);
// Using Lambda Expression for Consumer. (Functional Interface)
Consumer<Integer> lambda = (Integer o) ->{
System.out.println("Using Lambda Expression to iterate and do something else(BI).. "+o);
};
aList.forEach(lambda);
// Using Anonymous Inner Class.
aList.forEach(new Consumer<Integer>(){
#Override
public void accept(Integer o) {
System.out.println("Calling with Anonymous Inner Class "+o);
}
});
}
}
Which of the following is better practice in Java 8?
Java 8:
joins.forEach(join -> mIrc.join(mSession, join));
Java 7:
for (String join : joins) {
mIrc.join(mSession, join);
}
I have lots of for loops that could be "simplified" with lambdas, but is there really any advantage of using them? Would it improve their performance and readability?
EDIT
I'll also extend this question to longer methods. I know that you can't return or break the parent function from a lambda and this should also be taken into consideration when comparing them, but is there anything else to be considered?
The better practice is to use for-each. Besides violating the Keep It Simple, Stupid principle, the new-fangled forEach() has at least the following deficiencies:
Can't use non-final variables. So, code like the following can't be turned into a forEach lambda:
Object prev = null;
for(Object curr : list)
{
if( prev != null )
foo(prev, curr);
prev = curr;
}
Can't handle checked exceptions. Lambdas aren't actually forbidden from throwing checked exceptions, but common functional interfaces like Consumer don't declare any. Therefore, any code that throws checked exceptions must wrap them in try-catch or Throwables.propagate(). But even if you do that, it's not always clear what happens to the thrown exception. It could get swallowed somewhere in the guts of forEach()
Limited flow-control. A return in a lambda equals a continue in a for-each, but there is no equivalent to a break. It's also difficult to do things like return values, short circuit, or set flags (which would have alleviated things a bit, if it wasn't a violation of the no non-final variables rule). "This is not just an optimization, but critical when you consider that some sequences (like reading the lines in a file) may have side-effects, or you may have an infinite sequence."
Might execute in parallel, which is a horrible, horrible thing for all but the 0.1% of your code that needs to be optimized. Any parallel code has to be thought through (even if it doesn't use locks, volatiles, and other particularly nasty aspects of traditional multi-threaded execution). Any bug will be tough to find.
Might hurt performance, because the JIT can't optimize forEach()+lambda to the same extent as plain loops, especially now that lambdas are new. By "optimization" I do not mean the overhead of calling lambdas (which is small), but to the sophisticated analysis and transformation that the modern JIT compiler performs on running code.
If you do need parallelism, it is probably much faster and not much more difficult to use an ExecutorService. Streams are both automagical (read: don't know much about your problem) and use a specialized (read: inefficient for the general case) parallelization strategy (fork-join recursive decomposition).
Makes debugging more confusing, because of the nested call hierarchy and, god forbid, parallel execution. The debugger may have issues displaying variables from the surrounding code, and things like step-through may not work as expected.
Streams in general are more difficult to code, read, and debug. Actually, this is true of complex "fluent" APIs in general. The combination of complex single statements, heavy use of generics, and lack of intermediate variables conspire to produce confusing error messages and frustrate debugging. Instead of "this method doesn't have an overload for type X" you get an error message closer to "somewhere you messed up the types, but we don't know where or how." Similarly, you can't step through and examine things in a debugger as easily as when the code is broken into multiple statements, and intermediate values are saved to variables. Finally, reading the code and understanding the types and behavior at each stage of execution may be non-trivial.
Sticks out like a sore thumb. The Java language already has the for-each statement. Why replace it with a function call? Why encourage hiding side-effects somewhere in expressions? Why encourage unwieldy one-liners? Mixing regular for-each and new forEach willy-nilly is bad style. Code should speak in idioms (patterns that are quick to comprehend due to their repetition), and the fewer idioms are used the clearer the code is and less time is spent deciding which idiom to use (a big time-drain for perfectionists like myself!).
As you can see, I'm not a big fan of the forEach() except in cases when it makes sense.
Particularly offensive to me is the fact that Stream does not implement Iterable (despite actually having method iterator) and cannot be used in a for-each, only with a forEach(). I recommend casting Streams into Iterables with (Iterable<T>)stream::iterator. A better alternative is to use StreamEx which fixes a number of Stream API problems, including implementing Iterable.
That said, forEach() is useful for the following:
Atomically iterating over a synchronized list. Prior to this, a list generated with Collections.synchronizedList() was atomic with respect to things like get or set, but was not thread-safe when iterating.
Parallel execution (using an appropriate parallel stream). This saves you a few lines of code vs using an ExecutorService, if your problem matches the performance assumptions built into Streams and Spliterators.
Specific containers which, like the synchronized list, benefit from being in control of iteration (although this is largely theoretical unless people can bring up more examples)
Calling a single function more cleanly by using forEach() and a method reference argument (ie, list.forEach (obj::someMethod)). However, keep in mind the points on checked exceptions, more difficult debugging, and reducing the number of idioms you use when writing code.
Articles I used for reference:
Everything about Java 8
Iteration Inside and Out (as pointed out by another poster)
EDIT: Looks like some of the original proposals for lambdas (such as http://www.javac.info/closures-v06a.html Google Cache) solved some of the issues I mentioned (while adding their own complications, of course).
The advantage comes into account when the operations can be executed in parallel. (See http://java.dzone.com/articles/devoxx-2012-java-8-lambda-and - the section about internal and external iteration)
The main advantage from my point of view is that the implementation of what is to be done within the loop can be defined without having to decide if it will be executed in parallel or sequential
If you want your loop to be executed in parallel you could simply write
joins.parallelStream().forEach(join -> mIrc.join(mSession, join));
You will have to write some extra code for thread handling etc.
Note: For my answer I assumed joins implementing the java.util.Stream interface. If joins implements only the java.util.Iterable interface this is no longer true.
When reading this question one can get the impression, that Iterable#forEach in combination with lambda expressions is a shortcut/replacement for writing a traditional for-each loop. This is simply not true. This code from the OP:
joins.forEach(join -> mIrc.join(mSession, join));
is not intended as a shortcut for writing
for (String join : joins) {
mIrc.join(mSession, join);
}
and should certainly not be used in this way. Instead it is intended as a shortcut (although it is not exactly the same) for writing
joins.forEach(new Consumer<T>() {
#Override
public void accept(T join) {
mIrc.join(mSession, join);
}
});
And it is as a replacement for the following Java 7 code:
final Consumer<T> c = new Consumer<T>() {
#Override
public void accept(T join) {
mIrc.join(mSession, join);
}
};
for (T t : joins) {
c.accept(t);
}
Replacing the body of a loop with a functional interface, as in the examples above, makes your code more explicit: You are saying that (1) the body of the loop does not affect the surrounding code and control flow, and (2) the body of the loop may be replaced with a different implementation of the function, without affecting the surrounding code. Not being able to access non final variables of the outer scope is not a deficit of functions/lambdas, it is a feature that distinguishes the semantics of Iterable#forEach from the semantics of a traditional for-each loop. Once one gets used to the syntax of Iterable#forEach, it makes the code more readable, because you immediately get this additional information about the code.
Traditional for-each loops will certainly stay good practice (to avoid the overused term "best practice") in Java. But this doesn't mean, that Iterable#forEach should be considered bad practice or bad style. It is always good practice, to use the right tool for doing the job, and this includes mixing traditional for-each loops with Iterable#forEach, where it makes sense.
Since the downsides of Iterable#forEach have already been discussed in this thread, here are some reasons, why you might probably want to use Iterable#forEach:
To make your code more explicit: As described above, Iterable#forEach can make your code more explicit and readable in some situations.
To make your code more extensible and maintainable: Using a function as the body of a loop allows you to replace this function with different implementations (see Strategy Pattern). You could e.g. easily replace the lambda expression with a method call, that may be overwritten by sub-classes:
joins.forEach(getJoinStrategy());
Then you could provide default strategies using an enum, that implements the functional interface. This not only makes your code more extensible, it also increases maintainability because it decouples the loop implementation from the loop declaration.
To make your code more debuggable: Seperating the loop implementation from the declaration can also make debugging more easy, because you could have a specialized debug implementation, that prints out debug messages, without the need to clutter your main code with if(DEBUG)System.out.println(). The debug implementation could e.g. be a delegate, that decorates the actual function implementation.
To optimize performance-critical code: Contrary to some of the assertions in this thread, Iterable#forEach does already provide better performance than a traditional for-each loop, at least when using ArrayList and running Hotspot in "-client" mode. While this performance boost is small and negligible for most use cases, there are situations, where this extra performance can make a difference. E.g. library maintainers will certainly want to evaluate, if some of their existing loop implementations should be replaced with Iterable#forEach.
To back this statement up with facts, I have done some micro-benchmarks with Caliper. Here is the test code (latest Caliper from git is needed):
#VmOptions("-server")
public class Java8IterationBenchmarks {
public static class TestObject {
public int result;
}
public #Param({"100", "10000"}) int elementCount;
ArrayList<TestObject> list;
TestObject[] array;
#BeforeExperiment
public void setup(){
list = new ArrayList<>(elementCount);
for (int i = 0; i < elementCount; i++) {
list.add(new TestObject());
}
array = list.toArray(new TestObject[list.size()]);
}
#Benchmark
public void timeTraditionalForEach(int reps){
for (int i = 0; i < reps; i++) {
for (TestObject t : list) {
t.result++;
}
}
return;
}
#Benchmark
public void timeForEachAnonymousClass(int reps){
for (int i = 0; i < reps; i++) {
list.forEach(new Consumer<TestObject>() {
#Override
public void accept(TestObject t) {
t.result++;
}
});
}
return;
}
#Benchmark
public void timeForEachLambda(int reps){
for (int i = 0; i < reps; i++) {
list.forEach(t -> t.result++);
}
return;
}
#Benchmark
public void timeForEachOverArray(int reps){
for (int i = 0; i < reps; i++) {
for (TestObject t : array) {
t.result++;
}
}
}
}
And here are the results:
Results for -client
Results for -server
When running with "-client", Iterable#forEach outperforms the traditional for loop over an ArrayList, but is still slower than directly iterating over an array. When running with "-server", the performance of all approaches is about the same.
To provide optional support for parallel execution: It has already been said here, that the possibility to execute the functional interface of Iterable#forEach in parallel using streams, is certainly an important aspect. Since Collection#parallelStream() does not guarantee, that the loop is actually executed in parallel, one must consider this an optional feature. By iterating over your list with list.parallelStream().forEach(...);, you explicitly say: This loop supports parallel execution, but it does not depend on it. Again, this is a feature and not a deficit!
By moving the decision for parallel execution away from your actual loop implementation, you allow optional optimization of your code, without affecting the code itself, which is a good thing. Also, if the default parallel stream implementation does not fit your needs, no one is preventing you from providing your own implementation. You could e.g. provide an optimized collection depending on the underlying operating system, on the size of the collection, on the number of cores, and on some preference settings:
public abstract class MyOptimizedCollection<E> implements Collection<E>{
private enum OperatingSystem{
LINUX, WINDOWS, ANDROID
}
private OperatingSystem operatingSystem = OperatingSystem.WINDOWS;
private int numberOfCores = Runtime.getRuntime().availableProcessors();
private Collection<E> delegate;
#Override
public Stream<E> parallelStream() {
if (!System.getProperty("parallelSupport").equals("true")) {
return this.delegate.stream();
}
switch (operatingSystem) {
case WINDOWS:
if (numberOfCores > 3 && delegate.size() > 10000) {
return this.delegate.parallelStream();
}else{
return this.delegate.stream();
}
case LINUX:
return SomeVerySpecialStreamImplementation.stream(this.delegate.spliterator());
case ANDROID:
default:
return this.delegate.stream();
}
}
}
The nice thing here is, that your loop implementation doesn't need to know or care about these details.
forEach() can be implemented to be faster than for-each loop, because the iterable knows the best way to iterate its elements, as opposed to the standard iterator way. So the difference is loop internally or loop externally.
For example ArrayList.forEach(action) may be simply implemented as
for(int i=0; i<size; i++)
action.accept(elements[i])
as opposed to the for-each loop which requires a lot of scaffolding
Iterator iter = list.iterator();
while(iter.hasNext())
Object next = iter.next();
do something with `next`
However, we also need to account for two overhead costs by using forEach(), one is making the lambda object, the other is invoking the lambda method. They are probably not significant.
see also http://journal.stuffwithstuff.com/2013/01/13/iteration-inside-and-out/ for comparing internal/external iterations for different use cases.
TL;DR: List.stream().forEach() was the fastest.
I felt I should add my results from benchmarking iteration.
I took a very simple approach (no benchmarking frameworks) and benchmarked 5 different methods:
classic for
classic foreach
List.forEach()
List.stream().forEach()
List.parallelStream().forEach
the testing procedure and parameters
private List<Integer> list;
private final int size = 1_000_000;
public MyClass(){
list = new ArrayList<>();
Random rand = new Random();
for (int i = 0; i < size; ++i) {
list.add(rand.nextInt(size * 50));
}
}
private void doIt(Integer i) {
i *= 2; //so it won't get JITed out
}
The list in this class shall be iterated over and have some doIt(Integer i) applied to all it's members, each time via a different method.
in the Main class I run the tested method three times to warm up the JVM. I then run the test method 1000 times summing the time it takes for each iteration method (using System.nanoTime()). After that's done i divide that sum by 1000 and that's the result, average time.
example:
myClass.fored();
myClass.fored();
myClass.fored();
for (int i = 0; i < reps; ++i) {
begin = System.nanoTime();
myClass.fored();
end = System.nanoTime();
nanoSum += end - begin;
}
System.out.println(nanoSum / reps);
I ran this on a i5 4 core CPU, with java version 1.8.0_05
classic for
for(int i = 0, l = list.size(); i < l; ++i) {
doIt(list.get(i));
}
execution time: 4.21 ms
classic foreach
for(Integer i : list) {
doIt(i);
}
execution time: 5.95 ms
List.forEach()
list.forEach((i) -> doIt(i));
execution time: 3.11 ms
List.stream().forEach()
list.stream().forEach((i) -> doIt(i));
execution time: 2.79 ms
List.parallelStream().forEach
list.parallelStream().forEach((i) -> doIt(i));
execution time: 3.6 ms
I feel that I need to extend my comment a bit...
About paradigm\style
That's probably the most notable aspect. FP became popular due to what you can get avoiding side-effects. I won't delve deep into what pros\cons you can get from this, since this is not related to the question.
However, I will say that the iteration using Iterable.forEach is inspired by FP and rather result of bringing more FP to Java (ironically, I'd say that there is no much use for forEach in pure FP, since it does nothing except introducing side-effects).
In the end I would say that it is rather a matter of taste\style\paradigm you are currently writing in.
About parallelism.
From performance point of view there is no promised notable benefits from using Iterable.forEach over foreach(...).
According to official docs on Iterable.forEach :
Performs the given action on the contents of the Iterable, in the
order elements occur when iterating, until all elements have been
processed or the action throws an exception.
... i.e. docs pretty much clear that there will be no implicit parallelism. Adding one would be LSP violation.
Now, there are "parallell collections" that are promised in Java 8, but to work with those you need to me more explicit and put some extra care to use them (see mschenk74's answer for example).
BTW: in this case Stream.forEach will be used, and it doesn't guarantee that actual work will be done in parallell (depends on underlying collection).
UPDATE: might be not that obvious and a little stretched at a glance but there is another facet of style and readability perspective.
First of all - plain old forloops are plain and old. Everybody already knows them.
Second, and more important - you probably want to use Iterable.forEach only with one-liner lambdas. If "body" gets heavier - they tend to be not-that readable.
You have 2 options from here - use inner classes (yuck) or use plain old forloop.
People often gets annoyed when they see the same things (iteratins over collections) being done various vays/styles in the same codebase, and this seems to be the case.
Again, this might or might not be an issue. Depends on people working on code.
One of most upleasing functional forEach's limitations is lack of checked exceptions support.
One possible workaround is to replace terminal forEach with plain old foreach loop:
Stream<String> stream = Stream.of("", "1", "2", "3").filter(s -> !s.isEmpty());
Iterable<String> iterable = stream::iterator;
for (String s : iterable) {
fileWriter.append(s);
}
Here is list of most popular questions with other workarounds on checked exception handling within lambdas and streams:
Java 8 Lambda function that throws exception?
Java 8: Lambda-Streams, Filter by Method with Exception
How can I throw CHECKED exceptions from inside Java 8 streams?
Java 8: Mandatory checked exceptions handling in lambda expressions. Why mandatory, not optional?
The advantage of Java 1.8 forEach method over 1.7 Enhanced for loop is that while writing code you can focus on business logic only.
forEach method takes java.util.function.Consumer object as an argument, so It helps in having our business logic at a separate location that you can reuse it anytime.
Have look at below snippet,
Here I have created new Class that will override accept class method from Consumer Class,
where you can add additional functionility, More than Iteration..!!!!!!
class MyConsumer implements Consumer<Integer>{
#Override
public void accept(Integer o) {
System.out.println("Here you can also add your business logic that will work with Iteration and you can reuse it."+o);
}
}
public class ForEachConsumer {
public static void main(String[] args) {
// Creating simple ArrayList.
ArrayList<Integer> aList = new ArrayList<>();
for(int i=1;i<=10;i++) aList.add(i);
//Calling forEach with customized Iterator.
MyConsumer consumer = new MyConsumer();
aList.forEach(consumer);
// Using Lambda Expression for Consumer. (Functional Interface)
Consumer<Integer> lambda = (Integer o) ->{
System.out.println("Using Lambda Expression to iterate and do something else(BI).. "+o);
};
aList.forEach(lambda);
// Using Anonymous Inner Class.
aList.forEach(new Consumer<Integer>(){
#Override
public void accept(Integer o) {
System.out.println("Calling with Anonymous Inner Class "+o);
}
});
}
}
When you're designing the API for a code library, you want it to be easy to use well, and hard to use badly. Ideally you want it to be idiot proof.
You might also want to make it compatible with older systems that can't handle generics, like .Net 1.1 and Java 1.4. But you don't want it to be a pain to use from newer code.
I'm wondering about the best way to make things easily iterable in a type-safe way... Remembering that you can't use generics so Java's Iterable<T> is out, as is .Net's IEnumerable<T>.
You want people to be able to use the enhanced for loop in Java (for Item i : items), and the foreach / For Each loop in .Net, and you don't want them to have to do any casting. Basically you want your API to be now-friendly as well as backwards compatible.
The best type-safe option that I can think of is arrays. They're fully backwards compatible and they're easy to iterate in a typesafe way. But arrays aren't ideal because you can't make them immutable. So, when you have an immutable object containing an array that you want people to be able to iterate over, to maintain immutability you have to provide a defensive copy each and every time they access it.
In Java, doing (MyObject[]) myInternalArray.clone(); is super-fast. I'm sure that the equivalent in .Net is super-fast too. If you have like:
class Schedule {
private Appointment[] internalArray;
public Appointment[] appointments() {
return (Appointment[]) internalArray.clone();
}
}
people can do like:
for (Appointment a : schedule.appointments()) {
a.doSomething();
}
and it will be simple, clear, type-safe, and fast.
But they could do something like:
for (int i = 0; i < schedule.appointments().length; i++) {
Appointment a = schedule.appointments()[i];
}
And then it would be horribly inefficient because the entire array of appointments would get cloned twice for every iteration (once for the length test, and once to get the object at the index). Not such a problem if the array is small, but pretty horrible if the array has thousands of items in it. Yuk.
Would anyone actually do that? I'm not sure... I guess that's largely my question here.
You could call the method toAppointmentArray() instead of appointments(), and that would probably make it less likely that anyone would use it the wrong way. But it would also make it harder for people to find when they just want to iterate over the appointments.
You would, of course, document appointments() clearly, to say that it returns a defensive copy. But a lot of people won't read that particular bit of documentation.
Although I'd welcome suggestions, it seems to me that there's no perfect way to make it simple, clear, type-safe, and idiot proof. Have I failed if a minority of people are unwitting cloning arrays thousands of times, or is that an acceptable price to pay for simple, type-safe iteration for the majority?
NB I happen to be designing this library for both Java and .Net, which is why I've tried to make this question applicable to both. And I tagged it language-agnostic because it's an issue that could arise for other languages too. The code samples are in Java, but C# would be similar (albeit with the option of making the Appointments accessor a property).
UPDATE: I did a few quick performance tests to see how much difference this made in Java. I tested:
cloning the array once, and iterating over it using the enhanced for loop
iterating over an ArrayList using
the enhanced for loop
iterating over an unmodifyable
ArrayList (from
Collections.unmodifyableList) using
the enhanced for loop
iterating over the array the bad way (cloning it repeatedly in the length check
and when getting each indexed item).
For 10 objects, the relative speeds (doing multiple repeats and taking the median) were like:
1,000
1,300
1,300
5,000
For 100 objects:
1,300
4,900
6,300
85,500
For 1000 objects:
6,400
51,700
56,200
7,000,300
For 10000 objects:
68,000
445,000
651,000
655,180,000
Rough figures for sure, but enough to convince me of two things:
Cloning, then iterating is definitely
not a performance issue. In fact
it's consistently faster than using a
List. (this is why Java's
enum.values() method returns a
defensive copy of an array instead of
an immutable list.)
If you repeatedly call the method,
repeatedly cloning the array unnecessarily,
performance becomes more and more of an issue the larger the arrays in question. It's pretty horrible. No surprises there.
clone() is fast but not what I would describe as super faster.
If you don't trust people to write loops efficiently, I would not let them write a loop (which also avoids the need for a clone())
interface AppointmentHandler {
public void onAppointment(Appointment appointment);
}
class Schedule {
public void forEachAppointment(AppointmentHandler ah) {
for(Appointment a: internalArray)
ah.onAppointment(a);
}
}
Since you can't really have it both ways, I would suggest that you create a pre generics and a generics version of your API. Ideally, the underlying implementation can be mostly the same, but the fact is, if you want it to be easy to use for anyone using Java 1.5 or later, they will expect the usage of Generics and Iterable and all the newer languange features.
I think the usage of arrays should be non-existent. It does not make for an easy to use API in either case.
NOTE: I have never used C#, but I would expect the same holds true.
As far as failing a minority of the users, those that would call the same method to get the same object on each iteration of the loop would be asking for inefficiency regardless of API design. I think as long as that's well documented, it's not too much to ask that the users obey some semblance of common sense.
I often see code like:
Iterator i = list.iterator();
while(i.hasNext()) {
...
}
but I write that (when Java 1.5 isn't available or for each can't be used) as:
for(Iterator i = list.iterator(); i.hasNext(); ) {
...
}
because
It is shorter
It keeps i in a smaller scope
It reduces the chance of confusion. (Is i used outside the
while? Where is i declared?)
I think code should be as simple to understand as possible so that I only have to make complex code to do complex things. What do you think? Which is better?
From: http://jamesjava.blogspot.com/2006/04/iterating.html
I prefer the for loop because it also sets the scope of the iterator to just the for loop.
There are appropriate uses for the while, the for, and the foreach constructs:
while - Use this if you are iterating and the deciding factor for looping or not is based merely on a condition. In this loop construct, keeping an index is only a secondary concern; everything should be based on the condition
for - Use this if you are looping and your primary concern is the index of the array/collection/list. It is more useful to use a for if you are most likely to go through all the elements anyway, and in a particular order (e.g., going backwards through a sorted list, for example).
foreach - Use this if you merely need to go through your collection regardless of order.
Obviously there are exceptions to the above, but that's the general rule I use when deciding to use which. That being said I tend to use foreach more often.
Why not use the for-each construct? (I haven't used Java in a while, but this exists in C# and I'm pretty sure Java 1.5 has this too):
List<String> names = new ArrayList<String>();
names.add("a");
names.add("b");
names.add("c");
for (String name : names)
System.out.println(name.charAt(0));
I think scope is the biggest issue here, as you have pointed out.
In the "while" example, the iterator is declared outside the loop, so it will continue to exist after the loop is done. This may cause issues if this same iterator is used again at some later point. E. g. you may forget to initialize it before using it in another loop.
In the "for" example, the iterator is declared inside the loop, so its scope is limited to the loop. If you try to use it after the loop, you will get a compiler error.
if you're only going to use the iterator once and throw it away, the second form is preferred; otherwise you must use the first form
IMHO, the for loop is less readable in this scenario, if you look at this code from the perspective of English language. I am working on a code where author does abuse for loop, and it ain't pretty. Compare following:
for (; (currUserObjectIndex < _domainObjectReferences.Length) && (_domainObjectReferences[currUserObjectIndex].VisualIndex == index); ++currUserObjectIndex)
++currNumUserObjects;
vs
while (currUserObjectIndex < _domainObjectReferences.Length && _domainObjectReferences[currUserObjectIndex].VisualIndex == index)
{
++currNumUserObjects;
++currUserObjectIndex;
}
I would agree that the "for" loop is clearer and more appropriate when iterating.
The "while" loop is appropriate for polling, or where the number of loops to meet exit condition will change based on activity inside the loop.
Not that it probably matters in this case, but Compilers, VMs and CPU's normally have special optimization techniques they user under the hood that will make for loops performance better (and in the near future parallel), in general they don't do that with while loops (because its harder to determine how it's actually going to run). But in most cases code clarity should trump optimization.
Using for loop you can work with a single variable, as it sets the scope of variable for a current working for loop only. However this is not possible in while loop.
For Example:
int i; for(i=0; in1;i++) do something..
for(i=0;i n2;i+=2) do something.
So after 1st loop i=n1-1 at the end. But while using second loop you can set i again to 0.
However
int i=0;
while(i less than limit) { do something ..; i++; }
Hence i is set to limit-1 at the end. So you cant use same i in another while loop.
Either is fine. I use for () myself, and I don't know if there are compile issues. I suspect they both get optimized down to pretty much the same thing.
I agree that the for loop should be used whenever possible but sometimes there's more complex logic that controls the iterator in the body of the loop. In that case you have to go with while.
I was the for loop for clarity. While I use the while loop when faced with some undeterministic condition.
Both are fine, but remember that sometimes access to the Iterator directly is useful (such as if you are removing elements that match a certain condition - you will get a ConcurrentModificationException if you do collection.remove(o) inside a for(T o : collection) loop).
I prefer to write the for(blah : blah) [foreach] syntax almost all of the time because it seems more naturally readable to me. The concept of iterators in general don't really have parallels outside of programming
Academia tends to prefer the while-loop as it makes for less complicated reasoning about programs. I tend to prefer the for- or foreach-loop structures as they make for easier-to-read code.
Although both are really fine, I tend to use the first example because it is easier to read.
There are fewer operations happening on each line with the while() loop, making the code easier for someone new to the code to understand what's going on.
That type of construct also allows me to group initializations in a common location (at the top of the method) which also simplifies commenting for me, and conceptualization for someone reading it for the first time.