I'm reading this fantastic article about Lambda Expressions and the following is uncleared to me:
Does Lambda Expression saves the value of the free-variables or refernse/pointer to each of them? (I guess the answer is the latter because if not, mutate free-variables would be valid).
Don't count on the compiler to catch all concurrent access errors. The
prohibition against mutation holds only for local variables.
I'm not sure that self experimenting would cover all the cases so I'm searching for a well defined rules about:
What free varibles can be mutated inside the Lambda Expression (static/properties/local variables/parameters) and which can be mutated out side while beeing used inside a Lambda Expression?
Can I mutate every free variable after the end of a block of a Lambda Expression after I used it (read or called one of his methods) inisde a Lambda Expression?
Don't count on the compiler to catch all concurrent access errors. The
prohibition against mutation holds only for local variables.
If
matchesis an instance or static variable of an enclosing class, then
no error is reported, even though the result is just as undefined.
Does the result of the mutation is undefined even when I use a synchroniziton algorithm?
Update 1:
free variables - that is, the variables that are not parameters and not defined inside the code.
In simple words I can conclude that Free variables are all the variables that are not parameters of the Lambda Expression and are not defined inside the same Lambda Expression ?
This looks like complicated "words" on a simpler topic. The rules are pretty much the same as for anonymous classes.
For example the compiler catches this:
int x = 3;
Runnable r = () -> {
x = 6; // Local variable x defined in an enclosing scope must be final or effectively final
};
But at the same time it is perfectly legal to do this(from a compiler point of view):
final int x[] = { 0 };
Runnable r = () -> {
x[0] = 6;
};
The example that you provided and uses matches:
List<Path> matches = new ArrayList<>();
List<Path> files = List.of();
for (Path p : files) {
new Thread(() -> {
if (1 == 1) {
matches.add(p);
}
}).start();
}
has the same problem. The compiler does not complain about you editing matches(because you are not changing the reference matches - so it is effectively final); but at the same time this can have undefined results. This operation has side-effects and is discouraged in general.
The undefined results would come from the fact that your matches is not a thread-safe collection obviously.
And your last point : Does the result of the mutation is undefined even when I use a synchroniziton algorithm?. Of course not. With proper synchronization updating a variable outside lambda(or a stream) will work - but are discouraged, mainly because there would be other ways to achieve that.
EDIT
OK, so free variables are those that are not defined within the lambda code itself or are not the parameters of the lambda itself.
In this case the answer to 1) would be: lambda expressions are de-sugared to methods and the rules for free-variables are the same as for anonymous classes. This has been discussed numerous times, like here. This actually answers the second question as well - since the rules are the same. Obviously anything that is final or effectively final can be mutated. For primitives - this means they can't be mutated; for objects you can't mutate the references (but can change the underlying data - as shown in my example). For the 3) - yes.
Your term “free variables” is misleading at best. If you’re not talking about local variables (which must be effectively final to be captured), you are talking about heap variables.
Heap variables might be instance fields, static fields or array elements. For unqualified access to instance variables from the surrounding context, the lambda expression may (and will) access them via the captured this reference. For other instance fields, as well as array elements, you need an explicit access via a variable anyway, so it’s clear, how the heap variable will be accessed. Only static fields are accessed directly.
The rules are simple, unless being declared final, you can modify all of them, inside or outside the lambda expression. Keep in mind that lambda expressions can call arbitrary methods, containing arbitrary code anyway. Whether this will cause problems, depends on how you use the lambda expressions. You can even create problems with functions not directly modifying a variable, without any concurrency, e.g.
ArrayList<String> list=new ArrayList<>(Arrays.asList("foo", "bar"));
list.removeIf(s -> list.remove("bar"));
may throw a java.util.ConcurrentModificationException due to the list modification in an ongoing iteration.
Likewise, modifying a variable or resource in a concurrent context might break it, even if you made sure that the modification of the variable itself has been done in a thread-safe manner. It’s all about the contracts of the API you are using.
Most notably, when using parallel Streams, you have to be aware that functions are not only evaluated by different threads, they are also evaluating arbitrary elements of the Stream, regardless of their encounter order. For the final result of the Stream processing, the implementation will assemble partial results in a way that reestablishes the encounter order, if necessary, but the intermediate operations evaluate the elements in an arbitrary order, hence your functions must not only be thread safe, but also not rely on a particular processing order. In some cases, they may even process elements not contributing to the final result.
Since your bullet 3 refers to “after the end of a block”, I want to emphasize that it is irrelevant at which place inside your lambda expression the modification (or perceivable side effect) happens.
Generally, you are better off with functions not having such side effects. But this doesn’t imply that they are forbidden in general.
Related
int[] arr = new int[]{0};
l.stream().forEach(x -> {if (x > 10 && x < 15) { arr[0] += 1;}});
l is List<Integer>. Here I use one element arr array to store value that is changed inside the stream. An alternative solution is to use an instance of AtomicInteger class. But I don't understand what is the difference between these two approaches in terms of memory usage, time complexity, safety...
Please note: I am not trying to use AtomicInteger (or array) in this particular piece of code. This code is used only as an example. Thanks!
Knowing which is the best way is important and #rzwitserloot's explanation covers that in great detail. In your specific example, you could avoid the issue by doing it like this.
List<Integer> list = List.of(1,2,11,12,15,11,11,9,10,2,3);
int count = list.stream().filter(x->x > 10 && x < 15).reduce(0, (a,b)->a+1);
// or
int count = list.stream().filter(x->x > 10 && x < 15).mapToInt(x->1).sum();
Both return the value 4
In the first example, reduce sets an initial value of 0 and then adds 1 to it (b is syntactically required but not used). To sum the actual elements rather than 1, replace 1 with b in the reduce method.
In the second example, the values are replace with 1 in the stream and then summed. Since the method sum() doesn't exist for streams of objects, the 1 needs to be mapped to an int to create an IntStream. To sum the actual elements here, use mapToInt(x->x)
As suggested in the comments, you can also do it like this.
long count = list.stream().filter(x->x > 10 && x < 15).count();
count() returns a long so it would have to be down cast to an int if that is what you want.
You should always use AtomicInteger:
The performance impact is negligible. Technically, new int[1] is 'faster', but they are the same size, or, the array is actually larger in heap (but unlikely; depends on your OS architecture, usually they'd end up being the same size), and the array does not spend any cycles on guaranteeing proper concurrency protections, but there are really only two options: [A] the concurrency protections are required (because it's a lambda that runs in another thread), and thus the int array is a non-starter; it would result in hard to find bugs, quite horrible, or [B] they aren't required, and the hotspot engine is likely going to figure that out and eliminate this cost entirely. Even if it doesn't, the overhead of concurrency protection when there is no contention is low in any case.
It is more readable. Only slightly so, but new int[1] is weirder than new AtomicInteger(), I'd say. AtomicInteger at least suggests: I want a mutable int that I'm going to mess with from other contexts.
It is more convenient. System.out.println-ing an atomicinteger prints the value. sysouting an array prints garbage.
The convenience methods in AtomicInteger might be relevant. Maybe compareAndSet is useful.
But why?
Lambdas are not transparent in the following 3 things:
Checked exceptions (you cannot throw a checked exception inside a lambda even if the context around your lambda catches it).
Mutable local vars (you cannot touch, let alone change, any variable declared outside of the lambda, unless it is (effectively) final).
Control flow. You can't use break, continue, or return from inside a lambda and have it act like it wasn't: You can't break or continue a loop located outside of your lambda and you can't return form the method outside of your lambda (you can only return from the lambda itself).
These are all very bad things when the lambda runs 'in context', but they are all very good things when the lambda doesn't run in context.
Here is an example:
new TreeSet<String>((a, b) -> a - b);
Here I have created a TreeSet (which is a set that keeps its elements sorted automatically). To make one, you pass in code that determines for any 2 elements which one is 'the higher one', and TreeSet takes care of everything else. That TreeSet can survive your method (just store it in a field or pass it to a method that ends up storing it in a field) and could even escape your thread (have another thread read that field). That means when that code (a - b in this code) is invoked, we could be 5 days from the creation of that TreeSet, in another thread, with the code that 'surrounds' your new TreeSet statement having loooong gone.
In this scenario, all those transparencies make no sense at all:
What does it mean to break back to a loop that has long since completed and the system doesn't even know what it is about anymore?
That catch block uses context that is long gone, such as local vars or the parameters. It can't survive, so if your a - b were to throw something that is checked, the fact that you've wrapped your new TreeSet<> in a try/catch block is meaningless.
What does it mean to access a variable that no longer exists? For that matter, if it still does exist but the lambda runs in a separate thread, do we now start making local vars volatile and declare them on heap instead of stack just in case?
Of course, if your lambda runs within context, as in, you pass the lambda to some method and that method 'uses it or loses it': Runs your lambda a certain amount of times and then forgets all about it, then those lacking transparencies are really annoying.
It's annoying that you can't do this:
public List<String> toLines(List<Path> files) throws IOException {
var allLines = files.stream()
.filter(x -> x.toString().endsWith(".txt"))
.flatMap(x -> Files.readAllLines().stream())
.toList();
}
The only reason the above code fails is that Files.readAllLines() throws IOException. We declared that we throws this onwards but that won't work. You have to kludge up this code, make it bad, by trying to somehow transit that exception out of the lambda or otherwise work around it (the right answer is NOT the use the stream API at all here, write it with a normal for loop!).
Whilst trying to dance around checked exceptions in lambdas is generally just not worth it, you CAN work around the problem of wanting to share a variable with outer context:
int sum = 0;
listOfInts.forEach(x -> sum += x);
The above doesn't work - sum is from the outer scope and thus must be effectively final, and it isn't. There's no particular reason it can't work, but java won't let you. The right answer here is to use int sum = listOfInts.mapToInt(Integer::intValue).sum(); instead, but you can't always find a terminal op that just does what you want. Sometimes you need to kludge around it.
That's where new int[1] and AtomicInteger comes in. These are references - and the reference is final, so you CAN use them in the lambda. But the reference points at an object and you can change it at will, hence, you can use this 'trick' to 'share' a variable:
AtomicInteger sum = new AtomicInteger();
listOfInts.forEach(x -> sum.add(x));
That DOES work.
Do lambda expressions have any use other than saving lines of code?
Are there any special features provided by lambdas which solved problems which weren't easy to solve? The typical usage I've seen is that instead of writing this:
Comparator<Developer> byName = new Comparator<Developer>() {
#Override
public int compare(Developer o1, Developer o2) {
return o1.getName().compareTo(o2.getName());
}
};
We can use a lambda expression to shorten the code:
Comparator<Developer> byName =
(Developer o1, Developer o2) -> o1.getName().compareTo(o2.getName());
Lambda expressions do not change the set of problems you can solve with Java in general, but definitely make solving certain problems easier, just for the same reason we’re not programming in assembly language anymore. Removing redundant tasks from the programmer’s work makes life easier and allows to do things you wouldn’t even touch otherwise, just for the amount of code you would have to produce (manually).
But lambda expressions are not just saving lines of code. Lambda expressions allow you to define functions, something for which you could use anonymous inner classes as a workaround before, that’s why you can replace anonymous inner classes in these cases, but not in general.
Most notably, lambda expressions are defined independently to the functional interface they will be converted to, so there are no inherited members they could access, further, they can not access the instance of the type implementing the functional interface. Within a lambda expression, this and super have the same meaning as in the surrounding context, see also this answer. Also, you can not create new local variables shadowing local variables of the surrounding context. For the intended task of defining a function, this removes a lot of error sources, but it also implies that for other use cases, there might be anonymous inner classes which can not be converted to a lambda expression, even if implementing a functional interface.
Further, the construct new Type() { … } guarantees to produce a new distinct instance (as new always does). Anonymous inner class instances always keep a reference to their outer instance if created in a non-static context¹. In contrast, lambda expressions only capture a reference to this when needed, i.e. if they access this or a non-static member. And they produce instances of an intentionally unspecified identity, which allows the implementation to decide at runtime whether to reuse existing instances (see also “Does a lambda expression create an object on the heap every time it's executed?”).
These differences apply to your example. Your anonymous inner class construct will always produce a new instance, also it may capture a reference to the outer instance, whereas your (Developer o1, Developer o2) -> o1.getName().compareTo(o2.getName()) is a non-capturing lambda expression that will evaluate to a singleton in typical implementations. Further, it doesn’t produce a .class file on your hard drive.
Given the differences regarding both, semantic and performance, lambda expressions may change the way programmers will solve certain problems in the future, of course, also due to the new APIs embracing ideas of functional programming utilizing the new language features. See also Java 8 lambda expression and first-class values.
¹ From JDK 1.1 to JDK 17. Starting with JDK 18, inner classes may not retain a reference to the outer instance if it is not used. For compatibility reasons, this requires the inner class not be serializable. This only applies if you (re)compile the inner class under JDK 18 or newer with target JDK 18 or newer. See also JDK-8271717
Programming languages are not for machines to execute.
They are for programmers to think in.
Languages are a conversation with a compiler to turn our thoughts into something a machine can execute. One of the chief complaints about Java from people who come to it from other languages (or leave it for other languages) used to be that it forces a certain mental model on the programmer (i.e. everything is a class).
I'm not going to weigh in on whether that's good or bad: everything is trade-offs. But Java 8 lambdas allow programmers to think in terms of functions, which is something you previously could not do in Java.
It's the same thing as a procedural programmer learning to think in terms of classes when they come to Java: you see them gradually move from classes that are glorified structs and have 'helper' classes with a bunch of static methods and move on to something that more closely resembles a rational OO design (mea culpa).
If you just think of them as a shorter way to express anonymous inner classes then you are probably not going to find them very impressive in the same way that the procedural programmer above probably didn't think classes were any great improvement.
Saving lines of code can be viewed as a new feature, if it enables you to write a substantial chunk of logic in a shorter and clearer manner, which takes less time for others to read and understand.
Without lambda expressions (and/or method references) Stream pipelines would have been much less readable.
Think, for example, how the following Stream pipeline would have looked like if you replaced each lambda expression with an anonymous class instance.
List<String> names =
people.stream()
.filter(p -> p.getAge() > 21)
.map(p -> p.getName())
.sorted((n1,n2) -> n1.compareToIgnoreCase(n2))
.collect(Collectors.toList());
It would be:
List<String> names =
people.stream()
.filter(new Predicate<Person>() {
#Override
public boolean test(Person p) {
return p.getAge() > 21;
}
})
.map(new Function<Person,String>() {
#Override
public String apply(Person p) {
return p.getName();
}
})
.sorted(new Comparator<String>() {
#Override
public int compare(String n1, String n2) {
return n1.compareToIgnoreCase(n2);
}
})
.collect(Collectors.toList());
This is much harder to write than the version with lambda expressions, and it's much more error prone. It's also harder to understand.
And this is a relatively short pipeline.
To make this readable without lambda expressions and method references, you would have had to define variables that hold the various functional interface instances being used here, which would have split the logic of the pipeline, making it harder to understand.
Internal iteration
When iterating Java Collections, most developers tend to get an element and then process it. This is, take that item out and then use it, or reinsert it, etc. With pre-8 versions of Java, you can implement an inner class and do something like:
numbers.forEach(new Consumer<Integer>() {
public void accept(Integer value) {
System.out.println(value);
}
});
Now with Java 8 you can do better and less verbose with:
numbers.forEach((Integer value) -> System.out.println(value));
or better
numbers.forEach(System.out::println);
Behaviors as arguments
Guess the following case:
public int sumAllEven(List<Integer> numbers) {
int total = 0;
for (int number : numbers) {
if (number % 2 == 0) {
total += number;
}
}
return total;
}
With Java 8 Predicate interface you can do better like so:
public int sumAll(List<Integer> numbers, Predicate<Integer> p) {
int total = 0;
for (int number : numbers) {
if (p.test(number)) {
total += number;
}
}
return total;
}
Calling it like:
sumAll(numbers, n -> n % 2 == 0);
Source: DZone - Why We Need Lambda Expressions in Java
There are many benefits of using lambdas instead of inner class following as below:
Make the code more compactly and expressive without introducing more language syntax semantics. you already gave an example in your question.
By using lambdas you are happy to programming with functional-style operations on streams of elements, such as map-reduce transformations on collections. see java.util.function & java.util.stream packages documentation.
There is no physical classes file generated for lambdas by compiler. Thus, it makes your delivered applications smaller. How Memory assigns to lambda?
The compiler will optimize lambda creation if the lambda doesn't access variables out of its scope, which means the lambda instance only create once by the JVM. for more details you can see #Holger's answer of the question Is method reference caching a good idea in Java 8?
.
Lambdas can implements multi marker interfaces besides the functional interface, but the anonymous inner classes can't implements more interfaces, for example:
// v--- create the lambda locally.
Consumer<Integer> action = (Consumer<Integer> & Serializable) it -> {/*TODO*/};
Lambdas are just syntactic sugar for anonymous classes.
Before lambdas, anonymous classes can be used to achieve the same thing. Every lambda expression can be converted to an anonymous class.
If you are using IntelliJ IDEA, it can do the conversion for you:
Put the cursor in the lambda
Press alt/option + enter
To answer your question, the matter of fact is lambdas don’t let you do anything that you couldn’t do prior to java-8, rather it enables you to write more concise code. The benefits of this, is that your code will be clearer and more flexible.
One thing I don't see mentioned yet is that a lambda lets you define functionality where it's used.
So if you have some simple selection function you don't need to put it in a separate place with a bunch of boilerplate, you just write a lambda that's concise and locally relevant.
Yes many advantages are there.
No need to define whole class we can pass implementation of function it self as reference.
Internally creation of class will create .class file while if you use lambda then class creation is avoided by compiler because in lambda you are passing function implementation instead of class.
Code re-usability is higher then before
And as you said code is shorter then normal implementation.
Function composition and higher order functions.
Lambda functions can be used as building blocks towards building "higher order functions" or performing "function composition". Lambda functions can be seen as reusable building blocks in this sense.
Example of Higher Order Function via lambda:
Function<IntUnaryOperator, IntUnaryOperator> twice = f -> f.andThen(f);
IntUnaryOperator plusThree = i -> i + 3;
var g = twice.apply(plusThree);
System.out.println(g.applyAsInt(7))
Example Function Composition
Predicate<String> startsWithA = (text) -> text.startsWith("A");
Predicate<String> endsWithX = (text) -> text.endsWith("x");
Predicate<String> startsWithAAndEndsWithX =
(text) -> startsWithA.test(text) && endsWithX.test(text);
String input = "A hardworking person must relax";
boolean result = startsWithAAndEndsWithX.test(input);
System.out.println(result);
One benefit not yet mentioned is my favorite: lambdas make deferred execution really easy to write.
Log4j2 uses this for example, where instead of passing a value to conditionally log (a value that may have been expensive to calculate), you can now pass a lambda to calculate that expensive value. The difference being that before, that value was being calculated every time whether it got used or not, whereas now with lambdas if your log level decides not to log that statement, then the lambda never gets called, and that expensive calculation never takes place -- a performance boost!
Could that be done without lambdas? Yes, by surrounding each log statement with if() checks, or using verbose anonymous class syntax, but at the cost of horrible code noise.
Similar examples abound. Lambdas are like having your cake and eating it too: all the efficiency of gnarly multi-line optimized code squeezed down into the visual elegance of one-liners.
Edit: As requested by commenter, an example:
Old way, where expensiveCalculation() always gets called regardless of whether this log statement will actually use it:
logger.trace("expensive value was {}", expensiveCalculation());
New lambda efficient way, where expensiveCalculation() call won't happen unless trace log level is enabled:
logger.trace("expensive value was {}", () -> expensiveCalculation());
I want to increment value of index with the each iteration by 1. Easily to be achieved in the for-loop. The variable image is an array of ImageView.
Here is my for-loop.
for (Map.Entry<String, Item> entry : map.entrySet()) {
image[index].setImage(entry.getValue().getImage());
index++;
}
In order to practise Stream, I have tried to rewrite it to the Stream:
map.entrySet().stream()
.forEach(e -> item[index++].setImage(e.getValue().getImage()));
Causing me the error:
error: local variables referenced from a lambda expression must be final or effectively final
How to rewrite the Stream incrementing the variable index to be used in?
You shouldn't. These two look similar, but they are conceptually different. The loop is just a loop, but a forEach instructs the library to perform the action on each element, without specifying neither the order of actions (for parallel streams) nor threads which will execute them. If you use forEachOrdered, then there are still no guarantees about threads, but at least you have the guarantee of happens-before relationship between actions on subsequent elements.
Note especially that the docs say:
For any given element, the action may be performed at whatever time
and in whatever thread the library chooses. If the action accesses
shared state, it is responsible for providing the required
synchronization.
As #Marko noted in the comments below, though, it only applies to parallel streams, even if the wording is a bit confusing. Nevertheless, using a loop means that you don't even have to worry about all this complicated stuff!
So the bottom line is: use loops if that logic is a part of the function it's in, and use forEach if you just want to tell Java to “do this and that” to elements of the stream.
That was about forEach vs loops. Now on the topic of why the variable needs to be final in the first place, and why you can do that to class fields and array elements. It's because, like it says, Java has the limitation that anonymous classes and lambdas can't access a local variable unless it never changes. Meaning not only they can't change it themselves, but you can't change it outside them as well. But that only applies to local variables, which is why it works for everything else like class fields or array elements.
The reason for this limitation, I think, is lifetime issues. A local variable exists only while the block containing it is executing. Everything else exists while there are references to it, thanks to garbage collection. And that everything else includes lambdas and anonymous classes too, so if they could modify local variables which have different lifetime, that could lead to problems similar to dangling references in C++. So Java took the easy way out: it simply copies the local variable at the time the lambda / anonymous class is created. But that would lead to confusion if you could change that variable (because the copy wouldn't change, and since the copy is invisible it would be very confusing). So Java just forbids any changes to such variables, and that's that.
There are many questions on the final variables and anonymous classes discussed already, like this one.
Some kind of "zip" operation would be helpful here, though standard Stream API lacks it. Some third-party libraries extending Stream API provide it, including my free StreamEx library:
IntStreamEx.ints() // get stream of numbers 0, 1, 2, ...
.boxed() // box them
.zipWith(StreamEx.ofValues(map)) // zip with map values
.forKeyValue((index, item) -> image[index].setImage(item.getImage()));
See zipWith documentation for more details. Note that your map should have meaningful order (like LinkedHashMap), otherwise this would be pretty useless...
When I iterate over a collection using the new syntactic sugar of Java 8, such as
myStream.forEach(item -> {
// do something useful
});
Isn't this equivalent to the 'old syntax' snippet below?
myStream.forEach(new Consumer<Item>() {
#Override
public void accept(Item item) {
// do something useful
}
});
Does this mean a new anonymous Consumer object is created on the heap every time I iterate over a collection? How much heap space does this take? What performance implications does it have? Does it mean I should rather use the old style for loops when iterating over large multi-level data structures?
It is equivalent but not identical. Simply said, if a lambda expression does not capture values, it will be a singleton that is re-used on every invocation.
The behavior is not exactly specified. The JVM is given big freedom on how to implement it. Currently, Oracle’s JVM creates (at least) one instance per lambda expression (i.e. doesn’t share instance between different identical expressions) but creates singletons for all expressions which don’t capture values.
You may read this answer for more details. There, I not only gave a more detailed description but also testing code to observe the current behavior.
This is covered by The Java® Language Specification, chapter “15.27.4. Run-time Evaluation of Lambda Expressions”
Summarized:
These rules are meant to offer flexibility to implementations of the Java programming language, in that:
A new object need not be allocated on every evaluation.
Objects produced by different lambda expressions need not belong to different classes (if the bodies are identical, for example).
Every object produced by evaluation need not belong to the same class (captured local variables might be inlined, for example).
If an "existing instance" is available, it need not have been created at a previous lambda evaluation (it might have been allocated during the enclosing class's initialization, for example).
When an instance representing the lambda is created sensitively depends on the exact contents of your lambda's body. Namely, the key factor is what the lambda captures from the lexical environment. If it doesn't capture any state which is variable from creation to creation, then an instance will not be created each time the for-each loop is entered. Instead a synthetic method will be generated at compile time and the lambda use site will just receive a singleton object that delegates to that method.
Further note that this aspect is implementation-dependent and you can expect future refinements and advancements on HotSpot towards greater efficiency. There are general plans to e.g. make a lightweight object without a full corresponding class, which has just enough information to forward to a single method.
Here is a good, accessible in-depth article on the topic:
http://www.infoq.com/articles/Java-8-Lambdas-A-Peek-Under-the-Hood
You are passing a new instance to the forEach method. Every time you do that you create a new object but not one for every loop iteration. Iteration is done inside forEach method using the same 'callback' object instance until it is done with the loop.
So the memory used by the loop does not depend on the size of the collection.
Isn't this equivalent to the 'old syntax' snippet?
Yes. It has slight differences at a very low level but I don't think you should care about them. Lamba expressions use the invokedynamic feature instead of anonymous classes.
When I iterate over a collection using the new syntactic sugar of Java 8, such as
myStream.forEach(item -> {
// do something useful
});
Isn't this equivalent to the 'old syntax' snippet below?
myStream.forEach(new Consumer<Item>() {
#Override
public void accept(Item item) {
// do something useful
}
});
Does this mean a new anonymous Consumer object is created on the heap every time I iterate over a collection? How much heap space does this take? What performance implications does it have? Does it mean I should rather use the old style for loops when iterating over large multi-level data structures?
It is equivalent but not identical. Simply said, if a lambda expression does not capture values, it will be a singleton that is re-used on every invocation.
The behavior is not exactly specified. The JVM is given big freedom on how to implement it. Currently, Oracle’s JVM creates (at least) one instance per lambda expression (i.e. doesn’t share instance between different identical expressions) but creates singletons for all expressions which don’t capture values.
You may read this answer for more details. There, I not only gave a more detailed description but also testing code to observe the current behavior.
This is covered by The Java® Language Specification, chapter “15.27.4. Run-time Evaluation of Lambda Expressions”
Summarized:
These rules are meant to offer flexibility to implementations of the Java programming language, in that:
A new object need not be allocated on every evaluation.
Objects produced by different lambda expressions need not belong to different classes (if the bodies are identical, for example).
Every object produced by evaluation need not belong to the same class (captured local variables might be inlined, for example).
If an "existing instance" is available, it need not have been created at a previous lambda evaluation (it might have been allocated during the enclosing class's initialization, for example).
When an instance representing the lambda is created sensitively depends on the exact contents of your lambda's body. Namely, the key factor is what the lambda captures from the lexical environment. If it doesn't capture any state which is variable from creation to creation, then an instance will not be created each time the for-each loop is entered. Instead a synthetic method will be generated at compile time and the lambda use site will just receive a singleton object that delegates to that method.
Further note that this aspect is implementation-dependent and you can expect future refinements and advancements on HotSpot towards greater efficiency. There are general plans to e.g. make a lightweight object without a full corresponding class, which has just enough information to forward to a single method.
Here is a good, accessible in-depth article on the topic:
http://www.infoq.com/articles/Java-8-Lambdas-A-Peek-Under-the-Hood
You are passing a new instance to the forEach method. Every time you do that you create a new object but not one for every loop iteration. Iteration is done inside forEach method using the same 'callback' object instance until it is done with the loop.
So the memory used by the loop does not depend on the size of the collection.
Isn't this equivalent to the 'old syntax' snippet?
Yes. It has slight differences at a very low level but I don't think you should care about them. Lamba expressions use the invokedynamic feature instead of anonymous classes.