I found a quiz about Java 8 Stream API of peek method as below
Arrays.asList("Fred", "Jim", "Sheila")
.stream()
.peek(System.out::println)
.allMatch(s -> s.startsWith("F"));
The output is
Fred
Jim
I am confused how this stream works? My expected result should be
Fred
Jim
Sheila
The peek() method is an intermediate operation and it processes each element in Stream. Can anyone explain me this.
It's a stream optimization known as short-circuiting. Essentially, what happens is that allMatch prevents the execution of unnecessary intermediate operations on the stream, because there is no point in performing them when the final result is known.
It's as though this happened:
take"Fred"
peek("Fred")
evaluate("Fred".startsWith("F"))
decide whether the result of allMatch() is known for sure: Not yet
take"Jim"
peek("Jim")
evaluate("Jim".startsWith("F"))
decide whether the result of allMatch() is known for sure: Yes
When "Jim".startsWith("F") is evaluated, the result of allMatch(s -> s.startsWith("F")) is known for certain. It doesn't matter what values come in the pipeline after "Jim", we know that all values start with "F" is false
This is not specific to the peek/allMatch combination, there are multiple intermediate and terminal short-circuiting operations. java.util.stream package's docs state:
Further, some operations are deemed short-circuiting operations. An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
Extend this to finite streams, and short-circuiting operations obviate the execution of unnecessary pipeline steps, as in the case of your example.
Arrays.asList("Fred", "Jim", "Sheila")
.stream()
.peek(System.out::println)
.allMatch(s -> s.startsWith("F"));
First time thru, Fred is printed. It matches so
Second time thru, Jim is printed. It doesn't match so allMatch
terminates because "All didn't match"
So the last item was not consumed from the stream.
The docs for the peek method say (emphasis mine):
Returns a stream consisting of the elements of this stream, additionally performing the provided action on each element as elements are consumed from the resulting stream.
So in this case, peek doesn't see "Sheila" because that value is not consumed from the stream. As soon as "Jim" was consumed, the result of .allMatch(s -> s.startsWith("F")) is already known to be false, so there is no need to consume any more elements from the stream.
As per Java Doc Of allMatch():
Returns whether all elements of this stream match the provided predicate.
May not evaluate the predicate on all elements if not necessary for
determining the result. If the stream is empty then {#code true} is
returned and the predicate is not evaluated.
#apiNote
This method evaluates the universal quantification of the
predicate over the elements of the stream (for all x P(x)). If the
stream is empty, the quantification is said to be vacuously
satisfied and is always {#code true} (regardless of P(x)).
predicate to apply to elements of this stream
#return {#code true} if either all elements of the stream match the
provided predicate or the stream is empty, otherwise {#code false}
In your case:
1-
p(x) : s -> s.startsWith("F")
X : "Fred"
result : X P(X) = true
2-
p(x) : s -> s.startsWith("F")
X : "Jim"
result : X P(X) = false
No further evaluation will take place, because X P(X) = false
boolean result = Arrays.asList("Fred", "Finda", "Fish")
.stream()
.peek(System.out::println)
.allMatch(s -> s.startsWith("F"));
System.out.println("Result "+result);
Output is :
Fred
Finda
Fish
Result true
Here stream processed completely because xP(x) = true from each element
Related
I have a question on the intermediate stages sequential state - are the operations from a stage applied to all the input stream (items) or are all the stages / operations applied to each stream item?
I'm aware the question might not be easy to understand, so I'll give an example. On the following stream processing:
List<String> strings = Arrays.asList("Are Java streams intermediate stages sequential?".split(" "));
strings.stream()
.filter(word -> word.length() > 4)
.peek(word -> System.out.println("f: " + word))
.map(word -> word.length())
.peek(length -> System.out.println("m: " + length))
.forEach(length -> System.out.println("-> " + length + "\n"));
My expectation for this code is that it will output:
f: streams
f: intermediate
f: stages
f: sequential?
m: 7
m: 12
m: 6
m: 11
-> 7
-> 12
-> 6
-> 11
Instead, the output is:
f: streams
m: 7
-> 7
f: intermediate
m: 12
-> 12
f: stages
m: 6
-> 6
f: sequential?
m: 11
-> 11
Are the items just displayed for all the stages, due to the console output? Or are they also processed for all the stages, one at a time?
I can further detail the question, if it's not clear enough.
This behaviour enables optimisation of the code. If each intermediate operation were to process all elements of a stream before proceeding to the next intermediate operation then there would be no chance of optimisation.
So to answer your question, each element moves along the stream pipeline vertically one at a time (except for some stateful operations discussed later), therefore enabling optimisation where possible.
Explanation
Given the example you've provided, each element will move along the stream pipeline vertically one by one as there is no stateful operation included.
Another example, say you were looking for the first String whose length is greater than 4, processing all the elements prior to providing the result is unnecessary and time-consuming.
Consider this simple illustration:
List<String> stringsList = Arrays.asList("1","12","123","1234","12345","123456","1234567");
int result = stringsList.stream()
.filter(s -> s.length() > 4)
.mapToInt(Integer::valueOf)
.findFirst().orElse(0);
The filter intermediate operation above will not find all the elements whose length is greater than 4 and return a new stream of them but rather what happens is as soon as we find the first element whose length is greater than 4, that element goes through to the .mapToInt which then findFirst says "I've found the first element" and execution stops there. Therefore the result will be 12345.
Behaviour of stateful and stateless intermediate operations
Note that when a stateful intermediate operation as such of sorted is included in a stream pipeline then that specific operation will traverse the entire stream. If you think about it, this makes complete sense as in order to sort elements you'll need to see all the elements to determine which elements come first in the sort order.
The distinct intermediate operation is also a stateful operation, however, as #Holger has mentioned unlike sorted, it does not require traversing the entire stream as each distinct element can get passed down the pipeline immediately and may fulfil a short-circuiting condition.
stateless intermediate operations such as filter , map etc do not have to traverse the entire stream and can freely process one element at a time vertically as mentioned above.
Lastly, but not least it's also important to note that, when the terminal operation is a short-circuiting operation the terminal-short-circuiting methods can finish before traversing all the elements of the underlying stream.
reading: Java 8 stream tutorial
Your answer is loop fusion. What we see is that the four
intermediate operations filter() – peek() – map() – peek() – println using forEach() which is a kinda terminal operation have been logically
joined together to constitute a single pass. They are executed in
order for each of the individual element. This joining
together of operations in a single pass is an optimization technique
known as loop fusion.
More for reading: Source
An intermediate operation is always lazily executed. That is to say
they are not run until the point a terminal operation is reached.
A few of the most popular intermediate operations used in a stream
filter – the filter operation returns a stream of elements that
satisfy the predicate passed in as a parameter to the operation. The
elements themselves before and after the filter will have the same
type, however the number of elements will likely change
map – the map operation returns a stream of elements after they have
been processed by the function passed in as a parameter. The
elements before and after the mapping may have a different type, but
there will be the same total number of elements.
distinct – the distinct operation is a special case of the filter
operation. Distinct returns a stream of elements such that each
element is unique in the stream, based on the equals method of the
elements
.java-8-streams-cheat-sheet
Apart from optimisation, the order of processing you'd describe wouldn't work for streams of indeterminate length, like this:
DoubleStream.generate(Math::random).filter(d -> d > 0.9).findFirst();
Admittedly this example doesn't make much sense in practice, but the point is that rather than backed by a fixed-size collection,DoubleStream.generate() creates a potentially infinite stream. The only way to process this is element by element.
I have a question on the intermediate stages sequential state - are the operations from a stage applied to all the input stream (items) or are all the stages / operations applied to each stream item?
I'm aware the question might not be easy to understand, so I'll give an example. On the following stream processing:
List<String> strings = Arrays.asList("Are Java streams intermediate stages sequential?".split(" "));
strings.stream()
.filter(word -> word.length() > 4)
.peek(word -> System.out.println("f: " + word))
.map(word -> word.length())
.peek(length -> System.out.println("m: " + length))
.forEach(length -> System.out.println("-> " + length + "\n"));
My expectation for this code is that it will output:
f: streams
f: intermediate
f: stages
f: sequential?
m: 7
m: 12
m: 6
m: 11
-> 7
-> 12
-> 6
-> 11
Instead, the output is:
f: streams
m: 7
-> 7
f: intermediate
m: 12
-> 12
f: stages
m: 6
-> 6
f: sequential?
m: 11
-> 11
Are the items just displayed for all the stages, due to the console output? Or are they also processed for all the stages, one at a time?
I can further detail the question, if it's not clear enough.
This behaviour enables optimisation of the code. If each intermediate operation were to process all elements of a stream before proceeding to the next intermediate operation then there would be no chance of optimisation.
So to answer your question, each element moves along the stream pipeline vertically one at a time (except for some stateful operations discussed later), therefore enabling optimisation where possible.
Explanation
Given the example you've provided, each element will move along the stream pipeline vertically one by one as there is no stateful operation included.
Another example, say you were looking for the first String whose length is greater than 4, processing all the elements prior to providing the result is unnecessary and time-consuming.
Consider this simple illustration:
List<String> stringsList = Arrays.asList("1","12","123","1234","12345","123456","1234567");
int result = stringsList.stream()
.filter(s -> s.length() > 4)
.mapToInt(Integer::valueOf)
.findFirst().orElse(0);
The filter intermediate operation above will not find all the elements whose length is greater than 4 and return a new stream of them but rather what happens is as soon as we find the first element whose length is greater than 4, that element goes through to the .mapToInt which then findFirst says "I've found the first element" and execution stops there. Therefore the result will be 12345.
Behaviour of stateful and stateless intermediate operations
Note that when a stateful intermediate operation as such of sorted is included in a stream pipeline then that specific operation will traverse the entire stream. If you think about it, this makes complete sense as in order to sort elements you'll need to see all the elements to determine which elements come first in the sort order.
The distinct intermediate operation is also a stateful operation, however, as #Holger has mentioned unlike sorted, it does not require traversing the entire stream as each distinct element can get passed down the pipeline immediately and may fulfil a short-circuiting condition.
stateless intermediate operations such as filter , map etc do not have to traverse the entire stream and can freely process one element at a time vertically as mentioned above.
Lastly, but not least it's also important to note that, when the terminal operation is a short-circuiting operation the terminal-short-circuiting methods can finish before traversing all the elements of the underlying stream.
reading: Java 8 stream tutorial
Your answer is loop fusion. What we see is that the four
intermediate operations filter() – peek() – map() – peek() – println using forEach() which is a kinda terminal operation have been logically
joined together to constitute a single pass. They are executed in
order for each of the individual element. This joining
together of operations in a single pass is an optimization technique
known as loop fusion.
More for reading: Source
An intermediate operation is always lazily executed. That is to say
they are not run until the point a terminal operation is reached.
A few of the most popular intermediate operations used in a stream
filter – the filter operation returns a stream of elements that
satisfy the predicate passed in as a parameter to the operation. The
elements themselves before and after the filter will have the same
type, however the number of elements will likely change
map – the map operation returns a stream of elements after they have
been processed by the function passed in as a parameter. The
elements before and after the mapping may have a different type, but
there will be the same total number of elements.
distinct – the distinct operation is a special case of the filter
operation. Distinct returns a stream of elements such that each
element is unique in the stream, based on the equals method of the
elements
.java-8-streams-cheat-sheet
Apart from optimisation, the order of processing you'd describe wouldn't work for streams of indeterminate length, like this:
DoubleStream.generate(Math::random).filter(d -> d > 0.9).findFirst();
Admittedly this example doesn't make much sense in practice, but the point is that rather than backed by a fixed-size collection,DoubleStream.generate() creates a potentially infinite stream. The only way to process this is element by element.
I have a question on the intermediate stages sequential state - are the operations from a stage applied to all the input stream (items) or are all the stages / operations applied to each stream item?
I'm aware the question might not be easy to understand, so I'll give an example. On the following stream processing:
List<String> strings = Arrays.asList("Are Java streams intermediate stages sequential?".split(" "));
strings.stream()
.filter(word -> word.length() > 4)
.peek(word -> System.out.println("f: " + word))
.map(word -> word.length())
.peek(length -> System.out.println("m: " + length))
.forEach(length -> System.out.println("-> " + length + "\n"));
My expectation for this code is that it will output:
f: streams
f: intermediate
f: stages
f: sequential?
m: 7
m: 12
m: 6
m: 11
-> 7
-> 12
-> 6
-> 11
Instead, the output is:
f: streams
m: 7
-> 7
f: intermediate
m: 12
-> 12
f: stages
m: 6
-> 6
f: sequential?
m: 11
-> 11
Are the items just displayed for all the stages, due to the console output? Or are they also processed for all the stages, one at a time?
I can further detail the question, if it's not clear enough.
This behaviour enables optimisation of the code. If each intermediate operation were to process all elements of a stream before proceeding to the next intermediate operation then there would be no chance of optimisation.
So to answer your question, each element moves along the stream pipeline vertically one at a time (except for some stateful operations discussed later), therefore enabling optimisation where possible.
Explanation
Given the example you've provided, each element will move along the stream pipeline vertically one by one as there is no stateful operation included.
Another example, say you were looking for the first String whose length is greater than 4, processing all the elements prior to providing the result is unnecessary and time-consuming.
Consider this simple illustration:
List<String> stringsList = Arrays.asList("1","12","123","1234","12345","123456","1234567");
int result = stringsList.stream()
.filter(s -> s.length() > 4)
.mapToInt(Integer::valueOf)
.findFirst().orElse(0);
The filter intermediate operation above will not find all the elements whose length is greater than 4 and return a new stream of them but rather what happens is as soon as we find the first element whose length is greater than 4, that element goes through to the .mapToInt which then findFirst says "I've found the first element" and execution stops there. Therefore the result will be 12345.
Behaviour of stateful and stateless intermediate operations
Note that when a stateful intermediate operation as such of sorted is included in a stream pipeline then that specific operation will traverse the entire stream. If you think about it, this makes complete sense as in order to sort elements you'll need to see all the elements to determine which elements come first in the sort order.
The distinct intermediate operation is also a stateful operation, however, as #Holger has mentioned unlike sorted, it does not require traversing the entire stream as each distinct element can get passed down the pipeline immediately and may fulfil a short-circuiting condition.
stateless intermediate operations such as filter , map etc do not have to traverse the entire stream and can freely process one element at a time vertically as mentioned above.
Lastly, but not least it's also important to note that, when the terminal operation is a short-circuiting operation the terminal-short-circuiting methods can finish before traversing all the elements of the underlying stream.
reading: Java 8 stream tutorial
Your answer is loop fusion. What we see is that the four
intermediate operations filter() – peek() – map() – peek() – println using forEach() which is a kinda terminal operation have been logically
joined together to constitute a single pass. They are executed in
order for each of the individual element. This joining
together of operations in a single pass is an optimization technique
known as loop fusion.
More for reading: Source
An intermediate operation is always lazily executed. That is to say
they are not run until the point a terminal operation is reached.
A few of the most popular intermediate operations used in a stream
filter – the filter operation returns a stream of elements that
satisfy the predicate passed in as a parameter to the operation. The
elements themselves before and after the filter will have the same
type, however the number of elements will likely change
map – the map operation returns a stream of elements after they have
been processed by the function passed in as a parameter. The
elements before and after the mapping may have a different type, but
there will be the same total number of elements.
distinct – the distinct operation is a special case of the filter
operation. Distinct returns a stream of elements such that each
element is unique in the stream, based on the equals method of the
elements
.java-8-streams-cheat-sheet
Apart from optimisation, the order of processing you'd describe wouldn't work for streams of indeterminate length, like this:
DoubleStream.generate(Math::random).filter(d -> d > 0.9).findFirst();
Admittedly this example doesn't make much sense in practice, but the point is that rather than backed by a fixed-size collection,DoubleStream.generate() creates a potentially infinite stream. The only way to process this is element by element.
Reading up a bit on Java 8, I got to this blog post explaining a bit about streams and reduction of them, and when it would be possible to short-circuit the reduction. At the bottom it states:
Note in the case of findFirst or findAny we only need the first value which matches the predicate (although findAny is not guaranteed to return the first). However if the stream has no ordering then we’d expect findFirst to behave like findAny. The operations allMatch, noneMatch and anyMatch may not short-circuit the stream at all since it may take evaluating all the values to determine whether the operator is true or false. Thus an infinite stream using these may not terminate.
I get that findFirst or findAny may short-circuit the reduction, because as soon af you find an element, you don't need to process any further.
But why would this not be possible for allMatch, noneMatch and anyMatch? For allMatch, if you find one which doesn't match the predicate, you can stop processing. Same for none. And anyMatch especially doesn't make sense to me, as it it pretty much equal to findAny (except for what is returned)?
Saying that these three may not short-circuit, because it may take evaluating all the values, could also be said for findFirst/Any.
Is there some fundamental difference I'm missing? Am I not really understanding what is going on?
There's a subtle difference, because anyMatch family uses a predicate, while findAny family does not. Technically findAny() looks like anyMatch(x -> true) and anyMatch(pred) looks like filter(pred).findAny(). So here we have another issue. Consider we have a simple infinite stream:
Stream<Integer> s = Stream.generate(() -> 1);
So it's true that applying findAny() to such stream will always short-circuit and finish while applying anyMatch(pred) depends on the predicate. However let's filter our infinite stream:
Stream<Integer> s = Stream.generate(() -> 1).filter(x -> x < 0);
Is the resulting stream infinite as well? That's a tricky question. It actually contains no elements, but to determine this (for example, using .iterator().hasNext()) we have to check the infinite number of underlying stream elements, so this operation will never finish. I would call such stream an infinite as well. However using such stream both anyMatch and findAny will never finish:
Stream.generate(() -> 1).filter(x -> x < 0).anyMatch(x -> true);
Stream.generate(() -> 1).filter(x -> x < 0).findAny();
So findAny() is not guaranteed to finish either, it depends on the previous intermediate stream operations.
To conclude I would rate that blog-post as very misleading. In my opinion infinity stream behavior is better explained in official JavaDoc.
Answer Updated
I'd say the blog post is wrong when it says "findFirst or findAny we only need the first value which matches the predicate".
In the javadoc for allMatch(Predicate), anyMatch(Predicate), noneMatch(Predicate), findAny(), and findFirst():
This is a short-circuiting terminal operation.
However, note that findFirst and findAny doesn't have a Predicate. So they can both return immediately upon seeing the first/any value. The other 3 are conditional and may loop forever if condition never fires.
According to Oracle's Stream Documentation:
https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#StreamOps
A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
All five functions have the line:
This is a short-circuiting terminal operation.
In the description of the function.
When the javadoc says "may not short-circuit" it is merely pointing out that it is not a short circuit operation and depending on the values, the entire stream may be processed.
findFirst and findAny on the other hand, are guaranteed to short circuit since they never need to process the rest of the stream once they are satisfied.
anyMatch, noneMatch and allMatch return boolean values, so they may have to check all to prove the logic.
findFirst and findAny just care about finding the first they can and returning that.
Edit:
For a given dataset the Match methods are guaranteed to always return the same value, however the Find methods are not because the order may vary and affect which value is returned.
The short circuiting described is talking about the Find methods lacking consistency for a given dataset.
LongStream.range(0, Long.MAX_VALUE).allMatch(x -> x >= 0)
LongStream.range(0, Long.MAX_VALUE).allMatch(x -> x > 0)
The first one returns forever, the second one returns immediately
Why this code in java 8:
IntStream.range(0, 10)
.peek(System.out::print)
.limit(3)
.count();
outputs:
012
I'd expect it to output 0123456789, because peek preceeds limit.
It seems to me even more peculiar because of the fact that this:
IntStream.range(0, 10)
.peek(System.out::print)
.map(x -> x * 2)
.count();
outputs 0123456789 as expected (not 02481012141618).
P.S.: .count() here is used just to consume stream, it can be replaced with anything else
The most important thing to know about streams are that they do not contain elements themselves (like collections) but are working like a pipe whose values are lazily evaluated. That means that the statements that build up a stream - including mapping, filtering, or whatever - are not evaluated until the terminal operation runs.
In your first example, the stream tries to count from 0 to 9, one at each time doing the following:
print out the value
check whether 3 values are passed (if yes, terminate)
So you really get the output 012.
In your second example, the stream again counts from 0 to 9, one at each time doing the following:
print out the value
maping x to x*2, thus forwarding the double of the value to the next step
As you can see the output comes before the mapping and thus you get the result 0123456789. Try to switch the peek and the map calls. Then you will get your expected output.
From the docs:
limit() is a short-circuiting stateful intermediate operation.
map() is an intermediate operation
Again from the docs what that essentially means is that limit() will return a stream with x values from the stream it received.
An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result.
Streams are defined to do lazy processing. So in order to complete your count() operation it doesn’t need to look at the other items. Otherwise, it would be broken, as limit(…) is defined to be a proper way of processing infinite streams in a finite time (by not processing more than limit items).
In principle, it would be possible to complete your request without ever looking at the int values at all, as the operation chain limit(3).count() doesn’t need any processing of the previous operations (other than verifying whether the stream has at least 3 items).
Streams use lazy evaluation, the intermediate operations, i.e. peek() are not executed till the terminal operation runs.
For instances, the following code will just print 1 .In fact, as soon as the first element of the stream,1, will reach the terminal operation, findAny(), the stream execution will be ended.
Arrays.asList(1,2,3)
.stream()
.peek(System.out::print)
.filter((n)->n<3)
.findAny();
Viceversa, in the following example, will be printed 123. In fact the terminal operation, noneMatch(), needs to evaluate all the elements of the stream in order to make sure there is no match with its Predicate: n>4
Arrays.asList(1, 2, 3)
.stream()
.peek(System.out::print)
.noneMatch(n -> n > 4);
For future readers struggling to understand how the count method doesn't execute the peek method before it, I thought I add this additional note:
As per Java 9, the Java documentation for the count method states that:
An implementation may choose to not execute the stream pipeline
(either sequentially or in parallel) if it is capable of computing the
count directly from the stream source.
This means terminating the stream with count is no longer enough to ensure the execution of all previous steps, such as peek.