Reading up a bit on Java 8, I got to this blog post explaining a bit about streams and reduction of them, and when it would be possible to short-circuit the reduction. At the bottom it states:
Note in the case of findFirst or findAny we only need the first value which matches the predicate (although findAny is not guaranteed to return the first). However if the stream has no ordering then we’d expect findFirst to behave like findAny. The operations allMatch, noneMatch and anyMatch may not short-circuit the stream at all since it may take evaluating all the values to determine whether the operator is true or false. Thus an infinite stream using these may not terminate.
I get that findFirst or findAny may short-circuit the reduction, because as soon af you find an element, you don't need to process any further.
But why would this not be possible for allMatch, noneMatch and anyMatch? For allMatch, if you find one which doesn't match the predicate, you can stop processing. Same for none. And anyMatch especially doesn't make sense to me, as it it pretty much equal to findAny (except for what is returned)?
Saying that these three may not short-circuit, because it may take evaluating all the values, could also be said for findFirst/Any.
Is there some fundamental difference I'm missing? Am I not really understanding what is going on?
There's a subtle difference, because anyMatch family uses a predicate, while findAny family does not. Technically findAny() looks like anyMatch(x -> true) and anyMatch(pred) looks like filter(pred).findAny(). So here we have another issue. Consider we have a simple infinite stream:
Stream<Integer> s = Stream.generate(() -> 1);
So it's true that applying findAny() to such stream will always short-circuit and finish while applying anyMatch(pred) depends on the predicate. However let's filter our infinite stream:
Stream<Integer> s = Stream.generate(() -> 1).filter(x -> x < 0);
Is the resulting stream infinite as well? That's a tricky question. It actually contains no elements, but to determine this (for example, using .iterator().hasNext()) we have to check the infinite number of underlying stream elements, so this operation will never finish. I would call such stream an infinite as well. However using such stream both anyMatch and findAny will never finish:
Stream.generate(() -> 1).filter(x -> x < 0).anyMatch(x -> true);
Stream.generate(() -> 1).filter(x -> x < 0).findAny();
So findAny() is not guaranteed to finish either, it depends on the previous intermediate stream operations.
To conclude I would rate that blog-post as very misleading. In my opinion infinity stream behavior is better explained in official JavaDoc.
Answer Updated
I'd say the blog post is wrong when it says "findFirst or findAny we only need the first value which matches the predicate".
In the javadoc for allMatch(Predicate), anyMatch(Predicate), noneMatch(Predicate), findAny(), and findFirst():
This is a short-circuiting terminal operation.
However, note that findFirst and findAny doesn't have a Predicate. So they can both return immediately upon seeing the first/any value. The other 3 are conditional and may loop forever if condition never fires.
According to Oracle's Stream Documentation:
https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#StreamOps
A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
All five functions have the line:
This is a short-circuiting terminal operation.
In the description of the function.
When the javadoc says "may not short-circuit" it is merely pointing out that it is not a short circuit operation and depending on the values, the entire stream may be processed.
findFirst and findAny on the other hand, are guaranteed to short circuit since they never need to process the rest of the stream once they are satisfied.
anyMatch, noneMatch and allMatch return boolean values, so they may have to check all to prove the logic.
findFirst and findAny just care about finding the first they can and returning that.
Edit:
For a given dataset the Match methods are guaranteed to always return the same value, however the Find methods are not because the order may vary and affect which value is returned.
The short circuiting described is talking about the Find methods lacking consistency for a given dataset.
LongStream.range(0, Long.MAX_VALUE).allMatch(x -> x >= 0)
LongStream.range(0, Long.MAX_VALUE).allMatch(x -> x > 0)
The first one returns forever, the second one returns immediately
Related
I have run the following code in Eclipse:
Stream.generate(() -> "Elsa")
.filter(n -> n.length() ==4)
.sorted()
.limit(2)
.forEach(System.out::println);
The output is:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
What I was expecting since the limit is two:
Elsa
Elsa
Can someone please explain why this is an infinite stream?
The first thing is that Stream::generate creates an infinite stream. That's why the stream is initially infinite.
You limit the stream to two elements by using Stream::limit, which would make it finite.
However, the problem is that you call sorted(), which tries to consume the whole stream. You need to limit the stream before you sort:
Stream.generate(() -> "Elsa")
.filter(n -> n.length() == 4)
.limit(2)
.sorted()
.forEach(System.out::println);
The documentation says that Stream::sorted() "is a stateful intermediate operation". The Streams documentation about a stateful intermediate operation explains it very well:
Stateful operations may need to process the entire input before producing a result. For example, one cannot produce any results from sorting a stream until one has seen all elements of the stream.
Emphasis mine.
There it is. Also note that for all Stream operations, their operation type is mentioned in the Javadocs.
Can someone please explain why this is an infinite stream?
Because the javadoc says that is precisely what Stream.generate() creates:
Returns an infinite sequential unordered stream where each element is generated by the provided Supplier
Then when you combine that with sorted(), you tell it to start a sort on an infinite sequence which will obviously cause the JVM to run out of memory.
How do elements of a stream go thought the stream itself? Is it like it takes 1 element and passes it thought all functions (map then sort then collect) and then takes second elements and repeats the cycle or is it like it takes all elements and maps them then sorts and finally collects?
new ArrayList<Integer>().stream()
.map(x -> x.byteValue())
.sorted()
.collect(Collectors.toList());
It depends entirely on the stream. It is usually evaluated lazily, which means it takes it one element at a time, but under certain conditions it needs to get all the elements before it continues to the next step. For example, consider the following code:
IntStream.generate(() -> (int) (Math.random() * 100))
.limit(20)
.filter(i -> i % 2 == 0)
.sorted()
.forEach(System.out::println);
This stream generates random numbers from 0 to 99, limited to 20 elements, after which it filters the numbers by checking wether or not they are even, if they are, they continue. Until now, it's done one element at a time. The change comes when you request a sorting of the stream. The sorted() method sorts the stream by the natural ordering of the elements, or by a provided comparator. For you to sort something you need access to all elements, because you don't know the last element's value until you get it. It could be the first element after you sort it. So this method waits for the entire stream, sorts it and returns the sorted stream. After that this code just prints the sorted stream one element at a time.
That depends on the actual Streamimplementation. This mostly applies to parallel streams, because spliterators tend to chunk the amount of data and you don't know which element will be process when.
In general, a stream goes through each element in order (but doesn't have to). The simplest way to check this behaviour is to put in some breakpoints and see when they actually hit.
Also, certain operations may wait until all prior operations are executed (namely collet())
I advise to check the javadoc and read it carefully, because it gives away enough hints to get an expectation.
something like this, yes.
if you have a stream of integers let's say 1,2,3,4,5 and you do some operations on it, let's say stream().map(x -> x*3).filter(x -> x%2==0).findFirst()
it will first take the first value (1), it will be multiplied by 3, and then it will check if it's even.
Because it's not, it will take the second one (2), multiply by 3 (=6), check if it is even (it is), find first.
this will be the first one and now it stops and returns.
Which means the other integers from the stream won't be evaluated (multiplied and checked if even) as it is not necessary
I found a quiz about Java 8 Stream API of peek method as below
Arrays.asList("Fred", "Jim", "Sheila")
.stream()
.peek(System.out::println)
.allMatch(s -> s.startsWith("F"));
The output is
Fred
Jim
I am confused how this stream works? My expected result should be
Fred
Jim
Sheila
The peek() method is an intermediate operation and it processes each element in Stream. Can anyone explain me this.
It's a stream optimization known as short-circuiting. Essentially, what happens is that allMatch prevents the execution of unnecessary intermediate operations on the stream, because there is no point in performing them when the final result is known.
It's as though this happened:
take"Fred"
peek("Fred")
evaluate("Fred".startsWith("F"))
decide whether the result of allMatch() is known for sure: Not yet
take"Jim"
peek("Jim")
evaluate("Jim".startsWith("F"))
decide whether the result of allMatch() is known for sure: Yes
When "Jim".startsWith("F") is evaluated, the result of allMatch(s -> s.startsWith("F")) is known for certain. It doesn't matter what values come in the pipeline after "Jim", we know that all values start with "F" is false
This is not specific to the peek/allMatch combination, there are multiple intermediate and terminal short-circuiting operations. java.util.stream package's docs state:
Further, some operations are deemed short-circuiting operations. An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
Extend this to finite streams, and short-circuiting operations obviate the execution of unnecessary pipeline steps, as in the case of your example.
Arrays.asList("Fred", "Jim", "Sheila")
.stream()
.peek(System.out::println)
.allMatch(s -> s.startsWith("F"));
First time thru, Fred is printed. It matches so
Second time thru, Jim is printed. It doesn't match so allMatch
terminates because "All didn't match"
So the last item was not consumed from the stream.
The docs for the peek method say (emphasis mine):
Returns a stream consisting of the elements of this stream, additionally performing the provided action on each element as elements are consumed from the resulting stream.
So in this case, peek doesn't see "Sheila" because that value is not consumed from the stream. As soon as "Jim" was consumed, the result of .allMatch(s -> s.startsWith("F")) is already known to be false, so there is no need to consume any more elements from the stream.
As per Java Doc Of allMatch():
Returns whether all elements of this stream match the provided predicate.
May not evaluate the predicate on all elements if not necessary for
determining the result. If the stream is empty then {#code true} is
returned and the predicate is not evaluated.
#apiNote
This method evaluates the universal quantification of the
predicate over the elements of the stream (for all x P(x)). If the
stream is empty, the quantification is said to be vacuously
satisfied and is always {#code true} (regardless of P(x)).
predicate to apply to elements of this stream
#return {#code true} if either all elements of the stream match the
provided predicate or the stream is empty, otherwise {#code false}
In your case:
1-
p(x) : s -> s.startsWith("F")
X : "Fred"
result : X P(X) = true
2-
p(x) : s -> s.startsWith("F")
X : "Jim"
result : X P(X) = false
No further evaluation will take place, because X P(X) = false
boolean result = Arrays.asList("Fred", "Finda", "Fish")
.stream()
.peek(System.out::println)
.allMatch(s -> s.startsWith("F"));
System.out.println("Result "+result);
Output is :
Fred
Finda
Fish
Result true
Here stream processed completely because xP(x) = true from each element
I am reading about Java streams' short-circuiting operations and found in some articles that skip() is a short-circuiting operation.
In another article they didn't mention skip() as a short-circuiting operation.
Now I am confused; is skip() a short-circuiting operation or not?
From the java doc under the "Stream operations and pipelines" section :
An
intermediate operation is short-circuiting if, when presented with
infinite input, it may produce a finite stream as a result. A terminal
operation is short-circuiting if, when presented with infinite input,
it may terminate in finite time.
Emphasis mine.
if you were to call skip on an infinite input it won't produce a finite stream hence not a short-circuiting operation.
The only short-circuiting intermediate operation in JDK8 is limit as it allows computations on infinite streams to complete in finite time.
Example:
if you were to execute this program with the use of skip:
String[] skip = Stream.generate(() -> "test") // returns an infinite stream
.skip(20)
.toArray(String[]::new);
it will not produce a finite stream hence you would eventually end up with something along the lines of "java.lang.OutOfMemoryError: Java heap space".
whereas if you were to execute this program with the use of limit, it will cause the computation to finish in a finite time:
String[] limit = Stream.generate(() -> "test") // returns an infinite stream
.limit(20)
.toArray(String[]::new);
Just want to add my two cents here, this idea in general of a short-circuiting a stream is infinitely complicated (at least to me and at least in the sense that I have to scratch my head twice usually). I will get to skip at the end of the answer btw.
Let's take this for example:
Stream.generate(() -> Integer.MAX_VALUE);
This is an infinite stream, we can all agree on this. Let's short-circuit it via an operation that is documented to be as such (unlike skip):
Stream.generate(() -> Integer.MAX_VALUE).anyMatch(x -> true);
This works nicely, how about adding a filter:
Stream.generate(() -> Integer.MAX_VALUE)
.filter(x -> x < 100) // well sort of useless...
.anyMatch(x -> true);
What will happen here? Well, this never finishes, even if there is a short-circuiting operation like anyMatch - but it's never reached to actually short-circuit anything.
On the other hand, filter is not a short-circuiting operation, but you can make it as such (just as an example):
someList.stream()
.filter(x -> {
if(x > 3) throw AssertionError("Just because");
})
Yes, it's ugly, but it's short-circuiting... That's how we (emphases on we, since lots of people, disagree) implement short-circuiting reduce - throw an Exception that has no stack traces.
In java-9 there was an addition of another intermediate operation that is short-circuiting: takeWhile that acts sort of like limit but for a certain condition.
And to be fair, the bulk of the answer about skip was an already give by Aomine, but the most simple answer is that it is not documented as such. And in general (there are cases when documentation is corrected), but that is the number one indication you should look at. See limit and takeWhile for example that clearly says:
This is a short-circuiting stateful intermediate operation
Why this code in java 8:
IntStream.range(0, 10)
.peek(System.out::print)
.limit(3)
.count();
outputs:
012
I'd expect it to output 0123456789, because peek preceeds limit.
It seems to me even more peculiar because of the fact that this:
IntStream.range(0, 10)
.peek(System.out::print)
.map(x -> x * 2)
.count();
outputs 0123456789 as expected (not 02481012141618).
P.S.: .count() here is used just to consume stream, it can be replaced with anything else
The most important thing to know about streams are that they do not contain elements themselves (like collections) but are working like a pipe whose values are lazily evaluated. That means that the statements that build up a stream - including mapping, filtering, or whatever - are not evaluated until the terminal operation runs.
In your first example, the stream tries to count from 0 to 9, one at each time doing the following:
print out the value
check whether 3 values are passed (if yes, terminate)
So you really get the output 012.
In your second example, the stream again counts from 0 to 9, one at each time doing the following:
print out the value
maping x to x*2, thus forwarding the double of the value to the next step
As you can see the output comes before the mapping and thus you get the result 0123456789. Try to switch the peek and the map calls. Then you will get your expected output.
From the docs:
limit() is a short-circuiting stateful intermediate operation.
map() is an intermediate operation
Again from the docs what that essentially means is that limit() will return a stream with x values from the stream it received.
An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result.
Streams are defined to do lazy processing. So in order to complete your count() operation it doesn’t need to look at the other items. Otherwise, it would be broken, as limit(…) is defined to be a proper way of processing infinite streams in a finite time (by not processing more than limit items).
In principle, it would be possible to complete your request without ever looking at the int values at all, as the operation chain limit(3).count() doesn’t need any processing of the previous operations (other than verifying whether the stream has at least 3 items).
Streams use lazy evaluation, the intermediate operations, i.e. peek() are not executed till the terminal operation runs.
For instances, the following code will just print 1 .In fact, as soon as the first element of the stream,1, will reach the terminal operation, findAny(), the stream execution will be ended.
Arrays.asList(1,2,3)
.stream()
.peek(System.out::print)
.filter((n)->n<3)
.findAny();
Viceversa, in the following example, will be printed 123. In fact the terminal operation, noneMatch(), needs to evaluate all the elements of the stream in order to make sure there is no match with its Predicate: n>4
Arrays.asList(1, 2, 3)
.stream()
.peek(System.out::print)
.noneMatch(n -> n > 4);
For future readers struggling to understand how the count method doesn't execute the peek method before it, I thought I add this additional note:
As per Java 9, the Java documentation for the count method states that:
An implementation may choose to not execute the stream pipeline
(either sequentially or in parallel) if it is capable of computing the
count directly from the stream source.
This means terminating the stream with count is no longer enough to ensure the execution of all previous steps, such as peek.