According to the OCP book one must avoid stateful operations otherwise known as stateful lambda expression. The definition provided in the book is 'a stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline.'
They provide an example where a parallel stream is used to add a fixed collection of numbers to a synchronized ArrayList using the .map() function.
The order in the arraylist is completely random and this should make one see that a stateful lambda expression produces unpredictable results in runtime. That's why its strongly recommended to avoid stateful operations when using parallel streams so as to remove any potential data side effects.
They don't show a stateless lambda expression that provides a solution to the same problem (adding numbers to a synchronized arraylist) and I still don't get what the problem is with using a map function to populate an empty synchronized arraylist with data... What is exactly the state that might change during the execution of a pipeline? Are they referring to the Arraylist itself? Like when another thread decides to add other data to the ArrayList when the parallelstream is still in the process adding the numbers and thus altering the eventual result?
Maybe someone can provide me with a better example that shows what a stateful lambda expression is and why it should be avoided. That would be very much appreciated.
Thank you
The first problem is this:
List<Integer> list = new ArrayList<>();
List<Integer> result = Stream.of(1, 2, 3, 4, 5, 6)
.parallel()
.map(x -> {
list.add(x);
return x;
})
.collect(Collectors.toList());
System.out.println(list);
You have no idea what the result will be here, since you are adding elements to a non-thread-safe collection ArrayList.
But even if you do:
List<Integer> list = Collections.synchronizedList(new ArrayList<>());
And perform the same operation the list has no predictable order. Multiple Threads add to this synchronized collection. By adding the synchronized collection you guarantee that all elements are added (as opposed to the plain ArrayList), but in which order they will be present in unknown.
Notice that list has no order guarantees what-so-ever, this is called processing order. While result is guaranteed to be: [1, 2, 3, 4, 5, 6] for this particular example.
Depending on the problem, you usually can get rid of the stateful operations; for your example returning the synchronized List would be:
Stream.of(1, 2, 3, 4, 5, 6)
.filter(x -> x > 2) // for example a filter is present
.collect(Collectors.collectingAndThen(Collectors.toList(),
Collections::synchronizedList));
To try to give an example, let's consider the following Consumer (note : the usefulness of such a function is not of the matter here) :
public static class StatefulConsumer implements IntConsumer {
private static final Integer ARBITRARY_THRESHOLD = 10;
private boolean flag = false;
private final List<Integer> list = new ArrayList<>();
#Override
public void accept(int value) {
if(flag){ // exit condition
return;
}
if(value >= ARBITRARY_THRESHOLD){
flag = true;
}
list.add(value);
}
}
It's a consumer that will add items to a List (let's not consider how to get back the list nor the thread safety) and has a flag (to represent the statefulness).
The logic behind this would be that once the threshold has been reached, the consumer should stop adding items.
What your book was trying to say was that because there is no guaranteed order in which the function will have to consume the elements of the Stream, the output is non-deterministic.
Thus, they advise you to only use stateless functions, meaning they will always produce the same result with the same input.
Here is an example where a stateful operation returns a different result each time:
public static void main(String[] args) {
Set<Integer> seen = new HashSet<>();
IntStream stream = IntStream.of(1, 2, 3, 1, 2, 3);
// Stateful lambda expression
IntUnaryOperator mapUniqueLambda = (int i) -> {
if (!seen.contains(i)) {
seen.add(i);
return i;
}
else {
return 0;
}
};
int sum = stream.parallel().map(mapUniqueLambda).peek(i -> System.out.println("Stream member: " + i)).sum();
System.out.println("Sum: " + sum);
}
In my case when I ran the code I got the following output:
Stream member: 1
Stream member: 0
Stream member: 2
Stream member: 3
Stream member: 1
Stream member: 2
Sum: 9
Why did I get 9 as the sum if I'm inserting into a hashset?
The answer: Different threads took different parts of the IntStream
For example values 1 & 2 managed to end up on different threads.
A stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline. On the
other hand, a stateless lambda expression is one whose result does
not depend on any state that might change during the execution of a
pipeline.
Source: OCP: Oracle Certified Professional Java SE 8 Programmer II Study Guide: Exam 1Z0-809by Jeanne Boyarsky, Scott Selikoff
List < Integer > data = Collections.synchronizedList(new ArrayList < > ());
Arrays.asList(1, 2, 3, 4, 5, 6, 7).parallelStream()
.map(i -> {
data.add(i);
return i;
}) // AVOID STATEFUL LAMBDA EXPRESSIONS!
.forEachOrdered(i -> System.out.print(i+" "));
System.out.println();
for (int e: data) {
System.out.print(e + " ");
Possible Output:
1 2 3 4 5 6 7
1 7 5 2 3 4 6
It strongly recommended that you avoid stateful operations when using
parallel streams, so as to remove any potential data side effects. In
fact, they should generally be avoided in serial streams wherever
possible, since they prevent your streams from taking advantage of
parallelization.
A stateful lambda expression is one whose result depends on any state that might change during the execution of a stream pipeline.
Let's understand this with an example here:
List<Integer> list = Arrays.asList(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);
List<Integer> result = new ArrayList<Integer>();
list.parallelStream().map(s -> {
synchronized (result) {
if (result.size() < 10) {
result.add(s);
}
}
return s;
}).forEach( e -> {});
System.out.println(result);
When you run this code 5 times, the output would/could be different all the time. Reason behind is here processing of Lambda expression inside map updates result array. Since here the result array depend on the size of that array for a particular sub stream, which would change every time this parallel stream would be called.
For better understanding of parallel stream:
Parallel computing involves dividing a problem into subproblems, solving those problems simultaneously (in parallel, with each subproblem running in a separate thread), and then combining the results of the solutions to the subproblems. When a stream executes in parallel, the Java runtime partitions the streams into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results.
Hope this helps!!!
Related
So I have 2000 records of company names. I could like to take first 50 names and concatenate it and save as a string and then append it to a new List .
Does anyone have any idea how can we achieve it using java8 ?
Can this be done using parallelstream api?
Currently I’m iteration over 2k records and appending the data to a string builder . Meanwhile after every 50th count I’m creating a new String builder . After every 50 record I add the string builder content to a list. Finally i get list With all the data.
Example: a1# , a2 till a2000
Final output: LiSt of String with
1st entry —> concatenation of a1 to a50,
2nd entry —> concatenation of a51 to a100
Code:
List<String> bulkEmails ; //Fetched from DB
int count = 50;
List<String> splitEmails = new ArrayList<>(); //Final output
StringBuilder builder = new StringBuilder(); // temp builder
for (String mail : bulkEmails) {
builder.append(mail).append(",");
count++;
//append concatenated 50mails, appends to finalOutput and then resets the counter
if (count == 50) {
splitEmails.add(builder.toString());
builder = new StringBuilder();
count = 0;
}
}
Suggestions are appreciated.
As I've said in the comments, 2K isn't really a massive data.
And there's an issue with executing this task in parallel.
Let's imagine this task of splitting the data into groups of 50 elements is running in parallel somehow and the first worker thread is being assigned with a chunk of data containing 523 elements, the second - 518 elements, and so on. And none of these chunks is a product 50. As a consequence of this, each thread would produce a partial result containing a group of size which differs from 50.
Depending on how you want to deal with such cases, there are different approaches on how to implement this functionality:
Join partial results as is. It implies that the final result would contain an arbitrary number of groups having size in range [1,49]. That is the simplest option to implement, and cheapest to execute. But, note that there could even the case when every resulting group is smaller than 50 be since you're not in control of the splitterator implementation (i.e. you can't dictate what would be large would be a chuck of data which particular thread would be assigned to work with). Regardless how strict/lenient your requirements are, that doesn't sound very nice.
The second option requires reordering the elements. While joining partial results produced by each thread in parallel, we can merge the two last groups produced by every thread to ensure that there would be only one group at most that differs in size from 50 in the final result.
If you're not OK with either joining partial results as is, or reordering the elements, that implies that this task isn't suitable for parallel execution because in case when the first thread produces a partial result containing a group of size smaller than 50 all the groups created by the second thread need to be rearranged. Which results in worse performance in parallel because of doing the same job twice.
The second thing we need to consider is that the operation of creating groups requires maintaining a state. Therefore, the right place for this transformation is inside a collector, where the stream is being consumed and mutable container of the collector gets updated.
Let's start with implementing a Collector which ignores the issue described above and joins partial results as is.
For that we can use static method Collector.of().
public static <T> Collector<T, ?, Deque<List<T>>> getGroupCollector(int groupSize) {
return Collector.of(
ArrayDeque::new,
(Deque<List<T>> deque, T next) -> {
if (deque.isEmpty() || deque.getLast().size() == groupSize) deque.add(new ArrayList<>());
deque.getLast().add(next);
},
(left, right) -> {
left.addAll(right);
return left;
}
);
}
Now let's implement a Collector which merges the two last groups produced in different threads (option 2 in the list above):
public static <T> Collector<T, ?, Deque<List<T>>> getGroupCollector(int groupSize) {
return Collector.of(
ArrayDeque::new,
(Deque<List<T>> deque, T next) -> {
if (deque.isEmpty() || deque.getLast().size() == groupSize) deque.add(new ArrayList<>());
deque.getLast().add(next);
},
(left, right) -> {
if (left.peekLast().size() < groupSize) {
List<T> leftLast = left.pollLast();
List<T> rightLast = right.peekLast();
int llSize = leftLast.size();
int rlSize = rightLast.size();
if (rlSize + llSize <= groupSize) {
rightLast.addAll(leftLast);
} else {
rightLast.addAll(leftLast.subList(0, groupSize - rlSize));
right.add(new ArrayList<>(leftLast.subList(groupSize - rlSize, llSize)));
}
}
left.addAll(right);
return left;
}
);
}
If you wish to implement a Collector which combiner function rearranges the partial results if needed (the very last option in the list), I'm leaving it to the OP/reader as an exercise.
Now let's use the collector defined above (the latest). Let's consider a stream of single-letter strings representing the characters of the English alphabet, and let's group size would be 5.
public static void main(String[] args) {
List<String> groups = IntStream.rangeClosed('A', 'Z')
.mapToObj(ch -> String.valueOf((char) ch))
.collect(getGroupCollector(5))
.stream()
.map(group -> String.join(",", group))
.collect(Collectors.toList());
groups.forEach(System.out::println);
}
Output:
A,B,C,D,E
F,G,H,I,J
K,L,M,N,O
P,Q,R,S,T
U,V,W,X,Y
Z
That's the sequential result. If you switch, the stream from sequential to parallel the contents of the groups would probably change, but the sizes would not be affected.
Also note that since there are effectively two independent streams chained together, you need to apply parrallel() twice to make the whole thing working in parallel.
i have to calculate the average of a Infinite Sequence using Stream API
Input:
Stream<Double> s = a,b,c,d ...
int interval = 3
Expected Result:
Stream<Double> result = avg(a,b,c), avg(d,e,f), ....
the result can be also an Iterator, or any other type
as long as it mantains the structure of an infinite list
of course what i written is pseudo code and doesnt run
There is a #Beta API termed mapWithIndex within Guava that could help here with certain assumption:
static Stream<Double> stepAverage(Stream<Double> stream, int step) {
return Streams.mapWithIndex(stream, (from, index) -> Map.entry(index, from))
.collect(Collectors.groupingBy(e -> (e.getKey() / step), TreeMap::new,
Collectors.averagingDouble(Map.Entry::getValue)))
.values().stream();
}
The assumption that it brings in is detailed in the documentation clearly(emphasized by me):
The resulting stream is efficiently splittable if and only if stream
was efficiently splittable and its underlying spliterator reported
Spliterator.SUBSIZED. This is generally the case if the underlying
stream comes from a data structure supporting efficient indexed random
access, typically an array or list.
This should work fine using vanilla Java
I'm using Stream#mapMulti and a Set external to the Stream to aggregate the doubles
As you see, I also used DoubleSummaryStatistics to count the average.
I could have use the traditional looping and summing then dividing but I found this way more explicit
Update:
I changed the Collection used from Set to List as a Set could cause unexpected behaviour
int step = 3;
List<Double> list = new ArrayList<>();
Stream<Double> averagesStream =
infiniteStream.mapMulti(((Double aDouble, Consumer<Double> doubleConsumer) -> {
list.add(aDouble);
if (list.size() == step) {
DoubleSummaryStatistics doubleSummaryStatistics = new DoubleSummaryStatistics();
list.forEach(doubleSummaryStatistics::accept);
list.clear();
doubleConsumer.accept(doubleSummaryStatistics.getAverage());
}
}));
I have the following sample code:
System.out.println(
"Result: " +
Stream.of(1, 2, 3)
.filter(i -> {
System.out.println(i);
return true;
})
.findFirst()
.get()
);
System.out.println("-----------");
System.out.println(
"Result: " +
Stream.of(1, 2, 3)
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.filter(i -> {
System.out.println(i);
return true;
})
.findFirst()
.get()
);
The output is as follows:
1
Result: 1
-----------
-1
0
1
0
1
2
1
2
3
Result: -1
From here I see that in first case stream really behaves lazily - we use findFirst() so once we have first element our filtering lambda is not invoked.
However, in second case which uses flatMaps we see that despite first element which fulfils the filter condition is found (it's just any first element as lambda always returns true) further contents of the stream are still being fed through filtering function.
I am trying to understand why it behaves like this rather than giving up after first element is calculated as in the first case.
Any helpful information would be appreciated.
TL;DR, this has been addressed in JDK-8075939 and fixed in Java 10 (and backported to Java 8 in JDK-8225328).
When looking into the implementation (ReferencePipeline.java) we see the method [link]
#Override
final void forEachWithCancel(Spliterator<P_OUT> spliterator, Sink<P_OUT> sink) {
do { } while (!sink.cancellationRequested() && spliterator.tryAdvance(sink));
}
which will be invoke for findFirst operation. The special thing to take care about is the sink.cancellationRequested() which allows to end the loop on the first match. Compare to [link]
#Override
public final <R> Stream<R> flatMap(Function<? super P_OUT, ? extends Stream<? extends R>> mapper) {
Objects.requireNonNull(mapper);
// We can do better than this, by polling cancellationRequested when stream is infinite
return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT | StreamOpFlag.NOT_SIZED) {
#Override
Sink<P_OUT> opWrapSink(int flags, Sink<R> sink) {
return new Sink.ChainedReference<P_OUT, R>(sink) {
#Override
public void begin(long size) {
downstream.begin(-1);
}
#Override
public void accept(P_OUT u) {
try (Stream<? extends R> result = mapper.apply(u)) {
// We can do better that this too; optimize for depth=0 case and just grab spliterator and forEach it
if (result != null)
result.sequential().forEach(downstream);
}
}
};
}
};
}
The method for advancing one item ends up calling forEach on the sub-stream without any possibility for earlier termination and the comment at the beginning of the flatMap method even tells about this absent feature.
Since this is more than just an optimization thing as it implies that the code simply breaks when the sub-stream is infinite, I hope that the developers soon prove that they “can do better than this”…
To illustrate the implications, while Stream.iterate(0, i->i+1).findFirst() works as expected, Stream.of("").flatMap(x->Stream.iterate(0, i->i+1)).findFirst() will end up in an infinite loop.
Regarding the specification, most of it can be found in the
chapter “Stream operations and pipelines” of the package specification:
…
Intermediate operations return a new stream. They are always lazy;
…
… Laziness also allows avoiding examining all the data when it is not necessary; for operations such as "find the first string longer than 1000 characters", it is only necessary to examine just enough strings to find one that has the desired characteristics without examining all of the strings available from the source. (This behavior becomes even more important when the input stream is infinite and not merely large.)
…
Further, some operations are deemed short-circuiting operations. An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
It’s clear that a short-circuiting operation doesn’t guaranty a finite time termination, e.g. when a filter doesn’t match any item the processing can’t complete, but an implementation which doesn’t support any termination in finite time by simply ignoring the short-circuiting nature of an operation is far off the specification.
The elements of the input stream are consumed lazily one by one. The first element, 1, is transformed by the two flatMaps into the stream -1, 0, 1, 0, 1, 2, 1, 2, 3, so that entire stream corresponds to just the first input element. The nested streams are eagerly materialized by the pipeline, then flattened, then fed to the filter stage. This explains your output.
The above does not stem from a fundamental limitation, but it would probably make things much more complicated to get full-blown laziness for nested streams. I suspect it would be an even greater challenge to make it performant.
For comparison, Clojure's lazy seqs get another layer of wrapping for each such level of nesting. Due to this design, the operations may even fail with StackOverflowError when nesting is exercised to the extreme.
With regard to breakage with infinite sub-streams, the behavior of flatMap becomes still more surprising when one throws in an intermediate (as opposed to terminal) short-circuiting operation.
While the following works as expected, printing out the infinite sequence of integers
Stream.of("x").flatMap(_x -> Stream.iterate(1, i -> i + 1)).forEach(System.out::println);
the following code prints out only the "1", but still does not terminate:
Stream.of("x").flatMap(_x -> Stream.iterate(1, i -> i + 1)).limit(1).forEach(System.out::println);
I cannot imagine a reading of the spec in which that were not a bug.
In my free StreamEx library I introduced the short-circuiting collectors. When collecting sequential stream with short-circuiting collector (like MoreCollectors.first()) exactly one element is consumed from the source. Internally it's implemented in quite dirty way: using a custom exception to break the control flow. Using my library your sample could be rewritten in this way:
System.out.println(
"Result: " +
StreamEx.of(1, 2, 3)
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.filter(i -> {
System.out.println(i);
return true;
})
.collect(MoreCollectors.first())
.get()
);
The result is the following:
-1
Result: -1
While JDK-8075939 has been fixed in Java 11 and backported to 10 and 8u222, there's still an edge case of flatMap() not being truly lazy when using Stream.iterator(): JDK-8267359, still present in Java 17.
This
Iterator<Integer> it =
Stream.of("a", "b")
.flatMap(s -> Stream
.of(1, 2, 3, 4)
.filter(i -> { System.out.println(i); return true; }))
.iterator();
it.hasNext(); // This consumes the entire flatmapped stream
it.next();
Prints
1
2
3
4
While this:
Iterator<Integer> it =
Stream.of("a", "b")
.flatMap(s -> Stream
.iterate(1, i -> i)
.filter(i -> { System.out.println(i); return true; }))
.iterator();
it.hasNext();
it.next();
Never terminates
Unfortunately .flatMap() is not lazy. However, a custom flatMap workaround is available here: Why .flatMap() is so inefficient (non lazy) in java 8 and java 9
Today I also stumbled up on this bug. Behavior is not so strait forward, cause simple case, like below, is working fine, but similar production code doesn't work.
stream(spliterator).map(o -> o).flatMap(Stream::of).flatMap(Stream::of).findAny()
For guys who cannot wait another couple years for migration to JDK-10 there is a alternative true lazy stream. It doesn't support parallel. It was dedicated for JavaScript translation, but it worked out for me, cause interface is the same.
StreamHelper is collection based, but it is easy to adapt Spliterator.
https://github.com/yaitskov/j4ts/blob/stream/src/main/java/javaemul/internal/stream/StreamHelper.java
I agree with other people this is a bug opened at JDK-8075939. And since it's still not fixed more than one year later. I would like to recommend you: abacus-common
N.println("Result: " + Stream.of(1, 2, 3).peek(N::println).first().get());
N.println("-----------");
N.println("Result: " + Stream.of(1, 2, 3)
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.peek(N::println).first().get());
// output:
// 1
// Result: 1
// -----------
// -1
// Result: -1
Disclosure: I'm the developer of abacus-common.
I have the following sample code:
System.out.println(
"Result: " +
Stream.of(1, 2, 3)
.filter(i -> {
System.out.println(i);
return true;
})
.findFirst()
.get()
);
System.out.println("-----------");
System.out.println(
"Result: " +
Stream.of(1, 2, 3)
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.filter(i -> {
System.out.println(i);
return true;
})
.findFirst()
.get()
);
The output is as follows:
1
Result: 1
-----------
-1
0
1
0
1
2
1
2
3
Result: -1
From here I see that in first case stream really behaves lazily - we use findFirst() so once we have first element our filtering lambda is not invoked.
However, in second case which uses flatMaps we see that despite first element which fulfils the filter condition is found (it's just any first element as lambda always returns true) further contents of the stream are still being fed through filtering function.
I am trying to understand why it behaves like this rather than giving up after first element is calculated as in the first case.
Any helpful information would be appreciated.
TL;DR, this has been addressed in JDK-8075939 and fixed in Java 10 (and backported to Java 8 in JDK-8225328).
When looking into the implementation (ReferencePipeline.java) we see the method [link]
#Override
final void forEachWithCancel(Spliterator<P_OUT> spliterator, Sink<P_OUT> sink) {
do { } while (!sink.cancellationRequested() && spliterator.tryAdvance(sink));
}
which will be invoke for findFirst operation. The special thing to take care about is the sink.cancellationRequested() which allows to end the loop on the first match. Compare to [link]
#Override
public final <R> Stream<R> flatMap(Function<? super P_OUT, ? extends Stream<? extends R>> mapper) {
Objects.requireNonNull(mapper);
// We can do better than this, by polling cancellationRequested when stream is infinite
return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT | StreamOpFlag.NOT_SIZED) {
#Override
Sink<P_OUT> opWrapSink(int flags, Sink<R> sink) {
return new Sink.ChainedReference<P_OUT, R>(sink) {
#Override
public void begin(long size) {
downstream.begin(-1);
}
#Override
public void accept(P_OUT u) {
try (Stream<? extends R> result = mapper.apply(u)) {
// We can do better that this too; optimize for depth=0 case and just grab spliterator and forEach it
if (result != null)
result.sequential().forEach(downstream);
}
}
};
}
};
}
The method for advancing one item ends up calling forEach on the sub-stream without any possibility for earlier termination and the comment at the beginning of the flatMap method even tells about this absent feature.
Since this is more than just an optimization thing as it implies that the code simply breaks when the sub-stream is infinite, I hope that the developers soon prove that they “can do better than this”…
To illustrate the implications, while Stream.iterate(0, i->i+1).findFirst() works as expected, Stream.of("").flatMap(x->Stream.iterate(0, i->i+1)).findFirst() will end up in an infinite loop.
Regarding the specification, most of it can be found in the
chapter “Stream operations and pipelines” of the package specification:
…
Intermediate operations return a new stream. They are always lazy;
…
… Laziness also allows avoiding examining all the data when it is not necessary; for operations such as "find the first string longer than 1000 characters", it is only necessary to examine just enough strings to find one that has the desired characteristics without examining all of the strings available from the source. (This behavior becomes even more important when the input stream is infinite and not merely large.)
…
Further, some operations are deemed short-circuiting operations. An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
It’s clear that a short-circuiting operation doesn’t guaranty a finite time termination, e.g. when a filter doesn’t match any item the processing can’t complete, but an implementation which doesn’t support any termination in finite time by simply ignoring the short-circuiting nature of an operation is far off the specification.
The elements of the input stream are consumed lazily one by one. The first element, 1, is transformed by the two flatMaps into the stream -1, 0, 1, 0, 1, 2, 1, 2, 3, so that entire stream corresponds to just the first input element. The nested streams are eagerly materialized by the pipeline, then flattened, then fed to the filter stage. This explains your output.
The above does not stem from a fundamental limitation, but it would probably make things much more complicated to get full-blown laziness for nested streams. I suspect it would be an even greater challenge to make it performant.
For comparison, Clojure's lazy seqs get another layer of wrapping for each such level of nesting. Due to this design, the operations may even fail with StackOverflowError when nesting is exercised to the extreme.
With regard to breakage with infinite sub-streams, the behavior of flatMap becomes still more surprising when one throws in an intermediate (as opposed to terminal) short-circuiting operation.
While the following works as expected, printing out the infinite sequence of integers
Stream.of("x").flatMap(_x -> Stream.iterate(1, i -> i + 1)).forEach(System.out::println);
the following code prints out only the "1", but still does not terminate:
Stream.of("x").flatMap(_x -> Stream.iterate(1, i -> i + 1)).limit(1).forEach(System.out::println);
I cannot imagine a reading of the spec in which that were not a bug.
In my free StreamEx library I introduced the short-circuiting collectors. When collecting sequential stream with short-circuiting collector (like MoreCollectors.first()) exactly one element is consumed from the source. Internally it's implemented in quite dirty way: using a custom exception to break the control flow. Using my library your sample could be rewritten in this way:
System.out.println(
"Result: " +
StreamEx.of(1, 2, 3)
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.filter(i -> {
System.out.println(i);
return true;
})
.collect(MoreCollectors.first())
.get()
);
The result is the following:
-1
Result: -1
While JDK-8075939 has been fixed in Java 11 and backported to 10 and 8u222, there's still an edge case of flatMap() not being truly lazy when using Stream.iterator(): JDK-8267359, still present in Java 17.
This
Iterator<Integer> it =
Stream.of("a", "b")
.flatMap(s -> Stream
.of(1, 2, 3, 4)
.filter(i -> { System.out.println(i); return true; }))
.iterator();
it.hasNext(); // This consumes the entire flatmapped stream
it.next();
Prints
1
2
3
4
While this:
Iterator<Integer> it =
Stream.of("a", "b")
.flatMap(s -> Stream
.iterate(1, i -> i)
.filter(i -> { System.out.println(i); return true; }))
.iterator();
it.hasNext();
it.next();
Never terminates
Unfortunately .flatMap() is not lazy. However, a custom flatMap workaround is available here: Why .flatMap() is so inefficient (non lazy) in java 8 and java 9
Today I also stumbled up on this bug. Behavior is not so strait forward, cause simple case, like below, is working fine, but similar production code doesn't work.
stream(spliterator).map(o -> o).flatMap(Stream::of).flatMap(Stream::of).findAny()
For guys who cannot wait another couple years for migration to JDK-10 there is a alternative true lazy stream. It doesn't support parallel. It was dedicated for JavaScript translation, but it worked out for me, cause interface is the same.
StreamHelper is collection based, but it is easy to adapt Spliterator.
https://github.com/yaitskov/j4ts/blob/stream/src/main/java/javaemul/internal/stream/StreamHelper.java
I agree with other people this is a bug opened at JDK-8075939. And since it's still not fixed more than one year later. I would like to recommend you: abacus-common
N.println("Result: " + Stream.of(1, 2, 3).peek(N::println).first().get());
N.println("-----------");
N.println("Result: " + Stream.of(1, 2, 3)
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.peek(N::println).first().get());
// output:
// 1
// Result: 1
// -----------
// -1
// Result: -1
Disclosure: I'm the developer of abacus-common.
I am trying to calculate the multiplication of a value using the previous two values using java 8's stream. I want to call a function that will return an array/list/collection. I am creating a List and adding 1,2 to it.
Let's say the list name is result.
public static void main (String[] args) {
List<Integer> result = new ArrayList<Integer>();
result.add(1);
result.add(2);
int n = 5; //n can be anything, choosing 5 for this example
res(n, result);
//print result which should be [1, 2, 2, 4, 8]
}
public static List<Integer> res(int n, List<Integer> result ) {
result.stream()
.limit(n)
.reduce(identity, (base,index) -> base);
//return result;
}
Now the issue is trying to try to pass result into the stream to keep updating the list with the new values using the stream. According to the java tutorials, it is possible, albeit inefficient.
"If your reduce operation involves adding elements to a collection, then every time your accumulator function processes an element, it creates a new collection that includes the element, which is inefficient."
Do I need to use the optional third parameter, BinaryOperator combiner, to combine the list + result??
<U> U reduce(U identity,
BiFunction<U,? super T,U> accumulator,
BinaryOperator<U> combiner)
In short; I want to pass a list with two values and have the function find the multiplication of the first two values (1,2), add it to the list, and find the multiplication of the last two values (2,2), and add it to the list, and until the stream hits the limit.
It looks like you're trying to implement a recurrence relation. The reduce method applies some function to a bunch of pre-existing values in the stream. You can't use reduce and take an intermediate result from the reducer function and "feed it back" into the stream, which is what you need to do in order to implement a recurrence relation.
The way to implement a recurrence relation using streams is to use one of the streams factory methods Stream.generate or Stream.iterate. The iterate factory seems to suggest the most obvious approach. The state that needs to be kept for each application of the recurrence function requires two ints in your example, so unfortunately we have to create an object to hold these for us:
static class IntPair {
final int a, b;
IntPair(int a_, int b_) {
a = a_; b = b_;
}
}
Using this state object you can create a stream that implements the recurrence that you want:
Stream.iterate(new IntPair(1, 2), p -> new IntPair(p.b, p.a * p.b))
Once you have such a stream, it's a simple matter to collect the values into a list:
List<Integer> output =
Stream.iterate(new IntPair(1, 2), p -> new IntPair(p.b, p.a * p.b))
.limit(5)
.map(pair -> pair.a)
.collect(Collectors.toList());
System.out.println(output);
[1, 2, 2, 4, 8]
As an aside, you can use the same technique to generate the Fibonacci sequence. All you do is provide a different starting value and iteration function:
Stream.iterate(new IntPair(0, 1), p -> new IntPair(p.b, p.a + p.b))
You could also implement a similar recurrence relation using Stream.generate. This will also require a helper class. The helper class implements Supplier of the result value but it also needs to maintain state. It thus needs to be mutable, which is kind of gross in my book. The iteration function also needs to be baked into the generator object. This makes it less flexible than the IntPair object, which can be used for creating arbitrary recurrences.
Just for completeness, here is a solution which does not need an additional class.
List<Integer> output = Stream.iterate(
(ToIntFunction<IntBinaryOperator>)f -> f.applyAsInt(1, 2),
prev -> f -> prev.applyAsInt((a, b) -> f.applyAsInt(b, a*b) )
)
.limit(9).map(pair -> pair.applyAsInt((a, b)->a))
.collect(Collectors.toList());
This is a functional approach which doesn’t need an intermediate value storage. However, since Java is not a functional programming language and doesn’t have optimizations for such a recursive function definition, this is not recommended for larger streams.
Since for this example a larger stream would overflow numerically anyway and the calculation is cheap, this approach works. But for other use cases you will surely prefer a storage object when solving such a problem with plain Java (as in Stuart Marks’ answer)