The question is about java.util.stream.Stream.reduce(U identity,BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner) method.
One of the requirements is that the combiner function must be compatible with the accumulator function; for all u and t, the following must hold:
combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t) (*)
If the combiner and accumulator are the same, the above equality is automatically true.
A BinaryOperator is actually extending BiFunction, therefore I can use it when BiFunction is required. If U and T are identical, the following is always legal:
operator<T> op = (x,y) -> something;
stream.reduce(id, op, op);
Of course, one cannot always use the combiner as acumulator since, in the general case, they serve for different purposes and are different Java types.
My question
Is there an example of stream reduction with distinct combiner and accumulator?
Also, I'm not interested in trivial examples, but natural examples that I can encounter in practice while doing reduction on parallel streams.
For trivial examples, there are many tutorials, like this one
Why am I asking this question
Basically, the reason this reduction method exists is for parallel streams. It seems to me the condition (*) is so strong that, in practice, it renders this reduction useless since rarely the reduction operations fulfill it.
If the combiner and accumulator are the same? You are confusing things here.
accumulator transforms from X to Y for example (using the identity), while combiner merges two Y into one. Also notice that one is a BiFunction and the other one is a BinaryOperator (which is actually a BiFunction<T, T, T>).
Is there an example of stream reduction with distinct combiner and accumulator?
These look pretty different to me:
Stream.of("1", "2")
.reduce(0, (x, y) -> x + y.length(), Integer::sum);
I think you might be confused with things like:
Stream.of("1", "2")
.reduce("", String::concat, String::concat);
How is it possible to do?
BiFunction<String, String, String> bi = String::concat;
Well there is a hint here.
EDIT
Addressing the part where "different" means different operations, accumulator might sum, while accumulator might multiply. This is exactly what the rule :
combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t)
is about, to protected itself from two separate associative functions, but different operations. Let's take an example of two lists (equal, but with different order). This, btw, would be a lot funner with a Set::of from java-9 that adds an internal randomization, so theoretically for the same exact input, you would get different result on the same VM from run to run. But to keep it simple:
List.of("a", "bb", "ccc", "dddd");
List.of("dddd", "a", "bb", "ccc");
And we want to perform:
....stream()
.parallel()
.reduce(0,
(x, y) -> x + y.length(),
(x, y) -> x * y);
Under the current implementation, this will yield the same result for both lists; but that is an implementation artifact.
There is nothing stopping an internal implementation in saying: "I will split the list to the smallest chunk possible, but not smaller than two elements in each of them". In such a case, this could have been translated to these splits:
["a", "bb"] ["ccc", "dddd"]
["dddd", "a" ] ["bb" , "ccc" ]
Now, "accumulate" those splits:
0 + "a".length = 1 ; 1 + "bb".length = 3 // thus chunk result is 3
0 + "ccc".length = 3 ; 3 + "dddd".length = 7 // thus chunk result is 7
Now we "combine" these chunks: 3 * 7 = 21.
I am pretty sure you already see that the second list in such a scenario would result in 25; as such different operations in the accumulator and combiner can result in wrong results.
So, here's are a few examples. Some of these may count as "trivial", particularly where there's already a function to do it for you.
An example where T and U are the same
These are quite difficult to come up with, and are a bit contrived, as they generally involve assuming that the elements of stream and the object being accumulated have different meanings, even though they have the same type.
Counting
If we have a stream of integers, we could count them using reduce:
stream.reduce(0, (count, item) -> count+1, (a, b) -> a+b);
Obviously, we could just use stream.count() here, but I'm willing to bet count uses the 3 argument version of reduce internally.
An example where T and U are different
This gives us quite a lot of freedom, and obviously, the accumulator and combiner are never going to be the same here, as they have different types.
One of the most common ways we may want to aggregate is gathering into a collection. We could use reduce for that, but since in Java collection types are typically mutable, using collect will generally be more efficient. This rule applies generally: if the result type mutable, use collect rather than reduce.
Determining the range of a stream of numbers
class Range {
static Range NONE = new Range(Double.NaN, Double.NaN);
final double min, max;
static Range of(double min, double max) {
if(Double.isNaN(min) || Double.isNaN(max) || min>max) {
throw new IllegalArgumentException();
}
return new Range(min, max);
}
private Range(double min, double max) {
this.min = min;
this.max = max;
}
boolean contains(double value) {
return this!=Range.NONE && min<=value && max>=value;
}
boolean spans(Range other) {
return this==other
|| other==Range.NONE
|| (contains(other.min) && contains(other.max));
}
}
Range range = streamOfDoubles.reduce(
Range.NONE,
(range, value) -> {
if(range==Range.NONE)
return Range.of(value, value);
else if(range.contains(value))
return range;
else
return Range.of(Math.min(value, range.min), Math.max(value, range.max));
},
(a, b) -> {
if(b.spans(a))
return b;
else if(a.spans(b))
return a;
else
return Range.of(Math.min(a.min, b.min), Math.max(a.max, b.max));
}
);
Related
I have the following sample code:
System.out.println(
"Result: " +
Stream.of(1, 2, 3)
.filter(i -> {
System.out.println(i);
return true;
})
.findFirst()
.get()
);
System.out.println("-----------");
System.out.println(
"Result: " +
Stream.of(1, 2, 3)
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.filter(i -> {
System.out.println(i);
return true;
})
.findFirst()
.get()
);
The output is as follows:
1
Result: 1
-----------
-1
0
1
0
1
2
1
2
3
Result: -1
From here I see that in first case stream really behaves lazily - we use findFirst() so once we have first element our filtering lambda is not invoked.
However, in second case which uses flatMaps we see that despite first element which fulfils the filter condition is found (it's just any first element as lambda always returns true) further contents of the stream are still being fed through filtering function.
I am trying to understand why it behaves like this rather than giving up after first element is calculated as in the first case.
Any helpful information would be appreciated.
TL;DR, this has been addressed in JDK-8075939 and fixed in Java 10 (and backported to Java 8 in JDK-8225328).
When looking into the implementation (ReferencePipeline.java) we see the method [link]
#Override
final void forEachWithCancel(Spliterator<P_OUT> spliterator, Sink<P_OUT> sink) {
do { } while (!sink.cancellationRequested() && spliterator.tryAdvance(sink));
}
which will be invoke for findFirst operation. The special thing to take care about is the sink.cancellationRequested() which allows to end the loop on the first match. Compare to [link]
#Override
public final <R> Stream<R> flatMap(Function<? super P_OUT, ? extends Stream<? extends R>> mapper) {
Objects.requireNonNull(mapper);
// We can do better than this, by polling cancellationRequested when stream is infinite
return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT | StreamOpFlag.NOT_SIZED) {
#Override
Sink<P_OUT> opWrapSink(int flags, Sink<R> sink) {
return new Sink.ChainedReference<P_OUT, R>(sink) {
#Override
public void begin(long size) {
downstream.begin(-1);
}
#Override
public void accept(P_OUT u) {
try (Stream<? extends R> result = mapper.apply(u)) {
// We can do better that this too; optimize for depth=0 case and just grab spliterator and forEach it
if (result != null)
result.sequential().forEach(downstream);
}
}
};
}
};
}
The method for advancing one item ends up calling forEach on the sub-stream without any possibility for earlier termination and the comment at the beginning of the flatMap method even tells about this absent feature.
Since this is more than just an optimization thing as it implies that the code simply breaks when the sub-stream is infinite, I hope that the developers soon prove that they “can do better than this”…
To illustrate the implications, while Stream.iterate(0, i->i+1).findFirst() works as expected, Stream.of("").flatMap(x->Stream.iterate(0, i->i+1)).findFirst() will end up in an infinite loop.
Regarding the specification, most of it can be found in the
chapter “Stream operations and pipelines” of the package specification:
…
Intermediate operations return a new stream. They are always lazy;
…
… Laziness also allows avoiding examining all the data when it is not necessary; for operations such as "find the first string longer than 1000 characters", it is only necessary to examine just enough strings to find one that has the desired characteristics without examining all of the strings available from the source. (This behavior becomes even more important when the input stream is infinite and not merely large.)
…
Further, some operations are deemed short-circuiting operations. An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
It’s clear that a short-circuiting operation doesn’t guaranty a finite time termination, e.g. when a filter doesn’t match any item the processing can’t complete, but an implementation which doesn’t support any termination in finite time by simply ignoring the short-circuiting nature of an operation is far off the specification.
The elements of the input stream are consumed lazily one by one. The first element, 1, is transformed by the two flatMaps into the stream -1, 0, 1, 0, 1, 2, 1, 2, 3, so that entire stream corresponds to just the first input element. The nested streams are eagerly materialized by the pipeline, then flattened, then fed to the filter stage. This explains your output.
The above does not stem from a fundamental limitation, but it would probably make things much more complicated to get full-blown laziness for nested streams. I suspect it would be an even greater challenge to make it performant.
For comparison, Clojure's lazy seqs get another layer of wrapping for each such level of nesting. Due to this design, the operations may even fail with StackOverflowError when nesting is exercised to the extreme.
With regard to breakage with infinite sub-streams, the behavior of flatMap becomes still more surprising when one throws in an intermediate (as opposed to terminal) short-circuiting operation.
While the following works as expected, printing out the infinite sequence of integers
Stream.of("x").flatMap(_x -> Stream.iterate(1, i -> i + 1)).forEach(System.out::println);
the following code prints out only the "1", but still does not terminate:
Stream.of("x").flatMap(_x -> Stream.iterate(1, i -> i + 1)).limit(1).forEach(System.out::println);
I cannot imagine a reading of the spec in which that were not a bug.
In my free StreamEx library I introduced the short-circuiting collectors. When collecting sequential stream with short-circuiting collector (like MoreCollectors.first()) exactly one element is consumed from the source. Internally it's implemented in quite dirty way: using a custom exception to break the control flow. Using my library your sample could be rewritten in this way:
System.out.println(
"Result: " +
StreamEx.of(1, 2, 3)
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.filter(i -> {
System.out.println(i);
return true;
})
.collect(MoreCollectors.first())
.get()
);
The result is the following:
-1
Result: -1
While JDK-8075939 has been fixed in Java 11 and backported to 10 and 8u222, there's still an edge case of flatMap() not being truly lazy when using Stream.iterator(): JDK-8267359, still present in Java 17.
This
Iterator<Integer> it =
Stream.of("a", "b")
.flatMap(s -> Stream
.of(1, 2, 3, 4)
.filter(i -> { System.out.println(i); return true; }))
.iterator();
it.hasNext(); // This consumes the entire flatmapped stream
it.next();
Prints
1
2
3
4
While this:
Iterator<Integer> it =
Stream.of("a", "b")
.flatMap(s -> Stream
.iterate(1, i -> i)
.filter(i -> { System.out.println(i); return true; }))
.iterator();
it.hasNext();
it.next();
Never terminates
Unfortunately .flatMap() is not lazy. However, a custom flatMap workaround is available here: Why .flatMap() is so inefficient (non lazy) in java 8 and java 9
Today I also stumbled up on this bug. Behavior is not so strait forward, cause simple case, like below, is working fine, but similar production code doesn't work.
stream(spliterator).map(o -> o).flatMap(Stream::of).flatMap(Stream::of).findAny()
For guys who cannot wait another couple years for migration to JDK-10 there is a alternative true lazy stream. It doesn't support parallel. It was dedicated for JavaScript translation, but it worked out for me, cause interface is the same.
StreamHelper is collection based, but it is easy to adapt Spliterator.
https://github.com/yaitskov/j4ts/blob/stream/src/main/java/javaemul/internal/stream/StreamHelper.java
I agree with other people this is a bug opened at JDK-8075939. And since it's still not fixed more than one year later. I would like to recommend you: abacus-common
N.println("Result: " + Stream.of(1, 2, 3).peek(N::println).first().get());
N.println("-----------");
N.println("Result: " + Stream.of(1, 2, 3)
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.flatMap(i -> Stream.of(i - 1, i, i + 1))
.peek(N::println).first().get());
// output:
// 1
// Result: 1
// -----------
// -1
// Result: -1
Disclosure: I'm the developer of abacus-common.
With stream API I could easily check whether all elements satify a given condition, using allMatch(e -> predicate(e)) method. I could also check if any of multiple conditions is satified allMatch(e -> predicateA(e) || predicateB(e) || predicateC(e)). But is it possible to check if all elements satisfy one of those predicates (either one)? In the previous case it is possible that some elements satisfy A and some of them not, but they satisfy B or C (and vice versa).
I could perform allMatch multiple times, but then the stream would be terminated and I would need to repeat the preliminary ones.
I could also devise a tricky reduce operation, but then it would not be able to stop earlier, when the result is obviously false (like the allMatch method does).
Here is a possible approach, that goes back to using a simple Iterator over the elements of the Stream (so it doesn't have parallel support, but works for any kind and any number of predicates).
It creates an initial BitSet having the size of the given predicates' length with all bits set to true, and each time we retrieve a next element, we clear (set to false) the indexes of the predicates that didn't match. Thus, at each index, the bit set will contain whether that predicate all matched the element of the stream so far. It is short-circuiting because it loops until there are elements left and the bit set is not empty (meaning there are still predicates that all matched the elements considered so far).
#SafeVarargs
private static <T> boolean allMatchOneOf(Stream<T> stream, Predicate<T>... predicates) {
int length = predicates.length;
BitSet bitSet = new BitSet(length);
bitSet.set(0, length);
Iterator<T> it = stream.iterator();
while (it.hasNext() && !bitSet.isEmpty()) {
T t = it.next();
IntStream.range(0, length).filter(i -> !predicates[i].test(t)).forEach(bitSet::clear);
}
return !bitSet.isEmpty();
}
Sample usage:
// false because not all elements are either even or divisible by 3
System.out.println(allMatchOneOf(Stream.of(2, 3, 12), i -> i % 2 == 0, i -> i % 3 == 0));
// true because all elements are divisible by 3
System.out.println(allMatchOneOf(Stream.of(3, 12, 18), i -> i % 2 == 0, i -> i % 3 == 0));
If we want to keep parallel support, we can have help from the StreamEx library, that has filtering, first and pairing collectors. We reuse the anyMatching collector wrote in this answer.
import static one.util.streamex.MoreCollectors.*;
#SafeVarargs
static <T> Collector<T, ?, Boolean> allMatchingOneOf(Predicate<T> first, Predicate<T>... predicates) {
Collector<T, ?, Boolean> collector = allMatching(first);
for (Predicate<T> predicate : predicates) {
collector = pairing(collector, allMatching(predicate), Boolean::logicalOr);
}
return collector;
}
static <T> Collector<T, ?, Boolean> allMatching(Predicate<T> pred) {
return collectingAndThen(anyMatching(pred.negate()), b -> !b);
}
static <T> Collector<T, ?, Boolean> anyMatching(Predicate<T> pred) {
return collectingAndThen(filtering(pred, first()), Optional::isPresent);
}
The new allMatchingOneOf collector combines the result of each allMatching collecting result by performing a logical OR on it. As such, it will tell whether all elements of the stream matched one of the given predicates.
Sample usage:
// false because not all elements are either even or divisible by 3
System.out.println(Stream.of(2, 3, 12).collect(allMatchingOneOf(i -> i % 2 == 0, i -> i % 3 == 0)));
// true because all elements are divisible by 3
System.out.println(Stream.of(3, 12, 18).collect(allMatchingOneOf(i -> i % 2 == 0, i -> i % 3 == 0)));
You can take the iterative solution of Tunaki’s answer to create a functional one which is not short-circuiting, but works in parallel:
#SafeVarargs
private static <T> boolean allMatchOneOf(Stream<T> stream, Predicate<T>... predicates) {
int length = predicates.length;
return stream.collect( () -> new BitSet(length),
(bitSet,t) ->
IntStream.range(0, length).filter(i -> !predicates[i].test(t)).forEach(bitSet::set),
BitSet::or).nextClearBit(0)<length;
}
To simplify the code, this flips the meaning of the bits; setting a bit implies an unmatched predicate. So there’s a predicate fulfilled by all elements, if there is an unset bit within the range. If the predicates are rather expensive, you can use that information to only test predicates still fulfilled by all previous elements:
#SafeVarargs
private static <T> boolean allMatchOneOf(Stream<T> stream, Predicate<T>... predicates) {
int length = predicates.length;
return stream.collect( () -> new BitSet(length),
(bitSet,t) -> {
for(int bit=bitSet.nextClearBit(0); bit<length; bit=bitSet.nextClearBit(bit+1))
if(!predicates[bit].test(t)) bitSet.set(bit);
},
BitSet::or).nextClearBit(0)<length;
}
It’s still not short-circuiting, but renders subsequent iterations to no-ops for failed predicates. It may still be unsatisfying if the source of the stream elements is expensive.
Note that you can use a similar improvement for the iterative solution:
#SafeVarargs
private static <T> boolean allMatchOneOf(Stream<T> stream, Predicate<T>... predicates) {
int length = predicates.length;
BitSet bitSet = new BitSet(length);
bitSet.set(0, length);
for(Iterator<T> it = stream.iterator(); it.hasNext() && !bitSet.isEmpty(); ) {
T t = it.next();
for(int bit=bitSet.nextSetBit(0); bit>=0; bit=bitSet.nextSetBit(bit+1))
if(!predicates[bit].test(t)) bitSet.clear(bit);
}
return !bitSet.isEmpty();
}
The iterative solution already was short-circuiting in that it stops when there is no potentially matching predicate left, but still checked all predicates when there was at least one potentially matching predicate. With this improvement, it only checks predicates which have not failed yet and still exits when there is no candidate left.
I could perform this operation twice, but then the stream would be terminated and I would need to repeat the preliminary ones.
If you intend to process the elements only after checking your conditions then the stream will have to be buffered anyway since the condition can only be checked once all elements have been traversed.
So your options are generating the stream twice or putting it into a collection.
So this is one that's really left me puzzled. Lets say I have a Player object, with Point p containing an x and y value:
class Player {
void movePlayer(Point p) {
...
}
}
If I have a bunch of static points (certainly more than players) that I need to randomly, yet uniquely, map to each player's movePlayer function, how would I do so? This process does not need to be done quickly, but often and randomly each time. To add a layer of complication, my points are generated by both varying x and y values. As of now I am doing the following (which crashed my JVM):
public List<Stream<Point>> generatePointStream() {
Random random = new Random();
List<Stream<Point>> points = new ArrayList<Stream<Point>>();
points.add(random.ints(2384, 2413).distinct().mapToObj(x -> new Point(x, 3072)));
points.add(random.ints(3072, 3084).distinct().mapToObj(y -> new Point(2413, y)));
....
points.add(random.ints(2386, 2415).distinct().mapToObj(x -> new Point(x, 3135)));
Collections.shuffle(points);
return points;
}
Note that before I used only one stream with the Stream.concat method, but that threw errors and looked pretty ugly, leading me to my current predicament. And to assign them to all Player objects in the List<Player> players:
players.stream().forEach(p->p.movePlayer(generatePointStream().stream().flatMap(t->t).
findAny().orElse(new Point(2376, 9487))));
Now this almost worked when I used some ridiculous abstraction Stream<Stream<Point>> , except it only used points from the first Stream<Point>.
Am I completely missing the point of streams here? I just liked the idea of not creating explicit Point objects I wouldn't use anyways.
Well, you can define a method returning a Stream of Points like
public Stream<Point> allValues() {
return Stream.of(
IntStream.range(2384, 2413).mapToObj(x -> new Point(x, 3072)),
IntStream.range(3072, 3084).mapToObj(y -> new Point(2413, y)),
//...
IntStream.range(2386, 2415).mapToObj(x -> new Point(x, 3135))
).flatMap(Function.identity());
}
which contains all valid points, though not materialized, due to the lazy nature of the Stream. Then, create a method to pick random elements like:
public List<Point> getRandomPoints(int num) {
long count=allValues().count();
assert count > num;
return new Random().longs(0, count)
.distinct()
.limit(num)
.mapToObj(i -> allValues().skip(i).findFirst().get())
.collect(Collectors.toList());
}
In a perfect world, this would already have all the laziness you wish, including creating only the desired number of Point instances.
However, there are several implementation details which might make this even worse than just collecting into a list.
One is special to the flatMap operation, see “Why filter() after flatMap() is “not completely” lazy in Java streams?”. Not only are substreams processed eagerly, also Stream properties that could allow internal optimizations are not evaluated. In this regard, a concat based Stream is more efficient.
public Stream<Point> allValues() {
return Stream.concat(
Stream.concat(
IntStream.range(2384, 2413).mapToObj(x -> new Point(x, 3072)),
IntStream.range(3072, 3084).mapToObj(y -> new Point(2413, y))
),
//...
IntStream.range(2386, 2415).mapToObj(x -> new Point(x, 3135))
);
}
There is a warning regarding creating too deep concatenated streams, but if you are in control of the creation like here, you can care to create a balanced tree, like
Stream.concat(
Stream.concat(
Stream.concat(a, b),
Stream.concat(c, d)
),
Stream.concat(
Stream.concat(a, b),
Stream.concat(c, d)
)
)
However, even though such a Stream allows to calculate the size without processing elements, this won’t happen before Java 9. In Java 8, count() will always iterate over all elements, which implies having already instantiated as much Point instances as when collecting all elements into a List after the count() operation.
Even worse, skip is not propagated to the Stream’s source, so when saying stream.map(…).skip(n).findFirst(), the mapping function is evaluated up to n+1 times instead of only once. Of course, this renders the entire idea of the getRandomPoints method using this as lazy construct useless. Due to the encapsulation and the nested streams we have here, we can’t even move the skip operation before the map.
Note that temporary instances still might be handled more efficient than collecting into a list, where all instance of the exist at the same time, but it’s hard to predict due to the much larger number we have here. So if the instance creation really is a concern, we can solve this specific case due to the fact that the two int values making up a point can be encapsulated in a primitive long value:
public LongStream allValuesAsLong() {
return LongStream.concat(LongStream.concat(
LongStream.range(2384, 2413).map(x -> x <<32 | 3072),
LongStream.range(3072, 3084).map(y -> 2413L <<32 | y)
),
//...
LongStream.range(2386, 2415).map(x -> x <<32 | 3135)
);
}
public List<Point> getRandomPoints(int num) {
long count=allValuesAsLong().count();
assert count > num;
return new Random().longs(0, count)
.distinct()
.limit(num)
.mapToObj(i -> allValuesAsLong().skip(i)
.mapToObj(l -> new Point((int)(l>>>32), (int)(l&(1L<<32)-1)))
.findFirst().get())
.collect(Collectors.toList());
}
This will indeed only create num instances of Point.
You should do something like:
final int PLAYERS_COUNT = 6;
List<Point> points = generatePointStream()
.stream()
.limit(PLAYERS_COUNT)
.map(s -> s.findAny().get())
.collect(Collectors.toList());
This outputs
2403, 3135
2413, 3076
2393, 3072
2431, 3118
2386, 3134
2368, 3113
I am trying to calculate the multiplication of a value using the previous two values using java 8's stream. I want to call a function that will return an array/list/collection. I am creating a List and adding 1,2 to it.
Let's say the list name is result.
public static void main (String[] args) {
List<Integer> result = new ArrayList<Integer>();
result.add(1);
result.add(2);
int n = 5; //n can be anything, choosing 5 for this example
res(n, result);
//print result which should be [1, 2, 2, 4, 8]
}
public static List<Integer> res(int n, List<Integer> result ) {
result.stream()
.limit(n)
.reduce(identity, (base,index) -> base);
//return result;
}
Now the issue is trying to try to pass result into the stream to keep updating the list with the new values using the stream. According to the java tutorials, it is possible, albeit inefficient.
"If your reduce operation involves adding elements to a collection, then every time your accumulator function processes an element, it creates a new collection that includes the element, which is inefficient."
Do I need to use the optional third parameter, BinaryOperator combiner, to combine the list + result??
<U> U reduce(U identity,
BiFunction<U,? super T,U> accumulator,
BinaryOperator<U> combiner)
In short; I want to pass a list with two values and have the function find the multiplication of the first two values (1,2), add it to the list, and find the multiplication of the last two values (2,2), and add it to the list, and until the stream hits the limit.
It looks like you're trying to implement a recurrence relation. The reduce method applies some function to a bunch of pre-existing values in the stream. You can't use reduce and take an intermediate result from the reducer function and "feed it back" into the stream, which is what you need to do in order to implement a recurrence relation.
The way to implement a recurrence relation using streams is to use one of the streams factory methods Stream.generate or Stream.iterate. The iterate factory seems to suggest the most obvious approach. The state that needs to be kept for each application of the recurrence function requires two ints in your example, so unfortunately we have to create an object to hold these for us:
static class IntPair {
final int a, b;
IntPair(int a_, int b_) {
a = a_; b = b_;
}
}
Using this state object you can create a stream that implements the recurrence that you want:
Stream.iterate(new IntPair(1, 2), p -> new IntPair(p.b, p.a * p.b))
Once you have such a stream, it's a simple matter to collect the values into a list:
List<Integer> output =
Stream.iterate(new IntPair(1, 2), p -> new IntPair(p.b, p.a * p.b))
.limit(5)
.map(pair -> pair.a)
.collect(Collectors.toList());
System.out.println(output);
[1, 2, 2, 4, 8]
As an aside, you can use the same technique to generate the Fibonacci sequence. All you do is provide a different starting value and iteration function:
Stream.iterate(new IntPair(0, 1), p -> new IntPair(p.b, p.a + p.b))
You could also implement a similar recurrence relation using Stream.generate. This will also require a helper class. The helper class implements Supplier of the result value but it also needs to maintain state. It thus needs to be mutable, which is kind of gross in my book. The iteration function also needs to be baked into the generator object. This makes it less flexible than the IntPair object, which can be used for creating arbitrary recurrences.
Just for completeness, here is a solution which does not need an additional class.
List<Integer> output = Stream.iterate(
(ToIntFunction<IntBinaryOperator>)f -> f.applyAsInt(1, 2),
prev -> f -> prev.applyAsInt((a, b) -> f.applyAsInt(b, a*b) )
)
.limit(9).map(pair -> pair.applyAsInt((a, b)->a))
.collect(Collectors.toList());
This is a functional approach which doesn’t need an intermediate value storage. However, since Java is not a functional programming language and doesn’t have optimizations for such a recursive function definition, this is not recommended for larger streams.
Since for this example a larger stream would overflow numerically anyway and the calculation is cheap, this approach works. But for other use cases you will surely prefer a storage object when solving such a problem with plain Java (as in Stuart Marks’ answer)
I have a data set represented by a Java 8 stream:
Stream<T> stream = ...;
I can see how to filter it to get a random subset - for example
Random r = new Random();
PrimitiveIterator.OfInt coin = r.ints(0, 2).iterator();
Stream<T> heads = stream.filter((x) -> (coin.nextInt() == 0));
I can also see how I could reduce this stream to get, for example, two lists representing two random halves of the data set, and then turn those back into streams.
But, is there a direct way to generate two streams from the initial one? Something like
(heads, tails) = stream.[some kind of split based on filter]
Thanks for any insight.
A collector can be used for this.
For two categories, use Collectors.partitioningBy() factory.
This will create a Map<Boolean, List>, and put items in one or the other list based on a Predicate.
Note: Since the stream needs to be consumed whole, this can't work on infinite streams. And because the stream is consumed anyway, this method simply puts them in Lists instead of making a new stream-with-memory. You can always stream those lists if you require streams as output.
Also, no need for the iterator, not even in the heads-only example you provided.
Binary splitting looks like this:
Random r = new Random();
Map<Boolean, List<String>> groups = stream
.collect(Collectors.partitioningBy(x -> r.nextBoolean()));
System.out.println(groups.get(false).size());
System.out.println(groups.get(true).size());
For more categories, use a Collectors.groupingBy() factory.
Map<Object, List<String>> groups = stream
.collect(Collectors.groupingBy(x -> r.nextInt(3)));
System.out.println(groups.get(0).size());
System.out.println(groups.get(1).size());
System.out.println(groups.get(2).size());
In case the streams are not Stream, but one of the primitive streams like IntStream, then this .collect(Collectors) method is not available. You'll have to do it the manual way without a collector factory. It's implementation looks like this:
[Example 2.0 since 2020-04-16]
IntStream intStream = IntStream.iterate(0, i -> i + 1).limit(100000).parallel();
IntPredicate predicate = ignored -> r.nextBoolean();
Map<Boolean, List<Integer>> groups = intStream.collect(
() -> Map.of(false, new ArrayList<>(100000),
true , new ArrayList<>(100000)),
(map, value) -> map.get(predicate.test(value)).add(value),
(map1, map2) -> {
map1.get(false).addAll(map2.get(false));
map1.get(true ).addAll(map2.get(true ));
});
In this example I initialize the ArrayLists with the full size of the initial collection (if this is known at all). This prevents resize events even in the worst-case scenario, but can potentially gobble up 2NT space (N = initial number of elements, T = number of threads). To trade-off space for speed, you can leave it out or use your best educated guess, like the expected highest number of elements in one partition (typically just over N/2 for a balanced split).
I hope I don't offend anyone by using a Java 9 method. For the Java 8 version, look at the edit history.
I stumbled across this question to my self and I feel that a forked stream has some use cases that could prove valid. I wrote the code below as a consumer so that it does not do anything but you could apply it to functions and anything else you might come across.
class PredicateSplitterConsumer<T> implements Consumer<T>
{
private Predicate<T> predicate;
private Consumer<T> positiveConsumer;
private Consumer<T> negativeConsumer;
public PredicateSplitterConsumer(Predicate<T> predicate, Consumer<T> positive, Consumer<T> negative)
{
this.predicate = predicate;
this.positiveConsumer = positive;
this.negativeConsumer = negative;
}
#Override
public void accept(T t)
{
if (predicate.test(t))
{
positiveConsumer.accept(t);
}
else
{
negativeConsumer.accept(t);
}
}
}
Now your code implementation could be something like this:
personsArray.forEach(
new PredicateSplitterConsumer<>(
person -> person.getDateOfBirth().isPresent(),
person -> System.out.println(person.getName()),
person -> System.out.println(person.getName() + " does not have Date of birth")));
Unfortunately, what you ask for is directly frowned upon in the JavaDoc of Stream:
A stream should be operated on (invoking an intermediate or terminal
stream operation) only once. This rules out, for example, "forked"
streams, where the same source feeds two or more pipelines, or
multiple traversals of the same stream.
You can work around this using peek or other methods should you truly desire that type of behaviour. In this case, what you should do is instead of trying to back two streams from the same original Stream source with a forking filter, you would duplicate your stream and filter each of the duplicates appropriately.
However, you may wish to reconsider if a Stream is the appropriate structure for your use case.
You can get two Streams out of one
since Java 12 with teeing
counting heads and tails in 100 coin flips
Random r = new Random();
PrimitiveIterator.OfInt coin = r.ints(0, 2).iterator();
List<Long> list = Stream.iterate(0, i -> coin.nextInt())
.limit(100).collect(teeing(
filtering(i -> i == 1, counting()),
filtering(i -> i == 0, counting()),
(heads, tails) -> {
return(List.of(heads, tails));
}));
System.err.println("heads:" + list.get(0) + " tails:" + list.get(1));
gets eg.: heads:51 tails:49
Not exactly. You can't get two Streams out of one; this doesn't make sense -- how would you iterate over one without needing to generate the other at the same time? A stream can only be operated over once.
However, if you want to dump them into a list or something, you could do
stream.forEach((x) -> ((x == 0) ? heads : tails).add(x));
This is against the general mechanism of Stream. Say you can split Stream S0 to Sa and Sb like you wanted. Performing any terminal operation, say count(), on Sa will necessarily "consume" all elements in S0. Therefore Sb lost its data source.
Previously, Stream had a tee() method, I think, which duplicate a stream to two. It's removed now.
Stream has a peek() method though, you might be able to use it to achieve your requirements.
not exactly, but you may be able to accomplish what you need by invoking Collectors.groupingBy(). you create a new Collection, and can then instantiate streams on that new collection.
This was the least bad answer I could come up with.
import org.apache.commons.lang3.tuple.ImmutablePair;
import org.apache.commons.lang3.tuple.Pair;
public class Test {
public static <T, L, R> Pair<L, R> splitStream(Stream<T> inputStream, Predicate<T> predicate,
Function<Stream<T>, L> trueStreamProcessor, Function<Stream<T>, R> falseStreamProcessor) {
Map<Boolean, List<T>> partitioned = inputStream.collect(Collectors.partitioningBy(predicate));
L trueResult = trueStreamProcessor.apply(partitioned.get(Boolean.TRUE).stream());
R falseResult = falseStreamProcessor.apply(partitioned.get(Boolean.FALSE).stream());
return new ImmutablePair<L, R>(trueResult, falseResult);
}
public static void main(String[] args) {
Stream<Integer> stream = Stream.iterate(0, n -> n + 1).limit(10);
Pair<List<Integer>, String> results = splitStream(stream,
n -> n > 5,
s -> s.filter(n -> n % 2 == 0).collect(Collectors.toList()),
s -> s.map(n -> n.toString()).collect(Collectors.joining("|")));
System.out.println(results);
}
}
This takes a stream of integers and splits them at 5. For those greater than 5 it filters only even numbers and puts them in a list. For the rest it joins them with |.
outputs:
([6, 8],0|1|2|3|4|5)
Its not ideal as it collects everything into intermediary collections breaking the stream (and has too many arguments!)
I stumbled across this question while looking for a way to filter certain elements out of a stream and log them as errors. So I did not really need to split the stream so much as attach a premature terminating action to a predicate with unobtrusive syntax. This is what I came up with:
public class MyProcess {
/* Return a Predicate that performs a bail-out action on non-matching items. */
private static <T> Predicate<T> withAltAction(Predicate<T> pred, Consumer<T> altAction) {
return x -> {
if (pred.test(x)) {
return true;
}
altAction.accept(x);
return false;
};
/* Example usage in non-trivial pipeline */
public void processItems(Stream<Item> stream) {
stream.filter(Objects::nonNull)
.peek(this::logItem)
.map(Item::getSubItems)
.filter(withAltAction(SubItem::isValid,
i -> logError(i, "Invalid")))
.peek(this::logSubItem)
.filter(withAltAction(i -> i.size() > 10,
i -> logError(i, "Too large")))
.map(SubItem::toDisplayItem)
.forEach(this::display);
}
}
Shorter version that uses Lombok
import java.util.function.Consumer;
import java.util.function.Predicate;
import lombok.RequiredArgsConstructor;
/**
* Forks a Stream using a Predicate into postive and negative outcomes.
*/
#RequiredArgsConstructor
#FieldDefaults(makeFinal = true, level = AccessLevel.PROTECTED)
public class StreamForkerUtil<T> implements Consumer<T> {
Predicate<T> predicate;
Consumer<T> positiveConsumer;
Consumer<T> negativeConsumer;
#Override
public void accept(T t) {
(predicate.test(t) ? positiveConsumer : negativeConsumer).accept(t);
}
}
How about:
Supplier<Stream<Integer>> randomIntsStreamSupplier =
() -> (new Random()).ints(0, 2).boxed();
Stream<Integer> tails =
randomIntsStreamSupplier.get().filter(x->x.equals(0));
Stream<Integer> heads =
randomIntsStreamSupplier.get().filter(x->x.equals(1));