How to pop the first element in a java stream? [duplicate] - java

Java 8 introduced a Stream class that resembles Scala's Stream, a powerful lazy construct using which it is possible to do something like this very concisely:
def from(n: Int): Stream[Int] = n #:: from(n+1)
def sieve(s: Stream[Int]): Stream[Int] = {
s.head #:: sieve(s.tail filter (_ % s.head != 0))
}
val primes = sieve(from(2))
primes takeWhile(_ < 1000) print // prints all primes less than 1000
I wondered if it is possible to do this in Java 8, so I wrote something like this:
IntStream from(int n) {
return IntStream.iterate(n, m -> m + 1);
}
IntStream sieve(IntStream s) {
int head = s.findFirst().getAsInt();
return IntStream.concat(IntStream.of(head), sieve(s.skip(1).filter(n -> n % head != 0)));
}
IntStream primes = sieve(from(2));
Fairly simple, but it produces java.lang.IllegalStateException: stream has already been operated upon or closed because both findFirst() and skip() are terminal operations on Stream which can be done only once.
I don't really have to use up the stream twice since all I need is the first number in the stream and the rest as another stream, i.e. equivalent of Scala's Stream.head and Stream.tail. Is there a method in Java 8 Stream that I can use to achieve this?
Thanks.

Even if you hadn’t the problem that you can’t split an IntStream, you code didn’t work because you are invoking your sieve method recursively instead of lazily. So you had an infinity recursion before you could query your resulting stream for the first value.
Splitting an IntStream s into a head and a tail IntStream (which has not yet consumed) is possible:
PrimitiveIterator.OfInt it = s.iterator();
int head = it.nextInt();
IntStream tail = IntStream.generate(it::next).filter(i -> i % head != 0);
At this place you need a construct of invoking sieve on the tail lazily. Stream does not provide that; concat expects existing stream instances as arguments and you can’t construct a stream invoking sieve lazily with a lambda expression as lazy creation works with mutable state only which lambda expressions do not support. If you don’t have a library implementation hiding the mutable state you have to use a mutable object. But once you accept the requirement of mutable state, the solution can be even easier than your first approach:
IntStream primes = from(2).filter(i -> p.test(i)).peek(i -> p = p.and(v -> v % i != 0));
IntPredicate p = x -> true;
IntStream from(int n)
{
return IntStream.iterate(n, m -> m + 1);
}
This will recursively create a filter but in the end it doesn’t matter whether you create a tree of IntPredicates or a tree of IntStreams (like with your IntStream.concat approach if it did work). If you don’t like the mutable instance field for the filter you can hide it in an inner class (but not in a lambda expression…).

My StreamEx library has now headTail() operation which solves the problem:
public static StreamEx<Integer> sieve(StreamEx<Integer> input) {
return input.headTail((head, tail) ->
sieve(tail.filter(n -> n % head != 0)).prepend(head));
}
The headTail method takes a BiFunction which will be executed at most once during the stream terminal operation execution. So this implementation is lazy: it does not compute anything until traversal starts and computes only as much prime numbers as requested. The BiFunction receives a first stream element head and the stream of the rest elements tail and can modify the tail in any way it wants. You may use it with predefined input:
sieve(IntStreamEx.range(2, 1000).boxed()).forEach(System.out::println);
But infinite stream work as well
sieve(StreamEx.iterate(2, x -> x+1)).takeWhile(x -> x < 1000)
.forEach(System.out::println);
// Not the primes till 1000, but 1000 first primes
sieve(StreamEx.iterate(2, x -> x+1)).limit(1000).forEach(System.out::println);
There's also alternative solution using headTail and predicate concatenation:
public static StreamEx<Integer> sieve(StreamEx<Integer> input, IntPredicate isPrime) {
return input.headTail((head, tail) -> isPrime.test(head)
? sieve(tail, isPrime.and(n -> n % head != 0)).prepend(head)
: sieve(tail, isPrime));
}
sieve(StreamEx.iterate(2, x -> x+1), i -> true).limit(1000).forEach(System.out::println);
It interesting to compare recursive solutions: how many primes they capable to generate.
#John McClean solution (StreamUtils)
John McClean solutions are not lazy: you cannot feed them with infinite stream. So I just found by trial-and-error the maximal allowed upper bound (17793) (after that StackOverflowError occurs):
public void sieveTest(){
sieve(IntStream.range(2, 17793).boxed()).forEach(System.out::println);
}
#John McClean solution (Streamable)
public void sieveTest2(){
sieve(Streamable.range(2, 39990)).forEach(System.out::println);
}
Increasing upper limit above 39990 results in StackOverflowError.
#frhack solution (LazySeq)
LazySeq<Integer> ints = integers(2);
LazySeq primes = sieve(ints); // sieve method from #frhack answer
primes.forEach(p -> System.out.println(p));
Result: stuck after prime number = 53327 with enormous heap allocation and garbage collection taking more than 90%. It took several minutes to advance from 53323 to 53327, so waiting more seems impractical.
#vidi solution
Prime.stream().forEach(System.out::println);
Result: StackOverflowError after prime number = 134417.
My solution (StreamEx)
sieve(StreamEx.iterate(2, x -> x+1)).forEach(System.out::println);
Result: StackOverflowError after prime number = 236167.
#frhack solution (rxjava)
Observable<Integer> primes = Observable.from(()->primesStream.iterator());
primes.forEach((x) -> System.out.println(x.toString()));
Result: StackOverflowError after prime number = 367663.
#Holger solution
IntStream primes=from(2).filter(i->p.test(i)).peek(i->p=p.and(v->v%i!=0));
primes.forEach(System.out::println);
Result: StackOverflowError after prime number = 368089.
My solution (StreamEx with predicate concatenation)
sieve(StreamEx.iterate(2, x -> x+1), i -> true).forEach(System.out::println);
Result: StackOverflowError after prime number = 368287.
So three solutions involving predicate concatenation win, because each new condition adds only 2 more stack frames. I think, the difference between them is marginal and should not be considered to define a winner. However I like my first StreamEx solution more as it more similar to Scala code.

The solution below does not do state mutations, except for the head/tail deconstruction of the stream.
The lazyness is obtained using IntStream.iterate. The class Prime is used to keep the generator state
import java.util.PrimitiveIterator;
import java.util.stream.IntStream;
import java.util.stream.Stream;
public class Prime {
private final IntStream candidates;
private final int current;
private Prime(int current, IntStream candidates)
{
this.current = current;
this.candidates = candidates;
}
private Prime next()
{
PrimitiveIterator.OfInt it = candidates.filter(n -> n % current != 0).iterator();
int head = it.next();
IntStream tail = IntStream.generate(it::next);
return new Prime(head, tail);
}
public static Stream<Integer> stream() {
IntStream possiblePrimes = IntStream.iterate(3, i -> i + 1);
return Stream.iterate(new Prime(2, possiblePrimes), Prime::next)
.map(p -> p.current);
}
}
The usage would be this:
Stream<Integer> first10Primes = Prime.stream().limit(10)

You can essentially implement it like this:
static <T> Tuple2<Optional<T>, Seq<T>> splitAtHead(Stream<T> stream) {
Iterator<T> it = stream.iterator();
return tuple(it.hasNext() ? Optional.of(it.next()) : Optional.empty(), seq(it));
}
In the above example, Tuple2 and Seq are types borrowed from jOOλ, a library that we developed for jOOQ integration tests. If you don't want any additional dependencies, you might as well implement them yourself:
class Tuple2<T1, T2> {
final T1 v1;
final T2 v2;
Tuple2(T1 v1, T2 v2) {
this.v1 = v1;
this.v2 = v2;
}
static <T1, T2> Tuple2<T1, T2> tuple(T1 v1, T2 v2) {
return new Tuple<>(v1, v2);
}
}
static <T> Tuple2<Optional<T>, Stream<T>> splitAtHead(Stream<T> stream) {
Iterator<T> it = stream.iterator();
return tuple(
it.hasNext() ? Optional.of(it.next()) : Optional.empty,
StreamSupport.stream(Spliterators.spliteratorUnknownSize(
it, Spliterator.ORDERED
), false)
);
}

If you don't mind using 3rd party libraries cyclops-streams, I library I wrote has a number of potential solutions.
The StreamUtils class has large number of static methods for working directly with java.util.stream.Streams including headAndTail.
HeadAndTail<Integer> headAndTail = StreamUtils.headAndTail(Stream.of(1,2,3,4));
int head = headAndTail.head(); //1
Stream<Integer> tail = headAndTail.tail(); //Stream[2,3,4]
The Streamable class represents a replayable Stream and works by building a lazy, caching intermediate data-structure. Because it is caching and repayable - head and tail can be implemented directly and separately.
Streamable<Integer> replayable= Streamable.fromStream(Stream.of(1,2,3,4));
int head = repayable.head(); //1
Stream<Integer> tail = replayable.tail(); //Stream[2,3,4]
cyclops-streams also provides a sequential Stream extension that in turn extends jOOλ and has both Tuple based (from jOOλ) and domain object (HeadAndTail) solutions for head and tail extraction.
SequenceM.of(1,2,3,4)
.splitAtHead(); //Tuple[1,SequenceM[2,3,4]
SequenceM.of(1,2,3,4)
.headAndTail();
Update per Tagir's request -> A Java version of the Scala sieve using SequenceM
public void sieveTest(){
sieve(SequenceM.range(2, 1_000)).forEach(System.out::println);
}
SequenceM<Integer> sieve(SequenceM<Integer> s){
return s.headAndTailOptional().map(ht ->SequenceM.of(ht.head())
.appendStream(sieve(ht.tail().filter(n -> n % ht.head() != 0))))
.orElse(SequenceM.of());
}
And another version via Streamable
public void sieveTest2(){
sieve(Streamable.range(2, 1_000)).forEach(System.out::println);
}
Streamable<Integer> sieve(Streamable<Integer> s){
return s.size()==0? Streamable.of() : Streamable.of(s.head())
.appendStreamable(sieve(s.tail()
.filter(n -> n % s.head() != 0)));
}
Note - neither Streamable of SequenceM have an Empty implementation - hence the size check for Streamable and the use of headAndTailOptional.
Finally a version using plain java.util.stream.Stream
import static com.aol.cyclops.streams.StreamUtils.headAndTailOptional;
public void sieveTest(){
sieve(IntStream.range(2, 1_000).boxed()).forEach(System.out::println);
}
Stream<Integer> sieve(Stream<Integer> s){
return headAndTailOptional(s).map(ht ->Stream.concat(Stream.of(ht.head())
,sieve(ht.tail().filter(n -> n % ht.head() != 0))))
.orElse(Stream.of());
}
Another update - a lazy iterative based on #Holger's version using objects rather than primitives (note a primitive version is also possible)
final Mutable<Predicate<Integer>> predicate = Mutable.of(x->true);
SequenceM.iterate(2, n->n+1)
.filter(i->predicate.get().test(i))
.peek(i->predicate.mutate(p-> p.and(v -> v%i!=0)))
.limit(100000)
.forEach(System.out::println);

There are many interesting suggestions provided here, but if someone needs a solution without dependencies to third party libraries I came up with this:
import java.util.AbstractMap;
import java.util.Optional;
import java.util.Spliterators;
import java.util.stream.StreamSupport;
/**
* Splits a stream in the head element and a tail stream.
* Parallel streams are not supported.
*
* #param stream Stream to split.
* #param <T> Type of the input stream.
* #return A map entry where {#link Map.Entry#getKey()} contains an
* optional with the first element (head) of the original stream
* and {#link Map.Entry#getValue()} the tail of the original stream.
* #throws IllegalArgumentException for parallel streams.
*/
public static <T> Map.Entry<Optional<T>, Stream<T>> headAndTail(final Stream<T> stream) {
if (stream.isParallel()) {
throw new IllegalArgumentException("parallel streams are not supported");
}
final Iterator<T> iterator = stream.iterator();
return new AbstractMap.SimpleImmutableEntry<>(
iterator.hasNext() ? Optional.of(iterator.next()) : Optional.empty(),
StreamSupport.stream(Spliterators.spliteratorUnknownSize(iterator, 0), false)
);
}

To get head and tail you need a Lazy Stream implementation. Java 8 stream or RxJava are not suitable.
You can use for example LazySeq as follows.
Lazy sequence is always traversed from the beginning using very cheap
first/rest decomposition (head() and tail())
LazySeq implements java.util.List interface, thus can be used in
variety of places. Moreover it also implements Java 8 enhancements to
collections, namely streams and collectors
package com.company;
import com.nurkiewicz.lazyseq.LazySeq;
public class Main {
public static void main(String[] args) {
LazySeq<Integer> ints = integers(2);
LazySeq primes = sieve(ints);
primes.take(10).forEach(p -> System.out.println(p));
}
private static LazySeq<Integer> sieve(LazySeq<Integer> s) {
return LazySeq.cons(s.head(), () -> sieve(s.filter(x -> x % s.head() != 0)));
}
private static LazySeq<Integer> integers(int from) {
return LazySeq.cons(from, () -> integers(from + 1));
}
}

Here is another recipe using the way suggested by Holger.
It use RxJava just to add the possibility to use the take(int) method and many others.
package com.company;
import rx.Observable;
import java.util.function.IntPredicate;
import java.util.stream.IntStream;
public class Main {
public static void main(String[] args) {
final IntPredicate[] p={(x)->true};
IntStream primesStream=IntStream.iterate(2,n->n+1).filter(i -> p[0].test(i)).peek(i->p[0]=p[0].and(v->v%i!=0) );
Observable primes = Observable.from(()->primesStream.iterator());
primes.take(10).forEach((x) -> System.out.println(x.toString()));
}
}

This should work with parallel streams as well:
public static <T> Map.Entry<Optional<T>, Stream<T>> headAndTail(final Stream<T> stream) {
final AtomicReference<Optional<T>> head = new AtomicReference<>(Optional.empty());
final var spliterator = stream.spliterator();
spliterator.tryAdvance(x -> head.set(Optional.of(x)));
return Map.entry(head.get(), StreamSupport.stream(spliterator, stream.isParallel()));
}

If you want to get head of a stream, just:
IntStream.range(1, 5).first();
If you want to get tail of a stream, just:
IntStream.range(1, 5).skip(1);
If you want to get both head and tail of a stream, just:
IntStream s = IntStream.range(1, 5);
int head = s.head();
IntStream tail = s.tail();
If you want to find the prime, just:
LongStream.range(2, n)
.filter(i -> LongStream.range(2, (long) Math.sqrt(i) + 1).noneMatch(j -> i % j == 0))
.forEach(N::println);
If you want to know more, go to get abacus-common
Declaration: I'm the developer of abacus-common.

Related

Java stream, how to apply a function to the previous result

I was looking for a right answer, but found nothing that fill my purpose.
I have a simple for loop like this:
String test = "hi";
for(Something something : somethingList) {
if(something.getSomething() != null) {
test = cleaner.clean(test, something.getSomething());
} else if(something.getOther() != null) {
test = StaticClass.clean(test, something.getOther());
}
}
and I never understood if the same result can be achieved using java stream. With reduce maybe? I need to pass the response of the previuos loop (saved in the "test" variable) to the next loop (see clean method, where I pass test). How can I do that?
If you want to do something for each element in a list (with a for each loop) I would suggest using the forEach or forEachOrdered functions. These should represent your for(Object o : objects). You can easily define your own Consumer class, which handles all the stuff for you:
class CustomConsumer implements Consumer<Integer> {
private Integer previous;
public CustomConsumer(Integer initialValue) {
previous = initialValue;
}
#Override
public void accept(Integer current) {
// do stuff with your current / previous object :)
System.out.println("previous: " + previous);
previous = current;
}
}
List<Integer> values = getValues();
values.stream()
.forEachOrdered(new CustomConsumer(-1));
This example uses Integer as a provided class, if you want to use your own just replace Integer. You can even use generics:
class CustomConsumer<T> implements Consumer<T> {
private T previous;
public CustomConsumer(T initialValue) {
previous = initialValue;
}
#Override
public void accept(T current) {
// do stuff with your current / previous object :)
System.out.println("previous: " + previous);
previous = current;
}
}
List<Integer> values = new ArrayList<>();
for(int i = 0; i < 6; i++)
values.add(i);
values.stream()
.forEachOrdered(new CustomConsumer<>("hello"));
Output:
previous: hello
previous: 0
previous: 1
previous: 2
previous: 3
previous: 4
If you want to learn more about streams the oracle docs provide some good stuff.
To expand on my comment, with streams you basically could use reduction, e.g. by using the reduce() method.
Example:
//some list simulating your somethingList
List<Integer> list = List.of(2,4,6,1,3,5);
String result = list.stream()
//make sure the stream is sequential to keep processing order
.sequential()
//start reduction with an initial value
.reduce("initial",
//in the accumulator you get the previous reduction result and the current element
(test, element) -> {
//simulates your conditions, just adding the new element for demonstration purposes
// test could also be replaced
if( element % 2 == 0 ) {
test += ", even:" + element;
} else {
test += ", odd: " + element;
}
//return the new reduction result
return test;
},
//combiner is not used in sequential streams so just one of the elements
(l, r) -> l);
This would result in:
initial, even:2, even:4, even:6, odd: 1, odd: 3, odd: 5
Note, however, that streams are not a silver bullet and sometimes a simple loop like your initial code is just fine or even better. This seems to be such a case.

Java Streams - Buffering huge streams

I'm trying to collapse several streams backed by huge amounts of data into one, then buffer them. I'm able to collapse these streams into one stream of items with no problem. When I attempt to buffer/chunk the streams, though, it attempts to fully buffer the first stream, which instantly fills up my memory.
It took me a while to narrow down the issue to a minimum test case, but there's some code below.
I can refactor things such that I don't run into this issue, but without understanding why exactly this blows up, I feel like using streams is just a ticking time bomb.
I took inspiration from Buffer Operator on Java 8 Streams for the buffering.
import java.util.*;
import java.util.stream.LongStream;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public class BreakStreams
{
//#see https://stackoverflow.com/questions/47842871/buffer-operator-on-java-8-streams
/**
* Batch a stream into chunks
*/
public static <T> Stream<List<T>> buffer(Stream<T> stream, final long count)
{
final Iterator<T> streamIterator = stream.iterator();
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<List<T>>()
{
#Override public boolean hasNext()
{
return streamIterator.hasNext();
}
#Override public List<T> next()
{
List<T> intermediate = new ArrayList<>();
for (long v = 0; v < count && hasNext(); v++)
{
intermediate.add(streamIterator.next());
}
return intermediate;
}
}, 0), false);
}
public static void main(String[] args)
{
//create streams from huge datasets
Stream<Long> streams = Stream.of(LongStream.range(0, Integer.MAX_VALUE).boxed(),
LongStream.range(0, Integer.MAX_VALUE).boxed())
//collapse into one stream
.flatMap(x -> x);
//iterating over the stream one item at a time is OK..
// streams.forEach(x -> {
//buffering the stream is NOT ok, you will go OOM
buffer(streams, 25).forEach(x -> {
try
{
Thread.sleep(2500);
}
catch (InterruptedException ignore)
{
}
System.out.println(x);
});
}
}
This seems to be connected to the older issue “Why filter() after flatMap() is "not completely" lazy in Java streams?”. While that issue has been fixed for the Stream’s builtin operations, it seems to still exist when we try to iterate over a flatmapped stream externally.
We can simplify the code to reproduce the problem to
Stream.of(LongStream.range(0, Integer.MAX_VALUE))
.flatMapToLong(x -> x)
.iterator().hasNext();
Note that using Spliterator is affected as well
Stream.of(LongStream.range(0, Integer.MAX_VALUE))
.flatMapToLong(x -> x)
.spliterator()
.tryAdvance((long l) -> System.out.println("first item: "+l));
Both try to buffer elements until ultimately bailing out with an OutOfMemoryError.
Since spliterator().forEachRemaining(…) seems not to be affected, you could implement a solution which works for your use case of forEach, but it would be fragile, as it would still exhibit the problem for short-circuiting stream operations.
public static <T> Stream<List<T>> buffer(Stream<T> stream, final int count) {
boolean parallel = stream.isParallel();
Spliterator<T> source = stream.spliterator();
return StreamSupport.stream(
new Spliterators.AbstractSpliterator<List<T>>(
(source.estimateSize()+count-1)/count, source.characteristics()
&(Spliterator.SIZED|Spliterator.DISTINCT|Spliterator.ORDERED)
| Spliterator.NONNULL) {
List<T> list;
Consumer<T> c = t -> list.add(t);
#Override
public boolean tryAdvance(Consumer<? super List<T>> action) {
if(list == null) list = new ArrayList<>(count);
if(!source.tryAdvance(c)) return false;
do {} while(list.size() < count && source.tryAdvance(c));
action.accept(list);
list = null;
return true;
}
#Override
public void forEachRemaining(Consumer<? super List<T>> action) {
source.forEachRemaining(t -> {
if(list == null) list = new ArrayList<>(count);
list.add(t);
if(list.size() == count) {
action.accept(list);
list = null;
}
});
if(list != null) {
action.accept(list);
list = null;
}
}
}, parallel);
}
But note that Spliterator based solutions are preferable in general, as they support carrying additional information enabling optimizations and have lower iteration costs in a lot of use cases. So this is the way to go once this issue has been fixed in the JDK code.
As a workaround, you can use Stream.concat(…) to combine streams, but it has an explicit warning about not to combine too many streams at once in its documentation:
Use caution when constructing streams from repeated concatenation. Accessing an element of a deeply concatenated stream can result in deep call chains, or even StackOverflowException [sic].
The throwable’s name has been corrected to StackOverflowError in Java 9’s documentation

Java Streams — How to perform an intermediate function every nth item

I am looking for an operation on a Stream that enables me to perform a non-terminal (and/or terminal) operation every nth item. Although I use a stream of primes for example, the stream could just as easily be web-requests, user actions, or some other cold data or live feed being produced.
From this:
Duration start = Duration.ofNanos(System.nanoTime());
IntStream.iterate(2, n -> n + 1)
.filter(Findprimes::isPrime)
.limit(1_000_1000 * 10)
.forEach(System.out::println);
System.out.println("Duration: " + Duration.ofNanos(System.nanoTime()).minus(start));
To a stream function like this:
IntStream.iterate(2, n -> n + 1)
.filter(Findprimes::isPrime)
.limit(1_000_1000 * 10)
.peekEvery(10, System.out::println)
.forEach( it -> {});
Create a helper method to wrap the peek() consumer:
public static IntConsumer every(int count, IntConsumer consumer) {
if (count <= 0)
throw new IllegalArgumentException("Count must be >1: Got " + count);
return new IntConsumer() {
private int i;
#Override
public void accept(int value) {
if (++this.i == count) {
consumer.accept(value);
this.i = 0;
}
}
};
}
You can now use it almost exactly like you wanted:
IntStream.rangeClosed(1, 20)
.peek(every(5, System.out::println))
.count();
Output
5
10
15
20
The helper method can be put in a utility class and statically imported, similar to how the Collectors class is nothing but static helper methods.
As noted by #user140547 in a comment, this code is not thread-safe, so it cannot be used with parallel streams. Besides, the output order would be messed up, so it doesn't really make sense to use it with parallel streams anyway.
It is not a good idea to rely on peek() and count() as it is possible that the operation is not invoked at all if count() can be calculated without going over the whole stream. Even if it works now, it does not mean that it is also going to work in future. See the javadoc of Stream.count() in Java 9.
Better use forEach().
For the problem itself: In special cases like a simple iteration, you could just filter your objects like.
Stream.iterate(2, n->n+1)
.limit(20)
.filter(n->(n-2)%5==0 && n!=2)
.forEach(System.out::println);
This of course won't work for other cases, where you might use a stateful IntConsumer. If iterate() is used, it is probably not that useful to use parallel streams anyway.
If you want a generic solution, you could also try to use a "normal" Stream, which may not be as efficient as an IntStream, but should still suffice in many cases:
class Tuple{ // ctor, getter/setter omitted
int index;
int value;
}
Then you could do:
Stream.iterate( new Tuple(1,2),t-> new Tuple(t.index+1,t.value*2))
.limit(30)
.filter(t->t.index %5 == 0)
.forEach(System.out::println);
If you have to use peek(), you can also do
.peek(t->{if (t.index %5 == 0) System.out.println(t);})
Or if you add methods
static Tuple initialTuple(int value){
return new Tuple(1,value);
}
static UnaryOperator<Tuple> createNextTuple(IntUnaryOperator f){
return current -> new Tuple(current.index+1,f.applyAsInt(current.value));
}
static Consumer<Tuple> every(int n,IntConsumer consumer){
return tuple -> {if (tuple.index % n == 0) consumer.accept(tuple.value);};
}
you can also do (with static imports):
Stream.iterate( initialTuple(2), createNextTuple(x->x*2))
.limit(30)
.peek(every(5,System.out::println))
.forEach(System.out::println);
Try this.
int[] counter = {0};
long result = IntStream.iterate(2, n -> n + 1)
.filter(Findprimes::isPrime)
.limit(100)
.peek(x -> { if (counter[0]++ % 10 == 0) System.out.print(x + " ");} )
.count();
result:
2 31 73 127 179 233 283 353 419 467

How to compare two Streams in Java 8

What would be a good way to compare two Stream instances in Java 8 and find out whether they have the same elements, specifically for purposes of unit testing?
What I've got now is:
#Test
void testSomething() {
Stream<Integer> expected;
Stream<Integer> thingUnderTest;
// (...)
Assert.assertArrayEquals(expected.toArray(), thingUnderTest.toArray());
}
or alternatively:
Assert.assertEquals(
expected.collect(Collectors.toList()),
thingUnderTest.collect(Collectors.toList()));
But that means I'm constructing two collections and discarding them. It's not a performance issue, given the size of my test streams, but I'm wondering whether there's a canonical way to compare two streams.
static void assertStreamEquals(Stream<?> s1, Stream<?> s2) {
Iterator<?> iter1 = s1.iterator(), iter2 = s2.iterator();
while(iter1.hasNext() && iter2.hasNext())
assertEquals(iter1.next(), iter2.next());
assert !iter1.hasNext() && !iter2.hasNext();
}
Collecting the stream under test (as you show) is a straightforward and effective way of performing the test. You may create the list of expected results in the easiest way available, which might not be collecting a stream.
Alternatively, with most libraries for creating mock collaborators, one could mock a Consumer that "expects" a series of accept() calls with particular elements. Consume the Stream with it, and then "verify" that its configured expectations were met.
Using the elementsEqual method in the Guava library:
Iterators.elementsEqual(s1.iterator(), s2.iterator())
You can assert the stream's content without creating a Stream<> expected.
AssertJ has fluent and readable solutions for this.
import static org.assertj.core.api.Assertions.assertThat;
import java.util.stream.Stream;
import org.junit.jupiter.api.Test;
class MyTests {
#Test
void test() {
Stream<Integer> actual = Stream.of(0, 8, 15); // your thingUnderTest
assertThat(actual).containsExactly(0, 8, 15);
}
}
public static boolean equalStreams(Stream<?> ... streams) {
List<Iterator<?>> is = Arrays.stream(streams)
.map(Stream::iterator)
.collect(Collectors.toList());
while (is.stream().allMatch(Iterator::hasNext))
if (is.stream().map(Iterator::next).distinct().limit(2).count() > 1)
return false;
return is.stream().noneMatch(Iterator::hasNext);
}
If order of elements doesn't matter, then comparison can be done for any number of streams using stream groupingBy() and then checking count for each element.
Example is for IntStream but you can get the idea behind this method:
IntStream stream1 = IntStream.range(1, 9);
IntStream stream2 = IntStream.range(1, 10);
boolean equal = IntStream.concat(stream1, stream2)
.boxed()
.collect(Collectors.groupingBy(Function.identity()))
.entrySet().stream().noneMatch(e -> e.getValue().size() != 2);
Another way of doing the same thing:
static void assertStreamEquals(Stream<?> s1, Stream<?> s2) {
Iterator<?> it2 = s2.iterator();
assert s1.allMatch(o -> it2.hasNext() && Objects.equals(o, it2.next())) && !it2.hasNext();
}
How to Compare Two Streams in java 8 and above: with the example of Comparing IntStream
package com.techsqually.java.language.generics.basics;
import java.util.Iterator;
import java.util.stream.IntStream;
import java.util.stream.Stream;
public class TwoStreamComparision {
public static void main(String[] args) {
String a = "Arpan";
String b = "Arpen";
IntStream s1 = a.chars();
IntStream s2 = b.chars();
Iterator<Integer> s1It = s1.iterator();
Iterator<Integer> s2It = s2.iterator();
//Code to check how many characters are not equal in both the string at their Respective Position
int count = 0;
while (s2It.hasNext()){
if (!s1It.next().equals(s2It.next())){
count++;
}
}
System.out.println(count);
}
}

How to dynamically do filtering in Java 8?

I know in Java 8, I can do filtering like this :
List<User> olderUsers = users.stream().filter(u -> u.age > 30).collect(Collectors.toList());
But what if I have a collection and half a dozen filtering criteria, and I want to test the combination of the criteria ?
For example I have a collection of objects and the following criteria :
<1> Size
<2> Weight
<3> Length
<4> Top 50% by a certain order
<5> Top 20% by a another certain ratio
<6> True or false by yet another criteria
And I want to test the combination of the above criteria, something like :
<1> -> <2> -> <3> -> <4> -> <5>
<1> -> <2> -> <3> -> <5> -> <4>
<1> -> <2> -> <5> -> <4> -> <3>
...
<1> -> <5> -> <3> -> <4> -> <2>
<3> -> <2> -> <1> -> <4> -> <5>
...
<5> -> <4> -> <3> -> <3> -> <1>
If each testing order may give me different results, how to write a loop to automatically filter through all the combinations ?
What I can think of is to use another method that generates the testing order like the following :
int[][] getTestOrder(int criteriaCount)
{
...
}
So if the criteriaCount is 2, it will return : {{1,2},{2,1}}
If the criteriaCount is 3, it will return : {{1,2,3},{1,3,2},{2,1,3},{2,3,1},{3,1,2},{3,2,1}}
...
But then how to most efficiently implement it with the filtering mechanism in concise expressions that comes with Java 8 ?
Interesting problem. There are several things going on here. No doubt this could be solved in less than half a page of Haskell or Lisp, but this is Java, so here we go....
One issue is that we have a variable number of filters, whereas most of the examples that have been shown illustrate fixed pipelines.
Another issue is that some of the OP's "filters" are context sensitive, such as "top 50% by a certain order". This can't be done with a simple filter(predicate) construct on a stream.
The key is to realize that, while lambdas allow functions to be passed as arguments (to good effect) it also means that they can be stored in data structures and computations can be performed on them. The most common computation is to take multiple functions and compose them.
Assume that the values being operated on are instances of Widget, which is a POJO that has some obvious getters:
class Widget {
String name() { ... }
int length() { ... }
double weight() { ... }
// constructors, fields, toString(), etc.
}
Let's start off with the first issue and figure out how to operate with a variable number of simple predicates. We can create a list of predicates like this:
List<Predicate<Widget>> allPredicates = Arrays.asList(
w -> w.length() >= 10,
w -> w.weight() > 40.0,
w -> w.name().compareTo("c") > 0);
Given this list, we can permute them (probably not useful, since they're order independent) or select any subset we want. Let's say we just want to apply all of them. How do we apply a variable number of predicates to a stream? There is a Predicate.and() method that will take two predicates and combine them using a logical and, returning a single predicate. So we could take the first predicate and write a loop that combines it with the successive predicates to build up a single predicate that's a composite and of them all:
Predicate<Widget> compositePredicate = allPredicates.get(0);
for (int i = 1; i < allPredicates.size(); i++) {
compositePredicate = compositePredicate.and(allPredicates.get(i));
}
This works, but it fails if the list is empty, and since we're doing functional programming now, mutating a variable in a loop is declassé. But lo! This is a reduction! We can reduce all the predicates over the and operator get a single composite predicate, like this:
Predicate<Widget> compositePredicate =
allPredicates.stream()
.reduce(w -> true, Predicate::and);
(Credit: I learned this technique from #venkat_s. If you ever get a chance, go see him speak at a conference. He's good.)
Note the use of w -> true as the identity value of the reduction. (This could also be used as the initial value of compositePredicate for the loop, which would fix the zero-length list case.)
Now that we have our composite predicate, we can write out a short pipeline that simply applies the composite predicate to the widgets:
widgetList.stream()
.filter(compositePredicate)
.forEach(System.out::println);
Context Sensitive Filters
Now let's consider what I referred to as a "context sensitive" filter, which is represented by the example like "top 50% in a certain order", say the top 50% of widgets by weight. "Context sensitive" isn't the best term for this but it's what I've got at the moment, and it is somewhat descriptive in that it's relative to the number of elements in the stream up to this point.
How would we implement something like this using streams? Unless somebody comes up with something really clever, I think we have to collect the elements somewhere first (say, in a list) before we can emit the first element to the output. It's kind of like sorted() in a pipeline which can't tell which is the first element to output until it has read every single input element and has sorted them.
The straightforward approach to finding the top 50% of widgets by weight, using streams, would look something like this:
List<Widget> temp =
list.stream()
.sorted(comparing(Widget::weight).reversed())
.collect(toList());
temp.stream()
.limit((long)(temp.size() * 0.5))
.forEach(System.out::println);
This isn't complicated, but it's a bit cumbersome as we have to collect the elements into a list and assign it to a variable, in order to use the list's size in the 50% computation.
This is limiting, though, in that it's a "static" representation of this kind of filtering. How would we chain this into a stream with a variable number of elements (other filters or criteria) like we did with the predicates?
A important observation is that this code does its actual work in between the consumption of a stream and the emitting of a stream. It happens to have a collector in the middle, but if you chain a stream to its front and chain stuff off its back end, nobody is the wiser. In fact, the standard stream pipeline operations like map and filter each take a stream as input and emit a stream as output. So we can write a function kind of like this ourselves:
Stream<Widget> top50PercentByWeight(Stream<Widget> stream) {
List<Widget> temp =
stream.sorted(comparing(Widget::weight).reversed())
.collect(toList());
return temp.stream()
.limit((long)(temp.size() * 0.5));
}
A similar example might be to find the shortest three widgets:
Stream<Widget> shortestThree(Stream<Widget> stream) {
return stream.sorted(comparing(Widget::length))
.limit(3);
}
Now we can write something that combines these stateful filters with ordinary stream operations:
shortestThree(
top50PercentByWeight(
widgetList.stream()
.filter(w -> w.length() >= 10)))
.forEach(System.out::println);
This works, but is kind of lousy because it reads "inside-out" and backwards. The stream source is widgetList which is streamed and filtered through an ordinary predicate. Now, going backwards, the top 50% filter is applied, then the shortest-three filter is applied, and finally the stream operation forEach is applied at the end. This works but is quite confusing to read. And it's still static. What we really want is to have a way to put these new filters inside a data structure that we can manipulate, for example, to run all the permutations, as in the original question.
A key insight at this point is that these new kinds of filters are really just functions, and we have functional interface types in Java which let us represent functions as objects, to manipulate them, store them in data structures, compose them, etc. The functional interface type that takes an argument of some type and returns a value of the same type is UnaryOperator. The argument and return type in this case is Stream<Widget>. If we were to take method references such as this::shortestThree or this::top50PercentByWeight, the types of the resulting objects would be
UnaryOperator<Stream<Widget>>
If we were to put these into a list, the type of that list would be
List<UnaryOperator<Stream<Widget>>>
Ugh! Three levels of nested generics is too much for me. (But Aleksey Shipilev did once show me some code that used four levels of nested generics.) The solution for too much generics is to define our own type. Let's call one of our new things a Criterion. It turns out that there's little value to be gained by making our new functional interface type be related to UnaryOperator, so our definition can simply be:
#FunctionalInterface
public interface Criterion {
Stream<Widget> apply(Stream<Widget> s);
}
Now we can create a list of criteria like this:
List<Criterion> criteria = Arrays.asList(
this::shortestThree,
this::lengthGreaterThan20
);
(We'll figure out how to use this list below.) This is a step forward, since we can now manipulate the list dynamically, but it's still somewhat limiting. First, it can't be combined with ordinary predicates. Second, there's a lot of hard-coded values here, such as the shortest three: how about two or four? How about a different criterion than length? What we really want is a function that creates these Criterion objects for us. This is easy with lambdas.
This creates a criterion that selects the top N widgets, given a comparator:
Criterion topN(Comparator<Widget> cmp, long n) {
return stream -> stream.sorted(cmp).limit(n);
}
This creates a criterion that selects the top p percent of widgets, given a comparator:
Criterion topPercent(Comparator<Widget> cmp, double pct) {
return stream -> {
List<Widget> temp =
stream.sorted(cmp).collect(toList());
return temp.stream()
.limit((long)(temp.size() * pct));
};
}
And this creates a criterion from an ordinary predicate:
Criterion fromPredicate(Predicate<Widget> pred) {
return stream -> stream.filter(pred);
}
Now we have a very flexible way of creating criteria and putting them into a list, where they can be subsetted or permuted or whatever:
List<Criterion> criteria = Arrays.asList(
fromPredicate(w -> w.length() > 10), // longer than 10
topN(comparing(Widget::length), 4L), // longest 4
topPercent(comparing(Widget::weight).reversed(), 0.50) // heaviest 50%
);
Once we have a list of Criterion objects, we need to figure out a way to apply all of them. Once again, we can use our friend reduce to combine all of them into a single Criterion object:
Criterion allCriteria =
criteria.stream()
.reduce(c -> c, (c1, c2) -> (s -> c2.apply(c1.apply(s))));
The identity function c -> c is clear, but the second arg is a bit tricky. Given a stream s we first apply Criterion c1, then Criterion c2, and this is wrapped in a lambda that takes two Criterion objects c1 and c2 and returns a lambda that applies the composition of c1 and c2 to a stream and returns the resulting stream.
Now that we've composed all the criteria, we can apply it to a stream of widgets like so:
allCriteria.apply(widgetList.stream())
.forEach(System.out::println);
This is still a bit inside-out, but it's fairly well controlled. Most importantly, it addresses the original question, which is how to combine criteria dynamically. Once the Criterion objects are in a data structure, they can be selected, subsetted, permuted, or whatever as necessary, and they can all be combined in a single criterion and applied to a stream using the above techniques.
The functional programming gurus are probably saying "He just reinvented ... !" which is probably true. I'm sure this has probably been invented somewhere already, but it's new to Java, because prior to lambda, it just wasn't feasible to write Java code that uses these techniques.
Update 2014-04-07
I've cleaned up and posted the complete sample code in a gist.
We could add a counter with a map so we know how many elements we have after the filters. I created a helper class that has a method that counts and returns the same object passed:
class DoNothingButCount<T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T pass(T p) {
i.incrementAndGet();
return p;
}
}
public void runDemo() {
List<Person>persons = create(100);
DoNothingButCount<Person> counter = new DoNothingButCount<>();
persons.stream().filter(u -> u.size > 12).filter(u -> u.weitght > 12).
map((p) -> counter.pass(p)).
sorted((p1, p2) -> p1.age - p2.age).
collect(Collectors.toList()).stream().
limit((int) (counter.i.intValue() * 0.5)).
sorted((p1, p2) -> p2.length - p1.length).
limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
}
I had to convert the stream to list and back to stream in the middle because the limit would use the initial count otherwise. Is all a but "hackish" but is all I could think.
I could do it a bit differently using a Function for my mapped class:
class DoNothingButCount<T > implements Function<T, T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T apply(T p) {
i.incrementAndGet();
return p;
}
}
The only thing will change in the stream is:
map((p) -> counter.pass(p)).
will become:
map(counter).
My complete test class including the two examples:
import java.util.*;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Function;
import java.util.stream.Collectors;
public class Demo2 {
Random r = new Random();
class Person {
public int size, weitght,length, age;
public Person(int s, int w, int l, int a){
this.size = s;
this.weitght = w;
this.length = l;
this.age = a;
}
public String toString() {
return "P: "+this.size+", "+this.weitght+", "+this.length+", "+this.age+".";
}
}
public List<Person>create(int size) {
List<Person>persons = new ArrayList<>();
while(persons.size()<size) {
persons.add(new Person(r.nextInt(10)+10, r.nextInt(10)+10, r.nextInt(10)+10,r.nextInt(20)+14));
}
return persons;
}
class DoNothingButCount<T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T pass(T p) {
i.incrementAndGet();
return p;
}
}
class PDoNothingButCount<T > implements Function<T, T> {
AtomicInteger i;
public PDoNothingButCount() {
i = new AtomicInteger(0);
}
public T apply(T p) {
i.incrementAndGet();
return p;
}
}
public void runDemo() {
List<Person>persons = create(100);
PDoNothingButCount<Person> counter = new PDoNothingButCount<>();
persons.stream().filter(u -> u.size > 12).filter(u -> u.weitght > 12).
map(counter).
sorted((p1, p2) -> p1.age - p2.age).
collect(Collectors.toList()).stream().
limit((int) (counter.i.intValue() * 0.5)).
sorted((p1, p2) -> p2.length - p1.length).
limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
}
public void runDemo2() {
List<Person>persons = create(100);
DoNothingButCount<Person> counter = new DoNothingButCount<>();
persons.stream().filter(u -> u.size > 12).filter(u -> u.weitght > 12).
map((p) -> counter.pass(p)).
sorted((p1, p2) -> p1.age - p2.age).
collect(Collectors.toList()).stream().
limit((int) (counter.i.intValue() * 0.5)).
sorted((p1, p2) -> p2.length - p1.length).
limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
}
public static void main(String str[]) {
Demo2 demo = new Demo2();
System.out.println("Demo 2:");
demo.runDemo2();
System.out.println("Demo 1:");
demo.runDemo();
}
}

Categories

Resources