Finite generated Stream in Java - how to create one?

Finite generated Stream in Java - how to create one? - java

In Java, one can easily generate an infinite stream with Stream.generate(supplier). However, I would need to generate a stream that will eventually finish.
Imagine, for example, I want a stream of all files in a directory. The number of files can be huge, therefore I can not gather all the data upfront and create a stream from them (via collection.stream()). I need to generate the sequence piece by piece. But the stream will obviously finish at some point, and terminal operators like (collect() or findAny()) need to work on it, so Stream.generate(supplier) is not suitable here.
Is there any reasonable easy way to do this in Java, without implementing the entire Stream interface on my own?
I can think of a simple hack - doing it with infinite Stream.generate(supplier), and providing null or throwing an exception when all the actual values are taken. But it would break the standard stream operators, I could use it only with my own operators that are aware of this behaviour.
CLARIFICATION
People in the comments are proposing me takeWhile() operator. This is not what I meant. How to phrase the question better... I am not asking how to filter (or limit) an existing stream, I am asking how to create (generate) the stream - dynamically, without loading all the elements upfront, but the stream would have a finite size (unknown in advance).
SOLUTION
The code I was looking for is
Iterator it = myCustomIteratorThatGeneratesTheSequence();
StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.DISTINCT), false);
I just looked into java.nio.file.Files, how the list(path) method is implemented.

Is there any reasonable easy way to do this in Java, without implementing the entire Stream interface on my own?
A simple .limit() guarantees that it will terminate. But that's not always powerful enough.
After the Stream factory methods the simplest approach for creating customs stream sources without reimplementing the stream processing pipeline is subclassing java.util.Spliterators.AbstractSpliterator<T> and passing it to java.util.stream.StreamSupport.stream(Supplier<? extends Spliterator<T>>, int, boolean)
If you're intending to use parallel streams note that AbstractSpliterator only yields suboptimal splitting. If you have more control over your source fully implementing the Spliterator interface can better.
For example, the following snippet would create a Stream providing an infinite sequence 1,2,3...
in that particular example you could use IntStream.range()
But the stream will obviously finish at some point, and terminal operators like (collect() or findAny()) need to work on it.
short-circuiting operations like findAny() can actually finish on an infinite stream, as long as there is any element that matches.
Java 9 introduces Stream.iterate to generate finite streams for some simple cases.

Kotlin code to create Stream of JsonNode from InputStream
private fun InputStream.toJsonNodeStream(): Stream<JsonNode> {
return StreamSupport.stream(
Spliterators.spliteratorUnknownSize(this.toJsonNodeIterator(), Spliterator.ORDERED),
false
)
}
private fun InputStream.toJsonNodeIterator(): Iterator<JsonNode> {
val jsonParser = objectMapper.factory.createParser(this)
return object: Iterator<JsonNode> {
override fun hasNext(): Boolean {
var token = jsonParser.nextToken()
while (token != null) {
if (token == JsonToken.START_OBJECT) {
return true
}
token = jsonParser.nextToken()
}
return false
}
override fun next(): JsonNode {
return jsonParser.readValueAsTree()
}
}
}

Here is a stream which is custom and finite :
package org.tom.stream;
import java.util.*;
import java.util.function.*;
import java.util.stream.*;
public class GoldenStreams {
private static final String IDENTITY = "";
public static void main(String[] args) {
Stream<String> stream = java.util.stream.StreamSupport.stream(new Spliterator<String>() {
private static final int LIMIT = 25;
private int integer = Integer.MAX_VALUE;
{
integer = 0;
}
#Override
public int characteristics() {
return Spliterator.DISTINCT;
}
#Override
public long estimateSize() {
return LIMIT-integer;
}
#Override
public boolean tryAdvance(Consumer<? super String> arg0) {
arg0.accept(IDENTITY+integer++);
return integer < 25;
}
#Override
public Spliterator<String> trySplit() {
System.out.println("trySplit");
return null;
}}, false);
List<String> peeks = new ArrayList<String>();
List<String> reds = new ArrayList<String>();
stream.peek(data->{
peeks.add(data);
}).filter(data-> {
return Integer.parseInt(data)%2>0;
}).peek(data ->{
System.out.println("peekDeux:"+data);
}).reduce(IDENTITY,(accumulation,input)->{
reds.add(input);
String concat = accumulation + ( accumulation.isEmpty() ? IDENTITY : ":") + input;
System.out.println("reduce:"+concat);
return concat;
});
System.out.println("Peeks:"+peeks.toString());
System.out.println("Reduction:"+reds.toString());
}
}

While the author has discarded the takeWhile option, I find it adequate for certain use cases and worth an explanation.
The method takeWhile can be used on any stream and will terminate the stream when the predicate provided to the method returns false. The object which results in a false is not appended to the stream; only the objects which resulted in true are passed downstream.
So one method for generating a finite stream could be to use the Stream.generate method and return a value which signals the end of the stream by being evaluated to false by the predicate provided to takeWhile.
Here's an example, generating all the permutations of an array :
public static Stream<int[]> permutations(int[] original) {
int dim = original.length;
var permutation = original.clone();
int[] controller = new int[dim];
var low = new AtomicInteger(0);
var up = new AtomicInteger(1);
var permutationsStream = Stream.generate(() -> {
while (up.get() < dim) {
if (controller[up.get()] < up.get()) {
low.set(up.get() % 2 * controller[up.get()]);
var tmp = permutation[low.get()];
permutation[low.get()] = permutation[up.get()];
permutation[up.get()] = tmp;
controller[up.get()]++;
up.set(1);
return permutation.clone();
} else {
controller[up.get()] = 0;
up.incrementAndGet();
}
}
return null;
}).takeWhile(Objects::nonNull);
return Stream.concat(
Stream.ofNullable(original.clone()),
permutationsStream
);
}
In this example, I used the null value to signal the end of the stream.
The caller of the method won't receive the null value !
OP could use a similar strategy, and combine it with a visitor pattern.
If it's a flat directory, OP would be better off using Stream.iterate with the seed being the index of the file to yield and Stream.limit on the number of files (which can be known without browsing the directory).

Related

Converting simple foreach loop to stream

Suppose I have a simple class with method eval(). Is possible to convert this method to stream.reduce or something similar except for using for loop? Operation is interface with many possible implementations of method execute which compute different arithmetical operations.
public class Expression {
private final List<Operation> operations;
public Expression(List<Operation> operations) {
this.operations = operations;
}
int eval() {
int result = 0;
for (Operation operation: operations) {
result = operation.execute(result);
}
return result;
}
}

Try this.
int eval() {
int[] r = {0};
operations.stream()
.forEach(op -> r[0] = op.execute(r[0]));
return r[0];
}

forEach
Why not to try forEach() as the simplest and most common operation; it loops over the stream elements, calling the supplied function on each element.
public void eval() {
operations.stream().forEach(e -> e.execute());
}
This will effectively call the execute() on each element in the operations.
Also, a note to your current code is that result will have the latest result of operations execute, but not all.

Getting intermediate results from stream to be used later in stream

I was trying to write some functional programming code (using lambdas and streams from Java 8) to test if a string has unique characters in it (if it does, return true, if it does not, return false). A common way to do this using vanilla Java is with a data structure like a set, i.e.:
public static boolean oldSchoolMethod(String str) {
Set<String> set = new HashSet<>();
for(int i=0; i<str.length(); i++) {
if(!set.add(str.charAt(i) + "")) return false;
}
return true;
}
The set returns true if the character/object can be added to the set (because it did not exist there previously). It returns false if it cannot (it exists in the set already, duplicated value, and cannot be added). This makes it easy to break out the loop and detect if you have a duplicate, without needing to iterate through all length N characters of the string.
I know in Java 8 streams you cannot break out a stream. Is there anyway way to capture the return value of an intermediate stream operation, like adding to the set's return value (true or false) and send that value to the next stage of the pipeline (another intermediate operation or terminal stream operation)? i.e.
Arrays.stream(myInputString.split(""))
.forEach( i -> {
set.add(i) // need to capture whether this returns "true" or "false" and use that value later in
// the pipeline or is this bad/not possible?
});
One of the other ways I thought of solving this problem, is to just use distinct() and collect the results into a new string and if it is the same length as the original string, than you know it is unique, else if there are different lengths, some characters got filtered out for not being distinct, thus you know it is not unique when comparing lengths. The only issue I see here is that you have to iterate through all length N chars of the string, where the "old school" method best-case scenario could be done in almost constant time O(1), since it is breaking out the loop and returning as soon as it finds 1 duplicated character:
public static boolean java8StreamMethod(String str) {
String result = Arrays.stream(str.split(""))
.distinct()
.collect(Collectors.joining());
return result.length() == str.length();
}

Your solutions are all performing unnecessary string operations.
E.g. instead of using a Set<String>, you can use a Set<Character>:
public static boolean betterOldSchoolMethod(String str) {
Set<Character> set = new HashSet<>();
for(int i=0; i<str.length(); i++) {
if(!set.add(str.charAt(i))) return false;
}
return true;
}
But even the boxing from char to Character is avoidable.
public static boolean evenBetterOldSchoolMethod(String str) {
BitSet set = new BitSet();
for(int i=0; i<str.length(); i++) {
if(set.get(str.charAt(i))) return false;
set.set(str.charAt(i));
}
return true;
}
Likewise, for the Stream variant, you can use str.chars() instead of Arrays.stream(str.split("")). Further, you can use count() instead of collecting all elements to a string via collect(Collectors.joining()), just to call length() on it.
Fixing both issues yields the solution:
public static boolean newMethod(String str) {
return str.chars().distinct().count() == str.length();
}
This is simple, but lacks short-circuiting. Further, the performance characteristics of distinct() are implementation-dependent. In OpenJDK, it uses an ordinary HashSet under the hood, rather than BitSet or such alike.

This code might work for you:
public class Test {
public static void main(String[] args) {
String myInputString = "hellowrd";
HashSet<String> set = new HashSet<>();
Optional<String> duplicateChar =Arrays.stream(myInputString.split("")).
filter(num-> !set.add(num)).findFirst();
if(duplicateChar.isPresent()){
System.out.println("Not unique");
}else{
System.out.println("Unique");
}
}
}
Here using findFirst() I am able to find the first duplicate element. So that we don't need to continue on iterating rest of the characters.

What about just mapping to a boolean?
Arrays.stream(myInputString.split(""))
.map(set::add)
.<...>
That would solve your concrete issue, I guess, but it's not a very nice solution because the closures in stream chains should not have side-effects (that is exactly the point of functional programming...).
Sometimes the classic for-loop is still the better choice for certain problems ;-)

Split Java stream into two lazy streams without terminal operation

I understand that in general Java streams do not split. However, we have an involved and lengthy pipeline, at the end of which we have two different types of processing that share the first part of the pipeline.
Due to the size of the data, storing the intermediate stream product is not a viable solution. Neither is running the pipeline twice.
Basically, what we are looking for is a solution that is an operation on a stream that yields two (or more) streams that are lazily filled and able to be consumed in parallel. By that, I mean that if stream A is split into streams B and C, when streams B and C consume 10 elements, stream A consumes and provides those 10 elements, but if stream B then tries to consume more elements, it blocks until stream C also consumes them.
Is there any pre-made solution for this problem or any library we can look at? If not, where would we start to look if we want to implement this ourselves? Or is there a compelling reason not to implemented at all?

I don't know about functionality that would fulfill your blocking requirement, but you might be interested in jOOλ's Seq.duplicate() method:
Stream<T> streamA = Stream.of(/* your data here */);
Tuple2<Seq<T>, Seq<T>> streamTuple = Seq.seq(streamA).duplicate();
Stream<T> streamB = streamTuple.v1();
Stream<T> streamC = streamTuple.v2();
The Streams can be consumed absolutely independently (including consumption in parallel) thanks to the SeqBuffer class that's used internally by this method.
Note that:
SeqBuffer will cache even the elements that are no longer needed because they have already been consumed by both streamB and streamC (so if you cannot afford to keep them in memory, it's not a solution for you);
as I mentioned at the beginning, streamB and streamC will not block one another.
Disclaimer: I am the author of the SeqBuffer class.

You can implement a custom Spliterator in order to achieve such behavior. We will split your streams into the common "source" and the different "consumers". The custom spliterator then forwards the elements from the source to each consumer. For this purpose, we will use a BlockingQueue (see this question).
Note that the difficult part here is not the spliterator/stream, but the syncing of the consumers around the queue, as the comments on your question already indicate. Still, however you implement the syncing, Spliterator helps to use streams with it.
#SafeVarargs
public static <T> long streamForked(Stream<T> source, Consumer<Stream<T>>... consumers)
{
return StreamSupport.stream(new ForkingSpliterator<>(source, consumers), false).count();
}
private static class ForkingSpliterator<T>
extends AbstractSpliterator<T>
{
private Spliterator<T> sourceSpliterator;
private BlockingQueue<T> queue = new LinkedBlockingQueue<>();
private AtomicInteger nextToTake = new AtomicInteger(0);
private AtomicInteger processed = new AtomicInteger(0);
private boolean sourceDone;
private int consumerCount;
#SafeVarargs
private ForkingSpliterator(Stream<T> source, Consumer<Stream<T>>... consumers)
{
super(Long.MAX_VALUE, 0);
sourceSpliterator = source.spliterator();
consumerCount = consumers.length;
for (int i = 0; i < consumers.length; i++)
{
int index = i;
Consumer<Stream<T>> consumer = consumers[i];
new Thread(new Runnable()
{
#Override
public void run()
{
consumer.accept(StreamSupport.stream(new ForkedConsumer(index), false));
}
}).start();
}
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
sourceDone = !sourceSpliterator.tryAdvance(queue::offer);
return !sourceDone;
}
private class ForkedConsumer
extends AbstractSpliterator<T>
{
private int index;
private ForkedConsumer(int index)
{
super(Long.MAX_VALUE, 0);
this.index = index;
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
// take next element when it's our turn
while (!nextToTake.compareAndSet(index, index + 1))
{
}
T element;
while ((element = queue.peek()) == null)
{
if (sourceDone)
{
// element is null, and there won't be no more, so "terminate" this sub stream
return false;
}
}
// push to consumer pipeline
action.accept(element);
if (consumerCount == processed.incrementAndGet())
{
// start next round
queue.poll();
processed.set(0);
nextToTake.set(0);
}
return true;
}
}
}
With the approach used, the consumers work on each element in parallel, but wait for each other before starting on the next element.
Known issue
If one of the consumers is "shorter" than the others (e.g. because it calls limit()) it will also stop the other consumers and leave the threads hanging.
Example
public static void sleep(long millis)
{
try { Thread.sleep((long) (Math.random() * 30 + millis)); } catch (InterruptedException e) { }
}
streamForked(Stream.of("1", "2", "3", "4", "5"),
source -> source.map(word -> { sleep(50); return "fast " + word; }).forEach(System.out::println),
source -> source.map(word -> { sleep(300); return "slow " + word; }).forEach(System.out::println),
source -> source.map(word -> { sleep(50); return "2fast " + word; }).forEach(System.out::println));
fast 1
2fast 1
slow 1
fast 2
2fast 2
slow 2
2fast 3
fast 3
slow 3
fast 4
2fast 4
slow 4
2fast 5
fast 5
slow 5

How can I simplify these nested for loops with collectors in Java 8?

I'm very new to the idea of collectors and parallel streams in Java, and am wondering if there's a way to simplify this code:
boolean foundAnyMatch = false;
for (MyObject myObject : hashSetOfObjects) {
for (int i = 0; i < arrayOfStrings.length; i++) {
if (myObject.customMethodReturnsBool(arrayOfStrings[i])) {
foundAnyMatch = true;
break;
}
}
}
As you would expect, hashSetOfObjects is of type Set<MyObject> where the class MyObject contains a method with signature boolean customMethodReturnsBool(String entry). Also, arrayOfStrings is simply of type String[].

As per my understanding, your code runs for O(n) times on hashSetOfObjects. While your intention is to find if any object in hashSetOfObjects results truly in nested code.
Same logic can be represented using streams as:
boolean foundAnyMatch = hashSetOfObjects.stream
.anyMatch(x -> arrayOfStrings.streams
.anyMatch(y -> x.customMethodReturnsBool(y)));

How do I lazily concatenate streams?

I'm trying to implement a stream that uses another instance of itself in its implementation. The stream has a few constant elements prepended (with IntStream.concat) to it, so this should work as long as the concatenated stream creates the non-constant part lazily. I think using the StreamSupport.intStream overload taking a Supplier with IntStream.concat (which "creates a lazily concatenated stream") should be lazy enough to only create the second spliterator when elements are demanded from it, but even creating the stream (not evaluating it) overflows the stack. How can I lazily concatenate streams?
I'm attempting to port the streaming prime number sieve from this answer into Java. This sieve uses another instance of itself (ps = postponed_sieve() in the Python code). If I break the initial four constant elements (yield 2; yield 3; yield 5; yield 7;) into their own stream, it's easy to implement the generator as a spliterator:
/**
* based on https://stackoverflow.com/a/10733621/3614835
*/
static class PrimeSpliterator extends Spliterators.AbstractIntSpliterator {
private static final int CHARACTERISTICS = Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED | Spliterator.SORTED;
private final Map<Integer, Supplier<IntStream>> sieve = new HashMap<>();
private final PrimitiveIterator.OfInt postponedSieve = primes().iterator();
private int p, q, c = 9;
private Supplier<IntStream> s;
PrimeSpliterator() {
super(105097564 /* according to Wolfram Alpha */ - 4 /* in prefix */,
CHARACTERISTICS);
//p = next(ps) and next(ps) (that's Pythonic?)
postponedSieve.nextInt();
this.p = postponedSieve.nextInt();
this.q = p*p;
}
#Override
public boolean tryAdvance(IntConsumer action) {
for (; c > 0 /* overflow */; c += 2) {
Supplier<IntStream> maybeS = sieve.remove(c);
if (maybeS != null)
s = maybeS;
else if (c < q) {
action.accept(c);
return true; //continue
} else {
s = () -> IntStream.iterate(q+2*p, x -> x + 2*p);
p = postponedSieve.nextInt();
q = p*p;
}
int m = s.get().filter(x -> !sieve.containsKey(x)).findFirst().getAsInt();
sieve.put(m, s);
}
return false;
}
}
My first attempt at the primes() method returns an IntStream concatenating a constant stream with a new PrimeSpliterator:
public static IntStream primes() {
return IntStream.concat(IntStream.of(2, 3, 5, 7),
StreamSupport.intStream(new PrimeSpliterator()));
}
Calling primes() results in a StackOverflowError because primes() always instantiates a PrimeSpliterator, but PrimeSpliterator's field initializer always calls primes(). However, there's an overload of StreamSupport.intStream that takes a Supplier, which should allow lazily creating the PrimeSpliterator:
public static IntStream primes() {
return IntStream.concat(IntStream.of(2, 3, 5, 7),
StreamSupport.intStream(PrimeSpliterator::new, PrimeSpliterator.CHARACTERISTICS, false));
}
However, I instead get a StackOverflowError with a different backtrace (trimmed, as it repeats). Note that the recursion is entirely in the call to primes() -- the terminal operation iterator() is never invoked on a returned stream.
Exception in thread "main" java.lang.StackOverflowError
at java.util.stream.StreamSpliterators$DelegatingSpliterator$OfInt.<init>(StreamSpliterators.java:582)
at java.util.stream.IntPipeline.lazySpliterator(IntPipeline.java:155)
at java.util.stream.IntPipeline$Head.lazySpliterator(IntPipeline.java:514)
at java.util.stream.AbstractPipeline.spliterator(AbstractPipeline.java:352)
at java.util.stream.IntPipeline.spliterator(IntPipeline.java:181)
at java.util.stream.IntStream.concat(IntStream.java:851)
at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
at com.jeffreybosboom.projecteuler.util.Primes$PrimeSpliterator.<init>(Primes.java:32)
at com.jeffreybosboom.projecteuler.util.Primes$$Lambda$1/834600351.get(Unknown Source)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.get(StreamSpliterators.java:513)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.estimateSize(StreamSpliterators.java:536)
at java.util.stream.Streams$ConcatSpliterator.<init>(Streams.java:713)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:789)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:785)
at java.util.stream.Streams$ConcatSpliterator$OfInt.<init>(Streams.java:819)
at java.util.stream.IntStream.concat(IntStream.java:851)
at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
at com.jeffreybosboom.projecteuler.util.Primes$PrimeSpliterator.<init>(Primes.java:32)
at com.jeffreybosboom.projecteuler.util.Primes$$Lambda$1/834600351.get(Unknown Source)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.get(StreamSpliterators.java:513)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.estimateSize(StreamSpliterators.java:536)
at java.util.stream.Streams$ConcatSpliterator.<init>(Streams.java:713)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:789)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:785)
at java.util.stream.Streams$ConcatSpliterator$OfInt.<init>(Streams.java:819)
at java.util.stream.IntStream.concat(IntStream.java:851)
at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
How can I concatenate streams lazily enough to allow a stream to use another copy of itself in its implementation?

Your apparently assume that the Streams API extends its guarantees of laziness even to the instantiation of spliterators; this is not correct. It expects to be able to instantiate the stream's spliterator at any time before the actual consumption begins, for example just to find out the stream's characteristics and reported size. Consumption only begins by invoking trySplit, tryAdvance, or forEachRemaining.
Having that in mind, you are initializing the postponed sieve earlier than you need it. You don't get to use any of its results until the else if part in tryAdvance. So move the code to the last possible moment which gives correctness:
#Override
public boolean tryAdvance(IntConsumer action) {
for (; c > 0 /* overflow */; c += 2) {
Supplier<IntStream> maybeS = sieve.remove(c);
if (maybeS != null)
s = maybeS;
else {
if (postponedSieve == null) {
postponedSieve = primes().iterator();
postponedSieve.nextInt();
this.p = postponedSieve.nextInt();
this.q = p*p;
}
if (c < q) {
action.accept(c);
return true; //continue
I think that, with this change, even your first attempt at primes() should work.
If you want to stay with your current approach, you could involve the following idiom:
Stream.<Supplier<IntStream>>of(
()->IntStream.of(2, 3, 5, 7),
()->intStream(new PrimeSpliterator()))
.flatMap(Supplier::get);
You may find that this gives you as much laziness as you need.

I like to use Supplier to do that:
return Stream.<Supplier<Stream<WhatEver>>of(
() -> generateStreamOfWhatEverAndChangeSomeState(input, state),
() -> generateStreamOfMoreWhatEversDependendingOnMutatedState(state)
).flatMap(Supplier::get);
Since stream is lazily evaluated the generateStreamOfWhatEverAndChangeSomeState() will finish before generateStreamOfMoreWhatEversDependendingOnMutatedState() start and the state would be updated.
I should note that this is probably not what the designers of Stream had in mind. Idealy a Stream should not change state, only read each item and produce a new item.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finite generated Stream in Java - how to create one? - java

Related

Converting simple foreach loop to stream

Getting intermediate results from stream to be used later in stream

Split Java stream into two lazy streams without terminal operation

How can I simplify these nested for loops with collectors in Java 8?

How do I lazily concatenate streams?

Categories

Resources