Split Java stream into two lazy streams without terminal operation - java

I understand that in general Java streams do not split. However, we have an involved and lengthy pipeline, at the end of which we have two different types of processing that share the first part of the pipeline.
Due to the size of the data, storing the intermediate stream product is not a viable solution. Neither is running the pipeline twice.
Basically, what we are looking for is a solution that is an operation on a stream that yields two (or more) streams that are lazily filled and able to be consumed in parallel. By that, I mean that if stream A is split into streams B and C, when streams B and C consume 10 elements, stream A consumes and provides those 10 elements, but if stream B then tries to consume more elements, it blocks until stream C also consumes them.
Is there any pre-made solution for this problem or any library we can look at? If not, where would we start to look if we want to implement this ourselves? Or is there a compelling reason not to implemented at all?

I don't know about functionality that would fulfill your blocking requirement, but you might be interested in jOOλ's Seq.duplicate() method:
Stream<T> streamA = Stream.of(/* your data here */);
Tuple2<Seq<T>, Seq<T>> streamTuple = Seq.seq(streamA).duplicate();
Stream<T> streamB = streamTuple.v1();
Stream<T> streamC = streamTuple.v2();
The Streams can be consumed absolutely independently (including consumption in parallel) thanks to the SeqBuffer class that's used internally by this method.
Note that:
SeqBuffer will cache even the elements that are no longer needed because they have already been consumed by both streamB and streamC (so if you cannot afford to keep them in memory, it's not a solution for you);
as I mentioned at the beginning, streamB and streamC will not block one another.
Disclaimer: I am the author of the SeqBuffer class.

You can implement a custom Spliterator in order to achieve such behavior. We will split your streams into the common "source" and the different "consumers". The custom spliterator then forwards the elements from the source to each consumer. For this purpose, we will use a BlockingQueue (see this question).
Note that the difficult part here is not the spliterator/stream, but the syncing of the consumers around the queue, as the comments on your question already indicate. Still, however you implement the syncing, Spliterator helps to use streams with it.
#SafeVarargs
public static <T> long streamForked(Stream<T> source, Consumer<Stream<T>>... consumers)
{
return StreamSupport.stream(new ForkingSpliterator<>(source, consumers), false).count();
}
private static class ForkingSpliterator<T>
extends AbstractSpliterator<T>
{
private Spliterator<T> sourceSpliterator;
private BlockingQueue<T> queue = new LinkedBlockingQueue<>();
private AtomicInteger nextToTake = new AtomicInteger(0);
private AtomicInteger processed = new AtomicInteger(0);
private boolean sourceDone;
private int consumerCount;
#SafeVarargs
private ForkingSpliterator(Stream<T> source, Consumer<Stream<T>>... consumers)
{
super(Long.MAX_VALUE, 0);
sourceSpliterator = source.spliterator();
consumerCount = consumers.length;
for (int i = 0; i < consumers.length; i++)
{
int index = i;
Consumer<Stream<T>> consumer = consumers[i];
new Thread(new Runnable()
{
#Override
public void run()
{
consumer.accept(StreamSupport.stream(new ForkedConsumer(index), false));
}
}).start();
}
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
sourceDone = !sourceSpliterator.tryAdvance(queue::offer);
return !sourceDone;
}
private class ForkedConsumer
extends AbstractSpliterator<T>
{
private int index;
private ForkedConsumer(int index)
{
super(Long.MAX_VALUE, 0);
this.index = index;
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
// take next element when it's our turn
while (!nextToTake.compareAndSet(index, index + 1))
{
}
T element;
while ((element = queue.peek()) == null)
{
if (sourceDone)
{
// element is null, and there won't be no more, so "terminate" this sub stream
return false;
}
}
// push to consumer pipeline
action.accept(element);
if (consumerCount == processed.incrementAndGet())
{
// start next round
queue.poll();
processed.set(0);
nextToTake.set(0);
}
return true;
}
}
}
With the approach used, the consumers work on each element in parallel, but wait for each other before starting on the next element.
Known issue
If one of the consumers is "shorter" than the others (e.g. because it calls limit()) it will also stop the other consumers and leave the threads hanging.
Example
public static void sleep(long millis)
{
try { Thread.sleep((long) (Math.random() * 30 + millis)); } catch (InterruptedException e) { }
}
streamForked(Stream.of("1", "2", "3", "4", "5"),
source -> source.map(word -> { sleep(50); return "fast " + word; }).forEach(System.out::println),
source -> source.map(word -> { sleep(300); return "slow " + word; }).forEach(System.out::println),
source -> source.map(word -> { sleep(50); return "2fast " + word; }).forEach(System.out::println));
fast 1
2fast 1
slow 1
fast 2
2fast 2
slow 2
2fast 3
fast 3
slow 3
fast 4
2fast 4
slow 4
2fast 5
fast 5
slow 5

Related

How do I collect the results of calling an async API sequentially?

I have an async API that essentially returns results through pagination
public CompletableFuture<Response> getNext(int startFrom);
Each Response object contains a list of offsets from startFrom and a flag indicating whether there are more elements remaining and, therefore, another getNext() request to make.
I'd like to write a method that goes through all the pages and retrieves all the offsets. I can write it in a synchronous manner like so
int startFrom = 0;
List<Integer> offsets = new ArrayList<>();
for (;;) {
CompletableFuture<Response> future = getNext(startFrom);
Response response = future.get(); // an exception stops everything
if (response.getOffsets().isEmpty()) {
break; // we're done
}
offsets.addAll(response.getOffsets());
if (!response.hasMore()) {
break; // we're done
}
startFrom = getLast(response.getOffsets());
}
In other words, we call getNext() with startFrom at 0. If an exception is thrown, we short-circuit the entire process. Otherwise, if there are no offsets, we complete. If there are offsets, we add them to the master list. If there are no more left to fetch, we complete. Otherwise, we reset the startFrom to the last offset we fetched and repeat.
Ideally, I want to do this without blocking with CompletableFuture::get() and returning a CompletableFuture<List<Integer>> containing all the offsets.
How can I do this? How can I compose the futures to collect their results?
I'm thinking of a "recursive" (not actually in execution, but in code)
private CompletableFuture<List<Integer>> recur(int startFrom, List<Integer> offsets) {
CompletableFuture<Response> future = getNext(startFrom);
return future.thenCompose((response) -> {
if (response.getOffsets().isEmpty()) {
return CompletableFuture.completedFuture(offsets);
}
offsets.addAll(response.getOffsets());
if (!response.hasMore()) {
return CompletableFuture.completedFuture(offsets);
}
return recur(getLast(response.getOffsets()), offsets);
});
}
public CompletableFuture<List<Integer>> getAll() {
List<Integer> offsets = new ArrayList<>();
return recur(0, offsets);
}
I don't love this, from a complexity point of view. Can we do better?
I also wanted to give a shot at EA Async on this one, as it implements Java support for async/await (inspired from C#). So I just took your initial code, and converted it:
public CompletableFuture<List<Integer>> getAllEaAsync() {
int startFrom = 0;
List<Integer> offsets = new ArrayList<>();
for (;;) {
// this is the only thing I changed!
Response response = Async.await(getNext(startFrom));
if (response.getOffsets().isEmpty()) {
break; // we're done
}
offsets.addAll(response.getOffsets());
if (!response.hasMore()) {
break; // we're done
}
startFrom = getLast(response.getOffsets());
}
// well, you also have to wrap your result in a future to make it compilable
return CompletableFuture.completedFuture(offsets);
}
You then have to instrument your code, for example by adding
Async.init();
at the beginning of your main() method.
I must say: this really looks like magic!
Behind the scenes, EA Async notices there is an Async.await() call within the method, and rewrites it to handle all the thenCompose()/thenApply()/recursion for you. The only requirement is that your method must return a CompletionStage or CompletableFuture.
That's really async code made easy!
For the exercise, I made a generic version of this algorithm, but it is rather complex because you need:
an initial value to call the service (the startFrom)
the service call itself (getNext())
a result container to accumulate the intermediate values (the offsets)
an accumulator (offsets.addAll(response.getOffsets()))
a condition to perform the "recursion" (response.hasMore())
a function to compute the next input (getLast(response.getOffsets()))
so this gives:
public <T, I, R> CompletableFuture<R> recur(T initialInput, R resultContainer,
Function<T, CompletableFuture<I>> service,
BiConsumer<R, I> accumulator,
Predicate<I> continueRecursion,
Function<I, T> nextInput) {
return service.apply(initialInput)
.thenCompose(response -> {
accumulator.accept(resultContainer, response);
if (continueRecursion.test(response)) {
return recur(nextInput.apply(response),
resultContainer, service, accumulator,
continueRecursion, nextInput);
} else {
return CompletableFuture.completedFuture(resultContainer);
}
});
}
public CompletableFuture<List<Integer>> getAll() {
return recur(0, new ArrayList<>(), this::getNext,
(list, response) -> list.addAll(response.getOffsets()),
Response::hasMore,
r -> getLast(r.getOffsets()));
}
A small simplification of recur() is possible by replacing initialInput by the CompletableFuture returned by the result of the first call, the resultContainer and the accumulator can be merged into a single Consumer and the service can then be merged with the nextInput function.
But this gives a little more complex getAll():
private <I> CompletableFuture<Void> recur(CompletableFuture<I> future,
Consumer<I> accumulator,
Predicate<I> continueRecursion,
Function<I, CompletableFuture<I>> service) {
return future.thenCompose(result -> {
accumulator.accept(result);
if (continueRecursion.test(result)) {
return recur(service.apply(result), accumulator, continueRecursion, service);
} else {
return CompletableFuture.completedFuture(null);
}
});
}
public CompletableFuture<List<Integer>> getAll() {
ArrayList<Integer> resultContainer = new ArrayList<>();
return recur(getNext(0),
result -> resultContainer.addAll(result.getOffsets()),
Response::hasMore,
r -> getNext(getLast(r.getOffsets())))
.thenApply(unused -> resultContainer);
}

Implications of weakly consistent ConcurrentSkipListSet

Using a ConcurrentSkipListSet I have observed some wired behaviour, that I suspect is caused by the weakly consistency of the concurrent set.
The JavaDoc has this to say on that topic:
Most concurrent Collection implementations (including most Queues)
also differ from the usual java.util conventions in that their
Iterators and Spliterators provide weakly consistent rather than
fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect
any modifications subsequent to construction.
This is the code that I use:
private final ConcurrentSkipListSet<TimedTask> sortedEvents;
public TimedUpdatableTaskList(){
Comparator<TimedTask> comparator =
(task1, task2) -> task1.getExecutionTime().compareTo(task2.getExecutionTime());
sortedEvents = new ConcurrentSkipListSet<>(comparator);
}
public void add(TimedTask task) {
log.trace("Add task {}", task);
sortedEvents.add(task);
}
public void handleClockTick(ClockTick event) {
LocalDateTime now = date.getCurrentDate();
logContent("Task list BEFORE daily processing ("+now+")");
for (Iterator<TimedTask> iterator = sortedEvents.iterator(); iterator.hasNext();) {
TimedTask task = iterator.next();
Preconditions.checkNotNull(task.getExecutionTime(),
"The exectution time of the task may not be null");
if (task.getExecutionTime().isBefore(now)) {
log.trace("BEFORE: Execute task {} scheduled for {} on {}",
task, task.getExecutionTime(), now);
try {
task.run();
iterator.remove();
} catch (Exception e) {
log.error("Failed to execute timed task", e);
}
log.trace("AFTER: Execute task {} scheduled for {} on {}",
task, task.getExecutionTime(), now);
}
if (task.getExecutionTime().isAfter(now)) {
break; // List is sorted
}
}
logContent("Task list AFTER daily processing");
}
private void logContent(String prefix) {
StringBuilder sb = new StringBuilder();
sortedEvents.stream().forEach(task ->sb.append(task).append(" "));
log.trace(prefix + ": "+sb.toString());
}
At occasion I can see log output like this:
2018-05-19 13:46:00,453 [pool-3-thread-1] TRACE ... - Add task AIRefitTask{ship=Mercurius, scheduled for: 1350-07-16T08:45}
2018-05-19 13:46:00,505 [pool-3-thread-5] TRACE ... - Task list BEFORE daily processing (1350-07-16T09:45): AIRefitTask{ship=Tidewalker, scheduled for: 1350-07-16T08:45} AIRepairTask{ship=Hackepeter, scheduled for: 1350-07-16T13:45} ch.sahits.game.openpatrician.engine.event.task.WeaponConstructionTask#680da167 ch.sahits.game.openpatrician.engine.player.DailyPlayerUpdater#6e22f1ba AIRepairTask{ship=St. Bonivatius, scheduled for: 1350-07-17T03:45} AIRepairTask{ship=Hackepeter, scheduled for: 1350-07-17T05:45} ch.sahits.game.openpatrician.engine.event.task.WeeklyLoanerCheckTask#47571ace
These are two almost consecutive log lines. Please note that they are executed on different threads. The TimedTask entry that is added is not listed in the second log line.
Am I correct in my assumption that this is due to the weakly consistency? If so, would this also imply that the iterator.next() retrieves a different entry than iterator.remove() deletes?
What I am observing, is that this added entry is never processed and does not show up in the concurrent set at any time.
What would be a good solution to avoid this? What comes to my mind, is create a copy of the set and iterate over that one, as it is acceptable, that entries can be processed in a future iteration, as long as they are processed. Looking at Weakly consistent iterator by ConcurrentHashMap suggests the iteration already happens on a copy of the set, so this might not change anything.
EDIT Sample implementation of a TimedTask:
class AIRefitTask extends TimedTask {
private static final Logger LOGGER = LogManager.getLogger(AIRefitTask.class);
private AsyncEventBus clientServerEventBus;
private ShipWeaponsLocationFactory shipWeaponLocationFactory;
private ShipService shipService;
private final IShip ship;
private final EShipUpgrade level;
private final IShipyard shipyard;
public AIRefitTask(LocalDateTime executionTime, IShip ship, EShipUpgrade upgrade, IShipyard shipyard) {
super();
setExecutionTime(executionTime);
LOGGER.debug("Add AIRefitTask for {} to be done at {}", ship.getName(), executionTime);
this.ship = ship;
this.level = upgrade;
this.shipyard = shipyard;
}
#Override
public void run() {
EShipUpgrade currentLevel = ship.getShipUpgradeLevel();
while (currentLevel != level) {
ship.upgrade();
List<IWeaponSlot> oldWeaponSlots = ship.getWeaponSlots();
List<IWeaponSlot> newWeaponSlots = shipWeaponLocationFactory.getShipWeaponsLocation(ship.getShipType(), level);
ship.setWeaponSlots(newWeaponSlots);
for (IWeaponSlot slot : oldWeaponSlots) {
if (slot.getWeapon().isPresent()) {
EWeapon weapon = (EWeapon) slot.getWeapon().get();
if (slot instanceof SecondaryLargeWeaponSlot) {
if (!shipService.isLargeWeapon(weapon)) { // ignore large weapons in secondary slots
shipService.placeWeapon(weapon, ship);
}
} else {
// Not secondary slot
shipService.placeWeapon(weapon, ship);
}
}
}
currentLevel = ship.getShipUpgradeLevel();
}
ship.setAvailable(true);
shipyard.removeCompletedUpgrade(ship);
LOGGER.debug("Refited ship {}", ship.getName());
clientServerEventBus.post(new RefitFinishedEvent(ship));
}
#Override
public String toString() {
return "AIRefitTask{ship="+ship.getUuid()+", scheduled for: "+getExecutionTime()+"}";
}
}
As #BenManes pointed out in his comment, the issue is with the Comparator used. When the result of the Comparator is 0, even through the two tasks are not equal, entries will be overridden. In effect, the Comparator should consider the same fields as hashCode and equals.
Use a Comparator implementation like this:
public int compare(TimedTask task1, TimedTask task2) {
int executionTimeBasedComparisonResult = task1.getExecutionTime().compareTo(task2.getExecutionTime());
if (executionTimeBasedComparisonResult == 0) { // two execution times are equal
return task1.getUuid().compareTo(task2.getUuid());
}
return executionTimeBasedComparisonResult;
}
With an implementation like this the comparison is based on the execution time and when both of them are the same (comparison is 0) ensure they are ordered based on their UUID.
For the use case the order of tasks with the same execution time is not relevant.

Finite generated Stream in Java - how to create one?

In Java, one can easily generate an infinite stream with Stream.generate(supplier). However, I would need to generate a stream that will eventually finish.
Imagine, for example, I want a stream of all files in a directory. The number of files can be huge, therefore I can not gather all the data upfront and create a stream from them (via collection.stream()). I need to generate the sequence piece by piece. But the stream will obviously finish at some point, and terminal operators like (collect() or findAny()) need to work on it, so Stream.generate(supplier) is not suitable here.
Is there any reasonable easy way to do this in Java, without implementing the entire Stream interface on my own?
I can think of a simple hack - doing it with infinite Stream.generate(supplier), and providing null or throwing an exception when all the actual values are taken. But it would break the standard stream operators, I could use it only with my own operators that are aware of this behaviour.
CLARIFICATION
People in the comments are proposing me takeWhile() operator. This is not what I meant. How to phrase the question better... I am not asking how to filter (or limit) an existing stream, I am asking how to create (generate) the stream - dynamically, without loading all the elements upfront, but the stream would have a finite size (unknown in advance).
SOLUTION
The code I was looking for is
Iterator it = myCustomIteratorThatGeneratesTheSequence();
StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.DISTINCT), false);
I just looked into java.nio.file.Files, how the list(path) method is implemented.
Is there any reasonable easy way to do this in Java, without implementing the entire Stream interface on my own?
A simple .limit() guarantees that it will terminate. But that's not always powerful enough.
After the Stream factory methods the simplest approach for creating customs stream sources without reimplementing the stream processing pipeline is subclassing java.util.Spliterators.AbstractSpliterator<T> and passing it to java.util.stream.StreamSupport.stream(Supplier<? extends Spliterator<T>>, int, boolean)
If you're intending to use parallel streams note that AbstractSpliterator only yields suboptimal splitting. If you have more control over your source fully implementing the Spliterator interface can better.
For example, the following snippet would create a Stream providing an infinite sequence 1,2,3...
in that particular example you could use IntStream.range()
But the stream will obviously finish at some point, and terminal operators like (collect() or findAny()) need to work on it.
short-circuiting operations like findAny() can actually finish on an infinite stream, as long as there is any element that matches.
Java 9 introduces Stream.iterate to generate finite streams for some simple cases.
Kotlin code to create Stream of JsonNode from InputStream
private fun InputStream.toJsonNodeStream(): Stream<JsonNode> {
return StreamSupport.stream(
Spliterators.spliteratorUnknownSize(this.toJsonNodeIterator(), Spliterator.ORDERED),
false
)
}
private fun InputStream.toJsonNodeIterator(): Iterator<JsonNode> {
val jsonParser = objectMapper.factory.createParser(this)
return object: Iterator<JsonNode> {
override fun hasNext(): Boolean {
var token = jsonParser.nextToken()
while (token != null) {
if (token == JsonToken.START_OBJECT) {
return true
}
token = jsonParser.nextToken()
}
return false
}
override fun next(): JsonNode {
return jsonParser.readValueAsTree()
}
}
}
Here is a stream which is custom and finite :
package org.tom.stream;
import java.util.*;
import java.util.function.*;
import java.util.stream.*;
public class GoldenStreams {
private static final String IDENTITY = "";
public static void main(String[] args) {
Stream<String> stream = java.util.stream.StreamSupport.stream(new Spliterator<String>() {
private static final int LIMIT = 25;
private int integer = Integer.MAX_VALUE;
{
integer = 0;
}
#Override
public int characteristics() {
return Spliterator.DISTINCT;
}
#Override
public long estimateSize() {
return LIMIT-integer;
}
#Override
public boolean tryAdvance(Consumer<? super String> arg0) {
arg0.accept(IDENTITY+integer++);
return integer < 25;
}
#Override
public Spliterator<String> trySplit() {
System.out.println("trySplit");
return null;
}}, false);
List<String> peeks = new ArrayList<String>();
List<String> reds = new ArrayList<String>();
stream.peek(data->{
peeks.add(data);
}).filter(data-> {
return Integer.parseInt(data)%2>0;
}).peek(data ->{
System.out.println("peekDeux:"+data);
}).reduce(IDENTITY,(accumulation,input)->{
reds.add(input);
String concat = accumulation + ( accumulation.isEmpty() ? IDENTITY : ":") + input;
System.out.println("reduce:"+concat);
return concat;
});
System.out.println("Peeks:"+peeks.toString());
System.out.println("Reduction:"+reds.toString());
}
}
While the author has discarded the takeWhile option, I find it adequate for certain use cases and worth an explanation.
The method takeWhile can be used on any stream and will terminate the stream when the predicate provided to the method returns false. The object which results in a false is not appended to the stream; only the objects which resulted in true are passed downstream.
So one method for generating a finite stream could be to use the Stream.generate method and return a value which signals the end of the stream by being evaluated to false by the predicate provided to takeWhile.
Here's an example, generating all the permutations of an array :
public static Stream<int[]> permutations(int[] original) {
int dim = original.length;
var permutation = original.clone();
int[] controller = new int[dim];
var low = new AtomicInteger(0);
var up = new AtomicInteger(1);
var permutationsStream = Stream.generate(() -> {
while (up.get() < dim) {
if (controller[up.get()] < up.get()) {
low.set(up.get() % 2 * controller[up.get()]);
var tmp = permutation[low.get()];
permutation[low.get()] = permutation[up.get()];
permutation[up.get()] = tmp;
controller[up.get()]++;
up.set(1);
return permutation.clone();
} else {
controller[up.get()] = 0;
up.incrementAndGet();
}
}
return null;
}).takeWhile(Objects::nonNull);
return Stream.concat(
Stream.ofNullable(original.clone()),
permutationsStream
);
}
In this example, I used the null value to signal the end of the stream.
The caller of the method won't receive the null value !
OP could use a similar strategy, and combine it with a visitor pattern.
If it's a flat directory, OP would be better off using Stream.iterate with the seed being the index of the file to yield and Stream.limit on the number of files (which can be known without browsing the directory).

Special behavior of a stream if there are no elements

How can I express this with java8 streaming-API?
I want to perform itemConsumer for every item of a stream. If there
are no items I want to perform emptyAction.
Of course I could write something like this:
Consumer<Object> itemConsumer = System.out::println;
Runnable emptyAction = () -> {System.out.println("no elements");};
Stream<Object> stream = Stream.of("a","b"); // or Stream.empty()
List<Object> list = stream.collect(Collectors.toList());
if (list.isEmpty())
emptyAction.run();
else
list.stream().forEach(itemConsumer);
But I would prefer to avoid any Lists.
I also thought about setting a flag in a peek method - but that flag would be non-final and therefore not allowed. Using a boolean container also seems to be too much of a workaround.
You could coerce reduce to do this. The logic would be to reduce on false, setting the value to true if any useful data is encountered.
The the result of the reduce is then false then no items have been encountered. If any items were encountered then the result would be true:
boolean hasItems = stream.reduce(false, (o, i) -> {
itemConsumer.accept(i);
return true;
}, (l, r) -> l | r);
if (!hasItems) {
emptyAction.run();
}
This should work fine for parallel streams, as any stream encountering an item would set the value to true.
I'm not sure, however, that I like this as it's a slightly obtuse use of the reduce operation.
An alternative would be to use AtomicBoolean as a mutable boolean container:
final AtomicBoolean hasItems = new AtomicBoolean(false);
stream.forEach(i -> {
itemConsumer.accept(i);
hasItems.set(true);
});
if (!hasItems.get()) {
emptyAction.run();
}
I don't know if I like that more or less however.
Finally, you could have your itemConsumer remember state:
class ItemConsumer implements Consumer<Object> {
private volatile boolean hasConsumedAny;
#Override
public void accept(Object o) {
hasConsumedAny = true;
//magic magic
}
public boolean isHasConsumedAny() {
return hasConsumedAny;
}
}
final ItemConsumer itemConsumer = new ItemConsumer();
stream.forEach(itemConsumer::accept);
if (!itemConsumer.isHasConsumedAny()) {
emptyAction.run();
}
This seems a bit neater, but might not be practical. So maybe a decorator pattern -
class ItemConsumer<T> implements Consumer<T> {
private volatile boolean hasConsumedAny;
private final Consumer<T> delegate;
ItemConsumer(final Consumer<T> delegate) {
this.delegate = delegate;
}
#Override
public void accept(T t) {
hasConsumedAny = true;
delegate.accept(t);
}
public boolean isHasConsumedAny() {
return hasConsumedAny;
}
}
final ItemConsumer<Object> consumer = new ItemConsumer<Object>(() -> /** magic **/);
TL;DR: something has to remember whether you encountered anything during the consumption of the Stream, be it:
the Stream itself in case of reduce;
AtomicBoolean; or
the consumer
I think the consumer is probably best placed, from a logic point of view.
A solution without any additional variables:
stream.peek(itemConsumer).reduce((a, b) -> a).orElseGet(() -> {
emptyAction.run();
return null;
});
Note that if the stream is parallel, then itemConsumer could be called simultaneously for different elements in different threads (like in forEach, not in forEachOrdered). Also this solution will fail if the first stream element is null.
There’s a simple straight-forward solution:
Spliterator<Object> sp=stream.spliterator();
if(!sp.tryAdvance(itemConsumer))
emptyAction.run();
else
sp.forEachRemaining(itemConsumer);
You can even keep parallel support for the elements after the first, if you wish:
Spliterator<Object> sp=stream.parallel().spliterator();
if(!sp.tryAdvance(itemConsumer))
emptyAction.run();
else
StreamSupport.stream(sp, true).forEach(itemConsumer);
In my opinion, it is much easier to understand as a reduce based solution.
You could do this:
if(stream.peek(itemConsumer).count() == 0){
emptyAction.run();
}
But it seems that count may be changed to skip the peek if it knows the size of the Stream in Java 9 (see here), so if you want it to work in the future you could use:
if(stream.peek(itemConsumer).mapToLong(e -> 1).sum() == 0){
emptyAction.run();
}
Another attempt to use reduce:
Stream<Object> stream = Stream.of("a","b","c");
//Stream<Object> stream = Stream.empty();
Runnable defaultRunnable = () -> System.out.println("empty Stream");
Consumer<Object> printConsumer = System.out::println;
Runnable runnable = stream.map(x -> toRunnable(x, printConsumer)).reduce((a, b) -> () -> {
a.run();
b.run();
}).orElse(defaultRunnable);
runnable.run(); // prints a, b, c (or empty stream when it is empty)
// for type inference
static <T> Runnable toRunnable(T t, Consumer<T> cons){
return ()->cons.accept(t);
}
This approach does not use peek() which according to Javadoc "mainly exists to support debugging"

How do I lazily concatenate streams?

I'm trying to implement a stream that uses another instance of itself in its implementation. The stream has a few constant elements prepended (with IntStream.concat) to it, so this should work as long as the concatenated stream creates the non-constant part lazily. I think using the StreamSupport.intStream overload taking a Supplier with IntStream.concat (which "creates a lazily concatenated stream") should be lazy enough to only create the second spliterator when elements are demanded from it, but even creating the stream (not evaluating it) overflows the stack. How can I lazily concatenate streams?
I'm attempting to port the streaming prime number sieve from this answer into Java. This sieve uses another instance of itself (ps = postponed_sieve() in the Python code). If I break the initial four constant elements (yield 2; yield 3; yield 5; yield 7;) into their own stream, it's easy to implement the generator as a spliterator:
/**
* based on https://stackoverflow.com/a/10733621/3614835
*/
static class PrimeSpliterator extends Spliterators.AbstractIntSpliterator {
private static final int CHARACTERISTICS = Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED | Spliterator.SORTED;
private final Map<Integer, Supplier<IntStream>> sieve = new HashMap<>();
private final PrimitiveIterator.OfInt postponedSieve = primes().iterator();
private int p, q, c = 9;
private Supplier<IntStream> s;
PrimeSpliterator() {
super(105097564 /* according to Wolfram Alpha */ - 4 /* in prefix */,
CHARACTERISTICS);
//p = next(ps) and next(ps) (that's Pythonic?)
postponedSieve.nextInt();
this.p = postponedSieve.nextInt();
this.q = p*p;
}
#Override
public boolean tryAdvance(IntConsumer action) {
for (; c > 0 /* overflow */; c += 2) {
Supplier<IntStream> maybeS = sieve.remove(c);
if (maybeS != null)
s = maybeS;
else if (c < q) {
action.accept(c);
return true; //continue
} else {
s = () -> IntStream.iterate(q+2*p, x -> x + 2*p);
p = postponedSieve.nextInt();
q = p*p;
}
int m = s.get().filter(x -> !sieve.containsKey(x)).findFirst().getAsInt();
sieve.put(m, s);
}
return false;
}
}
My first attempt at the primes() method returns an IntStream concatenating a constant stream with a new PrimeSpliterator:
public static IntStream primes() {
return IntStream.concat(IntStream.of(2, 3, 5, 7),
StreamSupport.intStream(new PrimeSpliterator()));
}
Calling primes() results in a StackOverflowError because primes() always instantiates a PrimeSpliterator, but PrimeSpliterator's field initializer always calls primes(). However, there's an overload of StreamSupport.intStream that takes a Supplier, which should allow lazily creating the PrimeSpliterator:
public static IntStream primes() {
return IntStream.concat(IntStream.of(2, 3, 5, 7),
StreamSupport.intStream(PrimeSpliterator::new, PrimeSpliterator.CHARACTERISTICS, false));
}
However, I instead get a StackOverflowError with a different backtrace (trimmed, as it repeats). Note that the recursion is entirely in the call to primes() -- the terminal operation iterator() is never invoked on a returned stream.
Exception in thread "main" java.lang.StackOverflowError
at java.util.stream.StreamSpliterators$DelegatingSpliterator$OfInt.<init>(StreamSpliterators.java:582)
at java.util.stream.IntPipeline.lazySpliterator(IntPipeline.java:155)
at java.util.stream.IntPipeline$Head.lazySpliterator(IntPipeline.java:514)
at java.util.stream.AbstractPipeline.spliterator(AbstractPipeline.java:352)
at java.util.stream.IntPipeline.spliterator(IntPipeline.java:181)
at java.util.stream.IntStream.concat(IntStream.java:851)
at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
at com.jeffreybosboom.projecteuler.util.Primes$PrimeSpliterator.<init>(Primes.java:32)
at com.jeffreybosboom.projecteuler.util.Primes$$Lambda$1/834600351.get(Unknown Source)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.get(StreamSpliterators.java:513)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.estimateSize(StreamSpliterators.java:536)
at java.util.stream.Streams$ConcatSpliterator.<init>(Streams.java:713)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:789)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:785)
at java.util.stream.Streams$ConcatSpliterator$OfInt.<init>(Streams.java:819)
at java.util.stream.IntStream.concat(IntStream.java:851)
at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
at com.jeffreybosboom.projecteuler.util.Primes$PrimeSpliterator.<init>(Primes.java:32)
at com.jeffreybosboom.projecteuler.util.Primes$$Lambda$1/834600351.get(Unknown Source)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.get(StreamSpliterators.java:513)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.estimateSize(StreamSpliterators.java:536)
at java.util.stream.Streams$ConcatSpliterator.<init>(Streams.java:713)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:789)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:785)
at java.util.stream.Streams$ConcatSpliterator$OfInt.<init>(Streams.java:819)
at java.util.stream.IntStream.concat(IntStream.java:851)
at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
How can I concatenate streams lazily enough to allow a stream to use another copy of itself in its implementation?
Your apparently assume that the Streams API extends its guarantees of laziness even to the instantiation of spliterators; this is not correct. It expects to be able to instantiate the stream's spliterator at any time before the actual consumption begins, for example just to find out the stream's characteristics and reported size. Consumption only begins by invoking trySplit, tryAdvance, or forEachRemaining.
Having that in mind, you are initializing the postponed sieve earlier than you need it. You don't get to use any of its results until the else if part in tryAdvance. So move the code to the last possible moment which gives correctness:
#Override
public boolean tryAdvance(IntConsumer action) {
for (; c > 0 /* overflow */; c += 2) {
Supplier<IntStream> maybeS = sieve.remove(c);
if (maybeS != null)
s = maybeS;
else {
if (postponedSieve == null) {
postponedSieve = primes().iterator();
postponedSieve.nextInt();
this.p = postponedSieve.nextInt();
this.q = p*p;
}
if (c < q) {
action.accept(c);
return true; //continue
I think that, with this change, even your first attempt at primes() should work.
If you want to stay with your current approach, you could involve the following idiom:
Stream.<Supplier<IntStream>>of(
()->IntStream.of(2, 3, 5, 7),
()->intStream(new PrimeSpliterator()))
.flatMap(Supplier::get);
You may find that this gives you as much laziness as you need.
I like to use Supplier to do that:
return Stream.<Supplier<Stream<WhatEver>>of(
() -> generateStreamOfWhatEverAndChangeSomeState(input, state),
() -> generateStreamOfMoreWhatEversDependendingOnMutatedState(state)
).flatMap(Supplier::get);
Since stream is lazily evaluated the generateStreamOfWhatEverAndChangeSomeState() will finish before generateStreamOfMoreWhatEversDependendingOnMutatedState() start and the state would be updated.
I should note that this is probably not what the designers of Stream had in mind. Idealy a Stream should not change state, only read each item and produce a new item.

Categories

Resources