How do I lazily concatenate streams?

How do I lazily concatenate streams? - java

I'm trying to implement a stream that uses another instance of itself in its implementation. The stream has a few constant elements prepended (with IntStream.concat) to it, so this should work as long as the concatenated stream creates the non-constant part lazily. I think using the StreamSupport.intStream overload taking a Supplier with IntStream.concat (which "creates a lazily concatenated stream") should be lazy enough to only create the second spliterator when elements are demanded from it, but even creating the stream (not evaluating it) overflows the stack. How can I lazily concatenate streams?
I'm attempting to port the streaming prime number sieve from this answer into Java. This sieve uses another instance of itself (ps = postponed_sieve() in the Python code). If I break the initial four constant elements (yield 2; yield 3; yield 5; yield 7;) into their own stream, it's easy to implement the generator as a spliterator:
/**
* based on https://stackoverflow.com/a/10733621/3614835
*/
static class PrimeSpliterator extends Spliterators.AbstractIntSpliterator {
private static final int CHARACTERISTICS = Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED | Spliterator.SORTED;
private final Map<Integer, Supplier<IntStream>> sieve = new HashMap<>();
private final PrimitiveIterator.OfInt postponedSieve = primes().iterator();
private int p, q, c = 9;
private Supplier<IntStream> s;
PrimeSpliterator() {
super(105097564 /* according to Wolfram Alpha */ - 4 /* in prefix */,
CHARACTERISTICS);
//p = next(ps) and next(ps) (that's Pythonic?)
postponedSieve.nextInt();
this.p = postponedSieve.nextInt();
this.q = p*p;
}
#Override
public boolean tryAdvance(IntConsumer action) {
for (; c > 0 /* overflow */; c += 2) {
Supplier<IntStream> maybeS = sieve.remove(c);
if (maybeS != null)
s = maybeS;
else if (c < q) {
action.accept(c);
return true; //continue
} else {
s = () -> IntStream.iterate(q+2*p, x -> x + 2*p);
p = postponedSieve.nextInt();
q = p*p;
}
int m = s.get().filter(x -> !sieve.containsKey(x)).findFirst().getAsInt();
sieve.put(m, s);
}
return false;
}
}
My first attempt at the primes() method returns an IntStream concatenating a constant stream with a new PrimeSpliterator:
public static IntStream primes() {
return IntStream.concat(IntStream.of(2, 3, 5, 7),
StreamSupport.intStream(new PrimeSpliterator()));
}
Calling primes() results in a StackOverflowError because primes() always instantiates a PrimeSpliterator, but PrimeSpliterator's field initializer always calls primes(). However, there's an overload of StreamSupport.intStream that takes a Supplier, which should allow lazily creating the PrimeSpliterator:
public static IntStream primes() {
return IntStream.concat(IntStream.of(2, 3, 5, 7),
StreamSupport.intStream(PrimeSpliterator::new, PrimeSpliterator.CHARACTERISTICS, false));
}
However, I instead get a StackOverflowError with a different backtrace (trimmed, as it repeats). Note that the recursion is entirely in the call to primes() -- the terminal operation iterator() is never invoked on a returned stream.
Exception in thread "main" java.lang.StackOverflowError
at java.util.stream.StreamSpliterators$DelegatingSpliterator$OfInt.<init>(StreamSpliterators.java:582)
at java.util.stream.IntPipeline.lazySpliterator(IntPipeline.java:155)
at java.util.stream.IntPipeline$Head.lazySpliterator(IntPipeline.java:514)
at java.util.stream.AbstractPipeline.spliterator(AbstractPipeline.java:352)
at java.util.stream.IntPipeline.spliterator(IntPipeline.java:181)
at java.util.stream.IntStream.concat(IntStream.java:851)
at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
at com.jeffreybosboom.projecteuler.util.Primes$PrimeSpliterator.<init>(Primes.java:32)
at com.jeffreybosboom.projecteuler.util.Primes$$Lambda$1/834600351.get(Unknown Source)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.get(StreamSpliterators.java:513)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.estimateSize(StreamSpliterators.java:536)
at java.util.stream.Streams$ConcatSpliterator.<init>(Streams.java:713)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:789)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:785)
at java.util.stream.Streams$ConcatSpliterator$OfInt.<init>(Streams.java:819)
at java.util.stream.IntStream.concat(IntStream.java:851)
at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
at com.jeffreybosboom.projecteuler.util.Primes$PrimeSpliterator.<init>(Primes.java:32)
at com.jeffreybosboom.projecteuler.util.Primes$$Lambda$1/834600351.get(Unknown Source)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.get(StreamSpliterators.java:513)
at java.util.stream.StreamSpliterators$DelegatingSpliterator.estimateSize(StreamSpliterators.java:536)
at java.util.stream.Streams$ConcatSpliterator.<init>(Streams.java:713)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:789)
at java.util.stream.Streams$ConcatSpliterator$OfPrimitive.<init>(Streams.java:785)
at java.util.stream.Streams$ConcatSpliterator$OfInt.<init>(Streams.java:819)
at java.util.stream.IntStream.concat(IntStream.java:851)
at com.jeffreybosboom.projecteuler.util.Primes.primes(Primes.java:22)
How can I concatenate streams lazily enough to allow a stream to use another copy of itself in its implementation?

Your apparently assume that the Streams API extends its guarantees of laziness even to the instantiation of spliterators; this is not correct. It expects to be able to instantiate the stream's spliterator at any time before the actual consumption begins, for example just to find out the stream's characteristics and reported size. Consumption only begins by invoking trySplit, tryAdvance, or forEachRemaining.
Having that in mind, you are initializing the postponed sieve earlier than you need it. You don't get to use any of its results until the else if part in tryAdvance. So move the code to the last possible moment which gives correctness:
#Override
public boolean tryAdvance(IntConsumer action) {
for (; c > 0 /* overflow */; c += 2) {
Supplier<IntStream> maybeS = sieve.remove(c);
if (maybeS != null)
s = maybeS;
else {
if (postponedSieve == null) {
postponedSieve = primes().iterator();
postponedSieve.nextInt();
this.p = postponedSieve.nextInt();
this.q = p*p;
}
if (c < q) {
action.accept(c);
return true; //continue
I think that, with this change, even your first attempt at primes() should work.
If you want to stay with your current approach, you could involve the following idiom:
Stream.<Supplier<IntStream>>of(
()->IntStream.of(2, 3, 5, 7),
()->intStream(new PrimeSpliterator()))
.flatMap(Supplier::get);
You may find that this gives you as much laziness as you need.

I like to use Supplier to do that:
return Stream.<Supplier<Stream<WhatEver>>of(
() -> generateStreamOfWhatEverAndChangeSomeState(input, state),
() -> generateStreamOfMoreWhatEversDependendingOnMutatedState(state)
).flatMap(Supplier::get);
Since stream is lazily evaluated the generateStreamOfWhatEverAndChangeSomeState() will finish before generateStreamOfMoreWhatEversDependendingOnMutatedState() start and the state would be updated.
I should note that this is probably not what the designers of Stream had in mind. Idealy a Stream should not change state, only read each item and produce a new item.

Related

Why does the CompletableFuture allOf method do a binary search?

I wanted to know if the allOf method of CompletableFuture does polling or goes into a wait state till all the CompletableFutures passed into the method complete their execution.
I looked at the code of the allOf method in IntelliJ and it is doing some sort of binary search.
Please help me to find out what the allOf method of CompletableFuture actually does.
public static CompletableFuture<Void> allOf(CompletableFuture<?>... cfs) {
return andTree(cfs, 0, cfs.length - 1);
}
/** Recursively constructs a tree of completions. */
static CompletableFuture<Void> andTree(CompletableFuture<?>[] cfs, int lo, int hi) {
CompletableFuture<Void> d = new CompletableFuture<Void>();
if (lo > hi) // empty
d.result = NIL;
else {
CompletableFuture<?> a, b;
int mid = (lo + hi) >>> 1;
if ((a = (lo == mid ? cfs[lo] :
andTree(cfs, lo, mid))) == null ||
(b = (lo == hi ? a : (hi == mid+1) ? cfs[hi] :
andTree(cfs, mid+1, hi))) == null)
throw new NullPointerException();
if (!d.biRelay(a, b)) {
BiRelay<?,?> c = new BiRelay<>(d, a, b);
a.bipush(b, c);
c.tryFire(SYNC);
}
}
return d;
}
/** Pushes completion to this and b unless both done. */
final void bipush(CompletableFuture<?> b, BiCompletion<?,?,?> c) {
if (c != null) {
Object r;
while ((r = result) == null && !tryPushStack(c))
lazySetNext(c, null); // clear on failure
if (b != null && b != this && b.result == null) {
Completion q = (r != null) ? c : new CoCompletion(c);
while (b.result == null && !b.tryPushStack(q))
lazySetNext(q, null); // clear on failure
}
}
}
final CompletableFuture<V> tryFire(int mode) {
CompletableFuture<V> d;
CompletableFuture<T> a;
CompletableFuture<U> b;
if ((d = dep) == null ||
!d.orApply(a = src, b = snd, fn, mode > 0 ? null : this))
return null;
dep = null; src = null; snd = null; fn = null;
return d.postFire(a, b, mode);
}

It doesn't do a binary search -- it's building a balanced binary tree with the input futures at the leaves, and inner nodes that each complete when its two children have both completed.
For some reason that is not apparent from the code, the author of the code must have decided it was most efficient to consider allOf(_,_) between exactly two futures to be his primitive operation, and if he's asked for an allOf(...) between more than two futures, he's manufacturing it as a cascade of these binary primitives.
The tree should be balanced for such that no matter what the last future to complete is, there will only be a small number of levels left to collapse before the future at the top can complete. This improves performance in some situations, because it ensures that as much work as possible can be handled before we're completely done, at a point where (if we're lucky) the CPU might just be sitting idle, waiting for something asynchronous to complete.
Balancing the tree is done by having the topmost inner node have about as many leaves under its left child as under its right child -- so both children get about half of the original array, and then the code recursively builds a tree from each half of the array. Splitting in halves can look a bit like the index calculations for a binary search.
The basic structure is obscured slightly by special cases that appear to be designed to
use an optimized code path with fewer allocations when some of the original futures are already completed, and
make sure that the result of allOf(_) with exactly one element will return a fresh CompleteableFuture. For most purposes it would work to just return that single element, but the author must have wanted to ensure that users of the library can rely on the object being fresh, if they are using them as keys in hash maps, or other logic that depends on being able to tell the output from the inputs, and
have only one throw new NullPointerException(); by using ?: and inline assignments instead of honest if statements. This probably produces slightly smaller bytecode at the expense of readability. Cannot be recommended as a style to learn from, unless you personally pay for the storage cost of the resulting bytecode ...

Split Java stream into two lazy streams without terminal operation

I understand that in general Java streams do not split. However, we have an involved and lengthy pipeline, at the end of which we have two different types of processing that share the first part of the pipeline.
Due to the size of the data, storing the intermediate stream product is not a viable solution. Neither is running the pipeline twice.
Basically, what we are looking for is a solution that is an operation on a stream that yields two (or more) streams that are lazily filled and able to be consumed in parallel. By that, I mean that if stream A is split into streams B and C, when streams B and C consume 10 elements, stream A consumes and provides those 10 elements, but if stream B then tries to consume more elements, it blocks until stream C also consumes them.
Is there any pre-made solution for this problem or any library we can look at? If not, where would we start to look if we want to implement this ourselves? Or is there a compelling reason not to implemented at all?

I don't know about functionality that would fulfill your blocking requirement, but you might be interested in jOOλ's Seq.duplicate() method:
Stream<T> streamA = Stream.of(/* your data here */);
Tuple2<Seq<T>, Seq<T>> streamTuple = Seq.seq(streamA).duplicate();
Stream<T> streamB = streamTuple.v1();
Stream<T> streamC = streamTuple.v2();
The Streams can be consumed absolutely independently (including consumption in parallel) thanks to the SeqBuffer class that's used internally by this method.
Note that:
SeqBuffer will cache even the elements that are no longer needed because they have already been consumed by both streamB and streamC (so if you cannot afford to keep them in memory, it's not a solution for you);
as I mentioned at the beginning, streamB and streamC will not block one another.
Disclaimer: I am the author of the SeqBuffer class.

You can implement a custom Spliterator in order to achieve such behavior. We will split your streams into the common "source" and the different "consumers". The custom spliterator then forwards the elements from the source to each consumer. For this purpose, we will use a BlockingQueue (see this question).
Note that the difficult part here is not the spliterator/stream, but the syncing of the consumers around the queue, as the comments on your question already indicate. Still, however you implement the syncing, Spliterator helps to use streams with it.
#SafeVarargs
public static <T> long streamForked(Stream<T> source, Consumer<Stream<T>>... consumers)
{
return StreamSupport.stream(new ForkingSpliterator<>(source, consumers), false).count();
}
private static class ForkingSpliterator<T>
extends AbstractSpliterator<T>
{
private Spliterator<T> sourceSpliterator;
private BlockingQueue<T> queue = new LinkedBlockingQueue<>();
private AtomicInteger nextToTake = new AtomicInteger(0);
private AtomicInteger processed = new AtomicInteger(0);
private boolean sourceDone;
private int consumerCount;
#SafeVarargs
private ForkingSpliterator(Stream<T> source, Consumer<Stream<T>>... consumers)
{
super(Long.MAX_VALUE, 0);
sourceSpliterator = source.spliterator();
consumerCount = consumers.length;
for (int i = 0; i < consumers.length; i++)
{
int index = i;
Consumer<Stream<T>> consumer = consumers[i];
new Thread(new Runnable()
{
#Override
public void run()
{
consumer.accept(StreamSupport.stream(new ForkedConsumer(index), false));
}
}).start();
}
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
sourceDone = !sourceSpliterator.tryAdvance(queue::offer);
return !sourceDone;
}
private class ForkedConsumer
extends AbstractSpliterator<T>
{
private int index;
private ForkedConsumer(int index)
{
super(Long.MAX_VALUE, 0);
this.index = index;
}
#Override
public boolean tryAdvance(Consumer<? super T> action)
{
// take next element when it's our turn
while (!nextToTake.compareAndSet(index, index + 1))
{
}
T element;
while ((element = queue.peek()) == null)
{
if (sourceDone)
{
// element is null, and there won't be no more, so "terminate" this sub stream
return false;
}
}
// push to consumer pipeline
action.accept(element);
if (consumerCount == processed.incrementAndGet())
{
// start next round
queue.poll();
processed.set(0);
nextToTake.set(0);
}
return true;
}
}
}
With the approach used, the consumers work on each element in parallel, but wait for each other before starting on the next element.
Known issue
If one of the consumers is "shorter" than the others (e.g. because it calls limit()) it will also stop the other consumers and leave the threads hanging.
Example
public static void sleep(long millis)
{
try { Thread.sleep((long) (Math.random() * 30 + millis)); } catch (InterruptedException e) { }
}
streamForked(Stream.of("1", "2", "3", "4", "5"),
source -> source.map(word -> { sleep(50); return "fast " + word; }).forEach(System.out::println),
source -> source.map(word -> { sleep(300); return "slow " + word; }).forEach(System.out::println),
source -> source.map(word -> { sleep(50); return "2fast " + word; }).forEach(System.out::println));
fast 1
2fast 1
slow 1
fast 2
2fast 2
slow 2
2fast 3
fast 3
slow 3
fast 4
2fast 4
slow 4
2fast 5
fast 5
slow 5

Can you have collections without storing the values in Java?

I have a question about java collections such as Set or List. More generally objects that you can use in a for-each loop. Is there any requirement that the elements of them actually has to be stored somewhere in a data structure or can they be described only from some sort of requirement and calculated on the fly when you need them? It feels like this should be possible to be done, but I don't see any of the java standard collection classes doing anything like this. Am I breaking any sort of contract here?
The thing I'm thinking about using these for is mainly mathematics. Say for example I want to have a set representing all prime numbers under 1 000 000. It might not be a good idea to save these in memory but to instead have a method check if a particular number is in the collection or not.
I'm also not at all an expert at java streams, but I feel like these should be usable in java 8 streams since the objects have very minimal state (the objects in the collection doesn't even exist until you try to iterate over them or check if a particular object exists in the collection).
Is it possible to have Collections or Iterators with virtually infinitely many elements, for example "all numbers on form 6*k+1", "All primes above 10" or "All Vectors spanned by this basis"? One other thing I'm thinking about is combining two sets like the union of all primes below 1 000 000 and all integers on form 2^n-1 and list the mersenne primes below 1 000 000. I feel like it would be easier to reason about certain mathematical objects if it was done this way and the elements weren't created explicitly until they are actually needed. Maybe I'm wrong.
Here's two mockup classes I wrote to try to illustrate what I want to do. They don't act exactly as I would expect (see output) which make me think I am breaking some kind of contract here with the iterable interface or implementing it wrong. Feel free to point out what I'm doing wrong here if you see it or if this kind of code is even allowed under the collections framework.
import java.util.AbstractSet;
import java.util.Iterator;
public class PrimesBelow extends AbstractSet<Integer>{
int max;
int size;
public PrimesBelow(int max) {
this.max = max;
}
#Override
public Iterator<Integer> iterator() {
return new SetIterator<Integer>(this);
}
#Override
public int size() {
if(this.size == -1){
System.out.println("Calculating size");
size = calculateSize();
}else{
System.out.println("Accessing calculated size");
}
return size;
}
private int calculateSize() {
int c = 0;
for(Integer p: this)
c++;
return c;
}
public static void main(String[] args){
PrimesBelow primesBelow10 = new PrimesBelow(10);
for(int i: primesBelow10)
System.out.println(i);
System.out.println(primesBelow10);
}
}
.
import java.util.Iterator;
import java.util.NoSuchElementException;
public class SetIterator<T> implements Iterator<Integer> {
int max;
int current;
public SetIterator(PrimesBelow pb) {
this.max= pb.max;
current = 1;
}
#Override
public boolean hasNext() {
if(current < max) return true;
else return false;
}
#Override
public Integer next() {
while(hasNext()){
current++;
if(isPrime(current)){
System.out.println("returning "+current);
return current;
}
}
throw new NoSuchElementException();
}
private boolean isPrime(int a) {
if(a<2) return false;
for(int i = 2; i < a; i++) if((a%i)==0) return false;
return true;
}
}
Main function gives the output
returning 2
2
returning 3
3
returning 5
5
returning 7
7
Exception in thread "main" java.util.NoSuchElementException
at SetIterator.next(SetIterator.java:27)
at SetIterator.next(SetIterator.java:1)
at PrimesBelow.main(PrimesBelow.java:38)
edit: spotted an error in the next() method. Corrected it and changed the output to the new one.

Well, as you see with your (now fixed) example, you can easily do it with Iterables/Iterators. Instead of having a backing collection, the example would've been nicer with just an Iterable that takes the max number you wish to calculate primes to. You just need to make sure that you handle the hasNext() method properly so you don't have to throw an exception unnecessarily from next().
Java 8 streams can be used easier to perform these kinds of things nowadays, but there's no reason you can't have a "virtual collection" that's just an Iterable. If you start implementing Collection it becomes harder, but even then it wouldn't be completely impossible, depending on the use cases: e.g. you could implement contains() that checks for primes, but you'd have to calculate it and it would be slow for large numbers.
A (somewhat convoluted) example of a semi-infinite set of odd numbers that is immutable and stores no values.
public class OddSet implements Set<Integer> {
public boolean contains(Integer o) {
return o % 2 == 1;
}
public int size() {
return Integer.MAX_VALUE;
}
public boolean add(Integer i) {
throw new OperationNotSupportedException();
}
public boolean equals(Object o) {
return o instanceof OddSet;
}
// etc. etc.
}

As DwB stated, this is not possible to do with Java's Collections API, as every element must be stored in memory. However, there is an alternative: this is precisely why Java's Stream API was implemented!
Streams allow you to iterate across an infinite amount of objects that are not stored in memory unless you explicitly collect them into a Collection.
From the documentation of IntStream#iterate:
Returns an infinite sequential ordered IntStream produced by iterative application of a function f to an initial element seed, producing a Stream consisting of seed, f(seed), f(f(seed)), etc.
The first element (position 0) in the IntStream will be the provided seed. For n > 0, the element at position n, will be the result of applying the function f to the element at position n - 1.
Here are some examples that you proposed in your question:
public class Test {
public static void main(String[] args) {
IntStream.iterate(1, k -> 6 * k + 1);
IntStream.iterate(10, i -> i + 1).filter(Test::isPrime);
IntStream.iterate(1, n -> 2 * n - 1).filter(i -> i < 1_000_000);
}
private boolean isPrime(int a) {
if (a < 2) {
return false;
}
for(int i = 2; i < a; i++) {
if ((a % i) == 0) {
return false;
}
return true;
}
}
}

Finite generated Stream in Java - how to create one?

In Java, one can easily generate an infinite stream with Stream.generate(supplier). However, I would need to generate a stream that will eventually finish.
Imagine, for example, I want a stream of all files in a directory. The number of files can be huge, therefore I can not gather all the data upfront and create a stream from them (via collection.stream()). I need to generate the sequence piece by piece. But the stream will obviously finish at some point, and terminal operators like (collect() or findAny()) need to work on it, so Stream.generate(supplier) is not suitable here.
Is there any reasonable easy way to do this in Java, without implementing the entire Stream interface on my own?
I can think of a simple hack - doing it with infinite Stream.generate(supplier), and providing null or throwing an exception when all the actual values are taken. But it would break the standard stream operators, I could use it only with my own operators that are aware of this behaviour.
CLARIFICATION
People in the comments are proposing me takeWhile() operator. This is not what I meant. How to phrase the question better... I am not asking how to filter (or limit) an existing stream, I am asking how to create (generate) the stream - dynamically, without loading all the elements upfront, but the stream would have a finite size (unknown in advance).
SOLUTION
The code I was looking for is
Iterator it = myCustomIteratorThatGeneratesTheSequence();
StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.DISTINCT), false);
I just looked into java.nio.file.Files, how the list(path) method is implemented.

Is there any reasonable easy way to do this in Java, without implementing the entire Stream interface on my own?
A simple .limit() guarantees that it will terminate. But that's not always powerful enough.
After the Stream factory methods the simplest approach for creating customs stream sources without reimplementing the stream processing pipeline is subclassing java.util.Spliterators.AbstractSpliterator<T> and passing it to java.util.stream.StreamSupport.stream(Supplier<? extends Spliterator<T>>, int, boolean)
If you're intending to use parallel streams note that AbstractSpliterator only yields suboptimal splitting. If you have more control over your source fully implementing the Spliterator interface can better.
For example, the following snippet would create a Stream providing an infinite sequence 1,2,3...
in that particular example you could use IntStream.range()
But the stream will obviously finish at some point, and terminal operators like (collect() or findAny()) need to work on it.
short-circuiting operations like findAny() can actually finish on an infinite stream, as long as there is any element that matches.
Java 9 introduces Stream.iterate to generate finite streams for some simple cases.

Kotlin code to create Stream of JsonNode from InputStream
private fun InputStream.toJsonNodeStream(): Stream<JsonNode> {
return StreamSupport.stream(
Spliterators.spliteratorUnknownSize(this.toJsonNodeIterator(), Spliterator.ORDERED),
false
)
}
private fun InputStream.toJsonNodeIterator(): Iterator<JsonNode> {
val jsonParser = objectMapper.factory.createParser(this)
return object: Iterator<JsonNode> {
override fun hasNext(): Boolean {
var token = jsonParser.nextToken()
while (token != null) {
if (token == JsonToken.START_OBJECT) {
return true
}
token = jsonParser.nextToken()
}
return false
}
override fun next(): JsonNode {
return jsonParser.readValueAsTree()
}
}
}

Here is a stream which is custom and finite :
package org.tom.stream;
import java.util.*;
import java.util.function.*;
import java.util.stream.*;
public class GoldenStreams {
private static final String IDENTITY = "";
public static void main(String[] args) {
Stream<String> stream = java.util.stream.StreamSupport.stream(new Spliterator<String>() {
private static final int LIMIT = 25;
private int integer = Integer.MAX_VALUE;
{
integer = 0;
}
#Override
public int characteristics() {
return Spliterator.DISTINCT;
}
#Override
public long estimateSize() {
return LIMIT-integer;
}
#Override
public boolean tryAdvance(Consumer<? super String> arg0) {
arg0.accept(IDENTITY+integer++);
return integer < 25;
}
#Override
public Spliterator<String> trySplit() {
System.out.println("trySplit");
return null;
}}, false);
List<String> peeks = new ArrayList<String>();
List<String> reds = new ArrayList<String>();
stream.peek(data->{
peeks.add(data);
}).filter(data-> {
return Integer.parseInt(data)%2>0;
}).peek(data ->{
System.out.println("peekDeux:"+data);
}).reduce(IDENTITY,(accumulation,input)->{
reds.add(input);
String concat = accumulation + ( accumulation.isEmpty() ? IDENTITY : ":") + input;
System.out.println("reduce:"+concat);
return concat;
});
System.out.println("Peeks:"+peeks.toString());
System.out.println("Reduction:"+reds.toString());
}
}

While the author has discarded the takeWhile option, I find it adequate for certain use cases and worth an explanation.
The method takeWhile can be used on any stream and will terminate the stream when the predicate provided to the method returns false. The object which results in a false is not appended to the stream; only the objects which resulted in true are passed downstream.
So one method for generating a finite stream could be to use the Stream.generate method and return a value which signals the end of the stream by being evaluated to false by the predicate provided to takeWhile.
Here's an example, generating all the permutations of an array :
public static Stream<int[]> permutations(int[] original) {
int dim = original.length;
var permutation = original.clone();
int[] controller = new int[dim];
var low = new AtomicInteger(0);
var up = new AtomicInteger(1);
var permutationsStream = Stream.generate(() -> {
while (up.get() < dim) {
if (controller[up.get()] < up.get()) {
low.set(up.get() % 2 * controller[up.get()]);
var tmp = permutation[low.get()];
permutation[low.get()] = permutation[up.get()];
permutation[up.get()] = tmp;
controller[up.get()]++;
up.set(1);
return permutation.clone();
} else {
controller[up.get()] = 0;
up.incrementAndGet();
}
}
return null;
}).takeWhile(Objects::nonNull);
return Stream.concat(
Stream.ofNullable(original.clone()),
permutationsStream
);
}
In this example, I used the null value to signal the end of the stream.
The caller of the method won't receive the null value !
OP could use a similar strategy, and combine it with a visitor pattern.
If it's a flat directory, OP would be better off using Stream.iterate with the seed being the index of the file to yield and Stream.limit on the number of files (which can be known without browsing the directory).

Atomic compareAndSet but with callback?

I know that AtomicReference has compareAndSet, but I feel like what I want to do is this
private final AtomicReference<Boolean> initialized = new AtomicReference<>( false );
...
atomicRef.compareSetAndDo( false, true, () -> {
// stuff that only happens if false
});
this would probably work too, might be better.
atomicRef.compareAndSet( false, () -> {
// stuff that only happens if false
// if I die still false.
return true;
});
I've noticed there's some new functional constructs but I'm not sure if any of them are what I'm looking for.
Can any of the new constructs do this? if so please provide an example.
update
To attempt to simplify my problem, I'm trying to find a less error prone way to guard code in a "do once for object" or (really) lazy initializer fashion, and I know that some developers on my team find compareAndSet confusing.

guard code in a "do once for object"
how exactly to implement that depends on what you want other threads attempting to execute the same thing in the meantime. if you just let them run past the CAS they may observe things in an intermediate state while the one thread that succeeded does its action.
or (really) lazy initializer fashion
that construct is not thread-safe if you're using it for lazy initializers because the "is initialized" boolean may be set to true by one thread and then execute the block while another thread observes the true-state but reads an empty result.
You can use Atomicreference::updateAndGet if multiple concurrent/repeated initialization attempts are acceptable with one object winning in the end and the others being discarded by GC. The update method should be side-effect-free.
Otherwise you should just use the double checked locking pattern with a variable reference field.
Of course you can always package any of these into a higher order function that returns a Runnable or Supplier which you then assign to a final field.
// == FunctionalUtils.java
/** #param mayRunMultipleTimes must be side-effect-free */
public static <T> Supplier<T> instantiateOne(Supplier<T> mayRunMultipleTimes) {
AtomicReference<T> ref = new AtomicReference<>(null);
return () -> {
T val = ref.get(); // fast-path if already initialized
if(val != null)
return val;
return ref.updateAndGet(v -> v == null ? mayRunMultipleTimes.get() : v)
};
}
// == ClassWithLazyField.java
private final Supplier<Foo> lazyInstanceVal = FunctionalUtils.instantiateOne(() -> new Foo());
public Foo getFoo() {
lazyInstanceVal.get();
}
You can easily encapsulate various custom control-flow and locking patterns this way. Here are two of my own..

compareAndSet returns true if the update was done, and false if the actual value was not equal to the expected value.
So just use
if (ref.compareAndSet(expectedValue, newValue)) {
...
}
That said, I don't really understand your examples, since you're passing true and false to a method taking object references as argument. And your second example doesn't do the same thing as the first one. If the second is what you want, I think what you're after is
ref.getAndUpdate(value -> {
if (value.equals(expectedValue)) {
return someNewValue(value);
}
else {
return value;
}
});

You’re over-complicating things. Just because there are now lambda expression, you don’t need to solve everything with lambdas:
private volatile boolean initialized;
…
if(!initialized) synchronized(this) {
if(!initialized) {
// stuff to be done exactly once
initialized=true;
}
}
The double checked locking might not have a good reputation, but for non-static properties, there are little alternatives.
If you consider multiple threads accessing it concurrently in the uninitialized state and want a guaranty that the action runs only once, and that it has completed, before dependent code is executed, an Atomic… object won’t help you.
There’s only one thread that can successfully perform compareAndSet(false,true), but since failure implies that the flag already has the new value, i.e. is initialized, all other threads will proceed as if the “stuff to be done exactly once” has been done while it might still be running. The alternative would be reading the flag first and conditionally perform the stuff and compareAndSet afterwards, but that allows multiple concurrent executions of “stuff”. This is also what happens with updateAndGet or accumulateAndGet and it’s provided function.
To guaranty exactly one execution before proceeding, threads must get blocked, if the “stuff” is currently executed. The code above does this. Note that once the “stuff” has been done, there will be no locking anymore and the performance characteristics of the volatile read are the same as for the Atomic… read.
The only solution which is simpler in programming, is to use a ConcurrentMap:
private final ConcurrentHashMap<String,Boolean> initialized=new ConcurrentHashMap<>();
…
initialized.computeIfAbsent("dummy", ignore -> {
// stuff to do exactly once
return true;
});
It might look a bit oversized, but it provides exactly the required performance characteristics. It will guard the initial computation using synchronized (or well, an implementation dependent exclusion mechanism) but perform a single read with volatile semantics on subsequent queries.
If you want a more lightweight solution, you may stay with the double checked locking shown at the beginning of this answer…

I know this is old, but I've found there is no perfect way to achieve this, more specifically this:
trying to find a less error prone way to guard code in a "do (anything) once..."
I'll add to this "while respecting a happens before behavior." which is required for instantiating singletons in your case.
IMO The best way to achieve this is by means of a synchronized function:
public<T> T transaction(Function<NonSyncObject, T> transaction) {
synchronized (lock) {
return transaction.apply(nonSyncObject);
}
}
This allows to preform atomic "transactions" on the given object.
Other options are double-check spin-locks:
for (;;) {
T t = atomicT.get();
T newT = new T();
if (atomicT.compareAndSet(t, newT)) return;
}
On this one new T(); will get executed repeatedly until the value is set successfully, so it is not really a "do something once".
This would only work on copy on write transactions, and could help on "instantiating objects once" (which in reality is instantiating many but at the end is referencing the same) by tweaking the code.
The final option is a worst performant version of the first one, but this one is a true happens before AND ONCE (as opposed to the double-check spin-lock):
public void doSomething(Runnable r) {
while (!atomicBoolean.compareAndSet(false, true)) {}
// Do some heavy stuff ONCE
r.run();
atomicBoolean.set(false);
}
The reason why the first one is the better option is that it is doing what this one does, but in a more optimized way.
As a side note, in my projects I've actually used the code below (similar to #the8472's answer), that at the time I thought safe, and it may be:
public T get() {
T res = ref.get();
if (res == null) {
res = builder.get();
if (ref.compareAndSet(null, res))
return res;
else
return ref.get();
} else {
return res;
}
}
The thing about this code is that, as the copy on write loop, this one generates multiple instances, one for each contending thread, but only one is cached, the first one, all the other constructions eventually get GC'd.
Looking at the putIfAbsent method I see the benefit is the skipping of 17 lines of code and then a synchronized body:
/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
int hash = spread(key.hashCode());
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
if (tab == null || (n = tab.length) == 0)
tab = initTable();
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
else if ((fh = f.hash) == MOVED)
tab = helpTransfer(tab, f);
else {
V oldVal = null;
synchronized (f) {
if (tabAt(tab, i) == f) {
And then the synchronized body itself is another 34 lines:
synchronized (f) {
if (tabAt(tab, i) == f) {
if (fh >= 0) {
binCount = 1;
for (Node<K,V> e = f;; ++binCount) {
K ek;
if (e.hash == hash &&
((ek = e.key) == key ||
(ek != null && key.equals(ek)))) {
oldVal = e.val;
if (!onlyIfAbsent)
e.val = value;
break;
}
Node<K,V> pred = e;
if ((e = e.next) == null) {
pred.next = new Node<K,V>(hash, key,
value, null);
break;
}
}
}
else if (f instanceof TreeBin) {
Node<K,V> p;
binCount = 2;
if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
value)) != null) {
oldVal = p.val;
if (!onlyIfAbsent)
p.val = value;
}
}
}
}
The pro(s) of using a ConcurrentHashMap is that it will undoubtedly work.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I lazily concatenate streams? - java

Related

Why does the CompletableFuture allOf method do a binary search?

Split Java stream into two lazy streams without terminal operation

Can you have collections without storing the values in Java?

Finite generated Stream in Java - how to create one?

Atomic compareAndSet but with callback?

Categories

Resources