How can this project-reactor behavior explained?

How can this project-reactor behavior explained? - java

Here is a simple project-reactor code snippet:
Consumer<String> slowConsumer = x -> {
try {
TimeUnit.MILLISECONDS.sleep(50);
} catch(Exception ignore) {
}
};
Flux<String> publisher = Flux
.just(1, 2, 3)
.parallel(2)
.runOn(Schedulers.newParallel("writer", 2))
.flatMap(rail -> Flux.range(1, 300).map(i -> String.format("Rail %d -> Row %d", rail, i)).log())
.sequential()
.publishOn(Schedulers.newSingle("reader"))
.doOnNext(slowConsumer);
publisher.subscribe();
My expectation is that everything that happens in "flatMap" should be executed within "writer" threads. However, this is what is logged:
onNext(Rail 2 -> Row 300) [Flux.MapFuseable.1] [writer-2]
...
onNext(Rail 1 -> Row 300) [Flux.MapFuseable.2] [writer-1]
...
onNext(Rail 3 -> Row 256) [Flux.MapFuseable.3] [writer-1]
request(256) [Flux.MapFuseable.3] [reader-1]
onNext(Rail 3 -> Row 257) [Flux.MapFuseable.3] [reader-1]
...
onNext(Rail 3 -> Row 300) [Flux.MapFuseable.3] [reader-1]
Can someone explain this? How come "reader" thread is processing the tail of the last rail? What am I missing?

Related

Flux.range waits to emit more element once 256 elements are reached

I wrote this code:
Flux.range(0, 300)
.doOnNext(i -> System.out.println("i = " + i))
.flatMap(i -> Mono.just(i)
.subscribeOn(Schedulers.elastic())
.delayElement(Duration.ofMillis(1000))
)
.doOnNext(i -> System.out.println("end " + i))
.blockLast();
When running it, the first System.out.println shows that the Flux stop emitting numbers at the 256th element, then it waits for the older to be completed before emitting new ones.
Why is this happening?
Why 256?

Why this happening?
The flatMap operator can be characterized as operator that (rephrased from javadoc):
subscribes to its inners eagerly
does not preserve ordering of elements.
lets values from different inners interleave.
For this question the first point is important. Project Reactor restricts the
number of in-flight inner sequences via concurrency parameter.
While flatMap(mapper) uses the default parameter the flatMap(mapper, concurrency) overload accepts this parameter explicitly.
The flatMaps javadoc describes the parameter as:
The concurrency argument allows to control how many Publisher can be subscribed to and merged in parallel
Consider the following code using concurrency = 500
Flux.range(0, 300)
.doOnNext(i -> System.out.println("i = " + i))
.flatMap(i -> Mono.just(i)
.subscribeOn(Schedulers.elastic())
.delayElement(Duration.ofMillis(1000)),
500
// ^^^^^^^^^^
)
.doOnNext(i -> System.out.println("end " + i))
.blockLast();
In this case there is no waiting:
i = 297
i = 298
i = 299
end 0
end 1
end 2
In contrast if you pass 1 as concurrency the output will be similar to:
i = 0
end 0
i = 1
end 1
Awaiting one second before emitting the next element.
Why 256?
256 is the default value for concurrency of flatMap.
Take a look at Queues.SMALL_BUFFER_SIZE:
public static final int SMALL_BUFFER_SIZE = Math.max(16,
Integer.parseInt(System.getProperty("reactor.bufferSize.small", "256")));

Why is CompletableFuture join/get faster in separate streams than using one stream

For the following program I am trying to figure out why using 2 different streams parallelizes the task and using the same stream and calling join/get on the Completable future makes them take longer time equivalent to as if they were sequentially processed).
public class HelloConcurrency {
private static Integer sleepTask(int number) {
System.out.println(String.format("Task with sleep time %d", number));
try {
TimeUnit.SECONDS.sleep(number);
} catch (InterruptedException e) {
e.printStackTrace();
return -1;
}
return number;
}
public static void main(String[] args) {
List<Integer> sleepTimes = Arrays.asList(1,2,3,4,5,6);
System.out.println("WITH SEPARATE STREAMS FOR FUTURE AND JOIN");
ExecutorService executorService = Executors.newFixedThreadPool(6);
long start = System.currentTimeMillis();
List<CompletableFuture<Integer>> futures = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.collect(Collectors.toList());
executorService.shutdown();
List<Integer> result = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
long finish = System.currentTimeMillis();
long timeElapsed = (finish - start)/1000;
System.out.println(String.format("done in %d seconds.", timeElapsed));
System.out.println(result);
System.out.println("WITH SAME STREAM FOR FUTURE AND JOIN");
ExecutorService executorService2 = Executors.newFixedThreadPool(6);
start = System.currentTimeMillis();
List<Integer> results = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.map(CompletableFuture::join)
.collect(Collectors.toList());
executorService2.shutdown();
finish = System.currentTimeMillis();
timeElapsed = (finish - start)/1000;
System.out.println(String.format("done in %d seconds.", timeElapsed));
System.out.println(results);
}
}
Output
WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 6
Task with sleep time 5
Task with sleep time 1
Task with sleep time 3
Task with sleep time 2
Task with sleep time 4
done in 6 seconds.
[1, 2, 3, 4, 5, 6]
WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
Task with sleep time 2
Task with sleep time 3
Task with sleep time 4
Task with sleep time 5
Task with sleep time 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]

The two approaches are quite different, let me try to explain it clearly
1st approach : In the first approach you are spinning up all Async requests for all 6 tasks and then calling join function on each one of them to get the result
2st approach : But in the second approach you are calling the join immediately after spinning the Async request for each task. For example after spinning Async thread for task 1 calling join, make sure that thread to complete task and then only spin up the second task with Async thread
Note : Another side if you observe the output clearly, In the 1st approach output appears in random order since the all six tasks were executed asynchronously. But during second approach all tasks were executed sequentially one after the another.
I believe you have an idea how stream map operation is performed, or you can get more information from here or here
To perform a computation, stream operations are composed into a stream pipeline. A stream pipeline consists of a source (which might be an array, a collection, a generator function, an I/O channel, etc), zero or more intermediate operations (which transform a stream into another stream, such as filter(Predicate)), and a terminal operation (which produces a result or side-effect, such as count() or forEach(Consumer)). Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.

The stream framework does not define the order in which map operations are executed on stream elements, because it is not intended for use cases in which that might be a relevant issue. As a result, the particular way your second version is executing is equivalent, essentially, to
List<Integer> results = new ArrayList<>();
for (Integer sleepTime : sleepTimes) {
results.add(CompletableFuture
.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.join());
}
...which is itself essentially equivalent to
List<Integer> results = new ArrayList<>()
for (Integer sleepTime : sleepTimes) {
results.add(sleepTask(sleepTime));
}

#Deadpool answered it pretty well, just adding my answer which can help someone understand it better.
I was able to get an answer by adding more printing to both methods.
TLDR
2 stream approach: We are starting up all 6 tasks asynchronously and then calling join function on each one of them to get the result in a separate stream.
1 stream approach: We are calling the join immediately after starting up each task. For example after spinning a thread for task 1, calling join makes sure the thread waits for completion of task 1 and then only spin up the second task with async thread.
Note: Also, if we observe the output clearly, in the 1 stream approach, output appears sequential order since the all six tasks were executed in order. But during second approach all tasks were executed in parallel, hence the random order.
Note 2: If we replace stream() with parallelStream() in the 1 stream approach, it will work identically to 2 stream approach.
More proof
I added more printing to the streams which gave the following outputs and confirmed the note above :
1 stream:
List<Integer> results = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.map(f -> {
int num = f.join();
System.out.println(String.format("doing join on task %d", num));
return num;
})
.collect(Collectors.toList());
WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
doing join on task 1
Task with sleep time 2
doing join on task 2
Task with sleep time 3
doing join on task 3
Task with sleep time 4
doing join on task 4
Task with sleep time 5
doing join on task 5
Task with sleep time 6
doing join on task 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]
2 streams:
List<CompletableFuture<Integer>> futures = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.collect(Collectors.toList());
List<Integer> result = futures.stream()
.map(f -> {
int num = f.join();
System.out.println(String.format("doing join on task %d", num));
return num;
})
.collect(Collectors.toList());
WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 2
Task with sleep time 5
Task with sleep time 3
Task with sleep time 1
Task with sleep time 4
Task with sleep time 6
doing join on task 1
doing join on task 2
doing join on task 3
doing join on task 4
doing join on task 5
doing join on task 6
done in 6 seconds.
[1, 2, 3, 4, 5, 6]

Flux parallel-serial execution with groupBy

Say I have this:
Flux<GroupedFlux<Integer, Integer>> intsGrouped = Flux.range(0, 12)
.groupBy(i -> i % 3);
and say I have a method:
Mono<Integer> getFromService(Integer i);
I want to call getFromService in parallel for each of the groups, but make sure the calls are serial within each group.
For the above example that would be three parallel streams with these input values:
stream 1: 0 -> 3 -> 6 -> 9
stream 2: 1 -> 4 -> 7 -> 10
stream 3: 2 -> 5 -> 8 -> 11
I tried this, but it's not doing what I want:
Flux.range(0, 12)
.groupBy(i -> i % 3)
.flatMap(g -> g.flatMap(i -> getFromService(g.key(), i)))
This is calling the service in parallel for all the ints at once. How do I proceed?

Use either concatMap or flatMapSequential instead of the inner .flatMap
If you want sequential execution within each group (i.e. only one subscription to getFromService at a single time within each group), then use .concatMap, like this:
Flux.range(0, 12)
.groupBy(i -> i % 3)
.flatMap(g -> g.concatMap(i -> getFromService(g.key(), i)))
If parallel execution within a group is ok, but you just care about the order in which the sequence is emitted, then use flatMapSequential, like this:
Flux.range(0, 12)
.groupBy(i -> i % 3)
.flatMap(g -> g.flatMapSequential(i -> getFromService(g.key(), i)))
Another option is to use .flatMap with the concurrency argument set to 1, but I'd recommend one of the above instead.

Unsubscribe interval after emission

I have a code where I´m making an interval until a condition acomplish and then in the subscribe send back the result.
But since is an interval the subscription continue.
I was wondering if there´s any way to unsubscribe an Observable interval once emmit something
here the code
Subscription subscriber = Observable.interval(0, 5, TimeUnit.MILLISECONDS)
.map(i -> eventHandler.getProcessedEvents())
.filter(eventsProcessed -> eventsProcessed >= 10)
.doOnNext(eventsProcessed -> eventHandler.initProcessedEvents())
.doOnNext(eventsProcessed -> logger.info(null, "Total number of events processed:" + eventsProcessed))
.subscribe(t -> resumeRequest(asyncResponse));
new TestSubscriber((Observer) subscriber).awaitTerminalEvent(10, TimeUnit.SECONDS);
subscriber.unsubscribe();
For now as a hack I use a timer and then unsubscribe, but it´s bad!
Regards

You can use the first operator
Subscription subscriber = Observable.interval(0, 5, TimeUnit.MILLISECONDS)
.map(i -> eventHandler.getProcessedEvents())
.first(eventsProcessed -> eventsProcessed >= 10)
.doOnNext(eventsProcessed -> eventHandler.initProcessedEvents())
.doOnNext(eventsProcessed -> logger.info(null, "Total number of events processed:" + eventsProcessed))
.subscribe(t -> resumeRequest(asyncResponse));
instead of the filter. This ensures that you only get a single emission if your condition is met. Note that you will get an exception if your condition interval Observable terminates without your condition being met.

How to iterate x times using Java 8 stream? [duplicate]

This question already has answers here:
Is it possible to use Streams.intRange function?
(3 answers)
Closed 6 years ago.
I have an old style for loop to do some load tests:
For (int i = 0 ; i < 1000 ; ++i) {
if (i+1 % 100 == 0) {
System.out.println("Test number "+i+" started.");
}
// The test itself...
}
How can I use new Java 8 stream API to be able to do this without the for?
Also, the use of the stream would make it easy to switch to parallel stream. How to switch to parallel stream?
* I'd like to keep the reference to i.

IntStream.range(0, 1000)
/* .parallel() */
.filter(i -> i+1 % 100 == 0)
.peek(i -> System.out.println("Test number " + i + " started."))
/* other operations on the stream including a terminal one */;
If the test is running on each iteration regardless of the condition (take the filter out):
IntStream.range(0, 1000)
.peek(i -> {
if (i + 1 % 100 == 0) {
System.out.println("Test number " + i + " started.");
}
}).forEach(i -> {/* the test */});
Another approach (if you want to iterate over an index with a predefined step, as #Tunaki mentioned) is:
IntStream.iterate(0, i -> i + 100)
.limit(1000 / 100)
.forEach(i -> { /* the test */ });
There is an awesome overloaded method Stream.iterate(seed, condition, unaryOperator) in JDK 9 which perfectly fits your situation and is designed to make a stream finite and might replace a plain for:
Stream<Integer> stream = Stream.iterate(0, i -> i < 1000, i -> i + 100);

You can use IntStream as shown below and explained in the comments:
(1) Iterate IntStream range from 1 to 1000
(2) Convert to parallel stream
(3) Apply Predicate condition to allow integers with (i+1)%100 == 0
(4) Now convert the integer to a string "Test number "+i+" started."
(5) Output to console
IntStream.range(1, 1000). //iterates 1 to 1000
parallel().//converts to parallel stream
filter( i -> ((i+1)%100 == 0)). //filters numbers & allows like 99, 199, etc..)
mapToObj((int i) -> "Test number "+i+" started.").//maps integer to String
forEach(System.out::println);//prints to the console

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can this project-reactor behavior explained? - java

Related

Flux.range waits to emit more element once 256 elements are reached

Why is CompletableFuture join/get faster in separate streams than using one stream

Flux parallel-serial execution with groupBy

Unsubscribe interval after emission

How to iterate x times using Java 8 stream? [duplicate]

Categories

Resources