Project Reactor: How to delay emission of (throttle) each element?

Project Reactor: How to delay emission of (throttle) each element? - java

Consider the following Flux
Flux.range(1, 5)
.parallel(10)
.runOn(Schedulers.parallel())
.map(i -> "https://www.google.com")
.flatMap(uri -> Mono.fromCallable(new HttpGetTask(httpClient, uri)))
HttpGetTask is a Callable whose actual implementation is irrelevant in this case, it makes a HTTP GET call to the given URI and returns the content if successful.
Now, I'd like to slow down the emission by introducing an artificial delay, such that up to 10 threads are started simultaneously, but each one doesn't complete as soon as HttpGetTask is done. For example, say no thread must finish before 3 seconds. How do I achieve that?

If the requirement is really "not less than 3s" you could add a delay of 3 seconds to the Mono inside the flatMap by using Mono.fromCallable(...).delayElement(Duration.ofSeconds(3)).

Related

Do you have a test to show differences between the reactor map() and flatMap()?

I am still trying to understand the difference between the reactor map() and flatMap() method.
First I took a look at the API, but it isn't really helpful, it confused me even more.
Then I googled a lot, but it seems like nobody has an example to make the differences understandable, if there are any differences.
Therefore I tried to write two tests to see the different behaviour for each methods.
But unfortunatley it isn't working as I hoped it would...
First test method is testing the reactive flatMap() method:
#Test
void fluxFlatMapTest() {
Flux.just(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
.window(2)
.flatMap(fluxOfInts -> fluxOfInts.map(this::processNumber).subscribeOn(Schedulers.parallel()))
.doOnNext(System.out::println)
.subscribe();
}
The output is as expected, explainable and looks like that:
9 - parallel-2
1 - parallel-1
4 - parallel-1
25 - parallel-3
36 - parallel-3
49 - parallel-4
64 - parallel-4
81 - parallel-5
100 - parallel-5
16 - parallel-2
The second method should test the output of the map() method to compare with above results of the flatMap() method.
#Test
void fluxMapTest() {
final int start = 1;
final int stop = 100;
Flux.just(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
.window(2)
.map(fluxOfInts -> fluxOfInts.map(this::processNumber).subscribeOn(Schedulers.parallel()))
.doOnNext(System.out::println)
.subscribe();
}
This test method has the output, I didn't expected at all and looks like that:
FluxSubscribeOn
FluxSubscribeOn
FluxSubscribeOn
FluxSubscribeOn
FluxSubscribeOn
There is a little helper method which looks like that:
private String processNumber(Integer x) {
String squaredValueAsString = String.valueOf(x * x);
return squaredValueAsString.concat(" - ").concat(Thread.currentThread().getName());
}
Nothing special here.
I am using Spring Boot 2.3.4 with Java 11 and the reactor implementation for Spring.
Do you have a good explaning example or do you know how to change the above tests so that they make sense?
Then please help me out with that.
Thanks a lot in advance!

Reactor which is the underlying library in Webflux consists of something called the event loop which in turn i believe is based on an architecture called the LMAX Architecture.
This means that the event loop is a single threaded event processer. Everything up to the event loop can be multithreaded but the events themselves are processed by a single thread. The event loop.
Regular spring boot applications are usually run using the server tomcat, or undertow, while webflux is per default run by the event driven server Netty, which in turn uses this event loop to process events for us.
So now that we understand what is underneath everything we can start talking about map and flatMap.
Map
If we look in the api we can see the following image:
and the api text says:
Transform the items emitted by this Flux by applying a synchronous function to each item.
Which is pretty self explanatory. We have a Flux of items, and each time map asks for an item to process it wont ask for another one until it has finished processing the first. Hence synchronous.
The image shows that, green circle needs to be converted to a green square, until we can ask for the the yellow circle to be converted to a yellow square... etc. etc.
here is a code example:
Flux.just("a", "b", "c")
.map(value -> value.toUppercase())
.subscribe(s -> System.out.println(s + " - " + Thread.currentThread().getName()));
// Output
A - main
B - main
C - main
Each are run on the main thread, and processed after each other synhronously.
flatMap
If we look in the api we can see the following image:
and the text says:
Transform the elements emitted by this Flux asynchronously into Publishers, then flatten these inner publishers into a single Flux through merging, which allow them to interleave.
it does this using basically three steps:
Generation of inners and subscription: this operator is eagerly subscribing to its inners.
Ordering of the flattened values: this operator does not necessarily preserve original ordering, as inner element are flattened as they arrive.
Interleaving: this operator lets values from different inners interleave (similar to merging the inner sequences).
So what does this mean? well it basically means that:
it will take each item in the flux, and transform it to individual Mono (publisher) with one item in each.
Order the items as they get processed, flatMap does NOT preserve order, as items can be processed in a different amount of time on the event loop.
Merging back all the processed items into a Flux for further processing down the line.
Here is a code example:
Flux.just("a", "b", "c")
.flatMap(value -> Mono.just(value.toUpperCase()))
.subscribe(s -> System.out.println(s + " - " + Thread.currentThread().getName()));
// Output
A - main
B - main
C - main
Wait flatMap printing the same thing as map!
Well, it all comes back to the threading model we talked about earlier. Actually there is only one thread called the event loop that handles all events.
Reactor is concurrent agnostic meaning that any worker can schedule jobs to be handled by the event loop.
So what is a worker well a worker is something a scheduler can spawn. And one important thing is that a worker doesn't have to be a thread, it can be, but it doesn't have to be.
In the above code cases, the main thread subscribes to our flux, which means that the main thread will process this for us and schedule work for the event loop to handle.
In a server environment this necessarily doesn't have to be the case. The important thing to understand here is that reactor can switch workers (aka possible threads) whenever it wants if it needs to.
In my above code examples there is only a main thread, so there is no need to run things on multiple threads, or have parallel execution.
If i wish to force it, i can use one of the different schedulers which all have their uses. In Netty, the server will start up will the same amount of event loop threads as cores on your machine, so there it can switch workers and cores freely if needed at heavy loads to maximize the usage of all event loops..
FlatMap being async does NOT mean parallel, it means that it will schedule all things to be processed by the event loop at the same time but its still only one thread executing the tasks.
Parallel execution
if i really want to execute something in parallel you can for instance place something on a parallel Scheduler. This means that it it will guarantee multiple workers on multiple cores. But remember there is a setup time for this when your program is run, and this is usually only beneficial if you have heavy computational stuff which in turn needs a lot of single core CPU power.
code example:
Flux.just("a", "b", "c")
.flatMap(value -> value -> Mono.just(value.toUpperCase()))
.subscribeOn(Schedulers.parallel())
.subscribe(s -> System.out.println(s + " - " + Thread.currentThread().getName()));
// Output
A - parallel-1
B - parallel-1
C - parallel-1
Here we are still running on just one thread, because subscribeOn means that when a thread subscribes the Scheduler will pick one thread from the scheduler pool and then stick with it throughout execution.
if we want to absolutely feel the need to force execution on multiple threads we can for instance use a parallel flux.
Flux.range(1, 10)
.parallel(2)
.runOn(Schedulers.parallel())
.subscribe(i -> System.out.println(Thread.currentThread().getName() + " -> " + i));
// Output
parallel-3 -> 2
parallel-2 -> 1
parallel-3 -> 4
parallel-2 -> 3
parallel-3 -> 6
parallel-2 -> 5
parallel-3 -> 8
parallel-2 -> 7
parallel-3 -> 10
parallel-2 -> 9
But remember this is in most cases not necessary. There is a setup time, and this type of execution is usually only beneficial if you have a lot of cpu heavy tasks. Otherwise using the default event loop single thread will in most cases "probably" be faster.
Dealing with a lot of i/o tasks, is usually more about orchestration, than raw CPU power.
Most of the information here is fetched from the Flux and Mono api.
the Reactor documentation is an amazing and interesting source of information.
also Simon Baslé's blog series Flight of the flux is also a wonderful and interesting read. It also exists in Youtube format
There is also some faults here and there and i have made some assumptions too especially when it comes to the inner workings of Reactor. But hopefully this will at least clear up some thoughts.
If someone feels things are direct faulty, feel free to edit.

Reactor: function creating Monos to Flux

Basically, I'm making a queue processor in Spring Boot and want to use Reactor for async. I've made a function needs to loop forever as it's the one that pulls from the queue then marks the item as processed.
here's the blocking version that works Subscribe returns a Mono
while(true) {
manager.Subscribe().block()
}
I'm not sure how to turn this into a Flux I've looked a interval, generate, create, etc. and I can't get anything to work without calling block()
Here's an example of what I've tried
Flux.generate(() -> manager,
(state, sink) -> {
state.Subscribe().block();
sink.next("done");
return state;
}));
Being a newbie to Reactor, I haven't been able to find anything about just loop and processing the Monos synchronously without blocking.
Here's what the Subscribe method does using the AWS Java SDK v2:
public Mono Subscribe() {
return Mono.fromFuture(_client.receiveMessage(ReceiveMessageRequest.builder()
.waitTimeSeconds(10)
.queueUrl(_queueUrl)
.build()))
.filter(x -> x.messages() != null)
.flatMap(x -> Mono.when(x.messages()
.stream()
.map(y -> {
_log.warn(y.body());
return Mono.fromFuture(_client.deleteMessage(DeleteMessageRequest.builder()
.queueUrl(_queueUrl)
.receiptHandle(y.receiptHandle())
.build()));
}).collect(Collectors.toList())));
}
Basically, I'm just polling an SQS queue, deleting the messages then I want to do it again. This is all just exploratory for me.
Thanks!

You need two things: a way to subscribe in a loop and a way to ensure that the Subscribe() method is effectively called on each iteration (because the Future needs to be recreated).
repeat() is a baked in operator that will resubscribe to its source once the source completes. If the source errors, the repeat cycle stops. The simplest variant continues to do so Long.MAX_VALUE times.
The only problem is that in your case the Mono from Subscribe() must be recreated on each iteration.
To do so, you can wrap the Subscribe() call in a defer: it will re-invoke the method each time a new subscription happens, which includes each repeat attempt:
Flux<Stuff> repeated = Mono
.defer(manager::Subscribe)
.repeat();

RxJava 2 Zip operation in different threads

I have created a really simple example using RxJava 2 (everything I have developed was using RxJava 1) and I have found next behavior that I don't understand at all. I have next Observable with zip operation:
Observable.zip(getGame(gameId), getDetail(gameId), getReviews(gameId),
(game, detail, reviews) -> new GameInfo(game, detail, reviews))
.subscribeOn(Schedulers.newThread())
.subscribe(sendGameInfo(asyncResponse));
Each of the methods returns an instance of Observable. In theory, I would expect that each of the method (getGame, getDetail, ...) would be executed in parallel in a new Thread, but doing a sysout I noticed that all the time is the same Thread so they are not executed in parallel. I suppose that this is the expected behavior but if I would like to make in parallel is there a way to do it without having to define a runnable inside each of the observable?
Thank you very much.

Ok you need to subscribeOn every Observable
Observable.zip(getGame(gameId)
.subscribeOn(Schedulers.from(executor)),
getDetail(gameId)
.subscribeOn(Schedulers.from(executor)),
getReviews(gameId)
.subscribeOn(Schedulers.from(executor)),
(game, detail, reviews) -> new GameInfo(game, detail, reviews))
.subscribeOn(Schedulers.from(executor))
.subscribe(sendGameInfo(asyncResponse));

Strange behavior of Observable#repeat in reactive extensions

I'm messing around with the rx operators and am curious why just(null).repeat() doesn't work as a parameter to any of the built-in operators:
Observable.interval(1, TimeUnit.SECONDS)
.sample(Observable.just(null).repeat())
.subscribe(System.out::println);
I would have expected this to print 0 1 2 3 ... but it just hangs. I imagine it's because the repeat is hogging the default Scheduler, however, if you swap the roles of interval and the just-repeat then it works as expected, printing null once per second:
Observable.just(null).repeat()
.sample(Observable.interval(1, TimeUnit.SECONDS))
.subscribe(System.out::println);
Whats going on here?

If you don't specify a scheduler (and no operator is setting one), then all processing happens on the same thread. just(null).repeat() will hog 100% of a CPU core, so nothing else gets a chance to proceed.
In your case, the interval gets produced on the Scedulers.computation() Scheduler, and because it's at the start and no scheduler changes happen afterwards, your repeat is also working on the same thread.
In the second case, everything gets subscribed on the same thread, except the interval, which is on its own scheduler; the rest depends on the internal implementation of sample.
If you use a specific scheduler, it should work:
.sample(Observable.just(null).repeat().subscribeOn(Schedulers.computation()))
Note that if you just want to use nulls instead of the numbers that interval produces, a much more efficient way is to use map instead of sample:
.map(any -> (Object) null)

Multithreaded Pipeline in Java

I have a problem regarding running multiple threads executing a pipeline in Java.
Say the 'input' of the pipeline is:
5 instructions namely: I1, I2, I3, I4, I5
If I1 has been fetched it will be now ready for decoding but fetch operation will not wait the decode task to finish. After transferring the fetched instructions to decode, the fetch operation will now get the next instruction I2, and so on.
It is a pipelining scheduling with five stages.
How do I simulate such a machine using java multithreading?

Assuming you want to know how to implement such a thing: It is called a 'pipeline pattern'. If this is not homework, you could reuse an existing implementation of this pattern. One is available at:
http://code.google.com/p/pipelinepattern/
If this is homework, then your teacher might be expecting you to write it on your own from scratch. A good starting point is this two stage pipeline (where one thread reads lines from a file, and the other thread prints the lines):
http://rosettacode.org/wiki/Synchronous_concurrency#Java
In the above example the two stages communicate via a BlockingQueue (i.e. stage 1 writes to the queue, and stage 2 reads it). If stage 1 is consistently faster than stage 2, the queue will get quite big). You can enforce operation of the stages in lockstep by using a SynchronousQueue instead [see comment #1 to this answer, for why].
If you need a five stage pipeline, you need to extend this by having 5 threads which have 4 queues between them:
in -> [s1] -> q12 -> [s2] -> q23 -> [s3] -> q34 -> [s4] -> q45 -> [s5] -> out
Above, each [s*] represents a stage (a thread) and each qAB represents a Queue being enqueued to by [sA] and dequeued from by [sB]

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.