Only consume latest item with onBackpressureLatest() - java

I have a producer which emits items periodically and a consumer which is sometimes quite slow. It is important that the consumer only works with recent items. I thought onBackpressureLatest() is the perfect solution for this problem. So I wrote the following test code:
PublishProcessor<Integer> source = PublishProcessor.create();
source
.onBackpressureLatest()
.observeOn(Schedulers.from(Executors.newCachedThreadPool()))
.subscribe(i -> {
System.out.println("Consume: " + i);
Thread.sleep(100);
});
for (int i = 0; i < 10; i++) {
System.out.println("Produce: " + i);
source.onNext(i);
}
I expected it to log something like:
Produce: 0
...
Produce: 9
Consume: 0
Consume: 9
Instead, I get
Produce: 0
...
Produce: 9
Consume: 0
Consume: 1
...
Consume: 9
onBackpressureLatest() and onBackpressureDrop() do both not have any effect. Only onBackpressureBuffer(i) causes an exception.
I use rxjava 2.1.9. Any ideas what the problem/my misunderstanding could be?

observeOn has an internal buffer (default 128 elements) that will pick up all source items easily immediately, thus the onBackpressureLatest is always fully consumed.
Edit:
The smallest buffer you can create is 1 which should provide the required pattern:
source.onBackpressureLatest()
.observeOn(Schedulers.from(Executors.newCachedThreadPool()), false, 1)
.subscribe(v -> { /* ... */ });
(the earlier delay + rebatchRequest combination is practically equivalent to this).

I think the following is supposed to work but I'm not entirely sure
PublishProcessor<Integer> source = PublishProcessor.create();
source
.onBackpressureLatest()
.switchMap(item -> Flowable.just(item)) // <--
.observeOn(
Schedulers.from(Executors.newCachedThreadPool()))
.subscribe(i -> {
System.out.println("Consume: " + i);
Thread.sleep(100);
});

Related

Complete all tasks, but no more K tasks at the same time via Project Reactor

I'm beginner in Project Reactor and think it's pretty easy, but I can't find the solution.
I have N expensive tasks to do, and I want to implement something like Bounded Semaphore in Java (do not request next element until current count of running task less than K).
Shortly: complete all tasks, but no more K tasks at the same time
Flux.range(1, 100)
.parallel()
.limit(K) // Something like this
.doOnNext(i -> expensiveWork(i))
.subscribe()
Found this post on SO, but it's not for Reactor. But the meaning is the same. Please, help.
Close to my real case:
httpClient.getMainPageAsMono()
.flatMapMany(html -> {
Flux.fromIterable(getLinksFromPage(it));
})
.parallel(k)
.runOn(Schedulers.boundedElastic())
.flatMap(link -> {
// ON THIS PART IT EXECUTES ALL LINKS AT THE SAME TIME
// INSTEAD OF MAKING THROATTLE
client.getAnotherPageByLink(link);
})
.....
.subscribe()
That is, if the getLinksFromPage(it) function returns 1000 links, each next link will not be taken until client.getAnotherPageByLink(link) does it not finished.
Using just .parallel() will give you a ParallelFlux, but in order to tell the resulting ParallelFlux where to run each rail (and, by extension, to run rails in parallel) you have to use .runOn(Scheduler scheduler).
So we should use .parallel(int parallelism) with .runOn(Scheduler scheduler):
public static void main(String[] args) throws InterruptedException {
int k = 3;
Flux.range(1, 100)
.parallel(k) // k rails
.runOn(Schedulers.boundedElastic()) // the rails will run on this scheduler
.doOnNext(i -> expensiveWork(i))
.subscribe();
Thread.currentThread().join(); // Just so program won't finish
}
private static void expensiveWork(Integer i) {
Instant start = Instant.now();
while (Duration.between(start, Instant.now()).getSeconds() < 5) ;
System.out.println(Instant.now()+" - "+i+" - Done expensive work");
}
Output:
2021-06-12T13:46:58.445Z - 3 - Done expensive work
2021-06-12T13:46:58.445Z - 1 - Done expensive work
2021-06-12T13:46:58.445Z - 2 - Done expensive work
2021-06-12T13:47:03.453Z - 5 - Done expensive work
2021-06-12T13:47:03.453Z - 6 - Done expensive work
2021-06-12T13:47:03.453Z - 4 - Done expensive work
2021-06-12T13:47:08.453Z - 8 - Done expensive work
2021-06-12T13:47:08.453Z - 7 - Done expensive work
2021-06-12T13:47:08.453Z - 9 - Done expensive work
...
As you can see, we limited the number of tasks that are executed in parallel to k.
What about this solution? I removed parallel from Flux, in order to bufferize 10 elements. Each elements can be then handled in parallel
public static final void main(String... args) {
Flux.range(1, 1000)
.buffer(10)
.doOnNext(grp -> grp.parallelStream().forEach(p -> System.out.println(Instant.now() + " : " + p)))
.doOnNext(grp -> sleep(1000)) // Wait for 1 second to see how the algorithm is working
.doOnNext(grp -> System.out.println("####"))
.subscribe();
}
private static void sleep(int millis) {
try {
Thread.sleep(millis);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Output is:
2021-06-12T14:16:23.760298200Z : 8
2021-06-12T14:16:23.760298200Z : 4
2021-06-12T14:16:23.760298200Z : 10
2021-06-12T14:16:23.760298200Z : 1
2021-06-12T14:16:23.760298200Z : 3
2021-06-12T14:16:23.760298200Z : 5
2021-06-12T14:16:23.760298200Z : 7
2021-06-12T14:16:23.760298200Z : 2
2021-06-12T14:16:23.760298200Z : 6
2021-06-12T14:16:23.760298200Z : 9
####
2021-06-12T14:16:24.784628Z : 17
2021-06-12T14:16:24.784628Z : 16
2021-06-12T14:16:24.784628Z : 20
2021-06-12T14:16:24.784628Z : 14
2021-06-12T14:16:24.784628Z : 11
2021-06-12T14:16:24.784628Z : 13
2021-06-12T14:16:24.784628Z : 18
2021-06-12T14:16:24.784628Z : 19
2021-06-12T14:16:24.784628Z : 12
2021-06-12T14:16:24.785801500Z : 15
As you can see, each 10 elements are processed by group in parallel within each second
This can be easily accomplished without parallel using an overloaded version of flatMap where you can specify concurrency:
flatMap(Function<? super T,? extends Publisher<? extends V>> mapper, int concurrency)
httpClient.getMainPageAsMono()
.flatMapMany(html -> {
Flux.fromIterable(getLinksFromPage(it));
})
.flatMap(link -> client.getAnotherPageByLink(link), k)
.....
.subscribe()
Based on the code, this operation is not expensive in terms of CPU, rather in terms of IO, so using ParallelFlux is not necessary.

Flux.range waits to emit more element once 256 elements are reached

I wrote this code:
Flux.range(0, 300)
.doOnNext(i -> System.out.println("i = " + i))
.flatMap(i -> Mono.just(i)
.subscribeOn(Schedulers.elastic())
.delayElement(Duration.ofMillis(1000))
)
.doOnNext(i -> System.out.println("end " + i))
.blockLast();
When running it, the first System.out.println shows that the Flux stop emitting numbers at the 256th element, then it waits for the older to be completed before emitting new ones.
Why is this happening?
Why 256?
Why this happening?
The flatMap operator can be characterized as operator that (rephrased from javadoc):
subscribes to its inners eagerly
does not preserve ordering of elements.
lets values from different inners interleave.
For this question the first point is important. Project Reactor restricts the
number of in-flight inner sequences via concurrency parameter.
While flatMap(mapper) uses the default parameter the flatMap(mapper, concurrency) overload accepts this parameter explicitly.
The flatMaps javadoc describes the parameter as:
The concurrency argument allows to control how many Publisher can be subscribed to and merged in parallel
Consider the following code using concurrency = 500
Flux.range(0, 300)
.doOnNext(i -> System.out.println("i = " + i))
.flatMap(i -> Mono.just(i)
.subscribeOn(Schedulers.elastic())
.delayElement(Duration.ofMillis(1000)),
500
// ^^^^^^^^^^
)
.doOnNext(i -> System.out.println("end " + i))
.blockLast();
In this case there is no waiting:
i = 297
i = 298
i = 299
end 0
end 1
end 2
In contrast if you pass 1 as concurrency the output will be similar to:
i = 0
end 0
i = 1
end 1
Awaiting one second before emitting the next element.
Why 256?
256 is the default value for concurrency of flatMap.
Take a look at Queues.SMALL_BUFFER_SIZE:
public static final int SMALL_BUFFER_SIZE = Math.max(16,
Integer.parseInt(System.getProperty("reactor.bufferSize.small", "256")));

Observable caches emitted items or does not?

Does Observable caches emitted items? I have two tests that lead me to different conclusions:
From the test #1 I make an conclusion that it does:
Test #1:
Observable<Long> clock = Observable
.interval(1000, TimeUnit.MILLISECONDS)
.take(10)
.map(i -> i++);
//subscribefor the first time
clock.subscribe(i -> System.out.println("a: " + i));
//subscribe with 2.5 seconds delay
Executors.newScheduledThreadPool(1).schedule(
() -> clock.subscribe(i -> System.out.println(" b: " + i)),
2500,
TimeUnit.MILLISECONDS
);
Output #1:
a: 0
a: 1
a: 2
b: 0
a: 3
b: 1
But the second test shows that we get different values for two observers:
Test #2:
Observable<Integer> observable = Observable
.range(1, 1000000)
.sample(7, TimeUnit.MILLISECONDS);
observable.subscribe(i -> System.out.println("Subscriber #1:" + i));
observable.subscribe(i -> System.out.println("Subscriber #2:" + i));
Output #2:
Subscriber #1:72745
Subscriber #1:196390
Subscriber #1:678171
Subscriber #2:336533
Subscriber #2:735521
There exist two kinds of Observables: hot and cold. Cold observables tend to generate the same sequence to its Observers unless you have external effects, such as a timer based action, associated with it.
In the first example, you get the same sequence twice because there are no external effects other than timer ticks you get one by one. In the second example, you sample a fast source and sampling with time has a non-deterministic effect: each nanosecond counts so even the slightest imprecision leads to different value reported.

Reactive Pull with muti-threaded RxJava

I am trying to build a reactive pull observer in RxJava.
My observer is like so:
Observable<Command> myObs = Observable.create(s -> {
Command command;
int i = 0;
do {
command = NetworkOperation1.call(i);
logger.info("Init command " + i);
s.onNext(command);
i++;
} while (!command.isLast() && i < MAX);
s.onCompleted();
});
And I want to process it in 4 concurrent batches (buffer), like so:
myObs
.buffer(10)
.flatMap(batch -> {
return Observable
.from(batch)
.subscribeOn(Schedulers.io())
.map(c -> {
Intermediate m = NetworkOperation2.call(c));
logger.info("Done intermediate " + m.id);
return m;
}
}, 4);
And then, I need to batch the results in a different size, like so:
.buffer(25)
.subscribeOn(Schedulers.newThread())
.subscribe(list ->
logger.info("Finished batch with " + list.size());
The problem is that the Commands in the Observable are processed all at once, while I want them to be processed as they are needed.
Here is the log of what happens: (notice all 1000 commands are run at once, instead of called as needed)
Init command 0
Init command 1
Init command 2
...
Init command 999
Done intermediate 0
Done intermediate 1
...
Done intermediate 24
Finished batch with 25
Done intermediate 25
Done intermediate 26
...
Done intermediate 49
Finished batch with 25
...
QUESTION: Is there a way to pause the thread of the Observer so it doesn't emmit all the commands at once or something like this? I have tried the request() operator but I can't get it to work.
Thank you.
You need backpressure aware sources and operators. The operators you are using support backpressure but your source does not.
Do this instead:
myObs = Observable.range(1,1000)
.map(i -> NetworkOperation1.call(i));
Observable.range supports backpressure so will only emit when requested to do so.

Strange issue regarding for-comprehension

I'm a newbie to the whole Scala scene but so far have been loving the ride! However, I got stuck with an issue and haven't yet been able to grasp the reason...
I'm currently working with Kafka and was trying to read data from a topic and pass it around to somewhere else.
The problem is: the println in the inner for-comprehension outputs the lines on the bottom, as expected, but all other prinln's outside that inner for are skipped and the function ends up returning nothing at all (can't even issue a getClass in the test case!)... What might be causing it? I really ran out of ideas...
The related code:
def tryBatchRead(maxMessages: Int = 100, skipMessageOnError: Boolean = true): List[String] = {
var numMessages = 0L
var list = List[String]()
val iter = if (maxMessages >= 0) stream.slice(0, maxMessages) else stream
for (messageAndTopic <- iter) {
for (m <- messageAndTopic) {
println(m.offset.toString + " --- " + new String(m.message))
list = list ++ List(new String(m.message))
println("DEBUG " + list)
numMessages += 1
}
println("test1")
}
println("test2")
println("FINISH" + list)
connector.shutdown()
println("test3")
list
}
The output:
6 --- {"user":{"id":"4d9e3582-2d35-4600-b070-e4d92e42c534","age":25,"sex":"M","location":"PT"}}
DEBUG List({"user":{"id":"4d9e3582-2d35-4600-b070-e4d92e42c534","age":25,"sex":"M","location":"PT"}})
7 --- test 2
DEBUG List({"user":{"id":"4d9e3582-2d35-4600-b070-e4d92e42c534","age":25,"sex":"M","location":"PT"}}, test 2)
8 --- {"StartSurvey":{"user":{"id":"6a736fdd-79a0-466a-9030-61b5ac3a3a0e","age":25,"sex":"M","location":"PT"}}}
DEBUG List({"user":{"id":"4d9e3582-2d35-4600-b070-e4d92e42c534","age":25,"sex":"M","location":"PT"}}, test 2, {"StartSurvey":{"user":{"id":"6a736fdd-79a0-466a-9030-61b5ac3a3a0e","age":25,"sex":"M","location":"PT"}}})
Thanks for the help!
I'm not totally sure, but it's VERY likely that you block after reading last message awaiting next one to come (kafka streams are basically infinite). Configure timeout for kafka consumer, so it will give up if there is no message for some time. There is consumer.timeout.ms property for that (set it to 3000 ms for example), which will result in ConsumerTimeoutException once waiting limit is reached.
By the way, I would rewrite your code as:
def tryBatchRead(maxMessages: Int = 100): List[String] = {
// `.take` works fine if collection has less elements than max
val batchStream = stream.take(maxMessages)
// TODO: add try/catch section, according to the above comments
// in scala we usually write a single joined for, instead of multiple nested ones
val batch = for {
messageAndTopic <- batchStream.take(maxMessages)
msg <- messageAndTopic // are you sure you can iterate message and topic? 0_o
} yield {
println(m.offset.toString + " --- " + new String(m.message))
msg
}
println("Number of messages: " + batch.length)
// shutdown has to be done outside, it's bad idea to implicitly tear down streams in reading function
batch
}
I think that this is a normal behavior since you are doing a for over a stream which can be in theory of infinite size(so it will never end or can hang if it waits for results over an I.O ....).
IMHO I will rather write for (m <- messageAndTopic.take(maxMessages).toList) instead of for (m <- messageAndTopic)

Categories

Resources