I'm playing with the RxJava retryWhen operator. Very little is found about it on the internet, the only one worthy of any mention being this. That too falls short of exploring the various use cases that I'd like to understand. I also threw in asynchronous execution and retry with back-off to make it more realistic.
My setup is simple: I've a class ChuckNorrisJokesRepository that returns random number of Chuck Norris jokes from a JSON file. My class under test is ChuckNorrisJokesService which is shown below. The use cases I'm interested in are as follows:
Succeeds on 1st attempt (no retries)
Fails after 1 retry
Attempts to retry 3 times but succeeds on 2nd hence doesn't retry 3rd time
Succeeds on 3rd retry
Note: The project is available on my GitHub.
ChuckNorrisJokesService.java:
#Slf4j
#Builder
public class ChuckNorrisJokesService {
#Getter
private final AtomicReference<Jokes> jokes = new AtomicReference<>(new Jokes());
private final Scheduler scheduler;
private final ChuckNorrisJokesRepository jokesRepository;
private final CountDownLatch latch;
private final int numRetries;
private final Map<String, List<String>> threads;
public static class ChuckNorrisJokesServiceBuilder {
public ChuckNorrisJokesService build() {
if (scheduler == null) {
scheduler = Schedulers.io();
}
if (jokesRepository == null) {
jokesRepository = new ChuckNorrisJokesRepository();
}
if (threads == null) {
threads = new ConcurrentHashMap<>();
}
requireNonNull(latch, "CountDownLatch must not be null.");
return new ChuckNorrisJokesService(scheduler, jokesRepository, latch, numRetries, threads);
}
}
public void setRandomJokes(int numJokes) {
mergeThreadNames("getRandomJokes");
Observable.fromCallable(() -> {
log.debug("fromCallable - before call. Latch: {}.", latch.getCount());
mergeThreadNames("fromCallable");
latch.countDown();
List<Joke> randomJokes = jokesRepository.getRandomJokes(numJokes);
log.debug("fromCallable - after call. Latch: {}.", latch.getCount());
return randomJokes;
}).retryWhen(errors ->
errors.zipWith(Observable.range(1, numRetries), (n, i) -> i).flatMap(retryCount -> {
log.debug("retryWhen. retryCount: {}.", retryCount);
mergeThreadNames("retryWhen");
return Observable.timer(retryCount, TimeUnit.SECONDS);
}))
.subscribeOn(scheduler)
.subscribe(j -> {
log.debug("onNext. Latch: {}.", latch.getCount());
mergeThreadNames("onNext");
jokes.set(new Jokes("success", j));
latch.countDown();
},
ex -> {
log.error("onError. Latch: {}.", latch.getCount(), ex);
mergeThreadNames("onError");
},
() -> {
log.debug("onCompleted. Latch: {}.", latch.getCount());
mergeThreadNames("onCompleted");
latch.countDown();
}
);
}
private void mergeThreadNames(String methodName) {
threads.merge(methodName,
new ArrayList<>(Arrays.asList(Thread.currentThread().getName())),
(value, newValue) -> {
value.addAll(newValue);
return value;
});
}
}
For brevity, I'll only show the Spock test case for the 1st use case. See my GitHub for the other test cases.
def "succeeds on 1st attempt"() {
setup:
CountDownLatch latch = new CountDownLatch(2)
Map<String, List<String>> threads = Mock(Map)
ChuckNorrisJokesService service = ChuckNorrisJokesService.builder()
.latch(latch)
.threads(threads)
.build()
when:
service.setRandomJokes(3)
latch.await(2, TimeUnit.SECONDS)
Jokes jokes = service.jokes.get()
then:
jokes.status == 'success'
jokes.count() == 3
1 * threads.merge('getRandomJokes', *_)
1 * threads.merge('fromCallable', *_)
0 * threads.merge('retryWhen', *_)
1 * threads.merge('onNext', *_)
0 * threads.merge('onError', *_)
1 * threads.merge('onCompleted', *_)
}
This fails with:
Too few invocations for:
1 * threads.merge('fromCallable', *_) (0 invocations)
1 * threads.merge('onNext', *_) (0 invocations)
What I'm expecting is that fromCallable is called once, it succeeds, onNext is called once, followed by onCompleted. What am I missing?
P.S.: Full disclosure - I've also posted this question on RxJava GitHub.
I solved this after several hours of troubleshooting and with help from ReactiveX member David Karnok.
retryWhen is a complicated, perhaps even buggy, operator. The official doc and at least one answer here use range operator, which completes immediately if there are no retries to be made. See my discussion with David Karnok.
The code is available on my GitHub complete with the following test cases:
Succeeds on 1st attempt (no retries)
Fails after 1 retry
Attempts to retry 3 times but succeeds on 2nd hence doesn't retry 3rd time
Succeeds on 3rd retry
Related
Consider the following code under these condition:
getOneResponePage(int) produces a Flux<Integer>.
It's simulating a request to fetch a page of results from another service.
It's implementation should be can be treated as black box for my questions.
(See below for an explanation of it's actual purpose.)
getOneResponePage(int) will eventually return an empty Flux<Integer>, which is it's way to signal that no more results will come.
(But it will keep on emiting empty Flux<Integer>.)
package ch.cimnine.test;
import org.junit.Test;
import reactor.core.publisher.Flux;
public class PaginationTest {
#Test
public void main() {
final Flux<Integer> finalFlux = getAllResponses();
finalFlux.subscribe(resultItem -> {
try {
Thread.sleep(200); // Simulate heavy processing
} catch (InterruptedException ignore) {
}
System.out.println(resultItem);
});
}
private Flux<Integer> getAllResponses() {
Flux<Flux<Integer>> myFlux = Flux.generate(
() -> 0, // inital page
(page, sink) -> {
var innerFlux = getOneResponePage(page); // eventually returns a Flux.empty()
// my way to check whether the `innerFlux` is now empty
innerFlux.hasElements().subscribe(
hasElements -> {
if (hasElements) {
System.out.println("hasElements=true");
sink.next(innerFlux);
return;
}
System.out.println("hasElements=false");
sink.complete();
}
);
return page + 1;
}
);
return Flux.concat(myFlux);
}
private Flux<Integer> getOneResponePage(int page) {
System.out.println("Request for page " + page);
// there's only content on the first 3 pages
if (page < 3) {
return Flux
.just(1, 2, 3, 5, 7, 11, 13, 17, 23, 27, 31)
.map(i -> (1000 * page) + i);
}
return Flux.empty();
}
}
Goal
The goal is to have one method that is called getAllResponses() which returns a continuous Flux<T> of results.
The caller should not know – or care – that some pagination logic happens internally.
All other methods will be hidden from the caller.
Questions
Since I'm new to reactive programming, am I thinking this right?
IntelliJ warns me that Calling 'subscribe' in non-blocking context is not recommended. How to do it right?
What's getOneResponePage(int) actually?
In my actual code, getOneResponsePage(int) is sending a request using org.springframework.web.reactive.function.client.WebClient.
It connects to a service that is returning results.
The service only returns a max of 1000 results per call.
An offset parameter must be sent to get more results.
The API is a bit weird in the sense that the only way to know for sure whether you have all the results is to query it repeatedly with an ever-increasing offset until you get an empty result set.
It will happily return more empty result sets for a still-increasing offset (… until some internal maximum for offset is reached and a 400 Bad Request is returned.)
The actual implementation of getOneResponePage(int) is almost identical to this:
private Flux<ResponseItem> getOneResponePage(int page) {
return webClientInstance
.get()
.uri(uriBuilder -> {
uriBuilder.queryParam("offset", page * LIMIT);
uriBuilder.queryParam("limit", LIMIT);
// …
})
.retrieve()
.bodyToFlux(ResponseItem.class);
}
Try to avoid Flux<Flux<T>>. Another anti-pattern is to subscribe explicitly (innerFlux.hasElements().subscribe). Ideally you need to subscribe only once, typically on framework layer (e.g. WebFlux subscribes in the underlining HTTP server).
Querying data with a continuously increasing pointer (page number, offset, etc) is a very common pattern and you could use expand operator to implement it. In case of Flux it will try to expand every element. For pagination usually Mono<List<T>> is more useful. Expand will try to expand every page and will stop when getOneResponePage returns Mono.empty().
private Flux<Integer> getAllResponses() {
var page = new AtomicInteger(0); // initial page
return getOneResponePage(page.get())
.expand(list -> getOneResponePage(page.incrementAndGet()))
.flatMapIterable(Function.identity());
}
private Mono<List<Integer>> getOneResponePage(int page) {
System.out.println("Request for page " + page);
// there's only content on the first 3 pages
if (page < 3) {
return Flux
.just(1, 2, 3, 5, 7, 11, 13, 17, 23, 27, 31)
.map(i -> (1000 * page) + i)
.collectList();
}
return Mono.empty();
}
In case your flow is non-blocking you need to subscribe on parallel scheduler. boundedElastic should be used to "offload" blocking tasks. Use .subscribeOn(Schedulers.parallel()). For more details you could check Difference between boundedElastic() vs parallel() scheduler
There is no direct way of stopping the outer flow from the inner flow. The closest thing would be to use switchIfEmpty with a Flux.error(NoSuchElementException) in on the inner sequence, then on the outer sequence use onErrorResumeNext and return an empty Flux if it finds that NoSuchElementException.
Flux.just(listOf(1, 2, 3), listOf(), listOf(4, 5, 6))
.flatMap(list ->
Flux.fromIterable(list)
.switchIfEmpty(Flux.error(new NoSuchElementException()))
)
.onErrorResumeNext(e ->
e instanceof NoSuchElementException ?
Flux.empty() : Flux.error(e)
);
I am writing to an in-memory distributed database in the batch size of that is user-defined in multithreaded environment. But I want to limit the number of rows written to ex. 1000 rows/sec. The reason for this requirement is that my producer is writing too fast and consumer is running into leaf-memory error. Is there any standard practice to perform throttling while batch processing of the records.
dataStream.map(line => readJsonFromString(line)).grouped(memsqlBatchSize).foreach { recordSet =>
val dbRecords = recordSet.map(m => (m, Events.transform(m)))
dbRecords.map { record =>
try {
Events.setValues(eventInsert, record._2)
eventInsert.addBatch
} catch {
case e: Exception =>
logger.error(s"error adding batch: ${e.getMessage}")
val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))
logger.error(s"event: $error_event")
}
}
// Bulk Commit Records
try {
eventInsert.executeBatch
} catch {
case e: java.sql.BatchUpdateException =>
val updates = e.getUpdateCounts
logger.error(s"failed commit: ${updates.toString}")
updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>
val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))
logger.error(s"insert error: $error")
logger.error(e.getMessage)
}
}
finally {
connection.commit
eventInsert.clearBatch
logger.debug(s"committed: ${dbRecords.length.toString}")
}
}
I was hoping if I could pass a user defined arguments as a throttleMax and if total records written by each thread reaches the throttleMax, thread.sleep() will be called for 1 sec. But this is going to make the entire process very slow. Can there be any other effective method, that can be used for throttle the loading of the data to 1000 rows/sec?
As others have suggested (see the comments on the question), you have better options available to you than throttling here. However, you can throttle an operation in Java with some simple code like the following:
/**
* Given an Iterator `inner`, returns a new Iterator which will emit items upon
* request, but throttled to at most one item every `minDelayMs` milliseconds.
*/
public static <T> Iterator<T> throttledIterator(Iterator<T> inner, int minDelayMs) {
return new Iterator<T>() {
private long lastEmittedMillis = System.currentTimeMillis() - minDelayMs;
#Override
public boolean hasNext() {
return inner.hasNext();
}
#Override
public T next() {
long now = System.currentTimeMillis();
long requiredDelayMs = now - lastEmittedMillis;
if (requiredDelayMs > 0) {
try {
Thread.sleep(requiredDelayMs);
} catch (InterruptedException e) {
// resume
}
}
lastEmittedMillis = now;
return inner.next();
}
};
}
The above code uses Thread.sleep, so is not suitable for use in a Reactive system. In that case, you would want to use the Throttle implementation provided in that system, e.g. throttle in Akka
I have UserConfig that I would like to download during splash screen.
class UserManager {
Single<UserConfig> loadConfig()
}
After downloading of the UserConfig, user is redirected to the next screen. I do something like these:
#Override
public void onResume(boolean isNewView) {
subscriptions.add(
userManager.loadConfig().subscribe(config -> {
applyConfig(config);
launchActivity(HomeActivity.class);
}, error -> {
//some error handling
})
);
}
However, I would like to show the splash screen for at least 1 second. (If loading took less than 1s add extra delay)
I think .delay(), .delaySubscription() will not work for my case, since they will delay every request (no matter was it shorter that 1s or not).
Try Zip operator
Returns a Single that emits the results of a specified combiner function > applied to two items emitted by two other Singles.
You can do something like
Single
.zip(
Single.timer(1, TimeUnit.SECONDS),
userManager.loadConfig(),
(time, config) -> config
)
.subscribe(
config -> {
applyConfig(config);
launchActivity(HomeActivity.class);
}, error -> {
//some error handling
}
);
My solution with kotlin extention function for Single type.
This delay work similarly with errors
/**
* sets the minimum delay on the success or error
*/
fun <T> Single<T>.minDelay(time: Long, unit: TimeUnit, scheduler: Scheduler = Schedulers.computation()): Single<T> {
val timeStart = scheduler.now(TimeUnit.MILLISECONDS)
val delayInMillis = TimeUnit.MILLISECONDS.convert(time, unit)
return Single.zip(
Single.timer(time, unit, scheduler),
this.onErrorResumeNext { error: Throwable ->
val afterError = scheduler.now(TimeUnit.MILLISECONDS)
val millisPassed = afterError - timeStart
val needWaitDelay = delayInMillis - millisPassed
if (needWaitDelay > 0)
Single.error<T>(error)
.delay(needWaitDelay, TimeUnit.MILLISECONDS, scheduler, true)
else
Single.error<T>(error)
},
BiFunction { _, t2 -> t2 }
)
}
I am trying to find a way to skip CompletableFuture based on specific conditions.
For example
public CompletableFuture<Void> delete(Long id) {
CompletableFuture<T> preFetchCf = get(id);
CompletableFuture<Boolean> cf1 = execute();
/*This is where I want different execution path, if result of this future is true go further, else do not*/
// Execute this only if result of cf1 is true
CompletableFuture<T> deleteCf = _delete(id);
// Execute this only if result of cf1 is true
CompletableFuture<T> postDeleteProcess = postDelete(id);
}
What is a good way to achieve this ?
I will prepare a different example than the one you used in the question, because your code is not quite clear in intent from the readers perspective.
First suppose the existing of a CompletableFuture<String> that provides the name of a Star Wars characters.
CompletableFuture<String> character = CompletableFuture.completedFuture("Luke");
Now, imagine I have two other CompletableFuture that represent different paths I may want to follow depending on whether the first completable future provides a character that is a Jedi.
Supplier<CompletableFuture<String>> thunk1 = () -> CompletableFuture.completedFuture("This guy is a Jedi");
Supplier<CompletableFuture<String>> thunk2 = () -> CompletableFuture.completedFuture("This guy is not a Jedi");
Notice that I wrapped the CompletableFuture in a a Supplier, to avoid that they get eagerly evaluated (this is concept known as thunk).
Now, I go and to my asynchronous chain:
character.thenApply(c -> isJedi(c))
.thenCompose(isJedi -> isJedi ? thunk1.get() : thunk2.get())
.whenComplete((answer, error) -> System.out.println(answer));
The use of thenCompose let me choose a path based on the boolean result. There I evaluate one of the thunks and cause it to create a new CompletableFuture for the path I care about.
This will print to the screen "This guys is a Jedi".
So, I believe what you're looking for is the thenCompose method.
Not sure if I understand your objective, but why won't you just go with future chaining like you said in the comment? Something like this, just to illustrate:
public class AppTest {
#Test
public void testCompletableFutures() {
Integer id = (int) Math.random() * 1000;
CompletableFuture<Void> testing = AppTest.execute()
.thenAcceptAsync(result -> {
System.out.println("Result is: " + result);
if(result)
AppTest.delete(id);
else
throw new RuntimeException("Execution failed");
})
.thenApplyAsync(result -> AppTest.postDelete())
.thenAcceptAsync(postDeleteResult -> {
if(postDeleteResult)
System.out.println("Post delete cleanup success");
else
throw new RuntimeException("Post delete failed");
});
}
private static boolean postDelete() {
System.out.println("Post delete cleanup");
return Math.random() > 0.3;
}
private static CompletableFuture<Boolean> delete(int i) {
System.out.println("Deleting id = " + i);
return CompletableFuture.completedFuture(true);
}
private static CompletableFuture<Boolean> execute() {
return CompletableFuture.supplyAsync(() -> Math.random() > 0.5);
}
}
Of course that doesn't make much real-life sense, but I think it works to show a concept.
If you want to skip the second call after execute based on the result it's clearly not possible since you need that result. The point is that it should not matter for you whether you skipped that or not since it's asynchronous, you are not blocking to wait for that result.
I'm working with Akka (version 2.4.17) to build an observation Flow in Java (let's say of elements of type <T> to stay generic).
My requirement is that this Flow should be customizable to deliver a maximum number of observations per unit of time as soon as they arrive. For instance, it should be able to deliver at most 2 observations per minute (the first that arrive, the rest can be dropped).
I looked very closely to the Akka documentation, and in particular this page which details the built-in stages and their semantics.
So far, I tried the following approaches.
With throttle and shaping() mode (to not close the stream when the limit is exceeded):
Flow.of(T.class)
.throttle(2,
new FiniteDuration(1, TimeUnit.MINUTES),
0,
ThrottleMode.shaping())
With groupedWith and an intermediary custom method:
final int nbObsMax = 2;
Flow.of(T.class)
.groupedWithin(Integer.MAX_VALUE, new FiniteDuration(1, TimeUnit.MINUTES))
.map(list -> {
List<T> listToTransfer = new ArrayList<>();
for (int i = list.size()-nbObsMax ; i>0 && i<list.size() ; i++) {
listToTransfer.add(new T(list.get(i)));
}
return listToTransfer;
})
.mapConcat(elem -> elem) // Splitting List<T> in a Flow of T objects
Previous approaches give me the correct number of observations per unit of time but these observations are retained and only delivered at the end of the time window (and therefore there is an additional delay).
To give a more concrete example, if the following observations arrives into my Flow:
[Obs1 t=0s] [Obs2 t=45s] [Obs3 t=47s] [Obs4 t=121s] [Obs5 t=122s]
It should only output the following ones as soon as they arrive (processing time can be neglected here):
Window 1: [Obs1 t~0s] [Obs2 t~45s]
Window 2: [Obs4 t~121s] [Obs5 t~122s]
Any help will be appreciated, thanks for reading my first StackOverflow post ;)
I cannot think of a solution out of the box that does what you want. Throttle will emit in a steady stream because of how it is implemented with the bucket model, rather than having a permitted lease at the start of every time period.
To get the exact behavior you are after you would have to create your own custom rate-limit stage (which might not be that hard). You can find the docs on how to create custom stages here: http://doc.akka.io/docs/akka/2.5.0/java/stream/stream-customize.html#custom-linear-processing-stages-using-graphstage
One design that could work is having an allowance counter saying how many elements that can be emitted that you reset every interval, for every incoming element you subtract one from the counter and emit, when the allowance used up you keep pulling upstream but discard the elements rather than emit them. Using TimerGraphStageLogic for GraphStageLogic allows you to set a timed callback that can reset the allowance.
I think this is exactly what you need: http://doc.akka.io/docs/akka/2.5.0/java/stream/stream-cookbook.html#Globally_limiting_the_rate_of_a_set_of_streams
Thanks to the answer of #johanandren, I've successfully implemented a custom time-based GraphStage that meets my requirements.
I post the code below, if anyone is interested:
import akka.stream.Attributes;
import akka.stream.FlowShape;
import akka.stream.Inlet;
import akka.stream.Outlet;
import akka.stream.stage.*;
import scala.concurrent.duration.FiniteDuration;
public class CustomThrottleGraphStage<A> extends GraphStage<FlowShape<A, A>> {
private final FiniteDuration silencePeriod;
private int nbElemsMax;
public CustomThrottleGraphStage(int nbElemsMax, FiniteDuration silencePeriod) {
this.silencePeriod = silencePeriod;
this.nbElemsMax = nbElemsMax;
}
public final Inlet<A> in = Inlet.create("TimedGate.in");
public final Outlet<A> out = Outlet.create("TimedGate.out");
private final FlowShape<A, A> shape = FlowShape.of(in, out);
#Override
public FlowShape<A, A> shape() {
return shape;
}
#Override
public GraphStageLogic createLogic(Attributes inheritedAttributes) {
return new TimerGraphStageLogic(shape) {
private boolean open = false;
private int countElements = 0;
{
setHandler(in, new AbstractInHandler() {
#Override
public void onPush() throws Exception {
A elem = grab(in);
if (open || countElements >= nbElemsMax) {
pull(in); // we drop all incoming observations since the rate limit has been reached
}
else {
if (countElements == 0) { // we schedule the next instant to reset the observation counter
scheduleOnce("resetCounter", silencePeriod);
}
push(out, elem); // we forward the incoming observation
countElements += 1; // we increment the counter
}
}
});
setHandler(out, new AbstractOutHandler() {
#Override
public void onPull() throws Exception {
pull(in);
}
});
}
#Override
public void onTimer(Object key) {
if (key.equals("resetCounter")) {
open = false;
countElements = 0;
}
}
};
}
}