Infinite never failing hot flux

Infinite never failing hot flux - java

I'm using webflux for handling my http request. As an side effect of the processing I want to add record to the database but I do not want to stop processing of user request to achieve that.
Somewhere in main application flow.
#GetMapping
Flux<Data> getHandler(){
return doStuff().doOnNext(data -> dataStore.store(data));
}
In different class I have
class DataStore {
private static final Logger LOGGER = LoggerFactory.getLogger(DataStore.class);
private DataRepository repository;
private Scheduler scheduler;
private Sinks.Many<Data> sink;
public DataStore(DataRepository repository, Scheduler scheduler)
this.repository = repository;
this.scheduler = scheduler; //will be boundedElastic in production
this.sink = Sinks.many().replay().limit(1000); //buffer size
//build hot flux
this.sink.asFlux()
.map(data -> repository.save(data))
// retry strategy for random issues with DB connection
.retryWhen(Retry.backoff(maxRetry, backoffDuration)
.doBeforeRetry(signal -> LOGGER.warn("Retrying to save, attempt {}", signal.totalRetries())))
// give up on saving this item, drop it, try with another one, reset backoff strategy in the meantime
.onErrorContinue(Exceptions::isRetryExhausted, (e, o) -> LOGGER.error("Dropping data"))
.subscribeOn(scheduler, true)
.subscribe(
data-> LOGGER.info("Data {} saved.", data),
error -> LOGGER.error("Fatal error. Terminating store flux.", error)
);
}
public void store(Data data) {
sink.tryEmitNext(data);
}
But when writing tests for it I have noticed that if backoff reaches it limit flux instead of doping the data and continuing will just stop.
#BeforeEach
public void setup() {
repository = mock(DataRepository.class);
dataStore = new DataStore(repository, Schedulers.immediate()); //maxRetry = 4, backoffDuration = Duration.ofMillis(1)
}
#Test
public void test() throws Exception {
//given
when(repository.save(any()))
.thenThrow(new RuntimeException("fail")) // normal store
.thenThrow(new RuntimeException("fail")) // first retry
.thenThrow(new RuntimeException("fail")) // second retry
.thenThrow(new RuntimeException("fail")) // third retry
.thenThrow(new RuntimeException("fail")) // fourth retry -> should drop data("One")
.thenAnswer(invocation -> invocation.getArgument(0)) //store data("Two")
.thenAnswer(invocation -> invocation.getArgument(0));//store data("Three")
//when
searchStore.store(data("One")); //exhaust 5 retries
searchStore.store(data("Two")); //successful store
searchStore.store(data("Three")); //successful store
//then
Thread.sleep(2000); //overkill sleep
verify(repository, times(7)).save(any()); //assertion fails. data two and three was not saved.
}
When running this test my assertion fails and in the logs I can see only
Retrying to save, attempt 0
Retrying to save, attempt 1
Retrying to save, attempt 2
Retrying to save, attempt 3
Dropping data
And there is no info of successful processing of data Two and Three.
I do not want to retry indefinitely, because I assume that DB connection may fail from time to time and I do not want to have buffer overflow.
I know that I can achieve similar flow without flux (use queue etc.), but the build in retry with backoff is very tempting.
How I can drop error from the flux as onErrorContinue does not seam to be working?

General note - the code in the above question isn't used in a reactive context, and therefore this answer suggestions options that would be completely wrong if using Webflux or in a similar reactive environment.
Firstly, note that onErrorContinue() is almost certainly not what you want - not just in this situation, but generally. It's a specialist operator that almost certainly doesn't quite do what you think it does.
Usually, I'd balk at this line:
.map(data -> repository.save(data))
...as it implies your repository isn't reactive, so you're blocking in a reactive chain - a complete no-no. In this case because you're using it purely for the convenience of the retry semantics it's not going to cause issues, but bear in mind most people used to seeing reactive code will get scared when they see stuff like this!
If you're able to use a reactive repository, that would be better, and then you'd expect to see something like this:
.flatMap(data -> repository.save(data))
...implying that the save() method is returning a non-blocking Mono rather than a plain value after blocking. The norm with retrying would then be to retry on the inner publisher, resuming on an empty publisher if retries are exhausted:
.flatMap(data -> repository.save(data)
.retryWhen(Retry.backoff(maxRetry, backoffDuration)
.doBeforeRetry(signal -> LOGGER.warn("Retrying to save, attempt {}", signal.totalRetries())))
.onErrorResume(Exceptions::isRetryExhausted, e -> Mono.empty())
)
If you're not able or willing to use a reactive repository, then in this case you can still achieve the above by wrapping repository.save(data) as Mono.just(repository.save(data)) - but again, that's a bit of a code smell, and completely forbidden in a standard reactive chain, as you're making something "look" reactive when it's not.

Related

How to wait for future inside Kafka Stream map()?

I am implementing Spring Boot application in Java, using Spring Cloud Stream with Kafka Streams binder.
I need to implement blocking operation inside of KStream map method like so:
public Consumer<KStream<?, ?>> sink() {
return input -> input
.mapValues(value -> methodReturningCompletableFuture(value).get())
.foreach((key, value) -> otherMethod(key, value));
}
completableFuture.get() throws exceptions (InterruptedException, ExecutionException)
How to handle these exceptions so that the chained method doesn't get executed and the Kafka message is not acknowledged? I cannot afford message loss, sending it to a dead letter topic is not an option.
Is there a better way of blocking inside map()?

You can try the branching feature in Kafka Streams to control the execution of the chained methods. For example, here is a pseudo-code that you can try.
You can possibly use this as a starting point and adapt this to your particular use case.
final Map<String, ? extends KStream<?, String>> branches =
input.split()
.branch(k, v) -> {
try {
methodReturningCompletableFuture(value).get();
return true;
}
catch (Exception e) {
return false;
}
}, Branched.as("good-records"))
.defaultBranch();
final KStream<?, String> kStream = branches.get("good-records");
kStream.foreach((key, value) -> otherMethod(key, value));
The idea here is that you will only send the records that didn't throw an exception to the named branch good-records, everything else goes into a default branch which we simply ignore in this pseudo-code. Then you invoke additional chained methods (as this foreach call shows) only for those "good" records.
This does not solve the problem of not acknowledging the message after an exception is thrown. That seems to be a bit challenging. However, I am curious about that use case. When an exception happens and you handle it, why don't you want to ack the message? The requirements seem to be a bit rigid without using a DLT. The ideal solution here is that you might want to introduce some retries and once exhausted from the retries, send the record to a DLT which makes Kafka Streams consumer acknowledges the message. Then the application moves on to the next offset.
The call methodReturningCompletableFuture(value).get() simply waits until a default or configured timeout is reached, assuming that methodReturningCompletableFuture() returns a Future object. Therefore, that is already a good approach to wait inside the KStream map operation. I don't think anything else is necessary to make it wait further.

Detect timeouts in a CompletableFuture chain

Is there any possible safe way to detect timeouts in a CompletableFuture chain?
O someValue = CompletableFuture.supplyAsync(() -> {
...
// API Call
...
}).thenApply(o -> {
...
}).thenApply(o -> {
// If the chain has timed out, I still have 'o' ready here
// So at least cache it here, so it's available for the next request
// Even though the current request will return with a 'null'
...
}).get(10, TimeUnit.SECONDS);
// cache 'someValue'
return someValue;
It completes successfully without a timeout, I can use 'someValue' and do whatever with it
If it times out, it throws a TimeoutException and I have lost the value, even though it's still being processed in the background
The idea is that even if it times out and since the API call in the thread still completes in the background and returns the response, I can use that value, let's say, for caching

Not at least in the way you show. When the exception is thrown, you lose any chance of getting your hands on the results of the API call even if it finishes. Your only chances of caching in a chain like that would be something like the following, which would not help with the time-outing API call itself
.thenApplyAsync(o -> {
cache = o;
// do something
}).thenApplyAsync(o -> {
cache = o;
// do something more
}).get(10, TimeUnit.SECONDS);
However reading through this gave me an idea, that what if you did something like the following
SynchronousQueue<Result> q = new SynchronousQueue<>();
CompletableFuture.supplyAsync(() -> {
// API call
}.thenAccept(result -> {
cache.put(result); // cache the value
q.offer(result); // offer value to main thread, if still there
}
);
// Main thread waits 10 seconds for a value to be asynchronously offered into the queue
// In case of timeout, null is returned, but any operations done
// before q.offer(result) are still performed
return queue.poll(10, TimeUnit.SECONDS);
An API call that doesn't finish in 10 seconds is still processed into cache as it is asynchronously accepted and the timeout happens in the main thread and not the CompletableFuture chain, even though the original request won't get the results (and I guess has to deal with it gracefully).

Java Stream vs Flux fromIterable

I have a list of usernames and want to fetch user details from the remote service without blocking the main thread. I'm using Spring's reactive client WebClient. For the response, I get Mono then subscribe it and print the result.
private Mono<User> getUser(String username) {
return webClient
.get()
.uri(uri + "/users/" + username)
.retrieve()
.bodyToMono(User.class)
.doOnError(e ->
logger.error("Error on retrieveing a user details {}", username));
}
I have implemented the task in two ways:
Using Java stream
usernameList.stream()
.map(this::getUser)
.forEach(mono ->
mono.subscribe(System.out::println));
Using Flux.fromIterable:
Flux.fromIterable(usernameList)
.map(this::getUser)
.subscribe(mono ->
mono.subscribe(System.out::println));
It seems the main thread is not blocked in both ways.
What is the difference between Java Stream and Flux.fromIterable in this situation? If both are doing the same thing, which one is recommended to use?

There are not huge differences between both variants. The Flux.fromIterable variant might give your more options and control about concurrency/retries, etc - but not really in this case because calling subscribe here defeats the purpose.
Your question is missing some background about the type of application you're building and in which context these calls are made. If you're building a web application and this is called during request processing, or a batch application - opinions might vary.
In general, I think applications should stay away from calling subscribe because it disconnects the processing of that pipeline from the rest of the application: if an exception happens, you might not be able to report it because the resource to use to send that error message might be gone at that point. Or maybe the application is shutting down and you have no way to make it wait the completion of that task.
If you're building an application that wants to kick off some work and that its result is not useful to the current operation (i.e. it doesn't matter if that work completes or not during the lifetime of the current operation), then subscribe might be an option.
In that case, I'd try and group all operations in a single Mono<Void> operation and then trigger that work:
Mono<Void> logUsers = Flux.fromIterable(userNameList)
.map(name -> getUser(name))
.doOnNext(user -> System.out.println(user)) // assuming this is non I/O work
.then();
logUsers.subscribe(...);
If you're concerned about consuming server threads in a web application, then it's really different - you might want to get the result of that operation to write something to the HTTP response. By calling subscribe, both tasks are now disconnected and the HTTP response might be long gone by the time that work is done (and you'll get an error while writing to the response).
In that case, you should chain the operations with Reactor operators.

Async method followed by a parallelly executed method in Java 8

After spending the day of learning about the java Concurrency API, I still dont quite get how could I create the following functionality with the help of CompletableFuture and ExecutorService classes:
When I get a request on my REST endpoint I need to:
Start an asynchronous task (includes DB query, filtering, etc.), which will give me a list of String URLs at the end
In the meanwhile, responde back to the REST caller with HTTP OK, that the request was received, I'm working on it
When the asynchronous task is finished, I need to send HTTP requests (with the payload, the REST caller gave me) to the URLs I got from the job. At most the number of URLs would be around a 100, so I need these to happen in parallel.
Ideally I have some syncronized counter which counts how many of the http requests were a success/fail, and I can send this information back to the REST caller (the URL I need to send it back to is provided inside the request payload).
I have the building blocks (methods like: getMatchingObjectsFromDB(callerPayload), getURLs(resultOfgetMachingObjects), sendHttpRequest(Url, methodType), etc...) written for these already, I just cant quite figure out how to tie step 1 and step 3 together. I would use CompletableFuture.supplyAsync() for step 1, then I would need the CompletableFuture.thenComponse method to start step 3, but it's not clear to me how parallelism can be done with this API. It is rather intuitive with ExecutorService executor = Executors.newWorkStealingPool(); though, which creates a thread pool based on how much processing power is available and the tasks can be submitted via the invokeAll() method.
How can I use CompletableFutureand ExecutorService together? Or how can I guarantee parallel execution of a list of tasks with CompletableFuture? Demonstrating code snippet would be much appreciated. Thanks.

You should use join() to wait for all thread finish.
Create Map<String, Boolean> result to store your request result.
In your controller:
public void yourControllerMethod() {
CompletableFuture.runAsync(() -> yourServiceMethod());
}
In your service:
// Execute your logic to get List<String> urls
List<CompletableFuture> futures = urls.stream().map(v ->
CompletableFuture.supplyAsync(url -> requestUrl(url))
.thenAcceptAsync(requestResult -> result.put(url, true or false))
).collect(toList()); // You have list of completeable future here
Then use .join() to wait for all thread (Remember that your service are executed in its own thread already)
CompletableFuture.allOf(futures).join();
Then you can determine which one success/fail by accessing result map
Edit
Please post your proceduce code so that other may understand you also.
I've read your code and here are the needed modification:
When this for loop was not commented out, the receiver webserver got
the same request twice,
I dont understand the purpose of this for loop.
Sorry in my previous answer, I did not clean it up. That's just a temporary idea on my head that I forgot to remove at the end :D
Just remove it from your code
// allOf() only accepts arrays, so the List needed to be converted
/* The code never gets over this part (I know allOf() is a blocking call), even long after when the receiver got the HTTP request
with the correct payload. I'm not sure yet where exactly the code gets stuck */
Your map should be a ConcurrentHashMap because you're modifying it concurrently later.
Map<String, Boolean> result = new ConcurrentHashMap<>();
If your code still does not work as expected, I suggest to remove the parallelStream() part.
CompletableFuture and parallelStream use common forkjoin pool. I think the pool is exhausted.
And you should create your own pool for your CompletableFuture:
Executor pool = Executors.newFixedThreadPool(10);
And execute your request using that pool:
CompletableFuture.supplyAsync(YOURTASK, pool).thenAcceptAsync(Yourtask, pool)

For the sake of completion here is the relevant parts of the code, after clean-up and testing (thanks to Mạnh Quyết Nguyễn):
Rest controller class:
#POST
#Path("publish")
public Response publishEvent(PublishEvent eventPublished) {
/*
Payload verification, etc.
*/
//First send the event to the right subscribers, then send the resulting hashmap<String url, Boolean subscriberGotTheRequest> back to the publisher
CompletableFuture.supplyAsync(() -> EventHandlerService.propagateEvent(eventPublished)).thenAccept(map -> {
if (eventPublished.getDeliveryCompleteUri() != null) {
String callbackUrl = Utility
.getUri(eventPublished.getSource().getAddress(), eventPublished.getSource().getPort(), eventPublished.getDeliveryCompleteUri(), isSecure,
false);
try {
Utility.sendRequest(callbackUrl, "POST", map);
} catch (RuntimeException e) {
log.error("Callback after event publishing failed at: " + callbackUrl);
e.printStackTrace();
}
}
});
//return OK while the event publishing happens in async
return Response.status(Status.OK).build();
}
Service class:
private static List<EventFilter> getMatchingEventFilters(PublishEvent pe) {
//query the database, filter the results based on the method argument
}
private static boolean sendRequest(String url, Event event) {
//send the HTTP request to the given URL, with the given Event payload, return true if the response is positive (status code starts with 2), false otherwise
}
static Map<String, Boolean> propagateEvent(PublishEvent eventPublished) {
// Get the event relevant filters from the DB
List<EventFilter> filters = getMatchingEventFilters(eventPublished);
// Create the URLs from the filters
List<String> urls = new ArrayList<>();
for (EventFilter filter : filters) {
String url;
try {
boolean isSecure = filter.getConsumer().getAuthenticationInfo() != null;
url = Utility.getUri(filter.getConsumer().getAddress(), filter.getPort(), filter.getNotifyUri(), isSecure, false);
} catch (ArrowheadException | NullPointerException e) {
e.printStackTrace();
continue;
}
urls.add(url);
}
Map<String, Boolean> result = new ConcurrentHashMap<>();
Stream<CompletableFuture> stream = urls.stream().map(url -> CompletableFuture.supplyAsync(() -> sendRequest(url, eventPublished.getEvent()))
.thenAcceptAsync(published -> result.put(url, published)));
CompletableFuture.allOf(stream.toArray(CompletableFuture[]::new)).join();
log.info("Event published to " + urls.size() + " subscribers.");
return result;
}
Debugging this was a bit harder than usual, sometimes the code just magically stopped. To fix this, I only put code parts into the async task which was absolutely necessary, and I made sure the code in the task was using thread-safe stuff. Also I was a dumb-dumb at first, and my methods inside the EventHandlerService.class used the synchronized keyword, which resulted in the CompletableFuture inside the Service class method not executing, since it uses a thread pool by default.
A piece of logic marked with synchronized becomes a synchronized block, allowing only one thread to execute at any given time.

Observable.concat with Exception

I have 2 data sources: DB and server. When I start the application, I call the method from the repository (MyRepository):
public Observable<List<MyObj>> fetchMyObjs() {
Observable<List<MyObj>> localData = mLocalDataSource.fetchMyObjs();
Observable<List<MyObj>> remoteData = mRemoteDataSource.fetchMyObjs();
return Observable.concat(localData, remoteData);
}
I subscribe to it as follows:
mMyRepository.fetchMyObjs()
.compose(applySchedulers())
.subscribe(
myObjs -> {
//do somthing
},
throwable -> {
//handle error
}
);
I expect that the data from the database will be loaded faster, and when the download of data from the network is completed, I will simply update the data in Activity.
When the Internet is connected, everything works well. But when we open the application without connecting to the network, then mRemoteDataSource.fetchMyObjs(); throws UnknownHostException and on this all Observable ends (the subscriber for localData does not work (although logs tell that the data from the database was taken)). And when I try to call the fetchMyObjs() method again from the MyRepository class (via SwipeRefresh), the subscriber to localData is triggered.
How can I get rid of the fact that when the network is off, when the application starts, does the subscriber work for localData?

Try some of error handling operators:
https://github.com/ReactiveX/RxJava/wiki/Error-Handling-Operators
I'd guess onErrorResumeNext( ) will be fine but you have to test it by yourself. Maybe something like this would work for you:
Observable<List<MyObj>> remoteData = mRemoteDataSource.fetchMyObjs()
.onErrorResumeNext()
Addidtionally I am not in position to judge if your idea is right or not but maybe it's worth to think about rebuilding this flow. It is not the right thing to ignore errors - that's for sure ;)

You can observe your chain with observeOn(Scheduler scheduler, boolean delayError) and delayError set to true.
delayError - indicates if the onError notification may not cut ahead of onNext notification on the other side of the scheduling boundary. If true a sequence ending in onError will be replayed in the same order as was received from upstream

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.