Debounce similar requests with reactor-grpc - java

In order to offload my database, I would like to debounce similar requests in a gRPC service (say for instance that they share the same id part of the request) that serves an API which does not have strong requirements in terms of latency. I know how to do that with vanilla gRPC but I am sure what kind of API of Mono I can use.
The API calling directly the db looks like this:
public Mono<Blob> getBlob(
Mono<MyRequest> request) {
return request.
map(reader.getBlob(request.getId()));
I have a feeling I should use delaySubscription but then it does not seem that groupBy is part of the Mono API that gRPC services handle.

It's perfeclty ok to detect duplicates not using reactive operators:
// Guava cache as example.
private final Cache<String, Boolean> duplicatesCache = CacheBuilder.newBuilder()
.expireAfterWrite(Duration.ofMinutes(1))
.build();
public Mono<Blob> getBlob(Mono<MyRequest> request) {
return request.map(req -> {
var id = req.getId();
var cacheKey = extractSharedIdPart(id);
if (duplicatesCache.getIfPresent(cacheKey) == null) {
duplicatesCache.put(cacheKey, true);
return reader.getBlob(id);
} else {
return POISON_PILL; // Any object that represents debounce hit.
// Or use flatMap() + Mono.error() instead.
}
});
}
If for some reason you absolutely want to use reactive operators, then first you need to convert incoming grpc requests into Flux. This can be achieved using thirdparty libs like salesforce/reactive-grpc or directly:
class MyService extends MyServiceGrpc.MyServiceImplBase {
private FluxSink<Tuple2<MyRequest, StreamObserver<MyResponse>>> sink;
private Flux<Tuple2<MyRequest, StreamObserver<MyResponse>>> flux;
MyService() {
flux = Flux.create(sink -> this.sink = sink);
}
#Override
public void handleRequest(MyRequest request, StreamObserver<MyResponse> responseObserver) {
sink.next(Tuples.of(request, responseObserver));
}
Flux<Tuple2<MyRequest, StreamObserver<MyResponse>>> getFlux() {
return flux;
}
}
Next you subscribe to this flux and use operators you like:
public static void main(String[] args) {
var mySvc = new MyService();
var server = ServerBuilder.forPort(DEFAULT_PORT)
.addService(mySvc)
.build();
server.start();
mySvc.getFlux()
.groupBy(...your grouping logic...)
.flatMap(group -> {
return group.sampleTimeout(...your debounce logic...);
})
.flatMap(...your handling logic...)
.subscribe();
}
But beware of using groupBy with lots of distinct shared id parts:
The groups need to be drained and consumed downstream for groupBy to work correctly. Notably when the criteria produces a large amount of groups, it can lead to hanging if the groups are not suitably consumed downstream (eg. due to a flatMap with a maxConcurrency parameter that is set too low).

Related

Java Reactor API: how to wait for an object to be modified by asynchronous calls to be completely modified before sending it back to the caller?

I'm totally new to the Java Reactor API.
I use a WebClient to retrieve data from an external webservice, which I then map to a DTO of class "LearnDetailDTO".
But before sending back this DTO, I have to modify it with data I get from another webservice. For this, I chain the calls with flatMap(). I get my data from the second webservice, but my DTO is returned before it is modified with the new data.
My problem is: how to wait until all calls to the second webservice are finished and the DTO is modified before sending it back to the caller?
Here is my code:
class Controller {
#GetMapping(value = "/learn/detail/", produces = MediaType.APPLICATION_JSON_VALUE)
public Mono<LearnDetailDTO> getLearnDetail() {
return getLearnDetailDTO();
}
private Mono<LearnDetailDTO> getLearnDetailDTO() {
WebClient client = WebClient.create("https://my_rest_webservice.com");
return client
.get()
.retrieve()
.bodyToMono(LearnDetailDTO.class)
.flatMap(learnDetailDTO -> {
LearnDetailDTO newDto = new LearnDetailDTO(learnDetailDTO );
for (GroupDTO group : newDto.getGroups()) {
String keyCode = group.getKeyCode();
for (GroupDetailDto detail : group.getGroupsDetailList()) {
adeService.getResourcesList(keyCode) // one asynchonous rest call to get resources
.flatMap(resource -> {
Long id = resource.getData().get(0).getId();
return adeService.getEventList(id); // another asynchronous rest call to get an events list with the resource coming from the previous call
})
.subscribe(event -> {
detail.setCreneaux(event.getData());
});
}
}
return Mono.just(newDto);
});
}
I tried to block() my call to adeservice.getEventList() instead of subscribe(), but I get the following error:
block()/blockFirst()/blockLast() are blocking, which is not supported
in thread reactor-http-nio-2
How to be sure that my newDTO object is complete before returning it ?
You should not mutate objects in subscribe. The function passed to subscribe will be called asynchronously in an unknown time in the future.
Subscribe should be considered a terminal operation, and should only serve to connect to other part of your system. It should not modify values inside the scope of your datastream.
What you want, is a pipeline that collects all events, and then map them to a dto with collected events.
As a rule of thumb your pipeline result must be composed of accumulated results in the operation chain. You should never have a "subscribe" in the middle of the operation chain, and you should never mutate an object with it.
I will provide a simplified example so you can take time to analyze the logic that can reach the goal: accumulate new values asynchronously in a single result. In this example, I've removed any notion of "detail" to connect directly groups to events, to simplify the overall code.
The snippet:
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
public class AccumulateProperly {
// Data object definitions
record Event(String data) {}
record Resource(int id) {}
record Group(String keyCode, List<Event> events) {
// When adding events, do not mute object directly. Instead, create a derived version
Group merge(List<Event> newEvents) {
var allEvents = new ArrayList<>(events);
allEvents.addAll(newEvents);
return new Group(keyCode, allEvents);
}
}
record MyDto(List<Group> groups) { }
static Flux<Resource> findResourcesByKeyCode(String keyCode) {
return Flux.just(new Resource(1), new Resource(2));
}
static Flux<Event> findEventById(int id) {
return Flux.just(
new Event("resource_"+id+"_event_1"),
new Event("resource_"+id+"_event_2")
);
}
public static void main(String[] args) {
MyDto dtoInstance = new MyDto(List.of(new Group("myGroup", List.of())));
System.out.println("INITIAL STATE:");
System.out.println(dtoInstance);
// Asynchronous operation pipeline
Mono<MyDto> dtoCompletionPipeline = Mono.just(dtoInstance)
.flatMap(dto -> Flux.fromIterable(dto.groups)
// for each group, find associated resources
.flatMap(group -> findResourcesByKeyCode(group.keyCode())
// For each resource, fetch its associated event
.flatMap(resource -> findEventById(resource.id()))
// Collect all events for the group
.collectList()
// accumulate collected events in a new instance of the group
.map(group::merge)
)
// Collect all groups after they've collected events
.collectList()
// Build a new dto instance from the completed set of groups
.map(completedGroups -> new MyDto(completedGroups))
);
// NOTE: block is here only because we are in a main function and that I want to print
// pipeline output before program extinction.
// Try to avoid block. Return your mono, or connect it to another Mono or Flux object using
// an operation like flatMap.
dtoInstance = dtoCompletionPipeline.block(Duration.ofSeconds(1));
System.out.println("OUTPUT STATE:");
System.out.println(dtoInstance);
}
}
Its output:
INITIAL STATE:
MyDto[groups=[Group[keyCode=myGroup, events=[]]]]
OUTPUT STATE:
MyDto[groups=[Group[keyCode=myGroup, events=[Event[data=resource_1_event_1], Event[data=resource_1_event_2], Event[data=resource_2_event_1], Event[data=resource_2_event_2]]]]]

What's the correct way to use webflux and reactor

I'm leaning webflux and reactor. Got three test methods as below. "documentOperations.findById" and "documentOperations.delete" are two database operations. I know test1 is bad as the two db operations are placed in one async method. My question is:
Do test2 and test3 have the same impact to system performace? Or in other words, which one is better?
private Mono<ServerResponse> test1(ServerRequest request, Contexts contexts) {
return request.body(bodyDocumentExtractor)
.flatMap(doc -> {
Document document = documentOperations.findById(doc.getId());
documentOperations.delete(document.getId());
return ServerResponse.noContent().build();
});
}
private Mono<ServerResponse> test2(ServerRequest request, Contexts contexts) {
return request.body(bodyDocumentExtractor)
.flatMap(doc -> {
return Mono.just(documentOperations.findById(doc.getId()))
.flatMap(document -> {
documentOperations.delete(document.getId());
return ServerResponse.noContent().build();
});
});
}
private Mono<ServerResponse> test3(ServerRequest request, Contexts contexts) {
return request.body(bodyDocumentExtractor)
.flatMap(doc -> {
return Mono.just(documentOperations.findById(doc.getId()));
}).flatMap(document -> {
documentOperations.delete(document.getId());
return ServerResponse.noContent().build();
});
}
None of the examples above are good. All your database calls return concrete types which means that they are all blocking calls.
// returns the concrete type
// thread does the call, needs to wait until we get the value (document)
Document document = documentOperations.findById("1");
If it is non blocking it returns a Mono<T> or a Flux<T>.
// Returns a mono, so we know it's not blocking.
// We can chain on actions with for example flatMap etc.
Mono<Document> document = documentOperations.findById("1");
If you have to use a blocking database like for instance oracle database etc. You need to place this on its entire own thread, so that it doesn't block any of the main worker threads. This can be done with a scheduler. So in this example when a client subscribes it will be placed on a separate thread.
Mono<Document> document = Mono.fromCallable(() -> documentOperations.findById("1"))
.subscribeOn(Schedulers.boundedElastic());;
So for your example:
private Mono<ServerResponse> test3(ServerRequest request, Contexts contexts) {
return request.body(bodyDocumentExtractor)
.flatMap(doc -> Mono.fromCallable(() -> documentOperations.findById(doc.getId()))
.flatMap(document -> Mono.fromCallable(() -> documentOperations.delete(document.getId()))
.then(ServerResponse.noContent().build());
).subscribeOn(Schedulers.boundedElastic());
}
Reactor documentation - Wrap blocking calls
I would expect the code to look something along the lines of the code below based on the assumption that DocumentOperations is reactive.
private Mono<ServerResponse> test3(ServerRequest request, Contexts contexts) {
return request.body(bodyDocumentExtractor)
.flatMap(doc -> documentOperations.findById(doc.getId()))
.flatMap(document -> documentOperations.delete(document.getId()))
.then(ServerResponse.noContent().build());
}

Spring WebFlux - how to determine when my client has finished working

I need to call certain API with multiple query params simultaneously, in order to do that I wanted to use reactive approach. I ended up with reactive client that is able to call endpoint based on passed SearchQuery, handle pagination of that response and call for remaining pages and returns Flux<Item>. So far it works fine, however what I need to do now is to:
Collect data for all search queries and save them as initial state
Once the initial data is collected, I need to start repeating those calls in small time intervals and validate each item against initial data. Basically, I need to find new items from here.
But I'm running out of options how to solve that, I came up with probably the dirties solution ever, but I bet there are much better ways to do that.
So first of all, this is relevant code of my client
public Flux<Item> collectData(final SearchQuery query) {
final var iteration = new int[]{0};
return invoke(query, 0).expand(res ->
this.handleResponse(res, query, iteration))
.flatMap(response -> Flux.fromIterable(response.collectItems()));
}
private Mono<ApiResponse> handleResponse(final ApiResponse response, final SearchQuery searchQuery, final int[] iteration) {
return hasNextPage(response) ? invoke(searchQuery, calculateOffset(++iteration[0])) : Mono.empty();
}
private Mono<ApiResponse> invoke(final SearchQuery query, final int offset) {
final var url = offset == 0 ? query.toUrlParams() : query.toUrlParamsWithOffset(offset);
return doInvoke(url).onErrorReturn(ApiResponse.emptyResponse());
}
private Mono<ApiResponse> doInvoke(final String endpoint) {
return webClient.get()
.uri(endpoint)
.retrieve()
.bodyToMono(ApiResponse.class);
}
And here is my service that is using this client
private final Map<String, Item> initialItems = new ConcurrentHashMap<>();
void work() {
final var executorService = Executors.newSingleThreadScheduledExecutor();
queryRepository.getSearchQueries().forEach(query -> {
reactiveClient.collectData(query).subscribe(item -> initialItems.put(item.getId(), item));
});
executorService.scheduleAtFixedRate(() -> {
if(isReady()) {
queryRepository.getSearchQueries().forEach(query -> {
reactiveClient.collectData(query).subscribe(this::process);
});
}
}, 0, 3, TimeUnit.SECONDS);
}
/**
* If after 2 second sleep size of initialItems remains the same,
* that most likely means that initial population phase is over,
* and we can proceed with further data processing
**/
private boolean isReady() {
try {
final var snapshotSize = initialItems.size();
Thread.sleep(2000);
return snapshotSize == initialItems.size();
} catch (Exception e) {
return false;
}
}
I think the code speaks for itself, I just want to finish first phase, which is initial data population and then start processing all incomming data.

Using RxJava for request response layer with WebSockets

I'm trying to implement a request -> response layer on top of websockets in Java. I recently stumbled across RxJava, which seems like a nice library to use for this. Down below is my current approach for handling the request response flow (unimportant code omitted for readability)
public class SimpleServer extends WebSocketServer {
Gson gson = new Gson();
Map<String, Function<JsonObject, Void>> requests = new HashMap<>();
private static int count = 0;
public SimpleServer(InetSocketAddress address) {
super(address);
}
#Override
public void onMessage(WebSocket conn, String message) {
String type = ...;
JsonObject payload = ...;
if (type.equals("response")) {
Request request = requests.get(requestId).apply(payload);
}
}
public Single<JsonObject> request(String action) {
requests.put(Integer.toString(count++), response -> {
source.onSuccess(response);
return null;
});
broadcast(...);
}
}
Is this a viable solution or is there a better way? I was thinking if there was a way to use RxJava for both ways, i.e. the request would listen to an "onMessage" observable or something along those lines.
All help will be greatly appreciated.
You can use RxJava for communication in both ways. Let's start with a simpler one – receiving messages. I recommend you use BehaviorRelay what behaves both like Observer and Consumer. You can both listen for emitted values and produce values – messages in our case. A simple implementation might look like this:
public class SimpleServer extends WebSocketServer {
private BehaviorRelay<String> receivedMessages = BehaviorRelay.create();
public SimpleServer(InetSocketAddress address) {
super(address);
}
#Override
public void onMessage(WebSocket conn, String message) {
receivedMessages.accept(message); // "sends" value to the relay
}
public Observable<String> getReceivedMessagesRx() {
return receivedMessages.hide(); // Cast Relay to Observable
}
//...
You can now call function getReceivedMessagesRx() and subscribe for incoming messages.
Now the more interesting part – sending messages. Let's assume, you have some Observable, what produces messages you want to send:
// ...
private Disposable senderDisposable = Disposables.disposed(); // (1)
public void setMessagesSender(Observable<String> messagesToSend) { // (2)
senderDisposable = messagesToSend.subscribe(message -> {
broadcast(message);
}, throwable -> {
// handle broadcast error
});
}
public void clear() { // (3)
senderDisposable.dispose();
}
}
What happens here:
Create Disposable which holds a reference to running observer of the messages to be sent.
Subscribe to passed Observable what emits every time you want to send a message. This function is meant to be called only once. If you want to call it multiple times, handle the disposal of previous sender or use CompositeDisposable to store multiple disposables.
When you are done working with your server, do not forget to dispose messages sender.

spring webflux how to manage sequential business logic code in reactive world

Is this approach is reactive friendly?
I have a reactive controller "save" method calling myService.save(request).
The service layer needs to:
jdbc save(on another scheduler because code is blocking),
generate a template string (on another scheduler),
send an email(on another scheduler),
finally return the saved entity to the controller layer
I can't chain all my calls in one pipeline or I don't know how to achieve this, because I want to send back (1) that is lost as soon as I do ....flatMap(templateService::generateStringTemplate) for example.
So instead I trigger my sub operations inside (1).
Is it how am I supposed to handle this or is there a clever way to do it in one pipeline ?
Below code to support the question. Thanks.
Service called by Controller layer
public Mono<Prospect> save(final Prospect prospect) {
return Mono.fromCallable(
() -> {
Prospect savedProspect = transactionTemplate.execute(status -> prospectRepository.save(prospect));
templateService.generateProspectSubscription(savedProspect)
.map(t ->
EmailPostRequest.builder()
...
.build())
.flatMap(emailService::send)
.subscribe();
return savedProspect;
})
.subscribeOn(jdbcScheduler);
}
TemplateService
public Mono<String> generateProspectSubscription(final Prospect prospect) {
return Mono.fromCallable(
() -> {
Map<String, Object> model = new HashMap<>();
...
Template t = freemarkerConfig.getTemplate(WELCOME_EN_FTL);
String html = FreeMarkerTemplateUtils.processTemplateIntoString(t, model);
return html;
}
).subscribeOn(freemarkerScheduler);
}
EmailService
public Mono<Void> send(final EmailPostRequest e) {
return Mono.fromCallable(
() -> {
MimeMessage message = emailSender.createMimeMessage();
MimeMessageHelper mimeHelper = new MimeMessageHelper(message,
MimeMessageHelper.MULTIPART_MODE_MIXED_RELATED,
StandardCharsets.UTF_8.name());
mimeHelper.setTo(e.getTo());
mimeHelper.setText(e.getText(), true);
mimeHelper.setSubject(e.getSubject());
mimeHelper.setFrom(new InternetAddress(e.getFrom(), e.getPersonal()));
emailSender.send(message);
return Mono.empty();
}
).subscribeOn(emailScheduler).then();
}
EDITED SERVICE
I think this version of service layer is cleaner but any comments is appreciated
public Mono<Prospect> save(final Prospect prospect) {
return Mono.fromCallable(
() -> transactionTemplate.execute(status -> prospectRepository.save(prospect)))
.subscribeOn(jdbcScheduler)
.flatMap(savedProspect -> {
templateService.generateProspectSubscription(savedProspect)
.map(t ->
EmailPostRequest.builder()
...
.build())
.flatMap(emailService::send)
.subscribe();
return Mono.just(savedProspect);
}
);
}
This approach is not reactive friendly, as you're 100% wrapping blocking libraries.
With this use case, you can't really see the benefit of a reactive runtime and chances are the performance of your application is worse than a blocking one.
If your main motivation is performance, than this is probably counter-productive.
Offloading a lot of blocking I/O work on to specialized Schedulers has a runtime cost in term of memory (creating more threads) and CPU (context switching). If performance and scalability are your primary concern, then switching to Spring MVC and leveraging the Flux/Mono support where it fits, or even calling block() operators is probably a better fit.
If your main motivation is using a specific library, like Spring Framework's WebClient with Spring MVC, then you're better off using .block() operators in selected places rather than wrapping and scheduling everything.

Categories

Resources