Concurrent processing of project reactor's flux - java

I’m very new to project reactor or reactive programming at large so I'm probably doing something wrong. I’m struggling to build a flow that does the following:
Given a class Entity:
Entity {
private Map<String, String> items;
public Map<String, String> getItems() {
return items;
}
}
read Entity from DB (ListenableFuture<Entity> readEntity())
perform some parallel async processing on every item (boolean processItem(Map.Entry<String, String> item))
when all finished call doneProcessing (void doneProcessing(boolean b))
Currently my code is:
handler = this;
Mono
.fromFuture(readEntity())
.doOnError(t -> {
notifyError(“some err-msg” , t);
return;
})
.doOnSuccess(e -> log.info("Got the Entity: " + e))
.flatMap( e -> Flux.fromIterable(e.getItems().entrySet()))
.all(handler::processItem)
.consume(handler::doneProcessing);
The thing works, but the handler::processItem calls don’t run concurrently on all items. I tried using dispatchOn and publishOn with both io and async SchedulerGroup and with various parameters, but still the calls run serially on one thread.
What am I doing wrong?
Apart, I’m sure that in general the above can be improved so any suggestion will be appreciated.
Thanks

You need another flatMap that forks and joins computation for each individual map element:
Mono.fromFuture(readEntity())
.flatMap(v -> Flux.fromIterable(v.getItems().entrySet()))
.flatMap(v -> Flux.just(v)
.publishOn(SchedulerGroup.io())
.doOnNext(handler::processItem))
.consume(handler::doneProcessing);

Related

Using a Promise to return java Tinkerpop Gremlin Traversal results

I would like to leverage the .promise(final Function<Traversal<S, E>, T> traversalFunction) method of a Gremlin GraphTraversal. It's not clear to me what function I would use within the promise.
Using the Tinkerpop Client object, I do something like this:
GraphTraversal myTraversal = g.V().hasLabel("myLabel");
client.submitAsync(myTraversal)
.thenAccept(result -> {
List<Map<Object, Object>> resultList = new ArrayList<>();
result.iterator().forEachRemaining(item ->{
DefaultRemoteTraverser drt = (DefaultRemoteTraverser) item.getObject();
Map<Object, Object> itemMap = (HashMap) drt.get();
resultList.add(itemMap);
});
outputSuccess(resultList);
})
.exceptionally(throwable -> {
// handle;
return null;
})
What would the equivalent look like using .promise()? I looked for a test in the source repo that might provide a clue, but did not see one.
First note that the promise() can only be use if you are using the Traversal as a remote connection. It will throw an exception in embedded mode as explained in the javadoc.
The promise() takes a function that processes the Traversal after it has been submitted asynchronously to the server. You're really just providing a terminator to the promise() to get a result into the returned CompletableFuture:
g.V().out().promise(t -> t.toList());
I guess you could chain it to be more exactly like what you had in your example:
g.V().out().promise(t -> t.toList()).
thenAccept(r -> outputSuccess(r)).
exceptionally(...);

Parallel processing end of execution in project reactor

I have the following reactive stream that gets data from a 3rd party API and then populates a hashMap in parallel.
HashMap<String, List<String>> tempHashMap = new HashMap<>();
Flux.fromIterable(cList)
.parallel(20)
.runOn(Schedulers.boundedElastic())
.flatMap(cId -> {
List<String> lb = api.getlb(p,cId);
if(!lb.isEmpty()) {
tempHashMap.put(cId, lb);
}
return Flux.just(tempHashMap);
})
.sequential()
.publishOn(Schedulers.single())
.doOnNext(hashMap-> lb = processMap(hashMap)
.doOnError(throwable -> {
log.error("Error while getting list of lb : {} ", throwable.getMessage());
return;
})
.subscribe();
I had expected processMap() method would be called just once after all the parallel processing is completed, with the addition of sequential(). However, it is being called on every parallel thread. Can someone help me understand why?

Get result from db in a loop with completable future

I'm using Spring Boot and Spring Data Jpa, and I have logic which consists of 3 request in db which I want to run in parallel. I want to use for this purpose CompletableFuture.
In the end I need to build response object from result of 5 db query runs. 3 of them currently I'm running in a loop.
So I've create CompletableFuture
CompletableFuture<Long> totalFuture = CompletableFuture.supplyAsync(() -> myRepository.getTotal());
CompletableFuture<Long> countFuture = CompletableFuture.supplyAsync(() -> myRepository.getCount());
Then I'm plannig to use .allOf with this future. But I have problem with loop calls. How to rewrite it to use callable as in every request I need to pass value from request, and then sort into map, by key ?
Map<String, Integer> groupcount = new HashMap<>();
request.ids().forEach((key, value) -> count.put(key, myRepository
.getGroupCountId(value));
To explain a little more throughly I'm posting a code snippet which I want to chain but for now it works like this.
List<CompletableFuture<Void>> completableFutures = new ArrayList<>();
Map<String, Integer> groupcount = new ConcurrentHashMap<>();
for (var id : request.Ids().entrySet()) {
completableFutures.add(
CompletableFuture.runAsync(someOperation, EXECUTOR_SERVICE)
.thenApply(v -> runQuery(v.getValues))
.thenAcceptAsync(res-> groupcount .put(v.key, res));
}
CompletableFuture.allOf(completableFutures.toArray(new CompletableFuture[0])).get();

Spring webflux When to use map over flatmap

I am new to java reactive programming and i have started learning it with spring webflux.
One thing that is always bothering me that map is synchronous while flatmap is asynchronous.
I am learning it from book spring in action chapter 10 and i see the below example.
Flux<Player> playerFlux = Flux
.just("Michael Jordan", "Scottie Pippen", "Steve Kerr")
.map(n -> {
String[] split = n.split("\\s");
return new Player(split[0], split[1]);
});
and after very next line it says
What’s important to understand about map() is that the mapping is performed synchronously,
as each item is published by the source Flux. If you want to perform the
mapping asynchronously, you should consider the flatMap() operation.
and showed this example
Flux<Player> playerFlux = Flux
.just("Michael Jordan", "Scottie Pippen", "Steve Kerr")
.flatMap(n -> Mono.just(n)
.map(p -> {
String[] split = p.split("\\s");
return new Player(split[0], split[1]);
})
.subscribeOn(Schedulers.parallel())
);
Alright i think i got the point, but when i was practicing i think actually i didn't got it. lot of question started to rise in my head when i was trying to populate my class fields.
#Data
#Accessors(chain = true)
public class Response {
private boolean success;
private String message;
private List data;
}
Here is the code how i was trying to populate it
Mono<Response> response = Mono.just(new Response())
.map(res -> res.setSuccess(true))
.map(res -> res.setMessage("Success"))
.map(res -> res.setData(new ArrayList()));
after writing this code, one line from the book blinks in my head that map its synchronous will it be a blocking code, could it be bad for entire application because once i read that in a non-blocking application single blocking code could ruin the entire app.
So i decided to convert it into flatmap and according to book it should be look like this.
Mono<Response> response1 = Mono.just(new Response())
.flatMap(
m -> Mono.just(m)
.map(res -> res.setSuccess(true))
)
.flatMap(
m -> Mono.just(m)
.map(res -> res.setMessage("Success"))
)
.flatMap(
m -> Mono.just(m)
.map(res -> res.setData(new ArrayList()))
);
Both example output same but then what is the difference here. it so we should always use flatmap?
Thanks

Efficient way to use fork join pool with multiple parallel streams

I am using three streams which needs to call http requests. All of the calls are independent. So, I use parallel streams and collect the results from http response.
Currently I am using three separate parallel streams for these operations.
Map<String, ClassA> list1 = listOfClassX.stream().parallel()
.map(item -> {
ClassA instanceA = httpGetCall(item.id);
return instanceA;
})
.collect(Collectors.toConcurrentMap(item -> item.id, item -> item);
Map<String, ClassB> list1 = listOfClassY.stream().parallel()
.map(item -> {
ClassB instanceB = httpGetCall(item.id);
return instanceB;
})
.collect(Collectors.toConcurrentMap(item -> item.id, item -> item);
Map<String, ClassC> list1 = listOfClassZ.stream().parallel()
.map(item -> {
ClassC instanceC = httpGetCall(item.id);
return instanceC;
})
.collect(Collectors.toConcurrentMap(item -> item.id, item -> item);
It runs the three parallel streams separately one after another though each call is independent.
Will common fork join pool help in this case to optimize the use of thread pool here?
Is there any other way to optimize the performance of this code further?

Categories

Resources