Parallel processing end of execution in project reactor - java

I have the following reactive stream that gets data from a 3rd party API and then populates a hashMap in parallel.
HashMap<String, List<String>> tempHashMap = new HashMap<>();
Flux.fromIterable(cList)
.parallel(20)
.runOn(Schedulers.boundedElastic())
.flatMap(cId -> {
List<String> lb = api.getlb(p,cId);
if(!lb.isEmpty()) {
tempHashMap.put(cId, lb);
}
return Flux.just(tempHashMap);
})
.sequential()
.publishOn(Schedulers.single())
.doOnNext(hashMap-> lb = processMap(hashMap)
.doOnError(throwable -> {
log.error("Error while getting list of lb : {} ", throwable.getMessage());
return;
})
.subscribe();
I had expected processMap() method would be called just once after all the parallel processing is completed, with the addition of sequential(). However, it is being called on every parallel thread. Can someone help me understand why?

Related

How to find first match in Flux and stop processing in Reactor

Can't figure out how to stop processing Flux on first match.
This what I have right now:
findAll(): Flux<Object>
findStorageId(Relation r): Mono<Long> | Mono.empty()
isPassing(Relation r): boolean
findAll().flatMap(p -> {
return Flux.fromStream(p.getRelations().stream()).flatMap(r -> {
return isPassing(r) ? findStorageId(r) : Mono.empty();
});
})
.handle((Long storageId, SynchronousSink<Long> sink) -> {
if (storageId != null) {
sink.next(storageId);
sink.complete();
}
})
.next()
.switchIfEmpty(Mono.error(new RuntimeException("Can't find storageId.")));
I'm trying to understand how I can interrupt processing of flux when first storageId is found. Right now I see, that first flatMap continues to work after finding first match.
The problem is that flatmap is using using concurrency and prefetch is more than 1.
In this case if you dont want to call the database many times but one by one you need to use concatmap with prefatch 1.
public static final String TO_BE_FOUND = "B";
#Override
public void run(String... args) throws Exception {
Mono<String> storageId =
Flux.just("A", "B", "C", "D", "A")
.doOnNext(id -> System.out.printf("processing: %s\n", id))
.concatMap(s -> findStorageId(s),1)
.next()
.switchIfEmpty(Mono.error(new RuntimeException("Can't find storageId.")));
storageId.subscribe();
}
private static Mono<String> findStorageId(String s) {
return TO_BE_FOUND.equals(s)
? Mono.just(s + UUID.randomUUID()).delayElement(Duration.ofSeconds(1))
: Mono.delay(Duration.ofSeconds(1)).flatMap(aLong -> Mono.empty());
}
in this case concatmap with prefetch 1 will request elements 1 by one and it will wait for the response.
For me it worked out using flatMap → next → onError, the handle is not needed.
flatMap: the method returns a Mono of String or empty
next: returns the first or empty if flatMap always returned empty
onError: error handling according to your example
this means that your example should work like you posted it and you don't even need to call handle
Example code:
we log before we pass it to flatMap, that way we can check if the stream is processed further after the first non empty mapped Mono
public static final String TO_BE_FOUND = "B";
public static void main(String[] args) {
Mono<String> storageId = Flux.just("A", "B", "C", "D", "A")
.doOnNext(id -> System.out.printf("processing: %s\n", id))
.flatMap(s -> findStorageId(s))
.next()
.switchIfEmpty(
Mono.error(new RuntimeException("Can't find storageId."))
);
storageId.subscribe(id -> System.out.printf("storageId found: %s\n", id));
}
private static Mono<String> findStorageId(String s) {
return TO_BE_FOUND.equals(s) ? Mono.just(s + UUID.randomUUID()) : Mono.empty();
}
Output when TO_BE_FOUND = "B":
The Flux will not be processed further after the firs storageId was found.
processing: A
processing: B
storageId found: B85bcdbcb-2903-4962-96ab-b3a97b0c091f
Output when TO_BE_FOUND = "X":
processing: A
processing: B
processing: C
processing: D
processing: A
12:52:22.555 [main] ERROR reactor.core.publisher.Operators - Operator called default onErrorDropped
reactor.core.Exceptions$ErrorCallbackNotImplemented: java.lang.RuntimeException: Can't find storageId.
Caused by: java.lang.RuntimeException: Can't find storageId.

Spring webflux When to use map over flatmap

I am new to java reactive programming and i have started learning it with spring webflux.
One thing that is always bothering me that map is synchronous while flatmap is asynchronous.
I am learning it from book spring in action chapter 10 and i see the below example.
Flux<Player> playerFlux = Flux
.just("Michael Jordan", "Scottie Pippen", "Steve Kerr")
.map(n -> {
String[] split = n.split("\\s");
return new Player(split[0], split[1]);
});
and after very next line it says
What’s important to understand about map() is that the mapping is performed synchronously,
as each item is published by the source Flux. If you want to perform the
mapping asynchronously, you should consider the flatMap() operation.
and showed this example
Flux<Player> playerFlux = Flux
.just("Michael Jordan", "Scottie Pippen", "Steve Kerr")
.flatMap(n -> Mono.just(n)
.map(p -> {
String[] split = p.split("\\s");
return new Player(split[0], split[1]);
})
.subscribeOn(Schedulers.parallel())
);
Alright i think i got the point, but when i was practicing i think actually i didn't got it. lot of question started to rise in my head when i was trying to populate my class fields.
#Data
#Accessors(chain = true)
public class Response {
private boolean success;
private String message;
private List data;
}
Here is the code how i was trying to populate it
Mono<Response> response = Mono.just(new Response())
.map(res -> res.setSuccess(true))
.map(res -> res.setMessage("Success"))
.map(res -> res.setData(new ArrayList()));
after writing this code, one line from the book blinks in my head that map its synchronous will it be a blocking code, could it be bad for entire application because once i read that in a non-blocking application single blocking code could ruin the entire app.
So i decided to convert it into flatmap and according to book it should be look like this.
Mono<Response> response1 = Mono.just(new Response())
.flatMap(
m -> Mono.just(m)
.map(res -> res.setSuccess(true))
)
.flatMap(
m -> Mono.just(m)
.map(res -> res.setMessage("Success"))
)
.flatMap(
m -> Mono.just(m)
.map(res -> res.setData(new ArrayList()))
);
Both example output same but then what is the difference here. it so we should always use flatmap?
Thanks

Efficient way to use fork join pool with multiple parallel streams

I am using three streams which needs to call http requests. All of the calls are independent. So, I use parallel streams and collect the results from http response.
Currently I am using three separate parallel streams for these operations.
Map<String, ClassA> list1 = listOfClassX.stream().parallel()
.map(item -> {
ClassA instanceA = httpGetCall(item.id);
return instanceA;
})
.collect(Collectors.toConcurrentMap(item -> item.id, item -> item);
Map<String, ClassB> list1 = listOfClassY.stream().parallel()
.map(item -> {
ClassB instanceB = httpGetCall(item.id);
return instanceB;
})
.collect(Collectors.toConcurrentMap(item -> item.id, item -> item);
Map<String, ClassC> list1 = listOfClassZ.stream().parallel()
.map(item -> {
ClassC instanceC = httpGetCall(item.id);
return instanceC;
})
.collect(Collectors.toConcurrentMap(item -> item.id, item -> item);
It runs the three parallel streams separately one after another though each call is independent.
Will common fork join pool help in this case to optimize the use of thread pool here?
Is there any other way to optimize the performance of this code further?

Zip reactive flow with itself

I'm using Java Reactor Core, and I have a reactive Flux of objects. For each object of the Flux I need to do an external query that will return one different object for each input. The newly generated Flux needs then to be zipped with the original one - so the items of the 2 Flux must be synchronized and generated in the same order.
I'm just re-using the same flow twice, like this:
Flux<MyObj> aTest = Flux.fromIterable(aListOfObj);
Flux<String> myObjLists = aTest.map(o -> MyRepository.findById(o.name)).map(o -> {
if (!o.isPresent()) {
System.out.println("Fallback to empty-object");
return "";
}
List<String> l = o.get();
if (l.size() > 1) {
System.out.println("that's bad");
}
return l.get(0);
});
Flux.zip(aTest, myObjLists, (a, b) -> doSomethingWith(a,b))
Is it the right way to do it? If the myObjLists emits an error, how do I prevent the zip phase to skip the failing iteration?
I've finally opted for using Tuples and Optionals (to prevent null-items that would break the flux), so that I don't need to re-use the initial Flux:
Flux<Tuple<MyObj, Optional<String>>> myObjLists = Flux.fromIterable(aListOfObj)
.map(o -> Tuples.of(o, Optional.ofNullable(MyRepository.findById(o.name))
.flatMap(t -> {
if (!t.getT2().isPresent()) {
System.out.println("Discarding this item");
return Flux.empty();
}
List<String> l = t.getT2().get();
if (l.size() > 1) {
System.out.println("that's bad");
}
return Tuples.of(t.getT1(), l.get(0));
})
.map(t -> doSomethingWith(t.getT1(),t.getT2()))
Note that the flatMap could be replaced with a .map().filter(), removing tuples with missing Optional items

Concurrent processing of project reactor's flux

I’m very new to project reactor or reactive programming at large so I'm probably doing something wrong. I’m struggling to build a flow that does the following:
Given a class Entity:
Entity {
private Map<String, String> items;
public Map<String, String> getItems() {
return items;
}
}
read Entity from DB (ListenableFuture<Entity> readEntity())
perform some parallel async processing on every item (boolean processItem(Map.Entry<String, String> item))
when all finished call doneProcessing (void doneProcessing(boolean b))
Currently my code is:
handler = this;
Mono
.fromFuture(readEntity())
.doOnError(t -> {
notifyError(“some err-msg” , t);
return;
})
.doOnSuccess(e -> log.info("Got the Entity: " + e))
.flatMap( e -> Flux.fromIterable(e.getItems().entrySet()))
.all(handler::processItem)
.consume(handler::doneProcessing);
The thing works, but the handler::processItem calls don’t run concurrently on all items. I tried using dispatchOn and publishOn with both io and async SchedulerGroup and with various parameters, but still the calls run serially on one thread.
What am I doing wrong?
Apart, I’m sure that in general the above can be improved so any suggestion will be appreciated.
Thanks
You need another flatMap that forks and joins computation for each individual map element:
Mono.fromFuture(readEntity())
.flatMap(v -> Flux.fromIterable(v.getItems().entrySet()))
.flatMap(v -> Flux.just(v)
.publishOn(SchedulerGroup.io())
.doOnNext(handler::processItem))
.consume(handler::doneProcessing);

Categories

Resources