Creating a loop using spring-webflux and avoid memory issues

Creating a loop using spring-webflux and avoid memory issues - java

I am currently working on a project where I need to create a loop using spring webflux to generate a Flux for downstream processing. The loop should sequentially take batches of elements from a source (in this instance a repository) and pass the elements as signal in a Flux. To acquire the elements, we have a repository method which fetches the next batch. When all elements have been processed, the method yields an empty List.
I have identified that I can use Flux::generate in the following manner:
Flux.<List<Object>>generate(sink -> {
List<Object> batch = repository.fetch();
if (batch.isEmpty()) {
sink.complete();
} else {
sink.next(batch);
}
})
...
However, when I use this, the argument method runs continuously, buffering until I run out of memory.
I have also tried using Flux::create, but I am struggling to find an appropriate approach. I have found that I can do the following:
Consumer<Integer> sinker;
Flux<?> createFlux() {
return Flux.<List<Object>>create(sink -> sinker = integer -> {
List<Object> batch = repository.fetch();
if (batch.isEmpty()) {
sink.complete();
} else {
sink.next(batch);
}
})
...
.doOnNext(x -> sinker.accept(y))
...
}
Then I just need to call the Consumer initially to initiate the loop.
However, I feel like I am overly complicating a job which should have a fairly standard implementation. Also, this implementation requires secondary calls to get started, and I haven't found a decent way to initiate it within the pipeline (for instance, using .onSubscribe() doesn't work, as it attempts to call the Consumer before it has been assigned).
So in summary, I am looking for a simple way to create an unbounded loop while controlling the backpressure to avoid outOfMemory-errors.

I believe I have found a simpler solution which serves my need. The method Mono::repeat(BooleanSuplier) allows me to loop until the list is empty, simply by:
Mono.fromCallable(() -> repository.nextBatch())
.flatMap(/* do some stuff here */)
.repeat(() -> repository.hasNext())
If other more elegant solutions exist, I am still open for suggestions.

Related

Spring Webflux: how to compare results from two Mono streams and save based on filter

The situation is as follows: I have two MongoDB documents: User and Reproduction.
Based on a check if the User has a Reproduction entry (document), I want to add another entity in MongoDB.
Currently I am using Spring Webflux and Spring Reactive Mongo. Please see code below.
#Autowired
ParserRepository parserRepository;
#Autowired
ReproductionRepository reproductionRepository;
#Autowired
UserRepository userRepository;
public void addParser(Parser parser, String keycloakUserId) {
Mono<User> userMono = userRepository.findByKeycloakUserId(keycloakUserId);
Mono<Reproduction> reproductionMono = reproductionRepository.findById(parser.getReproductionId());
userMono.zipWith(reproductionMono)
.filter(objects -> objects.getT2().getUserId().equals(objects.getT1().get_id()))
.then(parserRepository.save(parser))
.switchIfEmpty(Mono.error(new ParserDoesNotBelongToUserException("Unable to add, since this parser does not belong to you")));
}
My question is as follows: how can the result from the Mono be used in order to verify that the correct Mono is there, and based on that save Parser Document. Basically combining the results from both Mono streams in order to perform save of another document and doing this in a non-blocking way.
The method as above doesn't seem to work apparently. What is the best way of doing this scenario with two separate Mono's in this case? Any best practice tips are welcome.

taken from Mono#filter docs:
filter(Predicate<? super T> tester)
If this Mono is valued, test the result and replay it if predicate returns true.
So if the filter evaluates to true, it will pass through the value, if false, it will not.
The problem is that you are calling then after. Docs for Mono#then
then(Mono other)
Let this Mono complete then play another Mono.
the key word here is complete which basically means, that whatever the row before completes with, its ignored, as long as it completes. So whatever it completed with (false/true) in the row before doesn't really matter we run then anyway.
im guessing you want something like:
userMono.zipWith(reproductionMono).flatMap(objects -> {
if(objects.getT2().getUserId().equals(objects.getT1().get_id()))) {
return parserRepository.save(parser)
} else {
return Mono.error(new ParserDoesNotBelongToUserException("Unable to add, since this parser does not belong to you"));
}
}

Is it OK to call Kafka Streams StreamBuilder.build() multiple times

We're using micronaut/kafka-streams. Using this framework in order to create a streams application you build something like this:
#Factory
public class FooTopologyConfig {
#Singleton
#Named
public KStream<String, FooPojo> configureTopology {
return builder.stream("foo-topic-in")
.peek((k,v) -> System.out.println(String.format("key %s, value: %s", k,v))
.to("foo-topic-out");
}
}
This:
Receives a ConfiguredStreamBuilder (a very light wrapper around StreamsBuilder)
Build and return the stream (we're not actually sure how important returning the stream is,
but that's a different question).
ConfiguredStreamBuilder::build() (which invokes the same on StreamsBuilder) is called later by the framework and the returned Topology is not made available for injection by Micronaut.
We want the Topology bean in order to log a description of the topology (via Topology::describe).
Is it safe to do the following?
Call ConfiguredStreamBuilder::build (and therefore StreamsBuilder::build) and use the returned instance of Topology to print a human readable description.
Allow the framework to call ConfiguredStreamBuilder::build for a second time later, and use the second instance of the returned topology to build the application.

There should be no problem calling build() multiple times. This is common in the internal code of Streams as well as in the tests.
To answer your other question. you only need the stream from builder.stream() operations if you want to expand on that branch of the topology later.

Combine multiple mono in Spring Webflux

I'm new with webflux and I'm trying to execute multiple monos with Flux. But i think i'm doing it wrong.. is this the best approach to execute multiple Mono and collect it to list?
Here is my code:
mainService.getAllBranch()
.flatMapMany(branchesList -> {
List<Branch> branchesList2 = (List<Branch>) branchesList.getData();
List<Mono<Transaction>> trxMonoList= new ArrayList<>();
branchesList2.stream().forEach(branch -> {
trxMonoList.add(mainService.getAllTrxByBranchId(branch.branchId));
});
return Flux.concat(trxMonoList); // <--- is there any other way than using concat?
})
.collectList()
.flatMap(resultList -> combineAllList());
interface MainService{
Mono<RespBody> getAllBranch();
Mono<RespBody> getAllTrxByBranchId(String branchId); //will return executed url ex: http://trx.com/{branchId}
}
so far my with above code i can explain it like this:
Get all branches
iterate through all branchesList2 and add it to trxMonoList
return Flux.concat, this is where i'm not sure is this the right way or not. but it's working
combine all list
I'm just confuse is this the proper way to use Flux in my context? or is there any better way to achieve with what i'm trying to do?

You need to refactor a little bit your code to reactive.
mainService.getAllBranch()
.flatMapMany(branchesList -> Flux.fromIterable(branchesList.getData())) (1)
.flatMap(branch -> mainService.getAllTrxByBranchId(branch.branchId)) (2)
.collectList()
.flatMap(resultList -> combineAllList());
1) Create Flux of branches from List;
2) Iterate through the each element and call a service.
You shouldn't use Stream API in Reactor because it has the same methods but with adaptaptions and optimisations for multythreading.

The real problem here is that you shouldn't be hitting a Mono multiple times within a Flux. That will give you problems. If you are designing the API you should fix that to do what you want in a correct reactive manner.
interface MainService{
Flux<Branch> getAllBranch();
Flux<Transaction> getAllTrxByBranchId(Flux<String> branchIds);
}
Then your code becomes simpler and the reactive framework will work properly.
mainService.getAllTrxByBranchId(mainService.getAllBranch().map(Branch::getId));

Using CompletableFuture.allOf on a variable number of objects

Please bear with me, i dont usually use spring and havent used newer versions of java (I say newer I mean anything past prob 1.4)
Anyway, I have an issue where I have to do rest calls to do a search using multiple parallel requests. Ive been looking around online and I see you can use CompletableFuture.
So I created my method to get the objects I need form the rest call like:
#Async
public CompletableFuture<QueryObject[]> queryObjects(String url)
{
QueryObject[] objects= restTemplate.getForObject(url, QueryObject[].class);
return CompletableFuture.completedFuture(objects);
}
Now I need to call that with something like:
CompletableFuture<QueryObject> page1 = QueryController.queryObjects("http://myrest.com/ids=[abc, def, ghi]);
CompletableFuture<QueryObject> page2 = QueryController.queryObjects("http://myrest.com/ids=[jkl, mno, pqr]);
The problem I have is that the call needs to only do three ids at a time and there could be a list of variable number ids. So I parse the idlist and create a query string like above. The problem with that I am having is that while I can call the queries I dont have separate objects that I can then call CompletableFuture.allOf on.
Can anyone tell me the way to do this? Ive been at it for a while now and Im not getting any further than where I am now.
Happy to provide more info if the above isnt sufficient

You are not getting any benefit of using the CompletableFuture in a way you're using it right now.
The restTemplate method you're using is a synchronous method, so it has to finish and return a result, before proceeding. Because of that wrapping the final result in a CompletableFuture doesn't cause it to be executed asynchronously (neither in parallel). You just wrap a response, that you have already retrieved.
If you want to benefit from the async execution, then you can use for example the AsyncRestTemplate or the WebClient .
A simplified code example:
public ListenableFuture<ResponseEntity<QueryObject[]>> queryForObjects(String url) {
return asyncRestTemplate.getForEntity(url, QueryObject[].class);
}
public List<QueryObject> queryForList(String[] elements) {
return Arrays.stream(elements)
.map(element -> queryForObjects("http://myrest.com/ids=[" + element + "]"))
.map(future -> future.completable().join())
.filter(Objects::nonNull)
.map(HttpEntity::getBody)
.flatMap(Arrays::stream)
.collect(Collectors.toList());
}

Is there a way to successfully execute nested flux operations without actually blocking your code?

While working with Spring Webflux, I'm trying to insert some data in the realm object server which interacts with Java apps via a Rest API. So basically I have a set of students, who have a set of subjects and my objective is to persist those subjects in a non-blocking manner. So I use a microservice exposed via a rest endpoint which provides me with a Flux of student roll numbers, and for that flux, I use another microservice exposed via a rest endpoint that gets me the Flux of subjects, and for each of these subjects, I want to persist them in the realm server via another rest endpoint. I wanted to make this all very nonblocking which is why I wanted my code to look like this.
void foo() {
studentService.getAllRollnumbers().flatMap(rollnumber -> {
return subjectDirectory.getAllSubjects().map(subject -> {
return dbService.addSubject(subject);
})
});
}
But this doesn't work for some reason. But once I call blocks on the things, they get into place, something like this.
Flux<Done> foo() {
List<Integer> rollNumbers = studentService.getAllRollnumbers().collectList().block();
rollNumbers.forEach(rollNumber -> {
List<Subject> subjects = subjectDirectory.getAllSubjects().collectList().block();
subjects.forEach(subject -> {dbService.addSubject(subject).block();});
});
return Flux.just(new NotUsed());
}
getAllRollnumbers() returns a flux of integers.
getAllSubjects() returns a flux of subject.
and addSubject() returns a Mono of DBResponse pojo.
What I can understand is that the thread executing this function is getting expired before much of it gets triggerred. Please help me work this code in an async non blocking manner.

You are not subscribing to the Publisher at all in the first instance that is why it is not executing. You can do this:
studentService.getAllRollnumbers().flatMap(rollnumber -> {
return subjectDirectory.getAllSubjects().map(subject -> {
return dbService.addSubject(subject);
})
}).subscribe();
However it is usually better to let the framework take care of the subscription, but without seeing the rest of the code I can't advise.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Creating a loop using spring-webflux and avoid memory issues - java

Related

Spring Webflux: how to compare results from two Mono streams and save based on filter

Is it OK to call Kafka Streams StreamBuilder.build() multiple times

Combine multiple mono in Spring Webflux

Using CompletableFuture.allOf on a variable number of objects

Is there a way to successfully execute nested flux operations without actually blocking your code?

Categories

Resources