How to parallelize webservice requests with CompletableFuture?

How to parallelize webservice requests with CompletableFuture? - java

I have a servlet request that basically requests data given by an input date. As I have multiple dates, I have to send multiple requests, and then aggregate the results. For example:
List<Result> results = new ArrayList<>();
for (LocalDate date : dates) {
ServletReq req = new ServletReq(date);
try {
ServletRsp rsp = webservice.send(req);
results.addAll(rsp.getResults());
} catch (SpecificException e) {
//just ignore this result and continue
}
}
Question: how can I parallelize the code above? Means: sending multiple ServletReq async, and collect the result into the list. Wait for all requests to finish (maybe with a timeout), and ignore the SpecificException.
I started as follows, but neither do I know if this is the right direction, nor did I succeed transfering the code above completely. Especially regarding the exception to be ignored.
ExecutorService service = Executors.newCachedThreadPool();
List<CompletableFuture<ServletRsp>> futures = new ArrayList<>();
for (LocalDate date : dates) {
ServletReq req = new ServletReq(date);
CompletableFuture future = CompletableFuture.supplyAsync(() -> webservice.send(req), service);
futures.add(future);
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[futures.size()])).join();
So far, but: How can I call rsp.getResults() on the async result, and put everything into the list. And how can I ignore the SpecificException during the async execution? (I cannot modify the webservice.send() method!).

catch them within the supplier and return e.g. null. Only do that if you'd really do nothing with the exception anyways. To get the results at future.get() you have to deal with null and ExecutionExceptions.
Eg
CompletableFuture<ServletRsp> future = CompletableFuture.supplyAsync(() -> {
try {
return webservice.send(new ServletReq(date));
} catch (SpecificException e) {
return null;
}
});
rethrow them as (custom?) RuntimeException so you don't lose them. Now you deal with just exceptions in the end but some are double-wrapped.
Manually complete the future.
E.g.
CompletableFuture<ServletRsp> future = new CompletableFuture<>();
service.execute(() -> {
try {
future.complete(webservice.send(new ServletReq(date));
} catch (SpecificException e) {
future.completeExceptionally(e);
}
});
futures.add(future);
No more wrapping besides in ExecutionException. CompletableFuture.supplyAsync does about exactly that, but has no code to deal with checked exceptions.
Just use the good old ExecutorService#submit(Callable<T> callable) method which accepts code that throws:
e.g.
List<Callable<String>> tasks = dates.stream()
.map(d -> (Callable<ServletRsp>) () -> send(new ServletReq(d)))
.collect(Collectors.toList());
List<Future<ServletRsp>> completed = service.invokeAll(tasks);

I think you're on a good path there.
The issue is, that there is no mechanism to nicely collect the results, except doing it yourself:
ExecutorService service = Executors.newCachedThreadPool();
List<CompletableFuture<Void>> futures = new ArrayList<>(); // these are only references to tell you when the request finishes
Queue<ServletRsp> results = new ConcurrentLinkedQueue<>(); // this has to be thread-safe
for (LocalDate date : dates) {
ServletReq req = new ServletReq(date);
CompletableFuture future = CompletableFuture
.supplyAsync(() -> webservice.send(req), service)
.thenAcceptAsync(results::add);
futures.add(future);
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[futures.size()])).join();
// do stuff with results
I've tried to keep most of the code as you've written it. Maybe it's a bit cleaner with streams:
List<CompletableFuture<Void>> collect = dates
.map(date -> CompletableFuture
.supplyAsync(() -> webservice.send(new ServletReq(date)), service)
.thenAcceptAsync(results::add))
.collect(Collectors.toList());
// wait for all requests to finish
CompletableFuture.allOf(collect.toArray(new CompletableFuture[collect.size()])).thenAcceptAsync(ignored -> {
//you can also handle the response async.
});

Related

Call Rest end point with concurrent request

"url":"https://asia-east2-jsondoc.cloudfunctions.net/function-1?delay=1000" //url that takes 1000 ms to return "isParallel": true, "count": "3"
isParallel = true means make parallel calls, false means – make a sequential call, count represents the number of parallel or sequential calls to make.
I have to call the above endpoint and the output should be 1 sec.
How can I call the rest endpoint with the concurrent request? I know how to call single-threaded using rest templates.

Use RestTemplate with ExecutorService
Using ExecutorService to perform 3 concurrent calls with RestTemplate:
Strnig url = "https://asia-east2-jsondoc.cloudfunctions.net/function-1?delay=1000";
RestTemplate restTemplate = new RestTemplate();
ExecutorService executor = Executors.newFixedThreadPool(3);
Future<String> future1 = executor.submit(() -> restTemplate.getForObject(url , String.class));
Future<String> future2 = executor.submit(() -> restTemplate.getForObject(url , String.class));
Future<String> future3 = executor.submit(() -> restTemplate.getForObject(url , String.class));
String response1 = future1.get();
String response2 = future2.get();
String response3 = future3.get();
executor.shutdown();
Use reactive WebClient
Using reactive WebClient to perform 3 concurrent calls and display the response in subscribe callback:
String url = "https://asia-east2-jsondoc.cloudfunctions.net/function-1?delay=5000";
WebClient webClient = WebClient.builder().build();
Mono<String> mono1 = webClient.get().uri(url).retrieve().bodyToMono(String.class);
Mono<String> mono2 = webClient.get().uri(url).retrieve().bodyToMono(String.class);
Mono<String> mono3 = webClient.get().uri(url).retrieve().bodyToMono(String.class);
Flux.merge(mono1, mono2, mono3).subscribe(System.out::println);

Vertx WebClient reponds slowly

I am new to vertx and RxJava. I am trying to implement a simple test program. However, I am not able to understand the dynamics of this program. Why do some requests take more than 10 seconds to respond?
Below is my sample Test application
public class Test {
public static void main(String[] args) {
Vertx vertx = Vertx.vertx();
WebClient webClient = WebClient.create(vertx);
Observable < Object > google = hitURL("www.google.com", webClient);
Observable < Object > yahoo = hitURL("www.yahoo.com", webClient);
for (int i = 0; i < 100; i++) {
google.repeat(100).subscribe(timeTaken -> {
if ((Long) timeTaken > 10000) {
System.out.println(timeTaken);
}
}, error -> {
System.out.println(error.getMessage());
});
yahoo.repeat(100).subscribe(timeTaken -> {
if ((Long) timeTaken > 10000) {
System.out.println(timeTaken);
}
}, error -> {
System.out.println(error.getMessage());
});
}
}
public static Observable < Object > hitURL(String url, WebClient webClient) {
return Observable.create(emitter -> {
Long l1 = System.currentTimeMillis();
webClient.get(80, url, "").send(ar -> {
if (ar.succeeded()) {
Long elapsedTime = (System.currentTimeMillis() - l1);
emitter.onNext(elapsedTime);
} else {
emitter.onError(ar.cause());
}
emitter.onComplete();
});
});
}
}
What I want to know is, what is making my response time slow?

The problem here seems to be in the way you are using WebClient and/or the way you are measuring "response" times (depending on what you are trying to achieve here).
Vert.x's WebClient, like most http clients, under the hood uses limited-size connection pool to send the requests. In other words, calling .send(...) does not necessarily start the http request immediately - instead, it might wait in some sort of queue for an available connection. Your measurements include this potential waiting time.
You are using the default pool size, which seems to be 5 (at least in the latest version of Vert.x - it's defined here), and almost immediately starting 200 http requests. It's not surprising that most of the time your requests wait for the available connection.
You might try increasing the pool size if you want to test if I'm right:
WebClient webClient = WebClient.create(vertx, new WebClientOptions().setMaxPoolSize(...));

In Lettuce I'm finding async mget on a redis cluster to be considerably slower than pipelining

I'm writing some benchmarks for a new project and I'm running into an issue where the lettuce(version 5.1.3) and I'm finding the code below which uses mget:
#Override
public Set<V> getKeys(final Set<Long> keySet) {
try {
return asyncCommands.mget(keySet.toArray(new Long[0])).get(
5, TimeUnit.SECONDS
).stream().map(keyValue -> keyValue.getValue()).collect(Collectors.toSet());
} catch (Exception e) {
throw new RuntimeException(e);
}
to be considerably slower(like over 100x slower) than using pipelining myself:
List<RedisFuture<V>> futures = new ArrayList<>(keySet.size());
keySet.forEach(
key -> futures.add(asyncCommands.get(key))
);
asyncCommands.flushCommands();
LettuceFutures.awaitAll(5, TimeUnit.SECONDS,
futures.toArray(new RedisFuture[0]));
final Set<V> collect = futures.stream().map(
future -> {
try {
return future.get(1, TimeUnit.SECONDS);
} catch (Exception e) {
log.warn("", e);
throw new RuntimeException();
}
}
).filter(
Objects::nonNull
).collect(Collectors.toSet());
return collect;
Both of these seem quite slow compared to what the redis server is reporting, but there could be other factors there. The javadocs say that the mget should use pipelining, so why is it so much slower than when I do the pipeline myself? What am I not doing right?
Edit: for the mget I have autoflushcommands enabled, for the pipelining it is disabled.
Update: Found out the culprit behind the slow performance is a slow codec, is there any way I can increase the overall throughput when the codec is slow?

Unable to open store for Kafka streams because invalid state

I am trying to work with Kafka Streams and I have created the following Topology:
KStream<String, HistoryEvent> eventStream = builder.stream(applicationTopicName, Consumed.with(Serdes.String(),
historyEventSerde));
eventStream.selectKey((key, value) -> new HistoryEventKey(key, value.getIdentifier()))
.groupByKey()
.reduce((e1, e2) -> e2, Materialized.as(streamByKeyStoreName));
I later start the streams like this:
private void startKafkaStreams(KafkaStreams streams) {
CompletableFuture<KafkaStreams.State> stateFuture = new CompletableFuture<>();
streams.setStateListener((newState, oldState) -> {
if(stateFuture.isDone()) {
return;
}
if(newState == KafkaStreams.State.RUNNING || newState == KafkaStreams.State.ERROR) {
stateFuture.complete(newState);
}
});
streams.start();
try {
KafkaStreams.State finalState = stateFuture.get();
if(finalState != KafkaStreams.State.RUNNING) {
// ...
}
} catch (InterruptedException ex) {
// ...
} catch(ExecutionException ex) {
// ...
}
}
My Streams start without an error and they eventually get into the state of RUNNING where the future is completed. Later I am trying to access that store that I created in my topology for the KTable:
public KafkaFlowHistory createFlowHistory(String flowId) {
ReadOnlyKeyValueStore<HistoryEventKey, HistoryEvent> store = streams.store(streamByKeyStoreName,
QueryableStoreTypes.keyValueStore());
return new KafkaFlowHistory(flowId, store, event -> topicProducer.send(new ProducerRecord<>(applicationTopicName, flowId, event)));
}
I have verified that the createFlowHistory is called after the initializing future is completed in RUNNING state, however I am consistently unable to do this and KafkaStreams is reporting the following error:
Exception in thread "main"
org.apache.kafka.streams.errors.InvalidStateStoreException: Cannot get
state store flow-event-stream-file-service-test-instance-by-key
because the stream thread is PARTITIONS_ASSIGNED, not RUNNING
Apparently the state of the thread has changed. Do I need to take care of this manually when trying to query a store and wait for the internal thread of Kafka to get into the right state?

Older Versions (before 2.2.0)
On startup, Kafka Streams does the following state transitions:
CREATED -> RUNNING -> REBALANCING -> RUNNING
You need to wait for the second RUNNING state before you can query.
New Version: as of 2.2.0
The state transition behavior on startup was changed (via https://issues.apache.org/jira/browse/KAFKA-7657) to:
CREATED -> REBALANCING -> RUNNING
Hence, you should not hit this issue any longer.

why do I receive an empty string when mapping Mono<Void> to Mono<String>?

I am developing an API REST using Spring WebFlux, but I have problems when uploading files. They are stored but I don't get the expected return value.
This is what I do:
Receive a Flux<Part>
Cast Part to FilePart.
Save parts with transferTo() (this return a Mono<Void>)
Map the Mono<Void> to Mono<String>, using file name.
Return Flux<String> to client.
I expect file name to be returned, but client gets an empty string.
Controller code
#PostMapping(value = "/muscles/{id}/image")
public Flux<String> updateImage(#PathVariable("id") String id, #RequestBody Flux<Part> file) {
log.info("REST request to update image to Muscle");
return storageService.saveFiles(file);
}
StorageService
public Flux<String> saveFiles(Flux<Part> parts) {
log.info("StorageService.saveFiles({})", parts);
return
parts
.filter(p -> p instanceof FilePart)
.cast(FilePart.class)
.flatMap(file -> saveFile(file));
}
private Mono<String> saveFile(FilePart filePart) {
log.info("StorageService.saveFile({})", filePart);
String filename = DigestUtils.sha256Hex(filePart.filename() + new Date());
Path target = rootLocation.resolve(filename);
try {
Files.deleteIfExists(target);
File file = Files.createFile(target).toFile();
return filePart.transferTo(file)
.map(r -> filename);
} catch (IOException e) {
throw new RuntimeException(e);
}
}

FilePart.transferTo() returns Mono<Void>, which signals when the operation is done - this means the reactive Publisher will only publish an onComplete/onError signal and will never publish a value before that.
This means that the map operation was never executed, because it's only given elements published by the source.
You can return the name of the file and still chain reactive operators, like this:
return part.transferTo(file).thenReturn(part.filename());
It is forbidden to use the block operator within a reactive pipeline and it even throws an exception at runtime as of Reactor 3.2.
Using subscribe as an alternative is not good either, because subscribe will decouple the transferring process from your request processing, making those happen in different execution sequences. This means that your server could be done processing the request and close the HTTP connection while the other part is still trying to read the file part to copy it on disk. This is likely to fail in subtle ways at runtime.

FilePart.transferTo() returns Mono<Void> that is a constant empty. Then, map after that was never executed. I solved it by doing this:
private Mono<String> saveFile(FilePart filePart) {
log.info("StorageService.saveFile({})", filePart);
String filename = DigestUtils.sha256Hex(filePart.filename() + new Date());
Path target = rootLocation.resolve(filename);
try {
Files.deleteIfExists(target);
File file = Files.createFile(target).toFile();
return filePart
.transferTo(file)
.doOnSuccess(data -> log.info("do something..."))
.thenReturn(filename);
} catch (IOException e) {
throw new RuntimeException(e);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to parallelize webservice requests with CompletableFuture? - java

Related

Call Rest end point with concurrent request

Vertx WebClient reponds slowly

In Lettuce I'm finding async mget on a redis cluster to be considerably slower than pipelining

Unable to open store for Kafka streams because invalid state

why do I receive an empty string when mapping Mono<Void> to Mono<String>?

Categories

Resources