Parallel database calls using Java 8 streams and CompletableFuture

Parallel database calls using Java 8 streams and CompletableFuture - java

I would like to replicate and parallelize the following behavior with Java 8 streams:
for (animal : animalList) {
// find all other animals with the same breed
Collection<Animal> queryResult = queryDatabase(animal.getBreed());
if (animal.getSpecie() == cat) {
catList.addAll(queryResult);
} else {
dogList.addAll(queryResult);
}
}
This is what I have so far
final Executor queryExecutor =
Executors.newFixedThreadPool(Math.min(animalList.size(), 10),
new ThreadFactory(){
public Thread newThread(Runnable r){
Thread t = new Thread(r);
t.setDaemon(true);
return t;
}
});
List<CompletableFuture<Collection<Animal>>> listFutureResult = animalList.stream()
.map(animal -> CompletableFuture.supplyAsync(
() -> queryDatabase(animal.getBreed()), queryExecutor))
.collect(Collectors.toList());
List<Animal> = listFutureResult.stream()
.map(CompletableFuture::join)
.flatMap(subList -> subList.stream())
.collect(Collectors.toList());
1 - I'm not sure how to split the stream so that I can get 2 different animal lists, one for cats and one for dogs.
2 - does this solution look reasonable?

First, consider just using
List<Animal> result = animalList.parallelStream()
.flatMap(animal -> queryDatabase(animal.getBreed()).stream())
.collect(Collectors.toList());
even if it won’t give you the desired concurrency of up to ten. The simplicity might compensate it. Regarding the other part, it’s as easy as
Map<Boolean,List<Animal>> result = animalList.parallelStream()
.flatMap(animal -> queryDatabase(animal.getBreed()).stream())
.collect(Collectors.partitioningBy(animal -> animal.getSpecie() == cat));
List<Animal> catList = result.get(true), dogList = result.get(false);
In case you have more species than just cats and dogs, you may use Collectors.groupingBy(Animal::getSpecie) to get a map from species to list of animals.
If you insist on using your own thread pool, a few things can be improved:
Executor queryExecutor = Executors.newFixedThreadPool(Math.min(animalList.size(), 10),
r -> {
Thread t = new Thread(r);
t.setDaemon(true);
return t;
});
List<Animal> result = animalList.stream()
.map(animal -> CompletableFuture.completedFuture(animal.getBreed())
.thenApplyAsync(breed -> queryDatabase(breed), queryExecutor))
.collect(Collectors.toList()).stream()
.flatMap(cf -> cf.join().stream())
.collect(Collectors.toList());
Your supplyAsync variant required capturing the actual Animal instance, creating a new Supplier for each animal. In contrast, the function passed to thenApplyAsync is invariant, performing the same operation for each parameter value. The code above assumes that getBreed is a cheap operation, otherwise, it wouldn’t be hard to pass the Animal instance to completedFuture and perform getBreed() with the async function instead.
The .map(CompletableFuture::join) can be replaced by a simple chained .join() within the flatMap function. Otherwise, if you prefer method references, you should use them consistently, i.e. .map(CompletableFuture::join).flatMap(Collection::stream).
Of course, this variant also allows using partitioningBy instead of toList.
As a final note, if you invoke shutdown on the executor service after use, there is no need to mark the threads as daemon:
ExecutorService queryExecutor=Executors.newFixedThreadPool(Math.min(animalList.size(),10));
Map<Boolean,List<Animal>> result = animalList.stream()
.map(animal -> CompletableFuture.completedFuture(animal.getBreed())
.thenApplyAsync(breed -> queryDatabase(breed), queryExecutor))
.collect(Collectors.toList()).stream()
.flatMap(cf -> cf.join().stream())
.collect(Collectors.partitioningBy(animal -> animal.getSpecie() == cat));
List<Animal> catList = result.get(true), dogList = result.get(false);
queryExecutor.shutdown();

Related

Wrapping and turning a single CompleteableFuture<OlderCat> to a bulk operation with result of CompleteableFuture<Map<Cat.name, OlderCat>>

We have an async method:
public CompletableFuture<OlderCat> asyncGetOlderCat(String catName)
Given a list of Cats:
List<Cat> cats;
We like to create a bulk operation that will result in a map between the cat name and its async result:
public CompletableFuture<Map<String, OlderCat>>
We also like that if an exception was thrown from the asyncGetOlderCat, the cat will not be added to the map.
We were following this post and also this one and we came up with this code:
List<Cat> cats = ...
Map<String, CompletableFuture<OlderCat>> completableFutures = cats
.stream()
.collect(Collectors.toMap(Cat::getName,
c -> asynceGetOlderCat(c.getName())
.exceptionally( ex -> /* null?? */ ))
));
CompletableFuture<Void> allFutures = CompletableFuture
.allOf(completableFutures.values().toArray(new CompletableFuture[completableFutures.size()]));
return allFutures.thenApply(future -> completableFutures.keySet().stream()
.map(CompletableFuture::join) ???
.collect(Collectors.toMap(????)));
But it is not clear how in the allFutureswe can get access to the cat name and how to match between the OlderCat & the catName.
Can it be achieved?

You are almost there. You don't need to put an exceptionally() on the initial futures, but you should use handle() instead of thenApply() after the allOf(), because if any future fails, the allOf() will fail as well.
When processing the futures, you can then just filter out the failing ones from the result, and rebuild the expected map:
Map<String, CompletableFuture<OlderCat>> completableFutures = cats
.stream()
.collect(toMap(Cat::getName, c -> asyncGetOlderCat(c.getName())));
CompletableFuture<Void> allFutures = CompletableFuture
.allOf(completableFutures.values().toArray(new CompletableFuture[0]));
return allFutures.handle((dummy, ex) ->
completableFutures.entrySet().stream()
.filter(entry -> !entry.getValue().isCompletedExceptionally())
.collect(toMap(Map.Entry::getKey, e -> e.getValue().join())));
Note that the calls to join() are guaranteed to be non-blocking since the thenApply() will only be executed after all futures are completed.

As I get it, what you need is CompletableFuture with all results, the code below does exactly what you need
public CompletableFuture<Map<String, OlderCat>> getOlderCats(List<Cat> cats) {
return CompletableFuture.supplyAsync(
() -> {
Map<String, CompletableFuture<OlderCat>> completableFutures = cats
.stream()
.collect(Collectors.toMap(Cat::getName,
c -> asyncGetOlderCat(c.getName())
.exceptionally(ex -> {
ex.printStackTrace();
// if exception happens - return null
// if you don't want null - save failed ones to separate list and process them separately
return null;
}))
);
return completableFutures
.entrySet()
.stream()
.collect(Collectors.toMap(
Map.Entry::getKey,
e -> e.getValue().join()
));
}
);
}
What it does here - returns future, which creates more completable future inside and waits at the end.

Efficient way to use fork join pool with multiple parallel streams

I am using three streams which needs to call http requests. All of the calls are independent. So, I use parallel streams and collect the results from http response.
Currently I am using three separate parallel streams for these operations.
Map<String, ClassA> list1 = listOfClassX.stream().parallel()
.map(item -> {
ClassA instanceA = httpGetCall(item.id);
return instanceA;
})
.collect(Collectors.toConcurrentMap(item -> item.id, item -> item);
Map<String, ClassB> list1 = listOfClassY.stream().parallel()
.map(item -> {
ClassB instanceB = httpGetCall(item.id);
return instanceB;
})
.collect(Collectors.toConcurrentMap(item -> item.id, item -> item);
Map<String, ClassC> list1 = listOfClassZ.stream().parallel()
.map(item -> {
ClassC instanceC = httpGetCall(item.id);
return instanceC;
})
.collect(Collectors.toConcurrentMap(item -> item.id, item -> item);
It runs the three parallel streams separately one after another though each call is independent.
Will common fork join pool help in this case to optimize the use of thread pool here?
Is there any other way to optimize the performance of this code further?

For loop including if to parallelStream() expression

Is there a way to parallelize this piece of code:
HashMap<String, Car> cars;
List<Car> snapshotCars = new ArrayList<>();
...
for (final Car car : cars.values()) {
if (car.isTimeInTimeline(curTime)) {
car.updateCalculatedPosition(curTime);
snapshotCars.add(car);
}
}
Update: This is what I tried before asking for assistance:
snapshotCars.addAll(cars.values().parallelStream()
.filter(c -> c.isTimeInTimeline(curTime))
.collect(Collectors.toList()));
How could I integrate this line? ->
car.updateCalculatedPosition(curTime);

Well, assuming that updateCalculatedPosition does not affect state outside of the Car object on which it runs, it may be safe enough to use peek for this:
List<Car> snapshotCars = cars.values()
.parallelStream()
.filter(c -> c.isTimeInTimeline(curTime))
.peek(c -> c.updateCalculatedPosition(curTime))
.collect(Collectors.toCollection(ArrayList::new));
I say this is "safe enough" because the collect dictates which elements will be peeked by peek, and these will necessarily be all the items that passed the filter. However, read this answer for the reason why peek should generally be avoided for "significant" operations.
Your peek-free alternative is to first, filter and collect, and then update using the finished collection:
List<Car> snapshotCars = cars.values()
.parallelStream()
.filter(c -> c.isTimeInTimeline(curTime))
.collect(Collectors.toCollection(ArrayList::new));
snapShotCars.parallelStream()
.forEach(c -> c.updateCalculatedPosition(curTime));
This is safer from an API point of view, but less parallel - you only start updating the positions after you have finished filtering and collecting.

If you want parallelized access to a List you might want to use Collections.synchonizedList to get a thread-safe list:
List<Car> snapshotCars = Collections.synchronizedList(new ArrayList<>());
Then you can use the stream API like so:
cars.values()
.parallelStream()
.filter(car -> car.isTimeInTimeline(curTime))
.forEach(car -> {
car.updateCalculatedPosition(curTime);
snapshotCars.add(car);
});

In addition to RealSkeptic’s answer, you can alternatively use your own collector:
List<Car> snapshotCars = cars.values().parallelStream()
.filter(c -> c.isTimeInTimeline(curTime))
.collect(ArrayList::new,
(l,c) -> { c.updateCalculatedPosition(curTime); l.add(c); },
List::addAll);
Note that .collect(Collectors.toList()) is equivalent (though not necessarily identical) to .collect(Collectors.toCollection(ArrayList::new)) which is equivalent to .collect(ArrayList::new, List::add, List::addAll).
So our custom collector does a similar operation, but replaces the accumulator with a function, which also performs the desired additional operation.

Java Multi-Thread Executor InvokeAll Problems

The code I'm having problems with is:
Executor executor = (Executor) callList;
List<ProgState> newProgList = executor.invokeAll(callList).stream()
.map(future -> {try {return future.get();} catch(Exception e){e.printStackTrace();}})
.filter(p -> p!=null).collect(Collectors.toList());
The method invokeAll(List>) is undefined for the type Executor
I am told I should use an executor like the one in the code snippet.
The Callables are defined within the following code:
List<Callable<ProgState>> callList = (List<Callable<ProgState>>) lst.stream()
.map(p -> ((Callable<ProgState>)(() -> {return p.oneStep();})))
.collect(Collectors.toList());
Here is the teacher's code:
//prepare the list of callables
List<Callable<PrgState>> callList = prgList.stream().map(p -> (() -> {return p.oneStep();})).collect(Collectors.toList());
//start the execution of the callables
//it returns the list of new created threads
List<PrgState> newPrgList = executor.invokeAll(callList).stream()
.map(future -> { try {
return future.get();
}
catch(Exception e) {
//here you can treat the possible
// exceptions thrown by statements
// execution
}
})
.filter(p -> p!=null).collect(Collectors.toList());
//add the new created threads to the list of existing threads
prgList.addAll(newPrgList);

If you can use stream(), why not parallelStream() as it would be much simpler.
List<PrgState> prgStates = prgList.parallelStream()
.map(p -> p.oneStep())
.collect(Collectors.toList());
This way you have no thread pool to configure, start or stop when finished.
Some might suggest that parallelStream() was the main reason for adding Stream and lambdas to Java 8 in the first place. ;)

You can't cast list of Callables with ExecutorService. You need to define ExecutorService which will inturn pick up callables and execute them in one or multiple threads in parallel.
This is what i think you are after:
ExecutorService executor = Executors.newCachedThreadPool();//change executor type as per your need.
List<ProgState> newProgList = executor.invokeAll(callList).stream().map(future -> {...

Execute multiple queries in parallel via Streams

I am having the following method:
public String getResult() {
List<String> serversList = getServerListFromDB();
List<String> appList = getAppListFromDB();
List<String> userList = getUserFromDB();
return getResult(serversList, appList, userList);
}
Here I am calling three method sequentially which in turns hits the DB and fetch me results, then I do post processing on the results I got from the DB hits. I know how to call these three methods concurrently via use of Threads. But I would like to use Java 8 Parallel Stream to achieve this. Can someone please guide me how to achieve the same via Parallel Streams?
EDIT I just want to call the methods in parallel via Stream.
private void getInformation() {
method1();
method2();
method3();
method4();
method5();
}

You may utilize CompletableFuture this way:
public String getResult() {
// Create Stream of tasks:
Stream<Supplier<List<String>>> tasks = Stream.of(
() -> getServerListFromDB(),
() -> getAppListFromDB(),
() -> getUserFromDB());
List<List<String>> lists = tasks
// Supply all the tasks for execution and collect CompletableFutures
.map(CompletableFuture::supplyAsync).collect(Collectors.toList())
// Join all the CompletableFutures to gather the results
.stream()
.map(CompletableFuture::join).collect(Collectors.toList());
// Use the results. They are guaranteed to be ordered in the same way as the tasks
return getResult(lists.get(0), lists.get(1), lists.get(2));
}

As already mentioned, a standard parallel stream is probably not the best fit for your use case. I would complete each task asynchronously using an ExecutorService and "join" them when calling the getResult method:
ExecutorService es = Executors.newFixedThreadPool(3);
Future<List<String>> serversList = es.submit(() -> getServerListFromDB());
Future<List<String>> appList = es.submit(() -> getAppListFromDB());
Future<List<String>> userList = es.submit(() -> getUserFromDB());
return getResult(serversList.get(), appList.get(), userList.get());

foreach is what used for side-effects, you can call foreach on a parallel stream. ex:
listOfTasks.parallelStream().foreach(list->{
submitToDb(list);
});
However, parallelStream uses the common ForkJoinPool which is arguably not good for IO-bound tasks.
Consider using a CompletableFuture and supply an appropriate ExecutorService. It gives more flexibility (continuation,configuration). For ex:
ExecutorService executorService = Executors.newCachedThreadPool();
List<CompletableFuture> allFutures = new ArrayList<>();
for(Query query:queries){
CompletableFuture<String> query = CompletableFuture.supplyAsync(() -> {
// submit query to db
return result;
}, executorService);
allFutures.add(query);
}
CompletableFuture<Void> all = CompletableFuture.allOf(allFutures.toArray(new CompletableFuture[allFutures.size()]));

Not quite clear what do you mean, but if you just want to run some process on these lists on parallel you can do something like this:
List<String> list1 = Arrays.asList("1", "234", "33");
List<String> list2 = Arrays.asList("a", "b", "cddd");
List<String> list3 = Arrays.asList("1331", "22", "33");
List<List<String>> listOfList = Arrays.asList(list1, list2, list3);
listOfList.parallelStream().forEach(list -> System.out.println(list.stream().max((o1, o2) -> Integer.compare(o1.length(), o2.length()))));
(it will print most lengthy elements from each list).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parallel database calls using Java 8 streams and CompletableFuture - java

Related

Wrapping and turning a single CompleteableFuture<OlderCat> to a bulk operation with result of CompleteableFuture<Map<Cat.name, OlderCat>>

Efficient way to use fork join pool with multiple parallel streams

For loop including if to parallelStream() expression

Java Multi-Thread Executor InvokeAll Problems

Execute multiple queries in parallel via Streams

Categories

Resources