Why is `parallelStream` faster than the `CompletableFuture` implementation? - java

I wanted to increase the performance of my backend REST API on a certain operation that polled multiple different external APIs sequentially and collected their responses and flattened them all into a single list of responses.
Having just recently learned about CompletableFutures, I decided to give it a go, and compare that solution with the one that involved simply changing my stream for a parallelStream.
Here is the code used for the benchmark-test:
package com.foo;
import java.util.Arrays;
import java.util.List;
import java.util.Objects;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
public class ConcurrentTest {
static final List<String> REST_APIS =
Arrays.asList("api1", "api2", "api3", "api4", "api5", "api6", "api7", "api8");
MyTestUtil myTest = new MyTestUtil();
long millisBefore; // used to benchmark
#BeforeEach
void setUp() {
millisBefore = System.currentTimeMillis();
}
#AfterEach
void tearDown() {
System.out.printf("time taken : %.4fs\n",
(System.currentTimeMillis() - millisBefore) / 1000d);
}
#Test
void parallelSolution() { // 4s
var parallel = REST_APIS.parallelStream()
.map(api -> myTest.collectOneRestCall())
.flatMap(List::stream)
.collect(Collectors.toList());
System.out.println("List of responses: " + parallel.toString());
}
#Test
void futureSolution() throws Exception { // 8s
var futures = myTest.collectAllResponsesAsync(REST_APIS);
System.out.println("List of responses: " + futures.get()); // only blocks here
}
#Test
void originalProblem() { // 32s
var sequential = REST_APIS.stream()
.map(api -> myTest.collectOneRestCall())
.flatMap(List::stream)
.collect(Collectors.toList());
System.out.println("List of responses: " + sequential.toString());
}
}
class MyTestUtil {
public static final List<String> RESULTS = Arrays.asList("1", "2", "3", "4");
List<String> collectOneRestCall() {
try {
TimeUnit.SECONDS.sleep(4); // simulating the await of the response
} catch (Exception io) {
throw new RuntimeException(io);
} finally {
return MyTestUtil.RESULTS; // always return something, for this demonstration
}
}
CompletableFuture<List<String>> collectAllResponsesAsync(List<String> restApiUrlList) {
/* Collecting the list of all the async requests that build a List<String>. */
List<CompletableFuture<List<String>>> completableFutures = restApiUrlList.stream()
.map(api -> nonBlockingRestCall())
.collect(Collectors.toList());
/* Creating a single Future that contains all the Futures we just created ("flatmap"). */
CompletableFuture<Void> allFutures = CompletableFuture.allOf(completableFutures
.toArray(new CompletableFuture[restApiUrlList.size()]));
/* When all the Futures have completed, we join them to create merged List<String>. */
CompletableFuture<List<String>> allCompletableFutures = allFutures
.thenApply(future -> completableFutures.stream()
.filter(Objects::nonNull) // we filter out the failed calls
.map(CompletableFuture::join)
.flatMap(List::stream) // creating a List<String> from List<List<String>>
.collect(Collectors.toList())
);
return allCompletableFutures;
}
private CompletableFuture<List<String>> nonBlockingRestCall() {
/* Manage the Exceptions here to ensure the wrapping Future returns the other calls. */
return CompletableFuture.supplyAsync(() -> collectOneRestCall())
.exceptionally(ex -> {
return null; // gets managed in the wrapping Future
});
}
}
There is a list of 8 (fake) APIs. Each response takes 4 seconds to execute and returns a list of 4 entities (Strings, in our case, for the sake of simplicity).
The results:
stream : 32 seconds
parallelStream : 4 seconds
CompletableFuture : 8 seconds
I'm quite surprised and expected the last two to be almost identical. What exactly is causing that difference? As far as I know, they are both using the ForkJoinPool.commonPool().
My naive interpretation would be that parallelStream, since it is a blocking operation, uses the actual MainThread for its workload and thus has an extra active thread to work with, compared to the CompletableFuture which is asynchronous and thus cannot use that MainThread.

CompletableFuture.supplyAsync() will end up using a ForkJoinPool initialized with parralelism of Runtime.getRuntime().availableProcessors() - 1 (JDK 11 source)
So looks like you have an 8 processor machine. Therefore there are 7 threads in the pool.
There are 8 API calls, so only 7 can run at a time on the common pool. And for the completable futures test, there will be 8 tasks running with your main thread blocking until they all complete. 7 will be able to execute at once meaning one has to wait for 4 seconds.
parallelStream() also uses this same thread pool, however the difference is that the first task will be executed on main thread that is executing the stream's terminal operation, leaving 7 to be distributed to the common pool. So there are just enough threads to run everything in parallel in this scenario. Try increasing the number of tasks to 9 and you will get the 8 second run-time for your test.

Related

Understanding Parallel Execution of chained CompletableFutures

I have a question about how Java Streams and chained CompletableFutures perform.
My question is this: if I run the following code, calling execute() with 10 items in the list takes ~11 seconds to complete (number of items in the list plus 1). This is because I have two threads working in parallel: the first executes the digItUp operation, and once that's complete, the second executes the fillItBackIn operation, and the first starts processing digItUp on the next item in the list.
If I comment out line 36 (.collect(Collectors.toList())), the execute() method takes ~20 seconds to complete. The threads do not operate in parallel; for each item in the list, the digItUp operation completes, and then the fillItBackIn operation completes in sequence before the next item in the list is processed.
It's unclear to me why the exclusion of (.collect(Collectors.toList())) should change this behavior. Can someone explain?
The complete class:
package com.test;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.stream.Collectors;
public class SimpleExample {
private final ExecutorService diggingThreadPool = Executors.newFixedThreadPool(1);
private final ExecutorService fillingThreadPool = Executors.newFixedThreadPool(1);
public SimpleExample() {
}
public static void main(String[] args) {
List<Double> holesToDig = new ArrayList<>();
Random random = new Random();
for (int c = 0; c < 10; c++) {
holesToDig.add(random.nextDouble(1000));
}
new SimpleExample().execute(holesToDig);
}
public void execute(List<Double> holeVolumes) {
long start = System.currentTimeMillis();
holeVolumes.stream()
.map(volume -> {
CompletableFuture<Double> digItUpCF = CompletableFuture.supplyAsync(() -> digItUp(volume), diggingThreadPool);
return digItUpCF.thenApplyAsync(volumeDugUp -> fillItBackIn(volumeDugUp), fillingThreadPool);
})
.collect(Collectors.toList())
.forEach(cf -> {
Double volume = cf.join();
System.out.println("Dug a hole and filled it back in. Net volume: " + volume);
});
System.out.println("Dug up and filled back in " + holeVolumes.size() + " holes in " + (System.currentTimeMillis() - start) + " ms");
}
public Double digItUp(Double volume) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
}
System.out.println("Dug hole with volume " + volume);
return volume;
}
public Double fillItBackIn(Double volumeDugUp) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
}
System.out.println("Filled back in hole of volume " + volumeDugUp);
return 0.0;
}
}
The reason is that collect(Collectors.toList()) is a terminal operation, hence it triggers the stream pipeline (remember that streams are evaluated lazily). So when you call collect, all of the CompletableFuture instances are constructed and placed in the list. This means that there is a chain of CompletableFuture, where each one is in turn a chain composed of two stages, let's call them X and Y.
Every time the first thread executor finishes an X stage, it is free to process the X stage of the next composed CompletableFuture, while the other thread executor is processing stage Y of the previous CompletableFuture. This is the result that we intuitively expect.
On the other hand, when you don't call collect, then forEach is in this case the terminal operation. However, in this case every element in the stream is processed sequentially (to confirm try switching to parallelStream()), hence stages X and Y get executed for the first CompletableFuture. Only when stage Y from the previous stream element is finished, will forEach move to the second element in the stream pipeline, and only then will a new CompletableFuture be mapped from the original Double value.
Love this question and M A's answer is awesome! I had a similar use case, and I was using Rxjava there. It worked very well, but my colleagues challenged me to implement it without that. T.T
I tested your example and found a workaround to make it the same performance without collect. The trick is to let the cf.join() be executed in another thread.
.forEach(cf -> CompletableFuture.supplyAsync(cf::join, anotherThreadpool)
// another threadpool for the join, or you can omit it, using the default forkjoinpool.commonpool
.thenAccept(v -> System.out.println("Dug a hole and filled it back in. Net volume: " + v))
);
But I have to say, this might lead to potential issues as it lacks the support for backpressure...if the upstream is infinite and fast, but the consumer is too slow, all the fast-created CompletableFuture in the map operator would be accumulated and submitted to the first diggingThreadPool, finally causing RejectedExecutionException, OOM, etc.

Project Reactor, using a Flux sink outside of the creation lambda

When my service starts up, I want to construct a simple pipeline.
I'd like to isolate the Flux sink, or a Processor, to emit events with.
Events will be coming in from multiple threads and should be processed according to the pipeline's subscribeOn() specification, but everything seems to run on the main thread.
What is the best approach? I've attached my attempts below.
(I'm using reactor-core v3.2.8.RELEASE.)
import org.junit.jupiter.api.Test;
import reactor.core.publisher.DirectProcessor;
import reactor.core.publisher.Flux;
import reactor.core.publisher.FluxProcessor;
import reactor.core.publisher.FluxSink;
import reactor.core.scheduler.Schedulers;
/**
* I want to construct my React pipelines during creation,
* then emit events over the lifetime of my services.
*/
public class React1Test
{
/**
* Attempt 1 - use a DirectProcessor and send items to it.
* Doesn't work though - seems to always run on the main thread.
*/
#Test
public void testReact1() throws InterruptedException
{
// Create the flux and sink.
FluxProcessor<String, String> fluxProcessor = DirectProcessor.<String>create().serialize();
FluxSink<String> sink = fluxProcessor.sink();
// Create the pipeline.
fluxProcessor
.doOnNext(str -> showDebugMsg(str)) // What thread do ops work on?
.subscribeOn(Schedulers.elastic())
.subscribe(str -> showDebugMsg(str)); // What thread does subscribe run on?
// Give the multi-thread pipeline a second.
Thread.sleep(1000);
// Time passes ... things happen ...
// Pass a few messages to the sink, emulating events.
sink.next("a");
sink.next("b");
sink.next("c");
// It's multi-thread so wait a sec to receive.
Thread.sleep(1000);
}
// Used down below during Flux.create().
private FluxSink<String> sink2;
/**
* Attempt 2 - use Flux.create() and its FluxSink object.
* Also seems to always run on the main thread.
*/
#Test
public void testReact2() throws InterruptedException
{
// Create the flux and sink.
Flux.<String>create(sink -> sink2 = sink)
.doOnNext(str -> showDebugMsg(str)) // What thread do ops work on?
.subscribeOn(Schedulers.elastic())
.subscribe(str -> showDebugMsg(str)); // What thread does subscribe run on?
// Give the multi-thread pipeline a second.
Thread.sleep(1000);
// Pass a few messages to the sink.
sink2.next("a");
sink2.next("b");
sink2.next("c");
// It's multi-thread so wait a sec to receive.
Thread.sleep(1000);
}
// Show us what thread we're on.
private static void showDebugMsg(String msg)
{
System.out.println(String.format("%s [%s]", msg, Thread.currentThread().getName()));
}
}
Output is always:
a [main]
a [main]
b [main]
b [main]
c [main]
c [main]
But what I would expect, is:
a [elastic-1]
a [elastic-1]
b [elastic-2]
b [elastic-2]
c [elastic-3]
c [elastic-3]
Thanks in advance.
You see [main] because you're calling onNext from the main thread.
subscribeOn you're using is only for the subscription (when create's lambda is triggered).
You will see elastic-* threads logged if you use publishOn instead of subscribeOn.
Also, consider using Processors, storing sink obtained from Flux.create and similar operators as a field is discouraged.
You can use parallel() and runOn() instead of subscribeOn() to get sink.next() to run multi-threaded.
bsideup is also correct - you can use publishOn() to coerce downstream operators to run on one different Scheduler thread.
Here is my updated code:
import org.junit.jupiter.api.Test;
import reactor.core.publisher.DirectProcessor;
import reactor.core.publisher.Flux;
import reactor.core.publisher.FluxProcessor;
import reactor.core.publisher.FluxSink;
import reactor.core.scheduler.Schedulers;
/**
* I want to construct my React pipelines during creation,
* then emit events over the lifetime of my services.
*/
public class React1Test
{
/**
* Version 1 - use a DirectProcessor to dynamically emit items.
*/
#Test
public void testReact1() throws InterruptedException
{
// Create the flux and sink.
FluxProcessor<String, String> fluxProcessor = DirectProcessor.<String>create().serialize();
FluxSink<String> sink = fluxProcessor.sink();
// Create the pipeline.
fluxProcessor
.parallel()
.runOn(Schedulers.elastic())
.doOnNext(str -> showDebugMsg(str)) // What thread do ops work on?
.subscribe(str -> showDebugMsg(str)); // What thread does subscribe run on?
// Give the multi-thread pipeline a second.
Thread.sleep(1000);
// Time passes ... things happen ...
// Pass a few messages to the sink, emulating events.
sink.next("a");
sink.next("b");
sink.next("c");
// It's multi-thread so wait a sec to receive.
Thread.sleep(1000);
}
// Used down below during Flux.create().
private FluxSink<String> sink2;
/**
* Version 2 - use Flux.create() and its FluxSink object.
*/
#Test
public void testReact2() throws InterruptedException
{
// Create the flux and sink.
Flux.<String>create(sink -> sink2 = sink)
.parallel()
.runOn(Schedulers.elastic())
.doOnNext(str -> showDebugMsg(str)) // What thread do ops work on?
.subscribe(str -> showDebugMsg(str)); // What thread does subscribe run on?
// Give the multi-thread pipeline a second.
Thread.sleep(1000);
// Pass a few messages to the sink.
sink2.next("a");
sink2.next("b");
sink2.next("c");
// It's multi-thread so wait a sec to receive.
Thread.sleep(1000);
}
// Show us what thread we're on.
private static void showDebugMsg(String msg)
{
System.out.println(String.format("%s [%s]", msg, Thread.currentThread().getName()));
}
}
Both versions produce the desired multi-threaded output:
a [elastic-2]
b [elastic-3]
c [elastic-4]
b [elastic-3]
a [elastic-2]
c [elastic-4]

Generate infinite sequence of Natural numbers using RxJava

I am trying to write a simple program using RxJava to generate an infinite sequence of natural numbers. So, far I have found two ways to generate sequence of numbers using Observable.timer() and Observable.interval(). I am not sure if these functions are the right way to approach this problem. I was expecting a simple function like one we have in Java 8 to generate infinite natural numbers.
IntStream.iterate(1, value -> value +1).forEach(System.out::println);
I tried using IntStream with Observable but that does not work correctly. It sends infinite stream of numbers only to first subscriber. How can I correctly generate infinite natural number sequence?
import rx.Observable;
import rx.functions.Action1;
import java.util.stream.IntStream;
public class NaturalNumbers {
public static void main(String[] args) {
Observable<Integer> naturalNumbers = Observable.<Integer>create(subscriber -> {
IntStream stream = IntStream.iterate(1, val -> val + 1);
stream.forEach(naturalNumber -> subscriber.onNext(naturalNumber));
});
Action1<Integer> first = naturalNumber -> System.out.println("First got " + naturalNumber);
Action1<Integer> second = naturalNumber -> System.out.println("Second got " + naturalNumber);
Action1<Integer> third = naturalNumber -> System.out.println("Third got " + naturalNumber);
naturalNumbers.subscribe(first);
naturalNumbers.subscribe(second);
naturalNumbers.subscribe(third);
}
}
The problem is that the on naturalNumbers.subscribe(first);, the OnSubscribe you implemented is being called and you are doing a forEach over an infinite stream, hence why your program never terminates.
One way you could deal with it is to asynchronously subscribe them on a different thread. To easily see the results I had to introduce a sleep into the Stream processing:
Observable<Integer> naturalNumbers = Observable.<Integer>create(subscriber -> {
IntStream stream = IntStream.iterate(1, i -> i + 1);
stream.peek(i -> {
try {
// Added to visibly see printing
Thread.sleep(50);
} catch (InterruptedException e) {
}
}).forEach(subscriber::onNext);
});
final Subscription subscribe1 = naturalNumbers
.subscribeOn(Schedulers.newThread())
.subscribe(first);
final Subscription subscribe2 = naturalNumbers
.subscribeOn(Schedulers.newThread())
.subscribe(second);
final Subscription subscribe3 = naturalNumbers
.subscribeOn(Schedulers.newThread())
.subscribe(third);
Thread.sleep(1000);
System.out.println("Unsubscribing");
subscribe1.unsubscribe();
subscribe2.unsubscribe();
subscribe3.unsubscribe();
Thread.sleep(1000);
System.out.println("Stopping");
Observable.Generate is exactly the operator to solve this class of problem reactively. I also assume this is a pedagogical example, since using an iterable for this is probably better anyway.
Your code produces the whole stream on the subscriber's thread. Since it is an infinite stream the subscribe call will never complete. Aside from that obvious problem, unsubscribing is also going to be problematic since you aren't checking for it in your loop.
You want to use a scheduler to solve this problem - certainly do not use subscribeOn since that would burden all observers. Schedule the delivery of each number to onNext - and as a last step in each scheduled action, schedule the next one.
Essentially this is what Observable.generate gives you - each iteration is scheduled on the provided scheduler (which defaults to one that introduces concurrency if you don't specify it). Scheduler operations can be cancelled and avoid thread starvation.
Rx.NET solves it like this (actually there is an async/await model that's better, but not available in Java afaik):
static IObservable<int> Range(int start, int count, IScheduler scheduler)
{
return Observable.Create<int>(observer =>
{
return scheduler.Schedule(0, (i, self) =>
{
if (i < count)
{
Console.WriteLine("Iteration {0}", i);
observer.OnNext(start + i);
self(i + 1);
}
else
{
observer.OnCompleted();
}
});
});
}
Two things to note here:
The call to Schedule returns a subscription handle that is passed back to the observer
The Schedule is recursive - the self parameter is a reference to the scheduler used to call the next iteration. This allows for unsubscription to cancel the operation.
Not sure how this looks in RxJava, but the idea should be the same. Again, Observable.generate will probably be simpler for you as it was designed to take care of this scenario.
When creating infinite sequencies care should be taken to:
subscribe and observe on different threads; otherwise you will only serve single subscriber
stop generating values as soon as subscription terminates; otherwise runaway loops will eat your CPU
The first issue is solved by using subscribeOn(), observeOn() and various schedulers.
The second issue is best solved by using library provided methods Observable.generate() or Observable.fromIterable(). They do proper checking.
Check this:
Observable<Integer> naturalNumbers =
Observable.<Integer, Integer>generate(() -> 1, (s, g) -> {
logger.info("generating {}", s);
g.onNext(s);
return s + 1;
}).subscribeOn(Schedulers.newThread());
Disposable sub1 = naturalNumbers
.subscribe(v -> logger.info("1 got {}", v));
Disposable sub2 = naturalNumbers
.subscribe(v -> logger.info("2 got {}", v));
Disposable sub3 = naturalNumbers
.subscribe(v -> logger.info("3 got {}", v));
Thread.sleep(100);
logger.info("unsubscribing...");
sub1.dispose();
sub2.dispose();
sub3.dispose();
Thread.sleep(1000);
logger.info("done");

Java 8 lambda api

I'm working to migrate from Rx Java to Java 8 lambdas. One example I can't find is a way to buffer requests. For example, in Rx Java, I can say the following.
Observable.create(getIterator()).buffer(20, 1000, TimeUnit. MILLISECONDS).doOnNext(list -> doWrite(list));
Where we buffer 20 elements into a list, or timeout at 1000 milliseconds, which ever happens first.
Observables in RX are a "push" style observable, where as Streams use a java pull. Would this be possible implementing my own map operation in streams, or does the inability to emit cause problems with this since the doOnNext has to poll the previous element?
One way to do it would be to use a BlockingQueue and Guava. Using Queues.drain, you can create a Collection that you could then call stream() on and do your transformations. Here's a link: Guava Queues.drain
And here's a quick example:
public void transform(BlockingQueue<Something> input)
{
List<Something> buffer = new ArrayList<>(20);
Queues.drain(input, buffer, 20, 1000, TimeUnit.MILLISECONDS);
doWrite(buffer);
}
simple-react has similar operators, but not this exact one. It's pretty extensible though, so it should be possible to write your own. With the caveat that I haven't written this in an IDE or tested it, roughly a buffer by size with timeout operator for simple-react would look something like this
import com.aol.simple.react.async.Queue;
import com.aol.simple.react.stream.traits.LazyFutureStream;
import com.aol.simple.react.async.Queue.ClosedQueueException;
import com.aol.simple.react.util.SimpleTimer;
import java.util.concurrent.TimeUnit;
static LazyFutureStream batchBySizeAndTime(LazyFutureStream stream,int size,long time, TimeUnit unit) {
Queue queue = stream.toQueue();
Function<Supplier<U>, Supplier<Collection<U>>> fn = s -> {
return () -> {
SimpleTimer timer = new SimpleTimer();
List<U> list = new ArrayList<>();
try {
do {
if(list.size()==size())
return list;
list.add(s.get());
} while (timer.getElapsedNanoseconds()<unit.toNanos(time));
} catch (ClosedQueueException e) {
throw new ClosedQueueException(list);
}
return list;
};
};
return stream.fromStream(queue.streamBatch(stream.getSubscription(), fn));
}

What is the best / most elegant way to limit the number of concurrent evaluation (like with a fixedThreadPool) in parallel streams

Assume a lambda expression consume a certain amount of a resource (like memory) which is limited and requires to limit the number of concurrent executions (example: if the lambda temporarily consumes 100 MB (of local memory) and we like to limit it to 1GB, we do not allow for more that 10 concurrent evaluations).
What is the best way to limit the number of concurrent execution, say for example in
IntStream.range(0, numberOfJobs).parallel().foreach( i -> { /*...*/ });
?
Note: An obvious option is to perform a nesting like
double jobsPerThread = (double)numberOfJobs / numberOfThreads;
IntStream.range(0, numberOfThreads).parallel().forEach( threadIndex ->
IntStream.range((int)(threadIndex * jobsPerThread), (int)((threadIndex+1) * jobsPerThread)).sequential().forEach( i -> { /*...*/ }));
Is this the only way? Tt is not that elegant. Actually I would like to have a
IntStream.range(0, numberOfJobs).parallel(numberOfThreads).foreach( i -> { /*...*/ });
The Streams use a ForkJoinPool for parallel operations. By default they are using the ForkJoinPool.commonPool() which does not allow changing the concurrency afterwards. However, you can use your own ForkJoinPool instance. When you execute the stream code within the context of your own ForkJoinPool this context pool will be used for the stream operations. The following example illustrates this by executing the same operation once using default behavior and once using a custom pool with a fixed concurrency of 2:
import java.util.HashSet;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ForkJoinPool;
import java.util.stream.IntStream;
public class InterfaceStaticMethod {
public static void main(String[] arg) throws Exception {
Runnable parallelCode=() -> {
HashSet<String> allThreads=new HashSet<>();
IntStream.range(0, 1_000_000).parallel().filter(i->{
allThreads.add(Thread.currentThread().getName()); return false;}
).min();
System.out.println("executed by "+allThreads);
};
System.out.println("default behavior: ");
parallelCode.run();
System.out.println("specialized pool:");
ForkJoinPool pool=new ForkJoinPool(2);
pool.submit(parallelCode).get();
}
}
Depending on your use case, using the CompletableFuture utility methods may be easier:
import static java.util.concurrent.CompletableFuture.runAsync;
ExecutorService executor = Executors.newFixedThreadPool(10); //max 10 threads
for (int i = 0; i < numberOfJobs; i++) {
runAsync(() -> /* do something with i */, executor);
}
//or with a stream:
IntStream.range(0, numberOfJobs)
.forEach(i -> runAsync(() -> /* do something with i */, executor));
The main difference with your code is that the parallel forEach will only return after the last job is over, whereas runAsync will return as soon as all the jobs have been submitted. There are various ways to change that behaviour if required.

Categories

Resources