Difference between parallel stream and CompletableFuture

Difference between parallel stream and CompletableFuture - java

In the book "Java 8 in action" (by Urma, Fusco and Mycroft) they highlight that parallel streams internally use the common fork join pool and that whilst this can be configured globally, e.g. using System.setProperty(...), that it is not possibly to specify a value for a single parallel stream.
I have since seen the workaround that involves running the parallel stream inside a custom made ForkJoinPool.
Later on in the book, they have an entire chapter dedicated to CompletableFuture, during which they have a case study where they compare the respective performance of using a parallelStream VS a CompletableFuture. It turns out their performance is very similar - they highlight the reason for this as being that they are both as default using the same common pool (and therefore the same amount of threads).
They go on to show a solution and argue that the CompletableFuture is better in this circumstance as it can be congifured to use a custom Executor, with a thread pool size of the user's choice. When they update the solution to utilise this, the performance is significantly improved.
This made me think - if one were to do the same for the parallel stream version using the workaround highlighted above, would the performance benefits be similar, and would the two approaches therefore become similar again in terms of performance? In this case, why would one choose the CompletableFuture over the parallel stream when it clearly takes more work on the developer's part.

In this case, why would one choose the CompletableFuture over the parallel stream when it clearly takes more work on the developer's part.
IMHO This depends on the interface you are looking to support. If you are looking to support an asynchronous API e.g.
CompletableFuture<String> downloadHttp(URL url);
In this case, only a completable future makes sense because you may want to do something else unrelated while you wait for the data to come down.
On the other hand parallelStream() is best for CPU bound tasks where you want every tasks to perform a portion of some work. i.e. every thread is doing the same thing with different data. As you meantion it is also easier to use.

Related

CompletableFuture vs ExecutorService

I have tried complex usecases which can be done by CompletableFuture can be done by ExecutorService as well. That includes, handling exceptions as well.
The only difference I can see between them is, CompletableFuture gives better readability and convenience during coding.
Is there any practical advantage / use-case that one can solve using CompletableFuture, but not with ExecutorService?

Imagine the simple case, you do action A, when it is complicated, do B with the result of A operation - be that a Throwable or a valid result. Add to that that there is no CompletableFuture yet, and you do not want to block. Is it doable with Futures only? Well, yes. Is it painful to achieve? Very.
Do not get fooled by the "simplicity" of its API. In your previous question you already saw what not specifying an ExecutorService to some actions does. There are many, many more subtle things that comes with the package of that "simplicity".
The non-blocking capabilities of CompletableFuture made it highly usable and highly adopted by the developers. Not only us, usual developers, but the jdk ones too - the entire jdk http client is build around CompletableFuture, for a reason.
And your question is not vs ExecutorService, but should really be vs Future. An ExecutorService can still be passed to methods that CompletableFuture provides. For example read this Q&A for what various methods do and act like.

The main advantage of the CompletableFuture (at least for me) is that it allows to build a complex asynchronous processes within a higher probability to get it right from the first time and still be able to understand the logic after some time.
Of course the most of it can be implemented using ExecutorService, which will require to subclass it, override some methods, re-submit the task... Most of it is covered by CompletableFuture out of the box. Moreover, with every jdk update the best practices get integrated into the CompletableFuture class.
Thus, the practical reason to use CompletableFuture is the simplicity (well, you still need to understand how to use it).
For example :
CompletableFuture.supplyAsync(...)
.completeOnTimeout("foo" , 1, TimeUnit.SECONDS);
Is not that simple with ExecutorService to implement.

Why does Java have no async/await?

Using async/await it is possible to code asynchronous functions in an imperative style. This can greatly facilitate asynchronous programming. After it was first introduced in C#, it was adopted by many languages such as JavaScript, Python, and Kotlin.
EA Async is a library that adds async/await like functionality to Java. The library abstracts away the complexity of working with CompletableFutures.
But why has async/await neither been added to Java SE, nor are there any plans to add it in the future?

The short answer is that the designers of Java try to eliminate the need for asynchronous methods instead of facilitating their use.
According to Ron Pressler's talk asynchronous programming using CompletableFuture causes three main problems.
branching or looping over the results of asynchronous method calls is not possible
stacktraces cannot be used to identify the source of errors, profiling becomes impossible
it is viral: all methods that do asynchronous calls have to be asynchronous as well, i.e. synchronous and asynchronous worlds don't mix
While async/await solves the first problem it can only partially solve the second problem and does not solve the third problem at all (e.g. all methods in C# doing an await have to be marked as async).
But why is asynchronous programming needed at all? Only to prevent the blocking of threads, because threads are expensive. Thus instead of introducing async/await in Java, in project Loom Java designers are working on virtual threads (aka fibers/lightweight threads) which will aim to significantly reduce the cost of threads and thus eliminate the need of asynchronous programming. This would make all three problems above also obsolete.

Better late than never!!!
Java is 10+ years late in trying to come up with lighter weight units of execution which can be executed in parallel. As a side note, Project loom also aims to expose in Java 'delimited continuation' which, I believe is nothing more than good old 'yield' keyword of C# (again almost 20 years late!!)
Java does recognize the need for solving the bigger problem solved by asyn await (or actually Tasks in C# which is the big idea. Async Await is more of a syntactical sugar. Highly significant improvement, but still not a necessity to solve the actual problem of OS mapped Threads being heavier than desired).
Look at the proposal for project loom here: https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.html
and navigate to last section 'Other Approaches'. You will see why Java does not want to introduce async/await.
Having said this, I don't really agree with the reasoning being provided. Neither in this proposal nor in Stephan's answer.
First let us diagnose Stephan's answer
async await solves point 1 mentioned there. (Stephan also acknowledges it further down the answer)
It is extra work for sure on the part of the framework and tools but not at all on the part of the programmers. Even with async await, .Net debuggers are pretty good in this aspect.
This I only partially agree with. Whole purpose of async await is to elegantly mix asynchronous world with synchronous constructs. But yes, you either need to declare the caller also as async or deal directly with Task in the caller routine. However, project loom will not solve it either in a meaningful way. To fully benefit from the light weight virtual threads, even the caller routine must be getting executed on a virtual thread. Otherwise what's the benefit? You will end up blocking an OS backed thread!!! Hence even virtual threads need to be 'viral' in the code. On the contrary, it will be easier in Java to not notice that the routine you are calling is async and will block the calling thread (which will be concerning if the calling routine is itself not executing on a virtual thread). Async keyword in C# makes the intent very clear and forces you to decide (it is possible in C# to block as well if you want by asking for Task.Result. Most of the time the calling routine can just as easily be async itself).
Stephan is right when he says async programming is needed to prevent blocking of (OS) threads as (OS) threads are expensive. And that's precisely the whole reason why virtual threads (or C# tasks) are needed. You should be able to 'block' on these tasks without losing your sleep. Offcourse to not lose the sleep, either the calling routine itself should be a task or blocking should be on non-blocking IO, with framework being smart enough to not block the calling thread in that case (power of continuation).
C# supports this and proposed Java feature aims to support this.
According to the proposed Java api, blocking on virtual thread will require calling vThread.join() method in Java.
How is it really more beneficial than calling await workDoneByVThread()?
Now let us look at project loom proposal reasoning
Continuations and fibers dominate async/await in the sense that async/await is easily implemented with continuations (in fact, it can be implemented with a weak form of delimited continuations known as stackless continuations, that don't capture an entire call-stack but only the local context of a single subroutine), but not vice-versa
I don't simply understand this statement. If someone does, please let me know in the comments.
For me, async/await are implemented using continuations and as far as stack trace is concerned, since the fibres/virtual threads/tasks are within the virtual machine, it must be possible to manage that aspect. In-fact .net tools do manage that.
While async/await makes code simpler and gives it the appearance of normal, sequential code, like asynchronous code it still requires significant changes to existing code, explicit support in libraries, and does not interoperate well with synchronous code
I have already covered this. Not making significant changes to existing code and no explicit support in libraries will actually mean not using this feature effectively. Until and unless Java is aiming to transparently transform all the threads to virtual threads, which it can't and isn't, this statement does not make sense to me.
As a core idea, I find no real difference between Java virtual threads and C# tasks. To the point that project loom is also aiming for work-stealing scheduler as default, same as the scheduler used by .Net by default (https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler?view=net-5.0, scroll to last remarks section ).
Only debate it seems is on what syntax should be adopted to consume these.
C# adopted
A distinct class and interface as compared to existing threads
Very helpful syntactical sugar for marrying async with sync
Java is aiming for:
Same familiar interface of Java Thread
No special constructs apart from try-with-resources support for ExecutorService so that the result for submitted tasks/virtual threads can be automatically waited for (thus blocking the calling thread, virtual/non-virtual).
IMHO, Java's choices are worse than those of C#. Having a separate interface and class actually makes it very clear that the behavior is a lot different. Retaining same old interface can lead to subtle bugs when a programmer does not realize that she is now dealing with something different or when a library implementation changes to take advantage of the new constructs but ends up blocking the calling (non-virtual) thread.
Also no special language syntax means that reading async code will remain difficult to understand and reason about (I don't know why Java thinks programmers are in love with Java's Thread syntax and they will be thrilled to know that instead of writing sync looking code they will be using the lovely Thread class)
Heck, even Javascript now has async await (with all its 'single-threadedness').

I release a new project JAsync implement async-await fashion in java which use Reactor as its low level framework. It is in the alpha stage. I need more suggest and test case.
This project makes the developer's asynchronous programming experience as close as possible to the usual synchronous programming, including both coding and debugging.
I think my project solves point 1 mentioned by Stephan.
Here is an example:
#RestController
#RequestMapping("/employees")
public class MyRestController {
#Inject
private EmployeeRepository employeeRepository;
#Inject
private SalaryRepository salaryRepository;
// The standard JAsync async method must be annotated with the Async annotation, and return a JPromise object.
#Async()
private JPromise<Double> _getEmployeeTotalSalaryByDepartment(String department) {
double money = 0.0;
// A Mono object can be transformed to the JPromise object. So we get a Mono object first.
Mono<List<Employee>> empsMono = employeeRepository.findEmployeeByDepartment(department);
// Transformed the Mono object to the JPromise object.
JPromise<List<Employee>> empsPromise = Promises.from(empsMono);
// Use await just like es and c# to get the value of the JPromise without blocking the current thread.
for (Employee employee : empsPromise.await()) {
// The method findSalaryByEmployee also return a Mono object. We transform it to the JPromise just like above. And then await to get the result.
Salary salary = Promises.from(salaryRepository.findSalaryByEmployee(employee.id)).await();
money += salary.total;
}
// The async method must return a JPromise object, so we use just method to wrap the result to a JPromise.
return JAsync.just(money);
}
// This is a normal webflux method.
#GetMapping("/{department}/salary")
public Mono<Double> getEmployeeTotalSalaryByDepartment(#PathVariable String department) {
// Use unwrap method to transform the JPromise object back to the Mono object.
return _getEmployeeTotalSalaryByDepartment(department).unwrap(Mono.class);
}
}
In addition to coding, JAsync also greatly improves the debugging experience of async code.
When debugging, you can see all variables in the monitor window just like when debugging normal code. I will try my best to solve point 2 mentioned by Stephan.
For point 3, I think it is not a big problem. Async/Await is popular in c# and es even if it is not satisfied with it.

Task Executor vs Java 8 parallel streaming

I can't find a specific answer to the line of investigation that we've been requested to take on
I see that parallel streams may not be so performant when using small amount of threads, and that apparently it doesn't behave so well when the DB blocks the next request while processing the current one
However, I find that the overhead of implementing Task Executor vs Parallel Streams is huge, we've implemented a POC that takes care of our concurrency needs with just this one line of code:
List<Map<String, String>> listWithAllMaps = mappedValues.entrySet().parallelStream().map(e -> callPlugins(e))
.collect(Collectors.toList());
Whereas in Task Executor, we'd need to override the Runnable interface and write some cumbersome code just to get the runnables not to be void and return the values we're reading from the DB, leading us into several hours, if not days of coding, and producing a less maintainable, more bug prone code
However, our CTO is still reluctant to using parallel streams due to unforeseen issues that could come up down the road
So the question is, in an environment where I need to make several concurrent read-only queries to a database, using different java-components/REST calls for each query: Is it preferrable in any way to use Task Executor instead of parallel streaming, if so, why?

Use the TaskExecutor as an Executor for a CompletableFuture.
List<CompletableFuture> futures = mappedValues.entrySet().stream().map(e - > CompletableFuture.supplyAsync(() -> callPlugins(e), taskExecutor)).collect(Collectors.toList());
List<Map<String, String>> listWithAllMaps = futures.stream().map(CompletableFuture::join).collect(Collectors.toList());
Not sure how this is cumbersome. Yes it is a bit more code, but with the advantage that you can easily configure the TaskExecutor and increase the number of threads, queueu-size etc. etc.
DISCLAIMER: Typed it from the top of my head, so some minor things might be of with the code snippet.

How to convert an AbstractOnSubscribe to an Operator with backpressure support in RxJava?

I extended AbstractOnSubscribe to create my own OnSubscribe to be used with Observable.create(OnSubscribe<T>) that i named OnSubscribeInputStreamToLines that basically reads an InputStream and calls onNext for each line.
The thing is, I also need to do that with the InputStream being part of an other Observable.
The easy solution would be to do the following:
Observable<InputStream> isObservable = ...;
isObservable
.flatMap(is -> Observable.create(new OnSubscribeInputStreamToLines(is)));
The thing is that would not be really efficient as it would create an Observable for each inputStream. I was thinking I may be able to do this using Observable.lift.
Is there a way so I can easily convert my OnSubscribeInputStreamToLines to an Operator ?
I'm mostly worried about backpressure issues as i would call onNext for each line of an InputStream and although AbstractOnSubscribe supports backpressure, I couldn't find an AbstractOperator that does the same.
Thanks

The distinction here is that your OnSubscribeInputStreamToLines is an entry point into the Observable world whereas lift is an in-between operator for an existing sequence. Besides, the whole throughput might be dominated by the IO operation behind InputStream or the string processing in the operation so I wouldn't worry about that thin wrapper.
AbstractOnSubscribe captures the generator-aspect of operators which helps you build backpressure-aware value emitters (cold sources generally) where you can draft out how, when and what values are emitted.
AbstractOperator, on the other hand, can't be generalized this way because Operators have more freedom for interacting with upstream values and downstream requests. They are highly customized to a specific task and there is little-to-none common points to them. They can be built from a set of primitives but that's it (I've written hundreds of them).
So don't be afraid of flatMapping over things.

Don't be bothered about creating another Observable for each InputStream. The overhead is probably not as large as you might think especially compared to overhead associated with lift.
I don't know the nature of the InputStreams you are consuming but you should probably consider Observable.using() to close those resources safely (on termination or unsubscription).
You are absolutely right to have hesitations about writing a backpressure supporting Operator. It is very tricky ground to be stepping on unless you are composing existing Operators.

Relinquish the thread/CPU until async call completes in Akka and Java?

I'm looking for the Java/Akka equivalent of Python's yield from or gevent's monkey patch.
Update
There has been some confusion in the commets about what is question is asking so let me restate the question:
If I have a future, how do I wait for the future to compete without blocking the thread AND without returning to caller until the future is complete?
Lets say we have method that blocks:
public Object foo() {
Object result = someBlockingCall();
return doSomeThingWithResult(result);
}
To make this asynchronous, we would pass SomeBlockingCall() a callback:
public void foo() {
someAsyncCall(new Handler() {
public void onSuccess(Object result) {
message = doSomethingWithResult(result);
callerRef.tell(message, ActorRef.noSender());
}
});
}
The call to foo() now returns before the result is ready, so the caller no longer gets the result. We have to get the result back to the caller by passing a message. To convert synchronous code to asynchronous Akka code, a redesign of the caller is required.
What I'd like is async code that looks like synchronous code like Python's Gevent.
I want to write:
public Object foo() {
Future future = someAsyncCall();
// save stack frame, go back to the event loop and relinquish CPU
// so other events can use the thread,
// and come back when future is complete and restore stack frame
return yield(future);
}
This would allow me to make my synchronous code asynchronous without a redesign.
Is this posible?
Note:
The Play framework seems to fake this with async() and AsyncResult. But this won't work in general since I have to write the code that handles the AsyncResult which would look like the above callback handler.

I think trying to get back a more straightforward sync design, although an efficient one, is actually a good intention and a good idea (see for example here).
Quasar has facilities to obtain sync/blocking APIs that are still highly efficient from async APIs (see this blog post), which looks exactly what you're looking for.
The fundamental problem is not that the sync/blocking style itself is bad (actually async and sync are dual styles and can be transformed into one another, see for example here), but rather than blocking Java's heavyweight threads is not efficient: it is not an abstraction problem but an implementation problem, so instead of giving up the easier thread abstraction only because the implementation is inefficient, I agree it is better for the future of your code to try and look for more efficient thread implementations.
As Roland hinted, Quasar adds lightweight threads or fibers to the JVM, so you can get the same performance of async frameworks without giving up the thread abstraction and regular imperative control flow constructs (sequence, loops etc.) available in the language.
It also unifies JVM/JDK's threads and its fibers under a common strand interface, so they can interoperate seamlessly, and provides a porting of java.util.concurrent to this unified concept.
On top of strands (either fibers or regular threads) Quasar also offers fully-fledged Erlang-style actors, blocking Go-like channels and dataflow programming, so you can choose the concurrent programming paradigm that suits best your skills and needs without being forced into one.
It also provides bindings for popular and standard technologies (as part of the Comsat project), so you can preserve your code assets because the porting effort will be minimal (if any). For the same reason you can also opt-out easily, should you choose to.
Currently Quasar has binding for Java 7 and 8, Clojure under the Pulsar project and soon JetBrain's Kotlin. Being based on JVM bytecode instrumentation, Quasar can really work with any JVM language if an integration module is present, and it offers tools to build additional ones.

The answer to your question is “no”, and that is very much by design. Writing a method to be asynchronous means returning the Future as its result, the method itself will not perform the computation but will arrange for the result to be provided later. You can then pass this Future to the right place where it is further used, for example by transforming it using one of the many combinators (like map, recover, etc.).
Awaiting a strict result for the Future will have to block the current thread of execution, no matter which technology you use. With plain JVM threads you will block a real thread from the operating system, with Quasar you will block your Fiber, with Akka you will block your Actor (*); blocking means blocking, no way around that.
(*) In an Actor you would get the result via a message at a later point, and until that point you will have to switch the behavior such that new incoming messages are stashed, rejected or dropped, depending on your use-case.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.