RxJava .subscribeOn(Schedulers.newThread()) questions - java

I am on plain JDK 8. I have this simple RxJava example:
Observable
.from(Arrays.asList("one", "two", "three"))
.doOnNext(word -> System.out.printf("%s uses thread %s%n", word, Thread.currentThread().getName()))
//.subscribeOn(Schedulers.newThread())
.subscribe(word -> System.out.println(word));
and it prints out the words line by line, intertwined with information about the thread, which is 'main' for all next calls, as expected.
However, when I uncomment the subscribeOn(Schedulers.newThread()) call, nothing is printed at all. Why isn't it working? I would have expected it to start a new thread for each onNext() call and the doOnNext() to print that thread's name. Right now, I see nothing, also for the other schedulers.
When I add the call to Thread.sleep(10000L) at the end of my main, I can see the output, which would suggest the threads used by RxJava are all daemons. Is this the case? Can this be changed somehow, but using a custom ThreadFactory or similar concept, and not have to implement a custom Scheduler?
With the mentioned change, the thread name is always RxNewThreadScheduler-1, whereas the documentation for newThread says "Scheduler that creates a new {#link Thread} for each unit of work". Isn't it supposed to create a new thread for all of the emissions?

As Vladimir mentioned, RxJava standard schedulers run work on daemon threads which terminate in your example because the main thread quits. I'd like to emphasise that they don't schedule each value on a new thread, but they schedule the stream of values for each individual subscriber on a newly created thread. Subscribing a second time would give you "RxNewThreadScheduler-2".
You don't really need to change the default schedulers, but just wrap your own Executor-based scheduler with Schedulers.from() and supply that as a parameter where needed:
ThreadPoolExecutor exec = new ThreadPoolExecutor(
0, 64, 2, TimeUnit.SECONDS, new LinkedBlockingQueue<>());
exec.allowCoreThreadTimeOut(true);
Scheduler s = Schedulers.from(exec);
Observable
.from(Arrays.asList("one", "two", "three"))
.doOnNext(word -> System.out.printf("%s uses thread %s%n", word,
Thread.currentThread().getName()))
.subscribeOn(s)
.subscribe(word -> System.out.println(word));
I've got a series of blog posts about RxJava schedulers whichs should help you implement a "more permanent" variant.

Contrary to newcomers belief, reactive streams are not inherently concurrent but are inherently asynchronous. They also are inherently sequential and concurrency must be configured within the stream. Put simply, reactive streams are naturally sequential at their ends but can be concurrent at their core.
The secret sauce is using the flatMap() operator within the stream. This operator takes an Observable<T> input from the source stream and, internally re-emit it as an Observable<Observable<T>> stream to which it subscribes too all instances at once. As long as the flatMap() internal stream is executed in a multi-threaded context, it will concurrently execute the provided Function<T, R> that applies your logic and, finally, re-emit the result on the original stream as it's own emissions.
This sounds very complicated (and it is quite a bit at first glance) but simple examples with explanations help to understand the concept.
Find more details from a similar question here and articles on RxJava2 Schedulers and Concurrency with code sample and detailed explanations on how to use Schedulers sequentially and concurrently.
Hope this helps,
Softjake

public class MainClass {
public static void main(String[] args) {
Scheduler scheduler = Schedulers.from(Executors.newFixedThreadPool(10, Executors.defaultThreadFactory()));
Observable.interval(1,TimeUnit.SECONDS)
.doOnNext(word -> System.out.printf("%s uses thread %s%n", word,
Thread.currentThread().getName()))
.subscribeOn(scheduler)
.observeOn(Schedulers.io())
.doOnNext(word -> System.out.printf("%s uses thread %s%n", word,
Thread.currentThread().getName()))
.subscribe();
}
}

Related

Simple multi-threaded Java app - ExecutorService? Fork/Join? Spliterators?

I am writing a command-line application in Java 8. There's a part that involves some computation, and I believe it could benefit from running in parallel using multiple threads. However, I have not much experience in writing multi-threaded applications, so I hope you could steer me in the right direction how should I design the parallel part of my code.
For simplicity, let's pretend the method in question receives a relatively big array of longs, and it should return a Set containing only prime numbers:
public final static boolean checkIfNumberIsPrime(long number) {
// algorithm implementation, not important here
// ...
}
// a single-threaded version
public Set<Long> extractPrimeNumbers(long[] inputArray) {
Set<Long> result = new HashSet<>();
for (long number : inputArray) {
if (checkIfNumberIsPrime(number)) {
result.add(number);
}
}
return result;
}
Now, I would like to refactor method extractPrimeNumbers() in such way that it would be executed by four threads in parallel, and when all of them are finished, return the result. Off the top of my head, I have the following questions:
Which approach would be more suitable for the task: ExecutorService or Fork/Join? (each element of inputArray[] is completely independent and they can be processed in any order whatsoever)
Assuming there are 1 million elements in inputArray[], should I "ask" thread #1 to process all indexes 0..249999, thread #2 - 250000..499999, thread #3 - 500000..749999 and thread #4 - 750000..999999? Or should I rather treat each element of inputArray[] as a separate task to be queued and then executed by an applicable worker thread?
If a prime number is detected, it should be added to `Set result, therefore it needs to be thread-safe (synchronized). So, perhaps it would be better if each thread maintained its own, local result-set, and only when it is finished, it would transfer its contents to the global result, in one go?
Is Spliterator of any use here? Should they be used to partition inputArray[] somehow?
Parallel stream
Use none of these. Parallel streams are going to be enough to deal with this problem much more straightforwardly than any of the alternatives you list.
return Arrays.parallelStream(inputArray)
.filter(n -> checkIfNumberIsPrime(n))
.boxed()
.collect(Collectors.toSet());
For more info, see The Java™ Tutorials > Aggregate Operations > Parallelism.

Replicate deferred/async launch policies from C++ in Java

In C++ you can start a thread with a deferred or asynchronous launch policy. Is there a way to replicate this functionality in Java?
auto T1 = std::async(std::launch::deferred, doSomething());
auto T2 = std::async(std::launch::async, doSomething());
Descriptions of each--
Asynchronous:
If the async flag is set, then async executes the callable object f on a new thread of execution (with all thread-locals initialized) except that if the function f returns a value or throws an exception, it is stored in the shared state accessible through the std::future that async returns to the caller.
Deferred:
If the deferred flag is set, then async converts f and args... the same way as by std::thread constructor, but does not spawn a new thread of execution. Instead, lazy evaluation is performed: the first call to a non-timed wait function on the std::future that async returned to the caller will cause the copy of f to be invoked (as an rvalue) with the copies of args... (also passed as rvalues) in the current thread (which does not have to be the thread that originally called std::async). The result or exception is placed in the shared state associated with the future and only then it is made ready. All further accesses to the same std::future will return the result immediately.
See the documentation for details.
Future
First of all, we have to observe that std::async is a tool to execute a given task and return a std::future object that holds the result of the computation once its available.
For example we can call result.get() to block and wait for the result to arrive. Also, when the computation encountered an exception, it will be stored and rethrown to us as soon as we call result.get().
Java provides similar classes, the interface is Future and the most relevant implementation is CompletableFuture.
std::future#get translates roughly to Future#get. Even the exceptional behavior is very similar. While C++ rethrows the exception upon calling get, Java will throw a ExecutionException which has the original exception set as cause.
How to obtain a Future?
In C++ you create your future object using std::async. In Java you could use one of the many static helper methods in CompletableFuture. In your case, the most relevant are
CompletableFuture#runAsync, if the task does not return any result and
CompletableFuture#supplyAsync, if the task will return a result upon completion
So in order to create a future that just prints Hello World!, you could for example do
CompletableFuture<Void> task = CompletableFuture.runAsync(() -> System.out.println("Hello World!"));
/*...*/
task.get();
Java not only has lambdas but also method references. Lets say you have a method that computes a heavy math task:
class MyMath {
static int compute() {
// Very heavy, duh
return (int) Math.pow(2, 5);
}
}
Then you could create a future that returns the result once its available as
CompletableFuture<Integer> task = CompletableFuture.runAsync(MyMath::compute);
/*...*/
Integer result = task.get();
async vs deferred
In C++, you have the option to specify a launch policy which dictates the threading behavior for the task. Let us put the memory promises C++ makes aside, because in Java you do not have that much control over memory.
The differences are that async will immediately schedule creation of a thread and execute the task in that thread. The result will be available at some point and is computed while you can continue work in your main task. The exact details whether it is a new thread or a cached thread depend on the compiler and are not specified.
deferred behaves completely different to that. Basically nothing happens when you call std::async, no extra thread will be created and the task will not be computed yet. The result will not be made available in the meantime at all. However, as soon as you call get, the task will be computed in your current thread and return a result. Basically as if you would have called the method directly yourself, without any async utilities at all.
std::launch::async in Java
That said, lets focus on how to translate this behavior to Java. Lets start with async.
This is the simple one, as it is basically the default and intended behavior offered in CompletableFuture. So you just do runAsync or supplyAsync, depending on whether your method returns a result or not. Let me show the previous examples again:
// without result
CompletableFuture<Void> task = CompletableFuture.runAsync(() -> System.out.println("Hello World!"));
/*...*/ // the task is computed in the meantime in a different thread
task.get();
// with result
CompletableFuture<Integer> task = CompletableFuture.supplyAsync(MyMath::compute);
/*...*/
Integer result = task.get();
Note that there are also overloads of the methods that except an Executor which can be used if you have your own thread pool and want CompletableFuture to use that instead of its own (see here for more details).
std::launch::deferred in Java
I tried around a lot to mock this behavior with CompletableFuture but it does not seem to be possibly without creating your own implementation (please correct me if I am wrong though). No matter what, it either executes directly upon creation or not at all.
So I would just propose to use the underlying task interface that you gave to CompletableFuture, for example Runnable or Supplier, directly. In our case, we might also use IntSupplier to avoid the autoboxing.
Here are the two code examples again, but this time with deferred behavior:
// without result
Runnable task = () -> System.out.println("Hello World!");
/*...*/ // the task is not computed in the meantime, no threads involved
task.run(); // the task is computed now
// with result
IntSupplier task = MyMath::compute;
/*...*/
int result = task.getAsInt();
Modern multithreading in Java
As a final note I would like to give you a better idea how multithreading is typically used in Java nowadays. The provided facilities are much richer than what C++ offers by default.
Ideally should design your system in a way that you do not have to care about such little threading details. You create an automatically managed dynamic thread pool using Executors and then launch your initial task against that (or use the default executor service provided by CompletableFuture). After that, you just setup an operation pipeline on the future object, similar to the Stream API and then just wait on the final future object.
For example, let us suppose you have a list of file names List<String> fileNames and you want to
read the file
validate its content, skip it if its invalid
compress the file
upload the file to some web server
check the response status code
and count how many where invalid, not successfull and successfull. Suppose you have some methods like
class FileUploader {
static byte[] readFile(String name) { /*...*/ }
static byte[] requireValid(byte[] content) throws IllegalStateException { /*...*/ }
static byte[] compressContent(byte[] content) { /*...*/ }
static int uploadContent(byte[] content) { /*...*/ }
}
then we can do so easily by
AtomicInteger successfull = new AtomicInteger();
AtomicInteger notSuccessfull = new AtomicInteger();
AtomicInteger invalid = new AtomicInteger();
// Setup the pipeline
List<CompletableFuture<Void>> tasks = fileNames.stream()
.map(name -> CompletableFuture
.completedFuture(name)
.thenApplyAsync(FileUploader::readFile)
.thenApplyAsync(FileUploader::requireValid)
.thenApplyAsync(FileUploader::compressContent)
.thenApplyAsync(FileUploader::uploadContent)
.handleAsync((statusCode, exception) -> {
AtomicInteger counter;
if (exception == null) {
counter = statusCode == 200 ? successfull : notSuccessfull;
} else {
counter = invalid;
}
counter.incrementAndGet();
})
).collect(Collectors.toList());
// Wait until all tasks are done
tasks.forEach(CompletableFuture::join);
// Print the results
System.out.printf("Successfull %d, not successfull %d, invalid %d%n", successfull.get(), notSuccessfull.get(), invalid.get());
The huge benefit of this is that it will reach max throughput and use all hardware capacity offered by your system. All tasks are executed completely dynamic and independent, managed by an automatic pool of threads. And you just wait until everything is done.
For asynchronous launch of a thread, in modern Java prefer the use of a high-level java.util.concurrent.ExecutorService.
One way to obtain an ExecutorService is through java.util.concurrent.Executors. Different behaviors are available for ExecutorServices; the Executors class provides methods for some common cases.
Once you have an ExecutorService, you can submit Runnables and Callables to it.
Future<MyReturnValue> myFuture = myExecutorService.submit(myTask);
If I understood you correctly, may be something like this:
private static CompletableFuture<Void> deferred(Runnable run) {
CompletableFuture<Void> future = new CompletableFuture<>();
future.thenRun(run);
return future;
}
private static CompletableFuture<Void> async(Runnable run) {
return CompletableFuture.runAsync(run);
}
And then using them like:
public static void main(String[] args) throws Exception {
CompletableFuture<Void> def = deferred(() -> System.out.println("run"));
def.complete(null);
System.out.println(def.join());
CompletableFuture<Void> async = async(() -> System.out.println("run async"));
async.join();
}
To get something like a deferred thread, you might try running a thread at a reduced priority.
First, in Java it's often idiomatic to make a task using a Runnable first. You can also use the Callable<T> interface, which allows the thread to return a value (Runnable can't).
public class MyTask implements Runnable {
#Override
public void run() {
System.out.println( "hello thread." );
}
}
Then just create a thread. In Java threads normally wrap the task they execute.
MyTask myTask = new MyTask();
Thread t = new Tread( myTask );
t.setPriority( Thread.currentThread().getPriority()-1 );
t.start();
This should not run until there is a core available to do so, which means it shouldn't run until the current thread is blocked or run out of things to do. However you're at the mercy of the OS scheduler here, so the specific operation is not guaranteed. Most OSs will guarantee that all threads run eventually, so if the current thread takes a long time with out blocking the OSs will start it executing anyway.
setPriority() can throw a security exception if you're not allowed to set the priority of a thread (uncommon but possible). So just be aware of that minor inconvenience.
For an asynch task with a Future I would use an executor service. The helper methods in the class Executors are a convenient way to do this.
First make your task as before.
public class MyCallable implements Callable<String> {
#Override
public String call() {
return "hello future thread.";
}
}
Then use an executor service to run it:
MyCallable myCallable = new MyCallable();
ExecutorService es = Executors.newCachedThreadPool();
Future<String> f = es.submit( myCallable );
You can use the Future object to query the thread, determine its running status and get the value it returns. You will need to shutdown the executor to stop all of its threads before exiting the JVM.
es.shutdown();
I've tried to write this code as simply as possible, without the use of lambdas or clever use of generics. The above should show you what those lambdas are actually implementing. However it's usually considered better to be a bit more sophisticated when writing code (and a bit less verbose) so you should investigate other syntax once you feel you understand the above.

What is the difference between thenApply and thenApplyAsync of Java CompletableFuture?

Suppose I have the following code:
CompletableFuture<Integer> future
= CompletableFuture.supplyAsync( () -> 0);
thenApply case:
future.thenApply( x -> x + 1 )
.thenApply( x -> x + 1 )
.thenAccept( x -> System.out.println(x));
Here the output will be 2. Now in case of thenApplyAsync:
future.thenApplyAsync( x -> x + 1 ) // first step
.thenApplyAsync( x -> x + 1 ) // second step
.thenAccept( x -> System.out.println(x)); // third step
I read in this blog that each thenApplyAsync are executed in a separate thread and 'at the same time'(that means following thenApplyAsyncs started before preceding thenApplyAsyncs finish), if so, what is the input argument value of the second step if the first step not finished?
Where will the result of the first step go if not taken by the second step?
the third step will take which step's result?
If the second step has to wait for the result of the first step then what is the point of Async?
Here x -> x + 1 is just to show the point, what I want know is in cases of very long computation.
The difference has to do with the Executor that is responsible for running the code. Each operator on CompletableFuture generally has 3 versions.
thenApply(fn) - runs fn on a thread defined by the CompleteableFuture on which it is called, so you generally cannot know where this will be executed. It might immediately execute if the result is already available.
thenApplyAsync(fn) - runs fn on a environment-defined executor regardless of circumstances. For CompletableFuture this will generally be ForkJoinPool.commonPool().
thenApplyAsync(fn,exec) - runs fn on exec.
In the end the result is the same, but the scheduling behavior depends on the choice of method.
You're mis-quoting the article's examples, and so you're applying the article's conclusion incorrectly. I see two question in your question:
What is the correct usage of .then___()
In both examples you quoted, which is not in the article, the second function has to wait for the first function to complete. Whenever you call a.then___(b -> ...), input b is the result of a and has to wait for a to complete, regardless of whether you use the methods named Async or not. The article's conclusion does not apply because you mis-quoted it.
The example in the article is actually
CompletableFuture<String> receiver = CompletableFuture.supplyAsync(this::findReceiver);
receiver.thenApplyAsync(this::sendMsg);
receiver.thenApplyAsync(this::sendMsg);
Notice the thenApplyAsync both applied on receiver, not chained in the same statement. This means both function can start once receiver completes, in an unspecified order. (Any assumption of order is implementation dependent.)
To put it more clearly:
a.thenApply(b).thenApply(c); means the order is a finishes then b starts, b finishes, then c starts.
a.thenApplyAsync(b).thenApplyAsync(c); will behave exactly the same as above as far as the ordering between a b c is concerned.
a.thenApply(b); a.thenApply(c); means a finishes, then b or c can start, in any order. b and c don't have to wait for each other.
a.thenApplyAync(b); a.thenApplyAsync(c); works the same way, as far as the order is concerned.
You should understand the above before reading the below. The above concerns asynchronous programming, without it you won't be able to use the APIs correctly. The below concerns thread management, with which you can optimize your program and avoid performance pitfalls. But you can't optimize your program without writing it correctly.
As titled: Difference between thenApply and thenApplyAsync of Java CompletableFuture?
I must point out that the people who wrote the JSR must have confused the technical term "Asynchronous Programming", and picked the names that are now confusing newcomers and veterans alike. To start, there is nothing in thenApplyAsync that is more asynchronous than thenApply from the contract of these methods.
The difference between the two has to do with on which thread the function is run. The function supplied to thenApply may run on any of the threads that
calls complete
calls thenApply on the same instance
while the 2 overloads of thenApplyAsync either
uses a default Executor (a.k.a. thread pool), or
uses a supplied Executor
The take away is that for thenApply, the runtime promises to eventually run your function using some executor which you do not control. If you want control of threads, use the Async variants.
If your function is lightweight, it doesn't matter which thread runs your function.
If your function is heavy CPU bound, you do not want to leave it to the runtime. If the runtime picks the network thread to run your function, the network thread can't spend time to handle network requests, causing network requests to wait longer in the queue and your server to become unresponsive. In that case you want to use thenApplyAsync with your own thread pool.
Fun fact: Asynchrony != threads
thenApply/thenApplyAsync, and their counterparts thenCompose/thenComposeAsync, handle/handleAsync, thenAccept/thenAcceptAsync, are all asynchronous! The asynchronous nature of these function has to do with the fact that an asynchronous operation eventually calls complete or completeExceptionally. The idea came from Javascript, which is indeed asynchronous but isn't multi-threaded.
This is what the documentation says about CompletableFuture's thenApplyAsync:
Returns a new CompletionStage that, when this stage completes
normally, is executed using this stage's default asynchronous
execution facility, with this stage's result as the argument to the
supplied function.
So, thenApplyAsync has to wait for the previous thenApplyAsync's result:
In your case you first do the synchronous work and then the asynchronous one. So, it does not matter that the second one is asynchronous because it is started only after the synchrounous work has finished.
Let's switch it up. In some cases "async result: 2" will be printed first and in some cases "sync result: 2" will be printed first. Here it makes a difference because both call 1 and 2 can run asynchronously, call 1 on a separate thread and call 2 on some other thread, which might be the main thread.
CompletableFuture<Integer> future
= CompletableFuture.supplyAsync(() -> 0);
future.thenApplyAsync(x -> x + 1) // call 1
.thenApplyAsync(x -> x + 1)
.thenAccept(x -> System.out.println("async result: " + x));
future.thenApply(x -> x + 1) // call 2
.thenApply(x -> x + 1)
.thenAccept(x -> System.out.println("sync result:" + x));
The second step (i.e. computation) will always be executed after the first step.
If the second step has to wait for the result of the first step then what is the point of Async?
Async means in this case that you are guaranteed that the method will return quickly and the computation will be executed in a different thread.
When calling thenApply (without async), then you have no such guarantee. In this case the computation may be executed synchronously i.e. in the same thread that calls thenApply if the CompletableFuture is already completed by the time the method is called. But the computation may also be executed asynchronously by the thread that completes the future or some other thread that calls a method on the same CompletableFuture. This answer: https://stackoverflow.com/a/46062939/1235217 explained in detail what thenApply does and does not guarantee.
So when should you use thenApply and when thenApplyAsync? I use the following rule of thumb:
non-async: only if the task is very small and non-blocking, because in this case we don't care which of the possible threads executes it
async (often with an explicit executor as parameter): for all other tasks
In both thenApplyAsync and thenApply the Consumer<? super T> action passed to these methods will be called asynchronously and will not block the thread that specified the consumers.
The difference have to do with which thread will be responsible for calling the method Consumer#accept(T t):
Consider an AsyncHttpClient call as below: Notice the thread names printed below. I hope it give you clarity on the difference:
// running in the main method
// public static void main(String[] args) ....
CompletableFuture<Response> future =
asyncHttpClient.prepareGet(uri).execute().toCompletableFuture();
log.info("Current Thread " + Thread.currentThread().getName());
//Prints "Current Thread main"
thenApply Will use the same thread that completed the future.
//will use the dispatcher threads from the asyncHttpClient to call `consumer.apply`
//The thread that completed the future will be blocked by the execution of the method `Consumer#accept(T t)`.
future.thenApply(myResult -> {
log.info("Applier Thread " + Thread.currentThread().getName());
return myResult;
})
//Prints: "Applier Thread httpclient-dispatch-8"
thenApplyAsync Will use the a thread from the Executor pool.
//will use the threads from the CommonPool to call `consumer.accept`
//The thread that completed the future WON'T be blocked by the execution of the method `Consumer#accept(T t)`.
future.thenApplyAsync(myResult -> {
log.info("Applier Thread " + Thread.currentThread().getName());
return myResult;
})
//Prints: "Applier Thread ForkJoinPool.commonPool-worker-7"
future.get() Will block the main thread .
//If called, `.get()` may block the main thread if the CompletableFuture is not completed.
future.get();
Conclusion
The Async suffix in the method thenApplyAsync means that the thread completing the future will not be blocked by the execution of the Consumer#accept(T t) method.
The usage of thenApplyAsync vs thenApply depends if you want to block the thread completing the future or not.

Am I misusing rxJava by converting an observable into a blocking observable?

My API makes about 100 downstream calls, in pairs, to two separate services. All responses need to be aggregated, before I can return my response to the client. I use hystrix-feign to make the HTTP calls.
I came up with what I believed was an elegant solution until on the rxJava docs I've found the following
BlockingObservable is a variety of Observable that provides blocking operators. It can be useful for testing and demo purposes, but is generally inappropriate for production applications (if you think you need to use a BlockingObservable this is usually a sign that you should rethink your design).
My code looks roughly as follows
List<Observable<C>> observables = new ArrayList<>();
for (RequestPair request : requests) {
Observable<C> zipped = Observable.zip(
feignClientA.sendRequest(request.A()),
feignClientB.sendRequest(request.B()),
(a, b) -> new C(a,b));
observables.add(zipped);
}
Collection<D> apiResponse = = new ConcurrentLinkedQueue<>();
Observable
.merge(observables)
.toBlocking()
.forEach(combinedResponse -> apiResponse.add(doSomeWork(combinedResponse)));
return apiResponse;
Few questions based on this setup:
Is toBlocking() justified given my use case
Am I correct in understanding that the actual HTTP calls do not get made until the main thread gets to the forEach()
I've seen that the code in the forEach() block is executed by different threads, but I was not able to verify if there can be more than one thread in the forEach() block. Is the execution there concurrent?
A better option is to return the Observable to be consumed by other operators but you may get away with blocking code (It should, however, run on a background thread.)
public Observable<D> getAll(Iterable<RequestPair> requests) {
return Observable.from(requests)
.flatMap(request ->
Observable.zip(
feignClientA.sendRequest(request.A()),
feignClientB.sendRequest(request.B()),
(a, b) -> new C(a,b)
)
, 8) // maximum concurrent HTTP requests
.map(both -> doSomeWork(both));
}
// for legacy users of the API
public Collection<D> getAllBlocking(Iterable<RequestPair> requests) {
return getAll(requests)
.toList()
.toBlocking()
.first();
}
Am I correct in understanding that the actual HTTP calls do not get made until the main thread gets to the forEach()
Yes, the forEach triggers the whole sequence of operations.
I've seen that the code in the forEach() block is executed by different threads, but I was not able to verify if there can be more than one thread in the forEach() block. Is the execution there concurrent?
Only one thread at a time is allowed to execute the lambda in forEach but you may indeed see different threads entering there.

Calling sequential on parallel stream makes all previous operations sequential

I've got a significant set of data, and want to call slow, but clean method and than call fast method with side effects on result of the first one. I'm not interested in intermediate results, so i would like not to collect them.
Obvious solution is to create parallel stream, make slow call , make stream sequential again, and make fast call. The problem is, ALL code executing in single thread, there is no actual parallelism.
Example code:
#Test
public void testParallelStream() throws ExecutionException, InterruptedException
{
ForkJoinPool forkJoinPool = new ForkJoinPool(Runtime.getRuntime().availableProcessors() * 2);
Set<String> threads = forkJoinPool.submit(()-> new Random().ints(100).boxed()
.parallel()
.map(this::slowOperation)
.sequential()
.map(Function.identity())//some fast operation, but must be in single thread
.collect(Collectors.toSet())
).get();
System.out.println(threads);
Assert.assertEquals(Runtime.getRuntime().availableProcessors() * 2, threads.size());
}
private String slowOperation(int value)
{
try
{
Thread.sleep(100);
}
catch (InterruptedException e)
{
e.printStackTrace();
}
return Thread.currentThread().getName();
}
If I remove sequential, code executing as expected, but, obviously, non-parallel operation would be call in multiple threads.
Could you recommend some references about such behavior, or maybe some way to avoid temporary collections?
Switching the stream from parallel() to sequential() worked in the initial Stream API design, but caused many problems and finally the implementation was changed, so it just turns the parallel flag on and off for the whole pipeline. The current documentation is indeed vague, but it was improved in Java-9:
The stream pipeline is executed sequentially or in parallel depending on the mode of the stream on which the terminal operation is invoked. The sequential or parallel mode of a stream can be determined with the BaseStream.isParallel() method, and the stream's mode can be modified with the BaseStream.sequential() and BaseStream.parallel() operations. The most recent sequential or parallel mode setting applies to the execution of the entire stream pipeline.
As for your problem, you can collect everything into intermediate List and start new sequential pipeline:
new Random().ints(100).boxed()
.parallel()
.map(this::slowOperation)
.collect(Collectors.toList())
// Start new stream here
.stream()
.map(Function.identity())//some fast operation, but must be in single thread
.collect(Collectors.toSet());
In the current implementation a Stream is either all parallel or all sequential. While the Javadoc isn't explicit about this and it could change in the future it does say this is possible.
S parallel()
Returns an equivalent stream that is parallel. May return itself, either because the stream was already parallel, or because the underlying stream state was modified to be parallel.
If you need the function to be single threaded, I suggest you use a Lock or synchronized block/method.

Categories

Resources