Based on different posts and articles, all I have known is that: thenCompose will run in the same thread with the previous stage while thenComposeAsync will try to start a new thread compared to the previous stage.
Even Java 8 in Action provides the following code to demonstrate that thenCompose will work in the same thread.
Quote from Java 8 in Action Chapter 11.4.3
public List<String> findPrices(String product) {
List<CompletableFuture<String>> priceFutures =
shops.stream()
.map(shop -> CompletableFuture.supplyAsync(
() -> shop.getPrice(product), executor))
.map(future -> future.thenApply(Quote::parse))
.map(future -> future.thenCompose(quote ->
CompletableFuture.supplyAsync(
() -> Discount.applyDiscount(quote), executor)))
.collect(toList());
return priceFutures.stream()
.map(CompletableFuture::join)
.collect(toList());
}
But I tried the following code, the result quite confused me which seems quite opposite that the thenCompose will start new thread while thenComposeAsync will reuse the old thread.
public static void main(String... args) {
testBasic();
testAsync();
}
private static void testBasic() {
out.println("*****************************************");
out.println("********** TESTING thenCompose **********");
ExecutorService executorService = Executors.newCachedThreadPool();
CompletableFuture[] futures = Stream.iterate(0, i -> i+1)
.limit(3)
.map(i -> CompletableFuture.supplyAsync(() -> runStage1(i), executorService))
.map(future -> future.thenCompose(i -> CompletableFuture.supplyAsync(() -> runStage2(i), executorService)))
.map(f -> f.thenAccept(out::println))
.toArray(size -> new CompletableFuture[size]);
CompletableFuture.allOf(futures).join();
}
private static void testAsync() {
out.println("*****************************************");
out.println("******* TESTING thenComposeAsync ********");
ExecutorService executorService = Executors.newCachedThreadPool();
CompletableFuture[] futures = Stream.iterate(0, i -> i+1)
.limit(3)
.map(i -> CompletableFuture.supplyAsync(() -> runStage1(i), executorService))
.map(future -> future.thenComposeAsync(i ->
CompletableFuture.supplyAsync(() -> runStage2(i), executorService)))
.map(f -> f.thenAccept(out::println))
.toArray(size -> new CompletableFuture[size]);
CompletableFuture.allOf(futures).join();
}
private static Integer runStage1(int a) {
String s = String.format("Start: stage - 1 - value: %d - thread name: %s",
a, Thread.currentThread().getName());
out.println(s);
Long start = System.currentTimeMillis();
try {
Thread.sleep(1500 + Math.abs(new Random().nextInt()) % 1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
s = String.format("Finish: stage - 1 - value: %d - thread name: %s - time cost: %d",
a, Thread.currentThread().getName(), (System.currentTimeMillis() - start));
out.println(s);
return Integer.valueOf(a % 3);
}
private static Integer runStage2(int b) {
String s = String.format("Start: stage - 2 - value: %d - thread name: %s",
b, Thread.currentThread().getName());
out.println(s);
Long start = System.currentTimeMillis();
try {
Thread.sleep(200 + Math.abs(new Random().nextInt()) % 1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
s = String.format("Finish: stage - 2 - value: %d - thread name: %s - time cost: %d",
b, Thread.currentThread().getName(), (System.currentTimeMillis() - start));
out.println(s);
return Integer.valueOf(b);
}
And the result is as follows:
*****************************************
********** TESTING thenCompose **********
Start: stage - 1 - value: 1 - thread name: pool-1-thread-2
Start: stage - 1 - value: 0 - thread name: pool-1-thread-1
Start: stage - 1 - value: 2 - thread name: pool-1-thread-3
Finish: stage - 1 - value: 1 - thread name: pool-1-thread-2 - time cost: 1571
Start: stage - 2 - value: 1 - thread name: pool-1-thread-4 // using a new thread?????
Finish: stage - 1 - value: 2 - thread name: pool-1-thread-3 - time cost: 1875
Start: stage - 2 - value: 2 - thread name: pool-1-thread-2
Finish: stage - 1 - value: 0 - thread name: pool-1-thread-1 - time cost: 2198
Start: stage - 2 - value: 0 - thread name: pool-1-thread-3
Finish: stage - 2 - value: 2 - thread name: pool-1-thread-2 - time cost: 442
2
Finish: stage - 2 - value: 1 - thread name: pool-1-thread-4 - time cost: 779
1
Finish: stage - 2 - value: 0 - thread name: pool-1-thread-3 - time cost: 1157
0
*****************************************
******* TESTING thenComposeAsync ********
Start: stage - 1 - value: 0 - thread name: pool-2-thread-1 // all in same thread
Start: stage - 1 - value: 1 - thread name: pool-2-thread-2
Start: stage - 1 - value: 2 - thread name: pool-2-thread-3
Finish: stage - 1 - value: 0 - thread name: pool-2-thread-1 - time cost: 1623
Start: stage - 2 - value: 0 - thread name: pool-2-thread-1
Finish: stage - 1 - value: 1 - thread name: pool-2-thread-2 - time cost: 1921
Start: stage - 2 - value: 1 - thread name: pool-2-thread-2
Finish: stage - 1 - value: 2 - thread name: pool-2-thread-3 - time cost: 1932
Start: stage - 2 - value: 2 - thread name: pool-2-thread-3
Finish: stage - 2 - value: 0 - thread name: pool-2-thread-1 - time cost: 950
0
Finish: stage - 2 - value: 2 - thread name: pool-2-thread-3 - time cost: 678
2
Finish: stage - 2 - value: 1 - thread name: pool-2-thread-2 - time cost: 956
1
Is there something wrong in the demo? Please correct me or if there are some useful resources, do please share.
Related
My idea is to share data from the main to the child thread (from the thread pool).
I found InheritableThreadLocal
Below is my example:
public static void main(String[] args) {
InheritableThreadLocal<String> inheritableThreadLocal =
new InheritableThreadLocal<>();
System.out.println("Thread name: " + Thread.currentThread().getName() + " - Set data into inheritableThreadLocal: " + "ABCXYZ");
inheritableThreadLocal.set("ABCXYZ");
System.out.println();
List<Integer> listIndex = List.of(1, 2);
listIndex
.stream()
.parallel()
.map(index -> {
System.out.println("Thread name: " + Thread.currentThread().getName() + " - Get data from inheritableThreadLocal : " + inheritableThreadLocal.get());
return inheritableThreadLocal.get();
})
.collect(Collectors.toList());
}
When tested I have strange behavior when running this example in different Java versions.
Java 11.0.12:
Thread name: main - Set data into inheritableThreadLocal: ABCXYZ
Thread name: main - Get data from inheritableThreadLocal : ABCXYZ
Thread name: ForkJoinPool.commonPool-worker-3 - Get data from inheritableThreadLocal : ABCXYZ
Java 17.0.1:
Thread name: main - Set data into inheritableThreadLocal: ABCXYZ
Thread name: ForkJoinPool.commonPool-worker-1 - Get data from inheritableThreadLocal : null
Thread name: main - Get data from inheritableThreadLocal : ABCXYZ
I do not know why have different output on different Java version like this (The thread on the pool can not find any data from inheritableThreadLocal area).
Could anyone give me the answer?
I am trying to simulate the delay while emitting items in a specific sequence
Here I am trying to simulate the problem
List<Integer> integers = new ArrayList<>();
integers.add(1);
integers.add(2);
integers.add(3);
integers.add(4);
Disposable d = Observable
.just(integers)
.flatMap(integers1 -> {
return Observable
.zip(Observable.just(1L).concatWith(Observable.interval(10, 5, TimeUnit.SECONDS)),
Observable.fromIterable(integers1), (aLong, integer1) -> {
return new Pair<Long, Integer>(aLong, integer1);
})
.flatMap(longIntegerPair -> {
System.out.println("DATA " + longIntegerPair.getValue());
return Observable.just(longIntegerPair.getValue());
})
.toList()
.toObservable();
})
.subscribe(integers1 -> {
System.out.println("END");
}, throwable -> {
System.out.println("Error " + throwable.getMessage());
});
The output of the above code is
DATA 1
wait for 10 seconds
DATA 2
wait for 5
DATA 3
wait for 5
DATA 4
wait for 5 min
what I am expecting is to perform an operation once 10 seconds or 5 seconds delay is over at each stage but sure where do I inject that part in current flow.
DATA 1
wait for 10 seconds
[perform operation]
DATA 2
wait for 5
[perform operation]
DATA 3
wait for 5
[perform operation]
DATA 4
wait for 5 min
[perform operation]
Use concatMap to make the sequence orderly and use delay to delay the processing of a data to get your printout pattern:
Observable
.zip(Observable.just(-1L).concatWith(Observable.interval(10, 5, TimeUnit.SECONDS)),
Observable.range(1, 5),
(aLong, integer1) -> {
return new Pair<Long, Integer>(aLong, integer1);
}
)
.concatMap(longIntegerPair -> {
System.out.println("DATA " + longIntegerPair.getValue());
return Observable.just(longIntegerPair.getValue())
.delay(longIntegerPair.getKey() < 0 ? 10 : 5, TimeUnit.SECONDS)
.flatMap(value -> {
System.out.println("[performing operation] " + value);
return Observable.just(value);
});
})
.toList()
.toObservable()
.blockingSubscribe();
I'm writing a simple orchestration framework using reactor framework which executes tasks sequentially, and the next task to execute is dependent on the result from previous tasks. I might have multiple paths to choose from based on the outcome of previous tasks. Earlier, I wrote a similar framework based on a static DAG where I passed as list of tasks as iterables and used Flux.fromIterable(taskList). However, this does not give me the flexibility to go dynamic because of the static array publisher.
I'm looking for alternate approaches like do(){}while(condition) to solve for DAG traversal and task decision and I came up with Flux.generate(). I evaluate the next step in generate method and pass the next task downstream. The problem I'm facing now is, Flux.generate does not wait for downstream to complete, but pushes until the condition is set to invalid. And by the time task 1 gets executed, task 2 would have been pushed n times, which is not the expected behavior.
Can someone please point me towards the right direction?
Thanks.
First iteration using List of tasks (static DAG)
Flux.fromIterable(taskList)
.publishOn(this.factory.getSharedSchedulerPool())
.concatMap(
reactiveTask -> {
log.info("Running task =>{}", reactiveTask.getTaskName());
return reactiveTask
.run(ctx);
})
// Evaluates status from previous task and terminates stream or continues.
.takeWhile(context -> evaluateStatus(context))
.onErrorResume(throwable -> buildResponse(ctx, throwable))
.doOnCancel(() -> log.info("Task cancelled"))
.doOnComplete(() -> log.info("Completed flow"))
.subscribe();
Attempt to dynamic dag
Flux.generate(
(SynchronousSink<ReactiveTask<OrchestrationContext>> synchronousSink) -> {
ReactiveTask<OrchestrationContext> task = null;
if (ctx.getLastExecutedStep() == null) {
// first task;
task = getFirstTaskFromDAG();
} else {
task = deriveNextStep(ctx.getLastExecutedStep(), ctx.getDecisionData());
}
if (task.getName.equals("END")) {
synchronousSink.complete();
}
synchronousSink.next(task);
})
.publishOn(this.factory.getSharedSchedulerPool())
.doOnNext(orchestrationContextReactiveTask -> log.info("On next => {}",
orchestrationContextReactiveTask.getTaskName()))
.concatMap(
reactiveTask -> {
log.info("Running task =>{}", reactiveTask.getTaskName());
return reactiveTask
.run(ctx);
})
.onErrorResume(throwable -> buildResponse(ctx, throwable))
.takeUntil(context -> evaluateStatus(context, tasks))
.doOnCancel(() -> log.info("Task cancelled"))
.doOnComplete(() -> log.info("Completed flow")).subscribe();
The problem in above approach is, while task 1 is executing, the onNext() subscriber prints many time because generate is publishing. I want the generate method to wait on results from previous task and submit new task. In non-reactive world, this can be achieve through simple while() loop.
Each Task will perform the following action.
public class ResponseTask extends AbstractBaseTask {
private TaskDefinition taskDefinition;
final String taskName;
public ResponseTask(
StateManager stateManager,
ThreadFactory factory,
) {
this.taskDefinition = taskDefinition;
this.taskName = taskName;
}
public Mono<String> transform(OrchestrationContext context) {
Any masterPayload = Any.wrap(context.getIngestionPayload());
return Mono.fromCallable(() -> stateManager.doTransformation(context, masterPayload);
}
public Mono<OrchestrationContext> execute(OrchestrationContext context, String payload) {
log.info("Executing sleep for task=>{}", context.getLastExecutedStep());
return Mono.delay(Duration.ofSeconds(1), factory.getSharedSchedulerPool())
.then(Mono.just(context));
}
public Mono<OrchestrationContext> run(OrchestrationContext context) {
log.info("Executing task:{}. Last executed:{}", taskName, context.getLastExecutedStep());
return transform(context)
.doOnNext((result) -> log.info("Transformation complete for task=?{}", taskName);)
.flatMap(payload -> {
return execute(context, payload);
}).onErrorResume(throwable -> {
context.setStatus(FAILED);
return Mono.just(context);
});
}
}
EDIT - From #Ikatiforis 's recommendation - I got the following output
Here's the output from my side.
2021-12-02 09:58:14,643 INFO (reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$5:98] On next => Task1
2021-12-02 09:58:14,644 INFO (reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$6:101] Running task =>Task1
2021-12-02 09:58:14,644 INFO (reactive_shared_pool) [AbstractBaseTask run:75] Executing task:Task1. Last executed:Task1
2021-12-02 09:58:14,658 INFO (reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$5:98] On next => Task2
2021-12-02 09:58:14,659 INFO (reactive_shared_pool) [AbstractBaseTask lambda$run$0:83] Transformation complete for task=?Task1
2021-12-02 09:58:14,659 INFO (reactive_shared_pool) [ResponseTask execute:41] Executing sleep for task=>Task1
2021-12-02 09:58:15,661 INFO (reactive_shared_pool) [AbstractBaseTask lambda$run$4:106] Success for task=>Task1
2021-12-02 09:58:15,663 INFO (reactive_shared_pool)
[ReactiveEngine lambda$doOrchestration$6:101] Running task =>Task2
2021-12-02 09:58:15,811 INFO (cassandra-nio-worker-8) [AbstractBaseTask run:75] Executing task:Task2. Last executed:Task2
2021-12-02 09:58:15,811 INFO (reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$5:98] On next => Task2
2021-12-02 09:58:15,812 INFO (reactive_shared_pool) [AbstractBaseTask lambda$run$0:83] Transformation complete for task=?Task2
2021-12-02 09:58:15,812 INFO (reactive_shared_pool) [ResponseTask execute:41] Executing sleep for task=>Task2
2021-12-02 09:58:15,837 INFO (centaurus_reactive_shared_pool) [ReactiveEngine lambda$doOrchestration$9:113] Completed flow
I see couple of problems here --
The sequence of execution is
1. Task does transformations ( runs on Mono.fromCallable)
2. Task induces a delay - Mono.fromDelay()
3. Task completes execution. After this, generate method should evaluate the context and pass on the next task to be executed.
What I see from the output is:
1. Task 1 starts the transformations - Runs on Mono.fromCallable.
2. Task 2 doOnNext is reported - which means the stream already got this task.
3. Task 1 completes.
4. Task 2 starts and executes delay -> the stream does not wait for response from task 2 but completes the flow.
The problem in above approach is, while task 1 is executing, the
onNext() subscriber prints many time because generate is publishing.
This is happening because concatMap requests a number of items upfront(32 by default) instead of requesting elements one by one. If you really need to request one element at the time you can use concatMap(Function<? super T,? extends Publisher<? extends V>> mapper,int prefetch) variant method and provide the prefetch value like this:
.concatMap(reactiveTask -> {
log.info("Running task =>{}", reactiveTask.getTaskName());
return reactiveTask.run(ctx);
}, 1)
Edit
There is also a publishOn method which takes a prefetch value. Take a look at the following Fibonacci generator sample and let me know if it works as you expect:
generateFibonacci(100)
.publishOn(boundedElasticScheduler, 1)
.doOnNext(number -> log.info("On next => {}", number))
.concatMap(number -> {
log.info("Running task => {}", number);
return task(number).doOnNext(num -> log.info("Task completed => {}", num));
}, 1)
.takeWhile(context -> context < 3)
.subscribe();
public Flux<Integer> generateFibonacci(int limit) {
return Flux.generate(
() -> new FibonacciState(0, 1),
(state, sink) -> {
log.info("Generating number: " + state);
sink.next(state.getFormer());
if (state.getLatter() > limit) {
sink.complete();
}
int temp = state.getFormer();
state.setFormer(state.getLatter());
state.setLatter(temp + state.getLatter());
return state;
});
}
Here is the output:
2021-12-02 10:47:51,990 INFO main c.u.p.p.s.c.Test - Generating number: FibonacciState(former=0, latter=1)
2021-12-02 10:47:51,993 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 0
2021-12-02 10:47:51,996 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 0
2021-12-02 10:47:54,035 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 0
2021-12-02 10:47:54,035 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Generating number: FibonacciState(former=1, latter=1)
2021-12-02 10:47:54,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 1
2021-12-02 10:47:54,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 1
2021-12-02 10:47:56,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 1
2021-12-02 10:47:56,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Generating number: FibonacciState(former=1, latter=2)
2021-12-02 10:47:56,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 1
2021-12-02 10:47:56,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 1
2021-12-02 10:47:58,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 1
2021-12-02 10:47:58,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Generating number: FibonacciState(former=2, latter=3)
2021-12-02 10:47:58,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 2
2021-12-02 10:47:58,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 2
2021-12-02 10:48:00,036 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 2
2021-12-02 10:48:00,037 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Generating number: FibonacciState(former=3, latter=5)
2021-12-02 10:48:00,037 INFO pool-1-thread-1 c.u.p.p.s.c.Test - On next => 3
2021-12-02 10:48:00,037 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Running task => 3
2021-12-02 10:48:02,037 INFO pool-1-thread-1 c.u.p.p.s.c.Test - Task completed => 3
2021-12-02 10:52:07,877 INFO pool-1-thread-2 c.u.p.p.s.c.Test - Completed flow
Edit 04122021
You stated:
I'm trying to simulate HTTP / blocking calls. Hence the Mono.delay.
Mono#Delay is not the appropriate method to simulate a blocking call. The delay is introduced through the parallel scheduler and as a result, it does not wait for the task to complete. You can simulate a blocking call like this:
public String get() throws IOException {
HttpsURLConnection connection = (HttpsURLConnection) new URL("https://jsonplaceholder.typicode.com/comments").openConnection();
connection.setRequestMethod("GET");
try(InputStream inputStream = connection.getInputStream()) {
return new String(inputStream.readAllBytes(), StandardCharsets.UTF_8);
}
}
Note that as an alternative you could use .limitRate(1) operator instead of the prefetch parameter.
I am trying to understand CompletableFutures and chaining of calls that that return completed futures and I have created the bellow example which kind of simulates two calls to a database.
The first method is supposed to be giving a completable future with a list of userIds and then I need to make a call to another method providing a userId to get the user (a string in this case).
to summarise:
1. fetch the ids
2. fetch a list of the users coresponding to those ids.
I created simple methods to simulate the responses with sleap threads.
Please check the code bellow
public class PipelineOfTasksExample {
private Map<Long, String> db = new HashMap<>();
PipelineOfTasksExample() {
db.put(1L, "user1");
db.put(2L, "user2");
db.put(3L, "user3");
db.put(4L, "user4");
}
private CompletableFuture<List<Long>> returnUserIdsFromDb() {
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("building the list of Ids" + " - thread: " + Thread.currentThread().getName());
return CompletableFuture.supplyAsync(() -> Arrays.asList(1L, 2L, 3L, 4L));
}
private CompletableFuture<String> fetchById(Long id) {
CompletableFuture<String> cfId = CompletableFuture.supplyAsync(() -> db.get(id));
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("fetching id: " + id + " -> " + db.get(id) + " thread: " + Thread.currentThread().getName());
return cfId;
}
public static void main(String[] args) {
PipelineOfTasksExample example = new PipelineOfTasksExample();
CompletableFuture<List<String>> result = example.returnUserIdsFromDb()
.thenCompose(listOfIds ->
CompletableFuture.supplyAsync(
() -> listOfIds.parallelStream()
.map(id -> example.fetchById(id).join())
.collect(Collectors.toList()
)
)
);
System.out.println(result.join());
}
}
My question is, does the join call (example.fetchById(id).join()) ruins the nonblocking nature of process. If the answer is positive how can I solve this problem?
Thank you in advance
Your example is a bit odd as you are slowing down the main thread in returnUserIdsFromDb(), before any operation even starts and likewise, fetchById slows down the caller rather than the asynchronous operation, which defeats the entire purpose of asynchronous operations.
Further, instead of .thenCompose(listOfIds -> CompletableFuture.supplyAsync(() -> …)) you can simply use .thenApplyAsync(listOfIds -> …).
So a better example might be
public class PipelineOfTasksExample {
private final Map<Long, String> db = LongStream.rangeClosed(1, 4).boxed()
.collect(Collectors.toMap(id -> id, id -> "user"+id));
PipelineOfTasksExample() {}
private static <T> T slowDown(String op, T result) {
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(500));
System.out.println(op + " -> " + result + " thread: "
+ Thread.currentThread().getName()+ ", "
+ POOL.getPoolSize() + " threads");
return result;
}
private CompletableFuture<List<Long>> returnUserIdsFromDb() {
System.out.println("trigger building the list of Ids - thread: "
+ Thread.currentThread().getName());
return CompletableFuture.supplyAsync(
() -> slowDown("building the list of Ids", Arrays.asList(1L, 2L, 3L, 4L)),
POOL);
}
private CompletableFuture<String> fetchById(Long id) {
System.out.println("trigger fetching id: " + id + " thread: "
+ Thread.currentThread().getName());
return CompletableFuture.supplyAsync(
() -> slowDown("fetching id: " + id , db.get(id)), POOL);
}
static ForkJoinPool POOL = new ForkJoinPool(2);
public static void main(String[] args) {
PipelineOfTasksExample example = new PipelineOfTasksExample();
CompletableFuture<List<String>> result = example.returnUserIdsFromDb()
.thenApplyAsync(listOfIds ->
listOfIds.parallelStream()
.map(id -> example.fetchById(id).join())
.collect(Collectors.toList()
),
POOL
);
System.out.println(result.join());
}
}
which prints something like
trigger building the list of Ids - thread: main
building the list of Ids -> [1, 2, 3, 4] thread: ForkJoinPool-1-worker-1, 1 threads
trigger fetching id: 2 thread: ForkJoinPool-1-worker-0
trigger fetching id: 3 thread: ForkJoinPool-1-worker-1
trigger fetching id: 4 thread: ForkJoinPool-1-worker-2
fetching id: 4 -> user4 thread: ForkJoinPool-1-worker-3, 4 threads
fetching id: 2 -> user2 thread: ForkJoinPool-1-worker-3, 4 threads
fetching id: 3 -> user3 thread: ForkJoinPool-1-worker-2, 4 threads
trigger fetching id: 1 thread: ForkJoinPool-1-worker-3
fetching id: 1 -> user1 thread: ForkJoinPool-1-worker-2, 4 threads
[user1, user2, user3, user4]
which might be a surprising number of threads on the first glance.
The answer is that join() may block the thread, but if this happens inside a worker thread of a Fork/Join pool, this situation will be detected and a new compensation thread will be started, to ensure the configured target parallelism.
As a special case, when the default Fork/Join pool is used, the implementation may pick up new pending tasks within the join() method, to ensure progress within the same thread.
So the code will always make progress and there’s nothing wrong with calling join() occasionally, if the alternatives are much more complicated, but there’s some danger of too much resource consumption, if used excessively. After all, the reason to use thread pools, is to limit the number of threads.
The alternative is to use chained dependent operations where possible.
public class PipelineOfTasksExample {
private final Map<Long, String> db = LongStream.rangeClosed(1, 4).boxed()
.collect(Collectors.toMap(id -> id, id -> "user"+id));
PipelineOfTasksExample() {}
private static <T> T slowDown(String op, T result) {
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(500));
System.out.println(op + " -> " + result + " thread: "
+ Thread.currentThread().getName()+ ", "
+ POOL.getPoolSize() + " threads");
return result;
}
private CompletableFuture<List<Long>> returnUserIdsFromDb() {
System.out.println("trigger building the list of Ids - thread: "
+ Thread.currentThread().getName());
return CompletableFuture.supplyAsync(
() -> slowDown("building the list of Ids", Arrays.asList(1L, 2L, 3L, 4L)),
POOL);
}
private CompletableFuture<String> fetchById(Long id) {
System.out.println("trigger fetching id: " + id + " thread: "
+ Thread.currentThread().getName());
return CompletableFuture.supplyAsync(
() -> slowDown("fetching id: " + id , db.get(id)), POOL);
}
static ForkJoinPool POOL = new ForkJoinPool(2);
public static void main(String[] args) {
PipelineOfTasksExample example = new PipelineOfTasksExample();
CompletableFuture<List<String>> result = example.returnUserIdsFromDb()
.thenComposeAsync(listOfIds -> {
List<CompletableFuture<String>> jobs = listOfIds.parallelStream()
.map(id -> example.fetchById(id))
.collect(Collectors.toList());
return CompletableFuture.allOf(jobs.toArray(new CompletableFuture<?>[0]))
.thenApply(_void -> jobs.stream()
.map(CompletableFuture::join).collect(Collectors.toList()));
},
POOL
);
System.out.println(result.join());
System.out.println(ForkJoinPool.commonPool().getPoolSize());
}
}
The difference is that first, all asynchronous jobs are submitted, then, a dependent action calling join on them is scheduled, to be executed only when all jobs have completed, so these join invocations will never block. Only the final join call at the end of the main method may block the main thread.
So this prints something like
trigger building the list of Ids - thread: main
building the list of Ids -> [1, 2, 3, 4] thread: ForkJoinPool-1-worker-1, 1 threads
trigger fetching id: 3 thread: ForkJoinPool-1-worker-1
trigger fetching id: 2 thread: ForkJoinPool-1-worker-0
trigger fetching id: 4 thread: ForkJoinPool-1-worker-1
trigger fetching id: 1 thread: ForkJoinPool-1-worker-0
fetching id: 4 -> user4 thread: ForkJoinPool-1-worker-1, 2 threads
fetching id: 3 -> user3 thread: ForkJoinPool-1-worker-0, 2 threads
fetching id: 2 -> user2 thread: ForkJoinPool-1-worker-1, 2 threads
fetching id: 1 -> user1 thread: ForkJoinPool-1-worker-0, 2 threads
[user1, user2, user3, user4]
showing that no compensation threads had to be created, so the number of threads matches the configured target parallelism.
Note that if the actual work is done in a background thread rather than within the fetchById method itself, you now don’t need a parallel stream anymore, as there is no blocking join() call. For such scenarios, just using stream() will usually result in a higher performance.
Need help making an observable start on the main thread, and then move on to a pool of threads allowing the source to continue emitting new items (regardless if they are still being processed in the pool of threads).
This is my example:
public static void main(String[] args) {
Observable<Integer> source = Observable.range(1,10);
source.map(i -> sleep(i, 10))
.doOnNext(i -> System.out.println("Emitting " + i + " on thread " + Thread.currentThread().getName()))
.observeOn(Schedulers.computation())
.map(i -> sleep(i * 10, 300))
.subscribe( i -> System.out.println("Received " + i + " on thread " + Thread.currentThread().getName()));
sleep(-1, 30000);
}
private static int sleep(int i, int time) {
try {
Thread.sleep(time);
} catch (InterruptedException e) {
e.printStackTrace();
}
return i;
}
which always prints:
Emitting 1 on thread main
Emitting 2 on thread main
Emitting 3 on thread main
Received 10 on thread RxComputationScheduler-1
Emitting 4 on thread main
Emitting 5 on thread main
Emitting 6 on thread main
Received 20 on thread RxComputationScheduler-1
Emitting 7 on thread main
Emitting 8 on thread main
Emitting 9 on thread main
Received 30 on thread RxComputationScheduler-1
Emitting 10 on thread main
Received 40 on thread RxComputationScheduler-1
Received 50 on thread RxComputationScheduler-1
Received 60 on thread RxComputationScheduler-1
Received 70 on thread RxComputationScheduler-1
Received 80 on thread RxComputationScheduler-1
Received 90 on thread RxComputationScheduler-1
Received 100 on thread RxComputationScheduler-1
Although items are emitted on the main thread as supposed, I want them to move on to the computation/IO thread-pool afterwards.
Should be something like this:
I don't think you were slowing down the source emissions enough, and they were emitting so quickly that all items were emitted before the observeOn() had a chance to schedule them.
Try sleeping to 500ms instead of 10ms. You will then see interleaving like you would expect.
public class JavaLauncher {
public static void main(String[] args) {
Observable<Integer> source = Observable.range(1,10);
source.map(i -> sleep(i, 500))
.doOnNext(i -> System.out.println("Emitting " + i + " on thread " + Thread.currentThread().getName()))
.observeOn(Schedulers.computation())
.map(i -> sleep(i * 10, 250))
.subscribe( i -> System.out.println("Received " + i + " on thread " + Thread.currentThread().getName()));
sleep(-1, 30000);
}
private static int sleep(int i, int time) {
try {
Thread.sleep(time);
} catch (InterruptedException e) {
e.printStackTrace();
}
return i;
}
}
OUTPUT
Emitting 1 on thread main
Emitting 2 on thread main
Emitting 3 on thread main
Received 10 on thread RxComputationThreadPool-3
Emitting 4 on thread main
Received 20 on thread RxComputationThreadPool-3
Emitting 5 on thread main
Emitting 6 on thread main
Received 30 on thread RxComputationThreadPool-3
Emitting 7 on thread main
Emitting 8 on thread main
Received 40 on thread RxComputationThreadPool-3
Emitting 9 on thread main
Emitting 10 on thread main
Received 50 on thread RxComputationThreadPool-3
Received 60 on thread RxComputationThreadPool-3
Received 70 on thread RxComputationThreadPool-3
Received 80 on thread RxComputationThreadPool-3
Received 90 on thread RxComputationThreadPool-3
Received 100 on thread RxComputationThreadPool-3
UPDATE - Parallelized Version
public class JavaLauncher {
public static void main(String[] args) {
Observable<Integer> source = Observable.range(1,10);
source.map(i -> sleep(i, 250))
.doOnNext(i -> System.out.println("Emitting " + i + " on thread " + Thread.currentThread().getName()))
.flatMap(i ->
Observable.just(i)
.subscribeOn(Schedulers.computation())
.map(i2 -> sleep(i2 * 10, 500))
)
.subscribe( i -> System.out.println("Received " + i + " on thread " + Thread.currentThread().getName()));
sleep(-1, 30000);
}
private static int sleep(int i, int time) {
try {
Thread.sleep(time);
} catch (InterruptedException e) {
e.printStackTrace();
}
return i;
}
}
OUTPUT
Emitting 1 on thread main
Emitting 2 on thread main
Emitting 3 on thread main
Received 10 on thread RxComputationThreadPool-3
Emitting 4 on thread main
Received 20 on thread RxComputationThreadPool-4
Received 30 on thread RxComputationThreadPool-1
Emitting 5 on thread main
Received 40 on thread RxComputationThreadPool-2
Emitting 6 on thread main
Received 50 on thread RxComputationThreadPool-3
Emitting 7 on thread main
Received 60 on thread RxComputationThreadPool-4
Emitting 8 on thread main
Received 70 on thread RxComputationThreadPool-1
Emitting 9 on thread main
Received 80 on thread RxComputationThreadPool-2
Emitting 10 on thread main
Received 90 on thread RxComputationThreadPool-3
Received 100 on thread RxComputationThreadPool-4