I have a long running task (say an Observable<Integer>) that I want to trigger as few times in my application as possible. I have multiple "views" on the task that process the events that it sends in various ways. I only have one subscribe in my entire application.
How do I ensure that the long running task is only triggered once for each subscription, and is only triggered when required by a subscription?
To make things more concrete, here is a unit-test:
#Test
public void testSubscriptionCount() {
final Counter counter = new Counter();
// Some long running tasks that should be triggered once per subscribe
final Observable<Integer> a = Observable.just(1, 2, 3, 4, 5)
.doOnSubscribe(subscription -> {
counter.increment();
});
// Some "view" on the long running task
final Observable<Integer> b = a.filter(x -> x % 2 == 0);
// Another "view" on the long running task
final Observable<Integer> c = a.filter(x -> x % 2 == 1);
// A view on the views
final Observable<Integer> d = Observable.zip(b, c, (x, y) -> x + y);
d.toList().blockingGet();
assertEquals(1, counter.count); // Fails, counter.count == 2
}
I would like a to only be triggered when one of its views (b, c or d) is subscribed to, but also only once per subscription.
In the code above, the subscription happens twice (I presume that d triggers b and c, which both trigger a independently).
Adding .share() does not solve the problem (although I think it is along the right lines):
// Some long running tasks that should be triggered once per subscribe
final Observable<Integer> a = Observable.just(1, 2, 3, 4, 5)
.doOnSubscribe(subscription -> counter.increment())
.share();
java.lang.AssertionError:
Expected :1
Actual :2
If your goal is to prevent multiple executions when the observer is subscribed in parallel, .share() is what you are looking for:
Observable<Integer> shared = source.share();
// In thread 1:
shared.subscribe(...);
// In thread 2:
shared.subscribe(...);
So long as the source observable has not yet completed when second subscription happens, it will receive the same results as the first, and will not force another execution of the source observable.
The RxJava documentation has much more detailed explanation, but it's basically a wrapper that has some reference counting and only subscribes to the source observable when necessary to avoid concurrent executions.
Also keep in mind that timing will play an important part in which values are actually delivered. I don't believe .share() will do any specific buffering of elements, so if elements are delivered prior to the second subscription the second subscription will not get those elements. You'd have to use .buffer() or some other means of holding onto results for late subscribers.
Related
I'd like to parallel into 3 threads. Then I read the output, if there's returning output more than 99, stop the two other threads. Then main thread will give an output as "99+". Otherwise if not reach 99, store it as is (integer value) then wait until other threads end with giving another value, then accumulate it. In short, accumulate value from all of those threads. If more than 99, give it as "99+" then stop unfinished thread. This is how I implemented it:
RequestDTO request; //this is http request data
ExecutorService executor = Executors.newFixedThreadPool(3);
//for flagging purpose, for counting how many sub threads end
//but I can't reference it directly just like I did to DTOResponse totalAll;
short asyncFlag = 0;
Cancellable
cancellableThreads1,
cancellableThreads2,
cancellableThreads3;
DTOResponse totalAll = new DTOResponse(); totalAll.total = 0;
LOGGER.info("start threads 1");
cancellableThreads1 =
Uni.createFrom().item(asyncFlag)
.runSubscriptionOn(executor).subscribe().with(consumer ->
{//it runs on new thread
Response response = method1(request).await().indefinitely();
LOGGER.info("got uniMethod1!");
DTOResponse totalTodo = response.readEntity(DTOResponse.class);
Integer total =(Integer) totalTodo.total;
totalAll.total = (Integer) totalAll.total + total;
LOGGER.info("total thread1 done: "+total);
if ((Integer) totalAll.total > 99){
totalAll.total = "99+";
}
//as I mentioned on comments above, I can't refer asyncFlag directly, so I put those as .item() parameter
//then I just refer it as consumer, but no matter how many consumer increase, it not change the asyncFlag on main thread
consumer++;
});
LOGGER.info("thread 1 already running asynchronus");
LOGGER.info("start threads 2");
cancellableThreads2 =
Uni.createFrom().item(asyncFlag)
.runSubscriptionOn(executor).subscribe().with(consumer ->
{//it runs on new thread
Response response = method2(request).await().indefinitely();
LOGGER.info("got uniMethod2!");
DTOResponse totalTodo = response.readEntity(DTOResponse.class);
Integer total =(Integer) totalTodo.total;
totalAll.total = (Integer) totalAll.total + total;
LOGGER.info("total thread2 done: "+total);
if ((Integer) totalAll.total > 99){
totalAll.total = "99+";
}
//as I mentioned on comments above, I can't refer asyncFlag directly, so I put those as .item() parameter
//then I just refer it as consumer, but no matter how many consumer increase, it not change the asyncFlag on main thread
consumer++;
});
LOGGER.info("thread 2 already running asynchronus");
LOGGER.info("start threads 3");
cancellableThreads2 =
Uni.createFrom().item(asyncFlag)
.runSubscriptionOn(executor).subscribe().with(consumer ->
{//it runs on new thread
Response response = method3(request).await().indefinitely();
LOGGER.info("got uniMethod3!");
DTOResponse totalTodo = response.readEntity(DTOResponse.class);
Integer total =(Integer) totalTodo.total;
totalAll.total = (Integer) totalAll.total + total;
LOGGER.info("total thread3 done: "+total);
if ((Integer) totalAll.total > 99){
totalAll.total = "99+";
}
//as I mentioned on comments above, I can't refer asyncFlag directly, so I put those as .item() parameter
//then I just refer it as consumer, but no matter how many consumer increase, it not change the asyncFlag on main thread
consumer++;
});
LOGGER.info("thread 3 already running asynchronus");
do{
//executed by main threads.
//I wanted to block in here until those condition is met
//actually is not blocking thread but forever loop instead
if(totalAll.total instanceof String || asyncFlag >=3){
cancellableThreads1.cancel();
cancellableThreads2.cancel();
cancellableThreads3.cancel();
}
//asyncFlag isn't increase even all of 3 threads has execute consumer++
}while(totalAll.total instanceof Integer && asyncFlag <3);
ResponseBuilder responseBuilder = Response.ok().entity(totalAll);
return Uni.createFrom().item("").onItem().transform(s->responseBuilder.build());
totalAll is able to be accessed by those subthreads, but not asyncFlag. my editor gave me red line with Local variable asyncFlag defined in an enclosing scope must be final or effectively finalJava(536871575) if asyncFlag written inside subthreads block. So I use consumer but it doesn't affected. Making loop is never ending unless total value turned into String (first condition)
You are better switching gears to use a reactive(-native) approach to your problem.
Instead of subscribing to each Uni then collecting their results individually in an imperative approach monitoring their progress, here down the series of steps that you should rather use in a rxified way:
Create all your Uni request-representing objects with whatever concurrency construct you would like: Uni#emitOn
Combine all your requests Unis into a Multi merging all of your initial requests executing them concurrently (not in an ordered fashion): MultiCreatedBy#merging
Scan the Multi emitted items, which are your requests results, as they come adding each item to an initial seed: MultiOnItem#scan
Keep on skipping the items sum until you first see a value exceeding a threshold (99 in your case) in which case you let the result flow through your stream pipeline: MultiSkip#first (not that the skip stage will automatically cancel upstream requests hence stop any useless request processing already inflight)
In case no item has been emitted downstream, meaning that the requests sum has not exceeded the , you sum up the initial Uni results (which are cached to avoid re-triggering the requests): UniOnNull#ifNull
Here down a pseudo implementation of the described stages:
public Uni<Response> request() {
RequestDTO request; //this is http request data
Uni<Object> requestOne = method1(request)
.emitOn(executor)
.map(response -> response.readEntity(DTOResponse.class))
.map(dtoResponse -> dtoResponse.total)
.memoize()
.atLeast(Duration.ofSeconds(3));
Uni<Object> requestTwo = method2(request)
.emitOn(executor)
.map(response -> response.readEntity(DTOResponse.class))
.map(dtoResponse -> dtoResponse.total)
.memoize()
.atLeast(Duration.ofSeconds(3));
Uni<Object> requestThree = method3(request)
.emitOn(executor)
.map(response -> response.readEntity(DTOResponse.class))
.map(dtoResponse -> dtoResponse.total)
.memoize()
.atLeast(Duration.ofSeconds(3));
return Multi.createBy()
.merging()
.withConcurrency(1)
.streams(requestOne.toMulti(), requestTwo.toMulti(), requestThree.toMulti())
.onItem()
.scan(() -> 0, (result, itemTotal) -> result + (Integer) itemTotal)
.skip()
.first(total -> total < 99)
.<Object>map(ignored -> "99+")
.toUni()
.onItem()
.ifNull()
.switchTo(
Uni.combine()
.all()
.unis(requestOne, requestTwo, requestThree)
.combinedWith((one, two, three) -> (Integer) one + (Integer) two + (Integer) three)
)
.map(result -> Response.ok().entity(result).build());
}
I have a Flink streaming program that have branch processing logic after a long transformation logic. Will the long transformation logic be executed multiple times? Pseudo code:
env = getEnvironment();
DataStream<Event> inputStream = getInputStream();
tempStream = inputStream.map(very_heavy_computation_func)
output1 = tempStream.map(func1);
output1.addSink(sink1);
output2 = tempStream.map(func2);
output2.addSink(sink2);
env.execute();
Questions:
How many times would inputStream.map(very_heavy_computation_func) be executed?
Once or twice?
If twice, how can I cache tempStream (or other method) to avoid the previous transformation being executed multiple times?
You can actually answer (1) easily by just trying out more or less exactly your example:
public class TestProgram {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
SingleOutputStreamOperator<Integer> stream = env.fromElements(1, 2, 3)
.map(i -> {
System.out.println("Executed expensive computation for: " + i);
return i;
});
stream.map(i -> i).addSink(new PrintSinkFunction<>());
stream.map(i -> i).addSink(new PrintSinkFunction<>());
env.execute();
}
}
produces (on my machine, for example):
Executed expensive computation for: 3
Executed expensive computation for: 1
Executed expensive computation for: 2
9> 3
8> 2
8> 2
9> 3
7> 1
7> 1
You can also find a more technical answer here which explains how records are replicated to downstream operators, rather than running the source/operator multiple times.
For the following program I am trying to figure out why using 2 different streams parallelizes the task and using the same stream and calling join/get on the Completable future makes them take longer time equivalent to as if they were sequentially processed).
public class HelloConcurrency {
private static Integer sleepTask(int number) {
System.out.println(String.format("Task with sleep time %d", number));
try {
TimeUnit.SECONDS.sleep(number);
} catch (InterruptedException e) {
e.printStackTrace();
return -1;
}
return number;
}
public static void main(String[] args) {
List<Integer> sleepTimes = Arrays.asList(1,2,3,4,5,6);
System.out.println("WITH SEPARATE STREAMS FOR FUTURE AND JOIN");
ExecutorService executorService = Executors.newFixedThreadPool(6);
long start = System.currentTimeMillis();
List<CompletableFuture<Integer>> futures = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.collect(Collectors.toList());
executorService.shutdown();
List<Integer> result = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
long finish = System.currentTimeMillis();
long timeElapsed = (finish - start)/1000;
System.out.println(String.format("done in %d seconds.", timeElapsed));
System.out.println(result);
System.out.println("WITH SAME STREAM FOR FUTURE AND JOIN");
ExecutorService executorService2 = Executors.newFixedThreadPool(6);
start = System.currentTimeMillis();
List<Integer> results = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.map(CompletableFuture::join)
.collect(Collectors.toList());
executorService2.shutdown();
finish = System.currentTimeMillis();
timeElapsed = (finish - start)/1000;
System.out.println(String.format("done in %d seconds.", timeElapsed));
System.out.println(results);
}
}
Output
WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 6
Task with sleep time 5
Task with sleep time 1
Task with sleep time 3
Task with sleep time 2
Task with sleep time 4
done in 6 seconds.
[1, 2, 3, 4, 5, 6]
WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
Task with sleep time 2
Task with sleep time 3
Task with sleep time 4
Task with sleep time 5
Task with sleep time 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]
The two approaches are quite different, let me try to explain it clearly
1st approach : In the first approach you are spinning up all Async requests for all 6 tasks and then calling join function on each one of them to get the result
2st approach : But in the second approach you are calling the join immediately after spinning the Async request for each task. For example after spinning Async thread for task 1 calling join, make sure that thread to complete task and then only spin up the second task with Async thread
Note : Another side if you observe the output clearly, In the 1st approach output appears in random order since the all six tasks were executed asynchronously. But during second approach all tasks were executed sequentially one after the another.
I believe you have an idea how stream map operation is performed, or you can get more information from here or here
To perform a computation, stream operations are composed into a stream pipeline. A stream pipeline consists of a source (which might be an array, a collection, a generator function, an I/O channel, etc), zero or more intermediate operations (which transform a stream into another stream, such as filter(Predicate)), and a terminal operation (which produces a result or side-effect, such as count() or forEach(Consumer)). Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.
The stream framework does not define the order in which map operations are executed on stream elements, because it is not intended for use cases in which that might be a relevant issue. As a result, the particular way your second version is executing is equivalent, essentially, to
List<Integer> results = new ArrayList<>();
for (Integer sleepTime : sleepTimes) {
results.add(CompletableFuture
.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.join());
}
...which is itself essentially equivalent to
List<Integer> results = new ArrayList<>()
for (Integer sleepTime : sleepTimes) {
results.add(sleepTask(sleepTime));
}
#Deadpool answered it pretty well, just adding my answer which can help someone understand it better.
I was able to get an answer by adding more printing to both methods.
TLDR
2 stream approach: We are starting up all 6 tasks asynchronously and then calling join function on each one of them to get the result in a separate stream.
1 stream approach: We are calling the join immediately after starting up each task. For example after spinning a thread for task 1, calling join makes sure the thread waits for completion of task 1 and then only spin up the second task with async thread.
Note: Also, if we observe the output clearly, in the 1 stream approach, output appears sequential order since the all six tasks were executed in order. But during second approach all tasks were executed in parallel, hence the random order.
Note 2: If we replace stream() with parallelStream() in the 1 stream approach, it will work identically to 2 stream approach.
More proof
I added more printing to the streams which gave the following outputs and confirmed the note above :
1 stream:
List<Integer> results = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.map(f -> {
int num = f.join();
System.out.println(String.format("doing join on task %d", num));
return num;
})
.collect(Collectors.toList());
WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
doing join on task 1
Task with sleep time 2
doing join on task 2
Task with sleep time 3
doing join on task 3
Task with sleep time 4
doing join on task 4
Task with sleep time 5
doing join on task 5
Task with sleep time 6
doing join on task 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]
2 streams:
List<CompletableFuture<Integer>> futures = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.collect(Collectors.toList());
List<Integer> result = futures.stream()
.map(f -> {
int num = f.join();
System.out.println(String.format("doing join on task %d", num));
return num;
})
.collect(Collectors.toList());
WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 2
Task with sleep time 5
Task with sleep time 3
Task with sleep time 1
Task with sleep time 4
Task with sleep time 6
doing join on task 1
doing join on task 2
doing join on task 3
doing join on task 4
doing join on task 5
doing join on task 6
done in 6 seconds.
[1, 2, 3, 4, 5, 6]
Using RXJava 2, I'm trying to create an asynchronous Event Bus.
I have a singleton object, with a PublishSubject property. Emitters can send an event to the bus using onNext on the subject.
If subscribers have a long task to execute, I want my bus to dispatch the tasks on multiple threads to execute concurrently the tasks. Which means I want the work to start on an item immediatly after the item is emitted, even if the work on the previous item is not completed.
However, even using observeOn with a scheduler, I cannnot run my tasks concurrently.
Sample code:
public void test() throws Exception {
Subject<Integer> busSubject = PublishSubject.<Integer>create().toSerialized();
busSubject.observeOn(Schedulers.computation())
.subscribe(new LongTaskConsumer());
for (int i = 1; i < 5; i++) {
System.out.println(i + " - event");
busSubject.onNext(i);
Thread.sleep(1000);
}
Thread.sleep(1000);
}
private static class LongTaskConsumer implements Consumer<Integer> {
#Override
public void accept(Integer i) throws Exception {
System.out.println(i + " - start work");
System.out.println(i + " - computation on thread " + Thread.currentThread().getName());
Thread.sleep(2000);
System.out.println(i + " - end work");
}
}
Prints:
1 - event
1 - start work
1 - computation on thread RxComputationThreadPool-1
2 - event
3 - event
1 - end work
2 - start work
2 - computation on thread RxComputationThreadPool-1
4 - event
2 - end work
3 - start work
3 - computation on thread RxComputationThreadPool-1
3 - end work
4 - start work
4 - computation on thread RxComputationThreadPool-1
4 - end work
Which means that the work on item 2 waited for the end of work on item 1, even if the event 2 was already emitted.
When the call below happens one worker is created from Schedulers.computation() and is used for the whole stream. That's why all the of the work you submitted is done on RxComputationThreadPool-1.
busSubject.observeOn(Schedulers.computation())
.subscribe(new LongTaskConsumer());
To schedule work on multiple threads:
busSubject.flatMap(x ->
Flowable.just(x)
.subscribeOn(Schedulers.computation()
.doOnNext(somethingIntensive))
.subscribe(new LongTaskConsumer());
Note also that the intensive work is performed inside the flatMap rather than in the LongTaskConsumer because all items will arrive serially to LongTaskConsumer.
There are other approaches to doing work in parallel that you may want to investigate depending on how many events are hitting the PublishSubject.
Does Observable caches emitted items? I have two tests that lead me to different conclusions:
From the test #1 I make an conclusion that it does:
Test #1:
Observable<Long> clock = Observable
.interval(1000, TimeUnit.MILLISECONDS)
.take(10)
.map(i -> i++);
//subscribefor the first time
clock.subscribe(i -> System.out.println("a: " + i));
//subscribe with 2.5 seconds delay
Executors.newScheduledThreadPool(1).schedule(
() -> clock.subscribe(i -> System.out.println(" b: " + i)),
2500,
TimeUnit.MILLISECONDS
);
Output #1:
a: 0
a: 1
a: 2
b: 0
a: 3
b: 1
But the second test shows that we get different values for two observers:
Test #2:
Observable<Integer> observable = Observable
.range(1, 1000000)
.sample(7, TimeUnit.MILLISECONDS);
observable.subscribe(i -> System.out.println("Subscriber #1:" + i));
observable.subscribe(i -> System.out.println("Subscriber #2:" + i));
Output #2:
Subscriber #1:72745
Subscriber #1:196390
Subscriber #1:678171
Subscriber #2:336533
Subscriber #2:735521
There exist two kinds of Observables: hot and cold. Cold observables tend to generate the same sequence to its Observers unless you have external effects, such as a timer based action, associated with it.
In the first example, you get the same sequence twice because there are no external effects other than timer ticks you get one by one. In the second example, you sample a fast source and sampling with time has a non-deterministic effect: each nanosecond counts so even the slightest imprecision leads to different value reported.