Observable caches emitted items or does not? - java

Does Observable caches emitted items? I have two tests that lead me to different conclusions:
From the test #1 I make an conclusion that it does:
Test #1:
Observable<Long> clock = Observable
.interval(1000, TimeUnit.MILLISECONDS)
.take(10)
.map(i -> i++);
//subscribefor the first time
clock.subscribe(i -> System.out.println("a: " + i));
//subscribe with 2.5 seconds delay
Executors.newScheduledThreadPool(1).schedule(
() -> clock.subscribe(i -> System.out.println(" b: " + i)),
2500,
TimeUnit.MILLISECONDS
);
Output #1:
a: 0
a: 1
a: 2
b: 0
a: 3
b: 1
But the second test shows that we get different values for two observers:
Test #2:
Observable<Integer> observable = Observable
.range(1, 1000000)
.sample(7, TimeUnit.MILLISECONDS);
observable.subscribe(i -> System.out.println("Subscriber #1:" + i));
observable.subscribe(i -> System.out.println("Subscriber #2:" + i));
Output #2:
Subscriber #1:72745
Subscriber #1:196390
Subscriber #1:678171
Subscriber #2:336533
Subscriber #2:735521

There exist two kinds of Observables: hot and cold. Cold observables tend to generate the same sequence to its Observers unless you have external effects, such as a timer based action, associated with it.
In the first example, you get the same sequence twice because there are no external effects other than timer ticks you get one by one. In the second example, you sample a fast source and sampling with time has a non-deterministic effect: each nanosecond counts so even the slightest imprecision leads to different value reported.

Related

How to have a delay occur POST-emission?

I would like for every emission from my Observable to have a delay occur AFTER its emission. i.e. For every item emitted from Observable.fromIterable(listOf(1,2,3)) I would like a variable/dynamic delay AFTERWARDS.
i.e. given a list of timers or delays [3, 5, 10], I would like for the following timeline to occur with my observable:
1 -> Wait 3 seconds -> 2 -> Wait 5 seconds -> 3 -> Wait 10 seconds
Please note that this is NOT the same as:
Wait 3 seconds -> 1 -> Wait 5 seconds -> 2 -> Wait 10 seconds -> 3
which can be easily achieved with .zip + .delay or Observable.Timer
Using zip() is the right way to go. When you want the delay after the first value is emitted, then you can use startWith() to add a zero delay for the first combination. The code might look like this:
Observable<Integer> values = Observable.fromArray(1,2,3,4,5);
Observable<Long> delays = Observable.fromArray(300L, 400L, 500L, 600L);
Observable<Integer> delayedValues = values.zipWith(
delays.startWith(0L),
new BiFunction<Integer, Long, Integer>() {
#Override
public Integer apply(Integer v, Long d) throws Exception {
Thread.sleep(d);
return v;
}
}
);
System.out.println("Begin subscribing");
long startTime = System.currentTimeMillis();
delayedValues.subscribe(v -> {
long currentTime = System.currentTimeMillis();
long diff = currentTime - startTime;
System.out.println("["+diff+"] "+v);
});
System.out.println("After subscribing");
This will generate the following output:
Begin subscribing
[35] 1
[336] 2
[736] 3
[1236] 4
[1836] 5
After subscribing
As you see, the first value is printed immediately, following by a 300ms delay for the next value until you reached the end.

Flux.range waits to emit more element once 256 elements are reached

I wrote this code:
Flux.range(0, 300)
.doOnNext(i -> System.out.println("i = " + i))
.flatMap(i -> Mono.just(i)
.subscribeOn(Schedulers.elastic())
.delayElement(Duration.ofMillis(1000))
)
.doOnNext(i -> System.out.println("end " + i))
.blockLast();
When running it, the first System.out.println shows that the Flux stop emitting numbers at the 256th element, then it waits for the older to be completed before emitting new ones.
Why is this happening?
Why 256?
Why this happening?
The flatMap operator can be characterized as operator that (rephrased from javadoc):
subscribes to its inners eagerly
does not preserve ordering of elements.
lets values from different inners interleave.
For this question the first point is important. Project Reactor restricts the
number of in-flight inner sequences via concurrency parameter.
While flatMap(mapper) uses the default parameter the flatMap(mapper, concurrency) overload accepts this parameter explicitly.
The flatMaps javadoc describes the parameter as:
The concurrency argument allows to control how many Publisher can be subscribed to and merged in parallel
Consider the following code using concurrency = 500
Flux.range(0, 300)
.doOnNext(i -> System.out.println("i = " + i))
.flatMap(i -> Mono.just(i)
.subscribeOn(Schedulers.elastic())
.delayElement(Duration.ofMillis(1000)),
500
// ^^^^^^^^^^
)
.doOnNext(i -> System.out.println("end " + i))
.blockLast();
In this case there is no waiting:
i = 297
i = 298
i = 299
end 0
end 1
end 2
In contrast if you pass 1 as concurrency the output will be similar to:
i = 0
end 0
i = 1
end 1
Awaiting one second before emitting the next element.
Why 256?
256 is the default value for concurrency of flatMap.
Take a look at Queues.SMALL_BUFFER_SIZE:
public static final int SMALL_BUFFER_SIZE = Math.max(16,
Integer.parseInt(System.getProperty("reactor.bufferSize.small", "256")));

Why is CompletableFuture join/get faster in separate streams than using one stream

For the following program I am trying to figure out why using 2 different streams parallelizes the task and using the same stream and calling join/get on the Completable future makes them take longer time equivalent to as if they were sequentially processed).
public class HelloConcurrency {
private static Integer sleepTask(int number) {
System.out.println(String.format("Task with sleep time %d", number));
try {
TimeUnit.SECONDS.sleep(number);
} catch (InterruptedException e) {
e.printStackTrace();
return -1;
}
return number;
}
public static void main(String[] args) {
List<Integer> sleepTimes = Arrays.asList(1,2,3,4,5,6);
System.out.println("WITH SEPARATE STREAMS FOR FUTURE AND JOIN");
ExecutorService executorService = Executors.newFixedThreadPool(6);
long start = System.currentTimeMillis();
List<CompletableFuture<Integer>> futures = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.collect(Collectors.toList());
executorService.shutdown();
List<Integer> result = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
long finish = System.currentTimeMillis();
long timeElapsed = (finish - start)/1000;
System.out.println(String.format("done in %d seconds.", timeElapsed));
System.out.println(result);
System.out.println("WITH SAME STREAM FOR FUTURE AND JOIN");
ExecutorService executorService2 = Executors.newFixedThreadPool(6);
start = System.currentTimeMillis();
List<Integer> results = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.map(CompletableFuture::join)
.collect(Collectors.toList());
executorService2.shutdown();
finish = System.currentTimeMillis();
timeElapsed = (finish - start)/1000;
System.out.println(String.format("done in %d seconds.", timeElapsed));
System.out.println(results);
}
}
Output
WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 6
Task with sleep time 5
Task with sleep time 1
Task with sleep time 3
Task with sleep time 2
Task with sleep time 4
done in 6 seconds.
[1, 2, 3, 4, 5, 6]
WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
Task with sleep time 2
Task with sleep time 3
Task with sleep time 4
Task with sleep time 5
Task with sleep time 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]
The two approaches are quite different, let me try to explain it clearly
1st approach : In the first approach you are spinning up all Async requests for all 6 tasks and then calling join function on each one of them to get the result
2st approach : But in the second approach you are calling the join immediately after spinning the Async request for each task. For example after spinning Async thread for task 1 calling join, make sure that thread to complete task and then only spin up the second task with Async thread
Note : Another side if you observe the output clearly, In the 1st approach output appears in random order since the all six tasks were executed asynchronously. But during second approach all tasks were executed sequentially one after the another.
I believe you have an idea how stream map operation is performed, or you can get more information from here or here
To perform a computation, stream operations are composed into a stream pipeline. A stream pipeline consists of a source (which might be an array, a collection, a generator function, an I/O channel, etc), zero or more intermediate operations (which transform a stream into another stream, such as filter(Predicate)), and a terminal operation (which produces a result or side-effect, such as count() or forEach(Consumer)). Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.
The stream framework does not define the order in which map operations are executed on stream elements, because it is not intended for use cases in which that might be a relevant issue. As a result, the particular way your second version is executing is equivalent, essentially, to
List<Integer> results = new ArrayList<>();
for (Integer sleepTime : sleepTimes) {
results.add(CompletableFuture
.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.join());
}
...which is itself essentially equivalent to
List<Integer> results = new ArrayList<>()
for (Integer sleepTime : sleepTimes) {
results.add(sleepTask(sleepTime));
}
#Deadpool answered it pretty well, just adding my answer which can help someone understand it better.
I was able to get an answer by adding more printing to both methods.
TLDR
2 stream approach: We are starting up all 6 tasks asynchronously and then calling join function on each one of them to get the result in a separate stream.
1 stream approach: We are calling the join immediately after starting up each task. For example after spinning a thread for task 1, calling join makes sure the thread waits for completion of task 1 and then only spin up the second task with async thread.
Note: Also, if we observe the output clearly, in the 1 stream approach, output appears sequential order since the all six tasks were executed in order. But during second approach all tasks were executed in parallel, hence the random order.
Note 2: If we replace stream() with parallelStream() in the 1 stream approach, it will work identically to 2 stream approach.
More proof
I added more printing to the streams which gave the following outputs and confirmed the note above :
1 stream:
List<Integer> results = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.map(f -> {
int num = f.join();
System.out.println(String.format("doing join on task %d", num));
return num;
})
.collect(Collectors.toList());
WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
doing join on task 1
Task with sleep time 2
doing join on task 2
Task with sleep time 3
doing join on task 3
Task with sleep time 4
doing join on task 4
Task with sleep time 5
doing join on task 5
Task with sleep time 6
doing join on task 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]
2 streams:
List<CompletableFuture<Integer>> futures = sleepTimes.stream()
.map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
.exceptionally(ex -> { ex.printStackTrace(); return -1; }))
.collect(Collectors.toList());
List<Integer> result = futures.stream()
.map(f -> {
int num = f.join();
System.out.println(String.format("doing join on task %d", num));
return num;
})
.collect(Collectors.toList());
WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 2
Task with sleep time 5
Task with sleep time 3
Task with sleep time 1
Task with sleep time 4
Task with sleep time 6
doing join on task 1
doing join on task 2
doing join on task 3
doing join on task 4
doing join on task 5
doing join on task 6
done in 6 seconds.
[1, 2, 3, 4, 5, 6]

How to create an async event bus using RxJava 2?

Using RXJava 2, I'm trying to create an asynchronous Event Bus.
I have a singleton object, with a PublishSubject property. Emitters can send an event to the bus using onNext on the subject.
If subscribers have a long task to execute, I want my bus to dispatch the tasks on multiple threads to execute concurrently the tasks. Which means I want the work to start on an item immediatly after the item is emitted, even if the work on the previous item is not completed.
However, even using observeOn with a scheduler, I cannnot run my tasks concurrently.
Sample code:
public void test() throws Exception {
Subject<Integer> busSubject = PublishSubject.<Integer>create().toSerialized();
busSubject.observeOn(Schedulers.computation())
.subscribe(new LongTaskConsumer());
for (int i = 1; i < 5; i++) {
System.out.println(i + " - event");
busSubject.onNext(i);
Thread.sleep(1000);
}
Thread.sleep(1000);
}
private static class LongTaskConsumer implements Consumer<Integer> {
#Override
public void accept(Integer i) throws Exception {
System.out.println(i + " - start work");
System.out.println(i + " - computation on thread " + Thread.currentThread().getName());
Thread.sleep(2000);
System.out.println(i + " - end work");
}
}
Prints:
1 - event
1 - start work
1 - computation on thread RxComputationThreadPool-1
2 - event
3 - event
1 - end work
2 - start work
2 - computation on thread RxComputationThreadPool-1
4 - event
2 - end work
3 - start work
3 - computation on thread RxComputationThreadPool-1
3 - end work
4 - start work
4 - computation on thread RxComputationThreadPool-1
4 - end work
Which means that the work on item 2 waited for the end of work on item 1, even if the event 2 was already emitted.
When the call below happens one worker is created from Schedulers.computation() and is used for the whole stream. That's why all the of the work you submitted is done on RxComputationThreadPool-1.
busSubject.observeOn(Schedulers.computation())
.subscribe(new LongTaskConsumer());
To schedule work on multiple threads:
busSubject.flatMap(x ->
Flowable.just(x)
.subscribeOn(Schedulers.computation()
.doOnNext(somethingIntensive))
.subscribe(new LongTaskConsumer());
Note also that the intensive work is performed inside the flatMap rather than in the LongTaskConsumer because all items will arrive serially to LongTaskConsumer.
There are other approaches to doing work in parallel that you may want to investigate depending on how many events are hitting the PublishSubject.

How do I control the subscription count in RxJava 2?

I have a long running task (say an Observable<Integer>) that I want to trigger as few times in my application as possible. I have multiple "views" on the task that process the events that it sends in various ways. I only have one subscribe in my entire application.
How do I ensure that the long running task is only triggered once for each subscription, and is only triggered when required by a subscription?
To make things more concrete, here is a unit-test:
#Test
public void testSubscriptionCount() {
final Counter counter = new Counter();
// Some long running tasks that should be triggered once per subscribe
final Observable<Integer> a = Observable.just(1, 2, 3, 4, 5)
.doOnSubscribe(subscription -> {
counter.increment();
});
// Some "view" on the long running task
final Observable<Integer> b = a.filter(x -> x % 2 == 0);
// Another "view" on the long running task
final Observable<Integer> c = a.filter(x -> x % 2 == 1);
// A view on the views
final Observable<Integer> d = Observable.zip(b, c, (x, y) -> x + y);
d.toList().blockingGet();
assertEquals(1, counter.count); // Fails, counter.count == 2
}
I would like a to only be triggered when one of its views (b, c or d) is subscribed to, but also only once per subscription.
In the code above, the subscription happens twice (I presume that d triggers b and c, which both trigger a independently).
Adding .share() does not solve the problem (although I think it is along the right lines):
// Some long running tasks that should be triggered once per subscribe
final Observable<Integer> a = Observable.just(1, 2, 3, 4, 5)
.doOnSubscribe(subscription -> counter.increment())
.share();
java.lang.AssertionError:
Expected :1
Actual :2
If your goal is to prevent multiple executions when the observer is subscribed in parallel, .share() is what you are looking for:
Observable<Integer> shared = source.share();
// In thread 1:
shared.subscribe(...);
// In thread 2:
shared.subscribe(...);
So long as the source observable has not yet completed when second subscription happens, it will receive the same results as the first, and will not force another execution of the source observable.
The RxJava documentation has much more detailed explanation, but it's basically a wrapper that has some reference counting and only subscribes to the source observable when necessary to avoid concurrent executions.
Also keep in mind that timing will play an important part in which values are actually delivered. I don't believe .share() will do any specific buffering of elements, so if elements are delivered prior to the second subscription the second subscription will not get those elements. You'd have to use .buffer() or some other means of holding onto results for late subscribers.

Categories

Resources