When my service starts up, I want to construct a simple pipeline.
I'd like to isolate the Flux sink, or a Processor, to emit events with.
Events will be coming in from multiple threads and should be processed according to the pipeline's subscribeOn() specification, but everything seems to run on the main thread.
What is the best approach? I've attached my attempts below.
(I'm using reactor-core v3.2.8.RELEASE.)
import org.junit.jupiter.api.Test;
import reactor.core.publisher.DirectProcessor;
import reactor.core.publisher.Flux;
import reactor.core.publisher.FluxProcessor;
import reactor.core.publisher.FluxSink;
import reactor.core.scheduler.Schedulers;
/**
* I want to construct my React pipelines during creation,
* then emit events over the lifetime of my services.
*/
public class React1Test
{
/**
* Attempt 1 - use a DirectProcessor and send items to it.
* Doesn't work though - seems to always run on the main thread.
*/
#Test
public void testReact1() throws InterruptedException
{
// Create the flux and sink.
FluxProcessor<String, String> fluxProcessor = DirectProcessor.<String>create().serialize();
FluxSink<String> sink = fluxProcessor.sink();
// Create the pipeline.
fluxProcessor
.doOnNext(str -> showDebugMsg(str)) // What thread do ops work on?
.subscribeOn(Schedulers.elastic())
.subscribe(str -> showDebugMsg(str)); // What thread does subscribe run on?
// Give the multi-thread pipeline a second.
Thread.sleep(1000);
// Time passes ... things happen ...
// Pass a few messages to the sink, emulating events.
sink.next("a");
sink.next("b");
sink.next("c");
// It's multi-thread so wait a sec to receive.
Thread.sleep(1000);
}
// Used down below during Flux.create().
private FluxSink<String> sink2;
/**
* Attempt 2 - use Flux.create() and its FluxSink object.
* Also seems to always run on the main thread.
*/
#Test
public void testReact2() throws InterruptedException
{
// Create the flux and sink.
Flux.<String>create(sink -> sink2 = sink)
.doOnNext(str -> showDebugMsg(str)) // What thread do ops work on?
.subscribeOn(Schedulers.elastic())
.subscribe(str -> showDebugMsg(str)); // What thread does subscribe run on?
// Give the multi-thread pipeline a second.
Thread.sleep(1000);
// Pass a few messages to the sink.
sink2.next("a");
sink2.next("b");
sink2.next("c");
// It's multi-thread so wait a sec to receive.
Thread.sleep(1000);
}
// Show us what thread we're on.
private static void showDebugMsg(String msg)
{
System.out.println(String.format("%s [%s]", msg, Thread.currentThread().getName()));
}
}
Output is always:
a [main]
a [main]
b [main]
b [main]
c [main]
c [main]
But what I would expect, is:
a [elastic-1]
a [elastic-1]
b [elastic-2]
b [elastic-2]
c [elastic-3]
c [elastic-3]
Thanks in advance.
You see [main] because you're calling onNext from the main thread.
subscribeOn you're using is only for the subscription (when create's lambda is triggered).
You will see elastic-* threads logged if you use publishOn instead of subscribeOn.
Also, consider using Processors, storing sink obtained from Flux.create and similar operators as a field is discouraged.
You can use parallel() and runOn() instead of subscribeOn() to get sink.next() to run multi-threaded.
bsideup is also correct - you can use publishOn() to coerce downstream operators to run on one different Scheduler thread.
Here is my updated code:
import org.junit.jupiter.api.Test;
import reactor.core.publisher.DirectProcessor;
import reactor.core.publisher.Flux;
import reactor.core.publisher.FluxProcessor;
import reactor.core.publisher.FluxSink;
import reactor.core.scheduler.Schedulers;
/**
* I want to construct my React pipelines during creation,
* then emit events over the lifetime of my services.
*/
public class React1Test
{
/**
* Version 1 - use a DirectProcessor to dynamically emit items.
*/
#Test
public void testReact1() throws InterruptedException
{
// Create the flux and sink.
FluxProcessor<String, String> fluxProcessor = DirectProcessor.<String>create().serialize();
FluxSink<String> sink = fluxProcessor.sink();
// Create the pipeline.
fluxProcessor
.parallel()
.runOn(Schedulers.elastic())
.doOnNext(str -> showDebugMsg(str)) // What thread do ops work on?
.subscribe(str -> showDebugMsg(str)); // What thread does subscribe run on?
// Give the multi-thread pipeline a second.
Thread.sleep(1000);
// Time passes ... things happen ...
// Pass a few messages to the sink, emulating events.
sink.next("a");
sink.next("b");
sink.next("c");
// It's multi-thread so wait a sec to receive.
Thread.sleep(1000);
}
// Used down below during Flux.create().
private FluxSink<String> sink2;
/**
* Version 2 - use Flux.create() and its FluxSink object.
*/
#Test
public void testReact2() throws InterruptedException
{
// Create the flux and sink.
Flux.<String>create(sink -> sink2 = sink)
.parallel()
.runOn(Schedulers.elastic())
.doOnNext(str -> showDebugMsg(str)) // What thread do ops work on?
.subscribe(str -> showDebugMsg(str)); // What thread does subscribe run on?
// Give the multi-thread pipeline a second.
Thread.sleep(1000);
// Pass a few messages to the sink.
sink2.next("a");
sink2.next("b");
sink2.next("c");
// It's multi-thread so wait a sec to receive.
Thread.sleep(1000);
}
// Show us what thread we're on.
private static void showDebugMsg(String msg)
{
System.out.println(String.format("%s [%s]", msg, Thread.currentThread().getName()));
}
}
Both versions produce the desired multi-threaded output:
a [elastic-2]
b [elastic-3]
c [elastic-4]
b [elastic-3]
a [elastic-2]
c [elastic-4]
Related
2 threads are started. dataListUpdateThread adds the number 2 to a List. processFlowThread sums the values in the same List and prints the summed list to the console. Here is the code:
import akka.NotUsed;
import akka.actor.ActorSystem;
import akka.stream.javadsl.Sink;
import akka.stream.javadsl.Source;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;
import java.util.concurrent.ExecutionException;
import static java.lang.Thread.sleep;
public class SourceExample {
private final static ActorSystem system = ActorSystem.create("SourceExample");
private static void delayOneSecond() {
try {
sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static void printValue(CompletableFuture<Integer> integerCompletableFuture) {
try {
System.out.println("Sum is " + integerCompletableFuture.get().intValue());
} catch (ExecutionException | InterruptedException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
final List dataList = new ArrayList<Integer>();
final Thread dataListUpdateThread = new Thread(() -> {
while (true) {
dataList.add(2);
System.out.println(dataList);
delayOneSecond();
}
});
dataListUpdateThread.start();
final Thread processFlowThread = new Thread(() -> {
while (true) {
final Source<Integer, NotUsed> source = Source.from(dataList);
final Sink<Integer, CompletionStage<Integer>> sink =
Sink.fold(0, (agg, next) -> agg + next);
final CompletionStage<Integer> sum = source.runWith(sink, system);
printValue(sum.toCompletableFuture());
delayOneSecond();
}
});
processFlowThread.start();
}
}
I've tried to create the simplest example to frame the question. dataListUpdateThread could be populating the List from a REST service or Kafka topic instead of just adding the value 2 to the List. Instead of using Java threads how should this scenario be implemented? In other words, how to share dataList to the Akka Stream for processing?
Mutating the collection passed to Source.from is only ever going to accomplish this by coincidence: if the collection is ever exhausted, Source.from will complete the stream. This is because it's intended for finite, strictly evaluated data (the use cases are basically: a) simple examples for the docs and b) situations where you want to bound resource consumption when performing an operation for a collection in the background (think a list of URLs that you want to send HTTP requests to)).
NB: I haven't written Java to any great extent since the Java 7 days, so I'm not providing Java code, just an outline of approaches.
As mentioned in a prior answer Source.queue is probably the best option (besides using something like Akka HTTP or an Alpakka connector). In a case such as this, where the stream's materialized value is a future that won't be completed until the stream completes, that Source.queue will never complete the stream (because there's no way for it to know that its reference is the only reference), introducing a KillSwitch and propagating that through viaMat and toMat would give you the ability to decide outside of the stream to complete the stream.
An alternative to Source.queue, is Source.actorRef, which lets you send a distinguished message (akka.Done.done() in the Java API is pretty common for this). That source materializes as an ActorRef to which you can tell messages, and those messages (at least those which match the type of the stream) will be available for the stream to consume.
With both Source.queue and Source.actorRef, it's often useful to prematerialize them: the alternative in a situation like your example where you also want the materialized value of the sink, is to make heavy use of the Mat operators to customize materialized values (in Scala, it's possible to use tuples to at least simplify combining multiple materialized values, but in Java, once you got beyond a pair (as you would with queue), I'm pretty sure you'd have to define a class just to hold the three (queue, killswitch, future for completed value) materialized values).
It's also worth noting that, since Akka Streams run on actors in the background (and thus get scheduled as needed onto the ActorSystem's threads), there's almost never a reason to create a thread on which to run a stream.
I have an Observable that at some point has to write things to the cache - and we would like to wait that writes are done before finishing the whole operation on the observable (for reporting purposes).
For the purpose of test, the cache write Completable looks like this:
Completable.create(
emitter ->
new Thread(
() -> {
try {
Thread.sleep(2000);
doSomething();
emitter.onComplete();
} catch (InterruptedException e) {
e.printStackTrace();
}
})
.start());
Since I have several cache writes, I try to merge them in a container class:
public class CacheInsertionResultsTracker {
private Completable cacheInsertResultsCompletable;
public CacheInsertionResultsTracker() {
this.cacheInsertResultsCompletable = Completable.complete();
}
public synchronized void add(Completable cacheInsertResult) {
this.cacheInsertResultsCompletable = this.cacheInsertResultsCompletable.mergeWith(cacheInsertResult);
}
public Completable getCompletable() {
return this.cacheInsertResultsCompletable;
}
}
And I try to merge it with Observable in a following way:
CacheInsertionResultsTracker tracker = new ...;
observable
.doOnNext(next->tracker.add(next.writeToCache(...)))
.mergeWith(Completable.defer(()->tracker.getCompletable()))
.subscribe(
// on next
this::logNextElement
// on error
this::finishWithError
// on complete
this::finishWithSuccess
);
How could I make sure that by the time finishWithSuccess is called the doSomething is completed?
The problem is that the Completable reference is updated every time I add a new one, and it happens after the mergeWith runs...
The solution that seems to work for our use case is to use concatWith + defer:
observable
.doOnNext(next->tracker.add(next.writeToCache(...)))
.concatWith(Completable.defer(()->tracker.getCompletable()))
.subscribe(
// on next
this::logNextElement
// on error
this::finishWithError
// on complete
this::finishWithSuccess
);
Concat assures that the subscription to the Completable happens only after the Observable is done, and defer defers getting the final Completable till this subscription (so all the objects are already added to the tracker).
Based on the comments, you could replace the completable cache with ReplaySubject<Completable>, do some timeout to detect inactivity and have the observable sequence end.
ReplaySubject<Completable> cache = ReplaySubject.create();
cache.onNext(completable);
observable.mergeWith(
cache.flatMapCompletable(v -> v)
.timeout(10, TimeUnit.MILLISECONDS, Completable.complete())
)
Edit:
Your updated example implies you want to run Completables in response to items in the main observable, isolated to that sequence, and wait for all of them to complete. This is a typical use case for flatMap:
observable.flatMap(
next -> next.writeToCache(...).andThen(Observable.just(next))
)
.subscribe(
this::logNextElement
// on error
this::finishWithError
// on complete
this::finishWithSuccess
);
I wanted to increase the performance of my backend REST API on a certain operation that polled multiple different external APIs sequentially and collected their responses and flattened them all into a single list of responses.
Having just recently learned about CompletableFutures, I decided to give it a go, and compare that solution with the one that involved simply changing my stream for a parallelStream.
Here is the code used for the benchmark-test:
package com.foo;
import java.util.Arrays;
import java.util.List;
import java.util.Objects;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
public class ConcurrentTest {
static final List<String> REST_APIS =
Arrays.asList("api1", "api2", "api3", "api4", "api5", "api6", "api7", "api8");
MyTestUtil myTest = new MyTestUtil();
long millisBefore; // used to benchmark
#BeforeEach
void setUp() {
millisBefore = System.currentTimeMillis();
}
#AfterEach
void tearDown() {
System.out.printf("time taken : %.4fs\n",
(System.currentTimeMillis() - millisBefore) / 1000d);
}
#Test
void parallelSolution() { // 4s
var parallel = REST_APIS.parallelStream()
.map(api -> myTest.collectOneRestCall())
.flatMap(List::stream)
.collect(Collectors.toList());
System.out.println("List of responses: " + parallel.toString());
}
#Test
void futureSolution() throws Exception { // 8s
var futures = myTest.collectAllResponsesAsync(REST_APIS);
System.out.println("List of responses: " + futures.get()); // only blocks here
}
#Test
void originalProblem() { // 32s
var sequential = REST_APIS.stream()
.map(api -> myTest.collectOneRestCall())
.flatMap(List::stream)
.collect(Collectors.toList());
System.out.println("List of responses: " + sequential.toString());
}
}
class MyTestUtil {
public static final List<String> RESULTS = Arrays.asList("1", "2", "3", "4");
List<String> collectOneRestCall() {
try {
TimeUnit.SECONDS.sleep(4); // simulating the await of the response
} catch (Exception io) {
throw new RuntimeException(io);
} finally {
return MyTestUtil.RESULTS; // always return something, for this demonstration
}
}
CompletableFuture<List<String>> collectAllResponsesAsync(List<String> restApiUrlList) {
/* Collecting the list of all the async requests that build a List<String>. */
List<CompletableFuture<List<String>>> completableFutures = restApiUrlList.stream()
.map(api -> nonBlockingRestCall())
.collect(Collectors.toList());
/* Creating a single Future that contains all the Futures we just created ("flatmap"). */
CompletableFuture<Void> allFutures = CompletableFuture.allOf(completableFutures
.toArray(new CompletableFuture[restApiUrlList.size()]));
/* When all the Futures have completed, we join them to create merged List<String>. */
CompletableFuture<List<String>> allCompletableFutures = allFutures
.thenApply(future -> completableFutures.stream()
.filter(Objects::nonNull) // we filter out the failed calls
.map(CompletableFuture::join)
.flatMap(List::stream) // creating a List<String> from List<List<String>>
.collect(Collectors.toList())
);
return allCompletableFutures;
}
private CompletableFuture<List<String>> nonBlockingRestCall() {
/* Manage the Exceptions here to ensure the wrapping Future returns the other calls. */
return CompletableFuture.supplyAsync(() -> collectOneRestCall())
.exceptionally(ex -> {
return null; // gets managed in the wrapping Future
});
}
}
There is a list of 8 (fake) APIs. Each response takes 4 seconds to execute and returns a list of 4 entities (Strings, in our case, for the sake of simplicity).
The results:
stream : 32 seconds
parallelStream : 4 seconds
CompletableFuture : 8 seconds
I'm quite surprised and expected the last two to be almost identical. What exactly is causing that difference? As far as I know, they are both using the ForkJoinPool.commonPool().
My naive interpretation would be that parallelStream, since it is a blocking operation, uses the actual MainThread for its workload and thus has an extra active thread to work with, compared to the CompletableFuture which is asynchronous and thus cannot use that MainThread.
CompletableFuture.supplyAsync() will end up using a ForkJoinPool initialized with parralelism of Runtime.getRuntime().availableProcessors() - 1 (JDK 11 source)
So looks like you have an 8 processor machine. Therefore there are 7 threads in the pool.
There are 8 API calls, so only 7 can run at a time on the common pool. And for the completable futures test, there will be 8 tasks running with your main thread blocking until they all complete. 7 will be able to execute at once meaning one has to wait for 4 seconds.
parallelStream() also uses this same thread pool, however the difference is that the first task will be executed on main thread that is executing the stream's terminal operation, leaving 7 to be distributed to the common pool. So there are just enough threads to run everything in parallel in this scenario. Try increasing the number of tasks to 9 and you will get the 8 second run-time for your test.
I have the following observable:
ScheduledExecutorService executorService = Executors.newScheduledThreadPool( 1 );
Observable<List<Widget>> findWidgetsObservable = Observable.create( emitter -> {
executorService.scheduleWithFixedDelay( emitFindWidgets( emitter, 0, 30, TimeUnit.SECONDS );
} );
private Runnable emitFindWidgets( ObservableEmitter<List<Widgets>> emitter ) {
return () -> {
emitter.onNext( Collections.emptyList() ); // dummy empty array
};
}
And I'm returning it in a graphql-java subscription resolver like so:
ConnectableObservable<List<Widget>> connectableObservable = findWidgetsObservable.share().publish();
Disposable connectionDisposable = connectableObservable.connect();
return connectableObservable.toFlowable( BackpressureStrategy.LATEST )
The graphql subscription works as expected and emits data to the JavaScript graphql client, but when the client unsubscribes, my Runnable continues seemingly infinitely. That said, the flowable's doOnCancel() event handler IS being run.
In order to remedy this problem, I've attempted to do the following within the flowable's doOnCancel():
Disposable connectionDisposable = connectableObservable.connect();
return connectableObservable.toFlowable( BackpressureStrategy.LATEST ).doOnCancel( () -> {
findWidgetsObservable.toFuture().cancel( true );
connectionDisposable.dispose();
})
However, the Runnable continues omitting indefinitely. Is there any way I can solve this problem and completely stop the emits?
I did have one thought: scheduleWithFixedDelay returns a ScheduledFuture, which has a cancel() method, but I'm not sure that there's anyway I can do that when the scheduling itself is scoped within an observable! Any help is appreciated.
The runnable keeps on emitting because you are scheduling the emission on a scheduler that is not known/bound to observable stream.
When you dispose your connection, you stop receiving the items from upstream because the connection to upstream observable is cut. But since you are scheduling the emitter to run repeatedly on a separate scheduler, the runnable keeps running.
You can describe the custom scheduling behavior using a custom scheduler and passing it in subscribeOn(Your-Custom-Scheduler)
Also, you mentioned you can invoke cancel() on ScheduledFuture in doOnDispose().
But you should switch schedulers explicitly in the observable chain. Otherwise, it becomes harder to debug.
in the past I have written some java programs, using two threads.
First thread (producer) was reading data from an API (C library), create a java object, send the object to the other thread.
The C API is delivering an event stream (infinite).
The threads are using a LinkedBlockingQueue as a pipeline to exchange the objects (put, poll).
The second thread (consumer) is dealing with the object.
(I also found that code is more readable within the threads. First thread is dealing with the C API stuff and producing
proper java objects, second thread is free from C API handling and is dealing with the data).
Now I'm interested, how I can realize this scenario above with the new stream API coming in java 8.
But assuming I want to keep the two threads (producer/consumer)!
First thread is writing into the stream. Second thread is reading from the stream.
I also hope, that I can handle with this technique a better explicit parallelism (producer/consumer)
and within the stream I can use some implicit parallelism (e.g. stream.parallel()).
I don't have many experience with the new stream api.
So I experimented with the following code below, to solve the idea above.
I use 'generate' to access the C API and feed this to the java stream.
I used in the consumer thread .parallel() to test and handle implicit parallelism. Looks fine. But see below.
Questions:
Is 'generate' the best way in this scenario for the producer?
I have an understanding problem how to terminate/close the stream in the producer,
if the API has some errors AND I want to shutdown the whole pipeline.
Do I use stream.close or throw an exception?
2.1 I used stream.close(). But 'generate' is still running after closing,
I found only to throw an exception to terminate the generate part.
This exception is going into the stream and consumer is receiving the exception
(This is fine for me, consumer can recognize it and terminate).
But in this case, the producer has produced more then consumer has processed, while exception is arriving.
2.2 if consumer is using implicit parallelism stream.parallel(). The producer is processing much more items.
So I don't see any solution for this problem. (Accessing C API, check error, make decision).
2.3 Throwing the exception in producer arrives at consumer stream, but not all inserted objects are processed.
Once more: the idea is to have an explicit parallelism with the threads.
But internally I can deal with the new features and use parallel processing when possible
Thanks for breeding about this problem too.
package sandbox.test;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.LongStream;
public class MyStream {
private volatile LongStream stream = null;
private AtomicInteger producerCount = new AtomicInteger(0);
private AtomicInteger consumerCount = new AtomicInteger(0);
private AtomicInteger apiError = new AtomicInteger(0);
public static void main(String[] args) throws InterruptedException {
MyStream appl = new MyStream();
appl.create();
}
private static void sleep(long sleep) {
try {
Thread.sleep(sleep);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static void apiError(final String pos, final int iteration) {
RuntimeException apiException = new RuntimeException("API error pos=" + pos + " iteration=" + iteration);
System.out.println(apiException.getMessage());
throw apiException;
}
final private int simulateErrorAfter = 10;
private Thread produce() {
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
System.out.println("Producer started");
stream = LongStream.generate(() -> {
int localCount;
// Detect error, while using stream.parallel() processing
int error = apiError.get();
if ( error > 0 )
apiError("1", error);
// ----- Accessing the C API here -----
localCount = producerCount.incrementAndGet(); // C API access; delegate for accessing the C API
// ----- Accessing the C API here -----
// Checking error code from C API
if ( localCount > simulateErrorAfter ) { // Simulate an API error
producerCount.decrementAndGet();
stream.close();
apiError("2", apiError.incrementAndGet());
}
System.out.println("P: " + localCount);
sleep(200L);
return localCount;
});
System.out.println("Producer terminated");
}
});
thread.start();
return thread;
}
private Thread consume() {
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
try {
stream.onClose(new Runnable() {
#Override
public void run() {
System.out.println("Close detected");
}
}).parallel().forEach(l -> {
sleep(1000);
System.out.println("C: " + l);
consumerCount.incrementAndGet();
});
} catch (Exception e) {
// Capturing the stream end
System.out.println(e);
}
System.out.println("Consumer terminated");
}
});
thread.start();
return thread;
}
private void create() throws InterruptedException {
Thread producer = produce();
while ( stream == null )
sleep(10);
Thread consumer = consume();
producer.join();
consumer.join();
System.out.println("Produced: " + producerCount);
System.out.println("Consumed: " + consumerCount);
}
}
You need to understand some fundamental points about the Stream API:
All operations applied on a stream are lazy and won’t do anything before the terminal operation will be applied. There is no sense in creating the stream using a “producer” thread as this thread won’t do anything. All actions are performed within your “consumer” thread and the background threads started by the Stream implementation itself. The thread that created the Stream instance is completely irrelevant
Closing a stream has no relevance for the Stream operation itself, i.e. does not shut down threads. It is meant to release additional resources, e.g. closing the file associated with the stream returned by Files.lines(…). You can schedule such cleanup actions using onClose and the Stream will invoke them when you call close but that’s it. For the Stream class itself it has no meaning.
Streams do not model a scenario like “one thread is writing and another one is reading”. Their model is “one thread is calling your Supplier, followed by calling your Consumer and another thread does the same, and x other threads too…”
If you want to implement a producer/consumer scheme with distinct producer and consumer threads, you are better off using Threads or an ExecutorService and a thread-safe queue.
But you still can use Java 8 features. E.g. there is no need to implement Runnables using inner classes; you can use lambda expression for them.