Am I misusing rxJava by converting an observable into a blocking observable? - java

My API makes about 100 downstream calls, in pairs, to two separate services. All responses need to be aggregated, before I can return my response to the client. I use hystrix-feign to make the HTTP calls.
I came up with what I believed was an elegant solution until on the rxJava docs I've found the following
BlockingObservable is a variety of Observable that provides blocking operators. It can be useful for testing and demo purposes, but is generally inappropriate for production applications (if you think you need to use a BlockingObservable this is usually a sign that you should rethink your design).
My code looks roughly as follows
List<Observable<C>> observables = new ArrayList<>();
for (RequestPair request : requests) {
Observable<C> zipped = Observable.zip(
feignClientA.sendRequest(request.A()),
feignClientB.sendRequest(request.B()),
(a, b) -> new C(a,b));
observables.add(zipped);
}
Collection<D> apiResponse = = new ConcurrentLinkedQueue<>();
Observable
.merge(observables)
.toBlocking()
.forEach(combinedResponse -> apiResponse.add(doSomeWork(combinedResponse)));
return apiResponse;
Few questions based on this setup:
Is toBlocking() justified given my use case
Am I correct in understanding that the actual HTTP calls do not get made until the main thread gets to the forEach()
I've seen that the code in the forEach() block is executed by different threads, but I was not able to verify if there can be more than one thread in the forEach() block. Is the execution there concurrent?

A better option is to return the Observable to be consumed by other operators but you may get away with blocking code (It should, however, run on a background thread.)
public Observable<D> getAll(Iterable<RequestPair> requests) {
return Observable.from(requests)
.flatMap(request ->
Observable.zip(
feignClientA.sendRequest(request.A()),
feignClientB.sendRequest(request.B()),
(a, b) -> new C(a,b)
)
, 8) // maximum concurrent HTTP requests
.map(both -> doSomeWork(both));
}
// for legacy users of the API
public Collection<D> getAllBlocking(Iterable<RequestPair> requests) {
return getAll(requests)
.toList()
.toBlocking()
.first();
}
Am I correct in understanding that the actual HTTP calls do not get made until the main thread gets to the forEach()
Yes, the forEach triggers the whole sequence of operations.
I've seen that the code in the forEach() block is executed by different threads, but I was not able to verify if there can be more than one thread in the forEach() block. Is the execution there concurrent?
Only one thread at a time is allowed to execute the lambda in forEach but you may indeed see different threads entering there.

Related

Join multiple parallel object into a single List

I've a map of key-value and iterating over keys, and calling service and based on the response, I am adding all the response to some uberList
How can I execute the different operations concurrently? Will changing stream() to parallelStream() do the trick? Does it synchronize when it adds to uberList?
The idea is to minimize the response time.
List<MyClass> uberList = new LinkedList<>();
Map<String, List<MyOtherClass>> map = new HashMap();
//Populate map
map.entrySet().stream().filter(s -> s.getValue().size() > 0 && s.getValue().values().size() > 0).forEach(
y -> {
// Do stuff
if(noError) {
uberList.add(MyClass3);
}
}
}
//Do stuff on uberList
How can I execute the different operations concurrently?
One thread can do one task at a time. If you want to do multiple operations concurrently, you have to offwork to other threads.
You can either creating new Thread or using ExecutorService to manage thread pool, queue the task and execute task for you.
Will changing stream() to parallelStream() do the trick?
Yes it does. Internally, parallelStream() use the ForkJoinPool.commonPool() to run tasks for you. But keep in mind that the parallelStream() has no guarantee about if the returned stream is paralleled (but for now, the current implementation return a paralleled one)
Does it synchronize when it adds to uberList?
It's up to you to do the synchronization part in forEach pipeline. Normally you do not want to call collection.add() inside forEach to create collection. Instead you should use .map().collect(toX()) methods. It frees you from synchronizatin part:
It does not required to know about your local variable (in this case uberlist. And it will not modify it on execution, help reduce a lot of strange bugs caused of concurrency
You can freely change the type of collection in .collect() part. It give you more control over the result type.
It does not require thread-safe or synchronization on given collection when using with parallel stream. Because "multiple intermediate results may be instantiated, populated, and merged so as to maintain isolation of mutable data structures" (Read more about this here)
So what you want is to execute multiple similar service call at the same time and collect your result into a list.
You can do it simply by parallel stream:
uberList = map.entrySet().stream()
.parallel() // Use .stream().parallel() to force parallism. The .parallelStream() does not guarantee that the returned stream is parallel stream
.filter(yourCondition)
.map(e -> yourService.methodCall(e))
.collect(Collectors.toList());
Pretty cool, isn't it?
But as I stated, the default parallel stream use ForkJoinPool.commonPool() for thread queueing and executing.
The bad part is if your yourService.methodCall(e) do heavy IO stuff (like HTTP call, even db call...) or long running task then it may exhaust the pool, other incoming tasks will queued forever to wait for execution.
So typically all other tasks depend on this common pool (not only your own yourService.methodCall(e), but all other parallel stream) will be slow down due to queueing time.
To solve this problem, you can force execute parallelism on your own fork-join pool:
ForkJoinPool forkJoinPool = new ForkJoinPool(4); // Typically set it to Runtime.availableProcessors()
uberlist = forkJoinPool.submit(() -> {
return map.entrySet().stream()
.parallel() // Use .stream().parallel() to force parallism. The .parallelStream() does not guarantee that the returned stream is parallel stream
.filter(yourCondition)
.map(e -> yourService.methodCall(e))
.collect(Collectors.toList());
}).get();
You probably don't want to use parallelStream for concurrency, only for parallelism. (That is: use it for tasks where you want to use multiple physical processes efficiently on a task that's conceptually sequential, not for tasks where you want multiple things going on at the same time conceptually.)
In your case you would probably be better off using an ExecutorService, or more specifically com.google.common.util.concurrent.ListenableExecutorService from Google Guava (warning: I haven't tried to compile the below code, there may be syntax errors):
int MAX_NUMBER_OF_SIMULTANEOUS_REQUESTS = 100;
ListeningExecutorService myExecutor =
MoreExecutors.listeningDecorator(
Executors.newFixedThreadPool(MAX_NUMBER_OF_SIMULTANEOUS_REQUESTS));
List<ListenableFuture<Optional<MyClass>>> futures = new ArrayList<>();
for (Map.Entry<String, List<MyOtherClass>> entry : map.entrySet()) {
if (entry.getValue().size() > 0 && entry.getValue().values().size() > 0) {
futures.add(myExecutor.submit(() -> {
// Do stuff
if(noError) {
return Optional.of(MyClass3);
} else {
return Optional.empty();
}
}));
}
}
List<MyClass> uberList = Futures.successfulAsList(futures)
.get(1, TimeUnit.MINUTES /* adjust as necessary */)
.stream()
.filter(Optional::isPresent)
.map(Optional::get)
.collect(Collectors.toList());
The advantage of this code is that it allows you to explicitly specify that the tasks should all start at the "same time" (at least conceptually) and allows you to control your concurrency explicitly (how many simultaneous requests are allowed? What do we do if some of the tasks fail? How long are we willing to wait? etc). Parallel streams aren't really for that.
Parallel Stream will help in execution concurrently. But it is not recommended to do forEach loop and add element in outside list. If you do that, you have to make sure of synchnising external list. Better way of doing it is to use map and collect result into list. In this case, parallelStream takes care of synchronisation.
List<MyClass> uberList = map.entrySet().parallelStream().filter(s ->
s.getValue().size() > 0 && s.getValue().values().size() >
0).map(
y -> {
// Do stuff
return MyClass3;
}
}
.filter(t -> check no ertor condition)
.collect (Collectors.toList())

How do I memoize the value emitted by a Single?

Say I have an expensive calculation that creates an object. I want to give the caller some flexibility as to where that happens, with subscribeOn(). But I also don't want to make that calculation more than once, because of side effects (e.g. the object is backed by some external data store).
I can write
MyObject myObject = MyObject.createExpensively(params);
return Single.just(myObject);
but this does the expensive work on the calling thread.
I can write
Callable<MyObject> callable = () -> MyObject.createExpensively(params);
return Single.fromCallable(callable);
but this will invoke createExpensively() (with side effects) once per subscription, which isn't what I want if there are multiple subscribers.
If I want to ensure that createExpensively() is only called once, and its side effects only occur once, what's the pattern I'm looking for here?
You could use Single.cache():
Single.fromCallable(() -> MyObject.createExpensively(params)).cache();
Single.fromCallable(() -> MyObject.createExpensively(params)).cache();
cache() -> Stores the success value or exception from the current Single and replays it to late SingleObservers. Please have a look here for more info.

Sequential streams and shared state

The javadoc for java.util.stream implies that "behavioral operations" in a stream pipeline must usually be stateless. However, the examples it shows of how not to write a pipeline all seem to involve parallel streams.
To what extent does this apply to sequential streams?
In particular, I was looking over a colleague's code that looked essentially like this:
List<SomeClass> list = ...;
Map<SomeClass, String> map = new HashMap<>();
list.stream()
.filter(x -> [some boolean expression])
.forEach(x -> {
if (map.containsKey(x) {
throw new UserDefinedException("duplicates detected in input");
} else {
map.put(x, aStringFunction(x));
}
});
[The author had tried using Collectors.toMap(), but it threw an IllegalStateException when there were duplicates, and neither of us knew about the toMap that takes a mergeFunction. That last would have been the best solution, but I'd like an answer anyway because of the more general principle involved.]
I was nervous about this code, since it wasn't clear to me whether the execution of the block in the forEach could overlap for different elements, even for a sequential stream. The javadoc for forEach() is a bit ambiguous whether synchronization is necessary for accessing shared state in a sequential stream. Eventually the author changed the code to use a ConcurrentHashMap and map.putIfAbsent().
My question is: was I right to be nervous, or is the code above trustworthy?
Suppose the expression in the filter() did something that used some shared state. Can we trust that it will work OK when using a sequential stream?
The sequential stream is by definition executes everything in the caller thread, thus if you are not going to parallelize your stream in future, you can safely use shared state without additional synchronization and concurrent-safe collections. So the current code is safe. Note however that it just looks dirty.
If you rely on your forEach to be executed sequentially, consider using forEachOrdered instead even if the stream is sequential. Not only will that get the explicit guarantee from the api that the code will be executed sequentially, it will make the code more self-documenting and provide some measure of protection against somebody coming along and changing your stream to parallel.

RxJava .subscribeOn(Schedulers.newThread()) questions

I am on plain JDK 8. I have this simple RxJava example:
Observable
.from(Arrays.asList("one", "two", "three"))
.doOnNext(word -> System.out.printf("%s uses thread %s%n", word, Thread.currentThread().getName()))
//.subscribeOn(Schedulers.newThread())
.subscribe(word -> System.out.println(word));
and it prints out the words line by line, intertwined with information about the thread, which is 'main' for all next calls, as expected.
However, when I uncomment the subscribeOn(Schedulers.newThread()) call, nothing is printed at all. Why isn't it working? I would have expected it to start a new thread for each onNext() call and the doOnNext() to print that thread's name. Right now, I see nothing, also for the other schedulers.
When I add the call to Thread.sleep(10000L) at the end of my main, I can see the output, which would suggest the threads used by RxJava are all daemons. Is this the case? Can this be changed somehow, but using a custom ThreadFactory or similar concept, and not have to implement a custom Scheduler?
With the mentioned change, the thread name is always RxNewThreadScheduler-1, whereas the documentation for newThread says "Scheduler that creates a new {#link Thread} for each unit of work". Isn't it supposed to create a new thread for all of the emissions?
As Vladimir mentioned, RxJava standard schedulers run work on daemon threads which terminate in your example because the main thread quits. I'd like to emphasise that they don't schedule each value on a new thread, but they schedule the stream of values for each individual subscriber on a newly created thread. Subscribing a second time would give you "RxNewThreadScheduler-2".
You don't really need to change the default schedulers, but just wrap your own Executor-based scheduler with Schedulers.from() and supply that as a parameter where needed:
ThreadPoolExecutor exec = new ThreadPoolExecutor(
0, 64, 2, TimeUnit.SECONDS, new LinkedBlockingQueue<>());
exec.allowCoreThreadTimeOut(true);
Scheduler s = Schedulers.from(exec);
Observable
.from(Arrays.asList("one", "two", "three"))
.doOnNext(word -> System.out.printf("%s uses thread %s%n", word,
Thread.currentThread().getName()))
.subscribeOn(s)
.subscribe(word -> System.out.println(word));
I've got a series of blog posts about RxJava schedulers whichs should help you implement a "more permanent" variant.
Contrary to newcomers belief, reactive streams are not inherently concurrent but are inherently asynchronous. They also are inherently sequential and concurrency must be configured within the stream. Put simply, reactive streams are naturally sequential at their ends but can be concurrent at their core.
The secret sauce is using the flatMap() operator within the stream. This operator takes an Observable<T> input from the source stream and, internally re-emit it as an Observable<Observable<T>> stream to which it subscribes too all instances at once. As long as the flatMap() internal stream is executed in a multi-threaded context, it will concurrently execute the provided Function<T, R> that applies your logic and, finally, re-emit the result on the original stream as it's own emissions.
This sounds very complicated (and it is quite a bit at first glance) but simple examples with explanations help to understand the concept.
Find more details from a similar question here and articles on RxJava2 Schedulers and Concurrency with code sample and detailed explanations on how to use Schedulers sequentially and concurrently.
Hope this helps,
Softjake
public class MainClass {
public static void main(String[] args) {
Scheduler scheduler = Schedulers.from(Executors.newFixedThreadPool(10, Executors.defaultThreadFactory()));
Observable.interval(1,TimeUnit.SECONDS)
.doOnNext(word -> System.out.printf("%s uses thread %s%n", word,
Thread.currentThread().getName()))
.subscribeOn(scheduler)
.observeOn(Schedulers.io())
.doOnNext(word -> System.out.printf("%s uses thread %s%n", word,
Thread.currentThread().getName()))
.subscribe();
}
}

Incremental Future of list extensions

I essentially have a Future<List<T>> that is fetched in batches from the server. For some clients I'd like to provide incremental results while it loads in addition to the whole collection when future is fulfilled.
Is there a common Future extension defined somewhere for this? What are typical patterns/combinators exist for such futures?
I assume that given IncrementalListFuture<T> I can easily define map operation. What else comes to your mind?
Is there a common Future extension defined somewhere for this?
I assume you are talking about incremental results from an ExecutorService. You should consider using an ExecutorCompletionService which allows you to be informed as soon as one of the Future objects is get-able.
To quote from the javadocs:
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(e);
for (Callable<Result> s : solvers) {
ecs.submit(s);
}
int n = solvers.size();
for (int i = 0; i < n; ++i) {
// this waits for one of the futures to finish and provide a result
Future<Result> future = ecs.take();
Result result = future.get();
if (result != null) {
// do something with the result
}
}
Sorry. I initially misread the question and thought that you were asking about a List<Future<?>>. It may be that you could refactor your code to actually return a number of Futures so I'll leave this for posterity.
I would not pass back the list in this case in a Future. You aren't going to be able to get the return until the job finishes.
If possible, I would pass in some sort of BlockingQueue so both the caller and the thread can access it:
final BlockingQueue<T> queue = new LinkedBlockingQueue<T>();
// build out job with the queue
threadPool.submit(new SomeJob(queue));
threadPool.shutdown();
// now we can consume from the queue as it is built:
while (true) {
T result = queue.take();
// you could some constant result object to mean that the job finished
if (result == SOME_END_OBJECT) {
break;
}
// provide intermediate results
}
You could also have some sort of SomeJob.take() method which calls through to a BlockingQueue defined inside of your job class.
// the blocking queue in this case is hidden inside your job object
T result = someJob.take();
...
Here's what I would do:
In the thread that populates the List, make it thread-safe by wrapping the list using Collections.synchronizedList
Make the list publically available, but not modifiable by adding a public method to the thread which returns the list, but wrapped by Collections.unmodifiableList
Instead of giving clients a Future>, give them a handle to the thread, or some kind of wrapper of it, so that they can call the public method above.
Alternatively, as Gray has suggested, BlockingQueues are great for thread coordination like this. This may require more changes to your client code, however.
To answer my own question: there has been lots of development in this area recently. Among most used are: Play iteratees (http://www.playframework.org/documentation/2.0/Iteratees) and Rx for .NET (http://msdn.microsoft.com/en-us/data/gg577609.aspx)
Instead of Future they define something like:
interface Observable<T> {
Disposable subscribe(Observer<T> observer);
}
interface Observer<T> {
void onCompleted();
void onError(Exception error);
void onNext(T value);
}
and lots of combinators.
Alternatively to Observables you can take a look at twitter's approach.
They use Spool, which is an asynchronous version of the Stream.
Basically it is a simple trait similar to the List
trait Spool[+A] {
def head: A
/**
* The (deferred) tail of the spool. Invalid for empty spools.
*/
def tail: Future[Spool[A]]
}
that allows you to do functional stuff like map, filter and foreach on top of it.
Future is really designed to return a single (atomic) result, not for communicating intermediate results in this manner. What you will really want to do is to use multiple futures, one per batch.
We have a similar requirement where we have a bunch of things that we need to get from different remote servers, and each will come return at different times. We don't want to wait until the last one has returned, but rather process them in the order they return. For this we created the AsyncCompleter which takes an Iterable<Callable<T>> and returns an Iterable<T> that blocks on iteration, completely abstracting usage of the Future interface.
If you look at how that class is implemented, you'll see how to use a CompletionService to receive results from an Executor in the order in which they become available, if you need to build this for yourself.
edit: just saw that the second half of Gray's answer is similar, basically using an ExecutorCompletionService

Categories

Resources