Say I have an expensive calculation that creates an object. I want to give the caller some flexibility as to where that happens, with subscribeOn(). But I also don't want to make that calculation more than once, because of side effects (e.g. the object is backed by some external data store).
I can write
MyObject myObject = MyObject.createExpensively(params);
return Single.just(myObject);
but this does the expensive work on the calling thread.
I can write
Callable<MyObject> callable = () -> MyObject.createExpensively(params);
return Single.fromCallable(callable);
but this will invoke createExpensively() (with side effects) once per subscription, which isn't what I want if there are multiple subscribers.
If I want to ensure that createExpensively() is only called once, and its side effects only occur once, what's the pattern I'm looking for here?
You could use Single.cache():
Single.fromCallable(() -> MyObject.createExpensively(params)).cache();
Single.fromCallable(() -> MyObject.createExpensively(params)).cache();
cache() -> Stores the success value or exception from the current Single and replays it to late SingleObservers. Please have a look here for more info.
Related
I am trying to support concurrency on a hashmap that gets periodically cleared. I have a cache that stores data for a period of time. After every 5 minutes, the data in this cache is sent to the server. Once I flush, I want to clear the cache. The problem is when I am flushing, data could potentially be written to this map while I am doing that with an existing key. How would I go about making this process thread safe?
data class A(val a: AtomicLong, val b: AtomicLong) {
fun changeA() {
a.incrementAndGet()
}
}
class Flusher {
private val cache: Map<String, A> = ConcurrentHashMap()
private val lock = Any()
fun retrieveA(key: String){
synchronized(lock) {
return cache.getOrPut(key) { A(key, 1) }
}
}
fun flush() {
synchronized(lock) {
// send data to network request
cache.clear()
}
}
}
// Existence of multiple classes like CacheChanger
class CacheChanger{
fun incrementData(){
flusher.retrieveA("x").changeA()
}
}
I am worried that the above cache is not properly synchronized. Are there better/right ways to lock this cache so that I don't lose out on data? Should I create a deepcopy of cache and clear it?
Since the above data could be being changed by another changer, could that not lead to problems?
You can get rid of the lock.
In the flush method, instead of reading the entire map (e.g. through an iterator) and then clearing it, remove each element one by one.
I'm not sure if you can use iterator's remove method (I'll check that in a moment), but you can take the keyset iterate over it, and for each key invoke cache.remove() - this will give you the value stored and remove it from the cache atomically.
The tricky part is how to make sure that the object of class A won't be modified just prior sending over network... You can do it as follows:
When you get the some x through retrieveA and modify the object, you need to make sure it is still in the cache. Simply invoke retrieve one more time. If you get exactly the same object it's fine. If it's different, then it means that object was removed and sent over network, but you don't know if the modification was also sent, or the state of the object prior to the modification was sent. Still, I think in your case, you can simply repeat the whole process (apply change and check if objects are the same). But it depends on the specifics of your application.
If you don't want to increment twice, then when sending the data over network, you'll have to read the content of the counter a, store it in some local variable and decrease a by that amount (usually it will get zero). Then in the CacheChanger, when you get a different object from the second retrieve, you can check if the value is zero (your modification was taken into account), or non-zero which means your modification came just a fraction of second to late, and you'll have to repeat the process.
You could also replace incrementAndGet with compareAndSwap, but this could yield slightly worse performance. In this approach, instead of incrementing, you try to swap a value that is greater by one. And before sending over network you try to swap the value to -1 to denote the value as invalid. If the second swap fails it means that someone has changed the value concurrently, you need to check it one more time in order to send the freshest value over network, and you repeat the process in a loop (breaking the loop only once the swap to -1 succeeds). In the case of swap to greater by one, you also repeat the process in a loop until the swap succeeds. If it fails, it either means that somebody else swapped to some greater value, or the Flusher swapped to -1. In the latter case you know that you have to call retrieveA one more time to get a new object.
The easiest solution (but with a worse performance) is to rely completely on locks.
You can change ConcurrentHashMap to a regular HashMap.
Then you have to apply all your changes directly in the function retrieve:
fun retrieveA(key: String, mod: (A) -> Unit): A {
synchronized(lock) {
val obj: A = cache.getOrPut(key) { A(key, 1) }
mod(obj)
cache.put(obj)
return obj
}
}
I hope it compiles (I'm not an expert on Kotlin).
Then you use it like:
class CacheChanger {
fun incrementData() {
flusher.retrieveA("x") { it.changeA() }
}
}
Ok, I admit this code is not really Kotlin ;) you should use a Kotlin lambda instead of the Consumer interface. It's been some time since I played a bit with Kotlin. If someone could fix it I'd be very grateful.
I call a list of methods in a loop, wherein each return an Observable & I add them to a List for processing later.
Code:
List<Observable> productViewObservables = new ArrayList<>();
for (ProductEnricher enricher : orchestrationStrategy.getEnrichers()) {
productViewObservables.add(enricher.asyncEnrich(productView, productId); }
But Im not sure if the Observable responses get added to that list in the same order that I invoke them, which is essential for my processing. Can someone clarify this?
Yes, they are added in the same order, because Observable is just a regular object. Your code is simple sequential code - you are not executing any Observable yet in your example, they are just objects which represent computation that might be done in the future.
If you would like to run these Observables (using subscribe method) then you get into asynchronous world and the results might not be that obvious.
If you'd want to run your List of Observables sequentially then you could use concat method on Observable which takes Iterable as an argument
My API makes about 100 downstream calls, in pairs, to two separate services. All responses need to be aggregated, before I can return my response to the client. I use hystrix-feign to make the HTTP calls.
I came up with what I believed was an elegant solution until on the rxJava docs I've found the following
BlockingObservable is a variety of Observable that provides blocking operators. It can be useful for testing and demo purposes, but is generally inappropriate for production applications (if you think you need to use a BlockingObservable this is usually a sign that you should rethink your design).
My code looks roughly as follows
List<Observable<C>> observables = new ArrayList<>();
for (RequestPair request : requests) {
Observable<C> zipped = Observable.zip(
feignClientA.sendRequest(request.A()),
feignClientB.sendRequest(request.B()),
(a, b) -> new C(a,b));
observables.add(zipped);
}
Collection<D> apiResponse = = new ConcurrentLinkedQueue<>();
Observable
.merge(observables)
.toBlocking()
.forEach(combinedResponse -> apiResponse.add(doSomeWork(combinedResponse)));
return apiResponse;
Few questions based on this setup:
Is toBlocking() justified given my use case
Am I correct in understanding that the actual HTTP calls do not get made until the main thread gets to the forEach()
I've seen that the code in the forEach() block is executed by different threads, but I was not able to verify if there can be more than one thread in the forEach() block. Is the execution there concurrent?
A better option is to return the Observable to be consumed by other operators but you may get away with blocking code (It should, however, run on a background thread.)
public Observable<D> getAll(Iterable<RequestPair> requests) {
return Observable.from(requests)
.flatMap(request ->
Observable.zip(
feignClientA.sendRequest(request.A()),
feignClientB.sendRequest(request.B()),
(a, b) -> new C(a,b)
)
, 8) // maximum concurrent HTTP requests
.map(both -> doSomeWork(both));
}
// for legacy users of the API
public Collection<D> getAllBlocking(Iterable<RequestPair> requests) {
return getAll(requests)
.toList()
.toBlocking()
.first();
}
Am I correct in understanding that the actual HTTP calls do not get made until the main thread gets to the forEach()
Yes, the forEach triggers the whole sequence of operations.
I've seen that the code in the forEach() block is executed by different threads, but I was not able to verify if there can be more than one thread in the forEach() block. Is the execution there concurrent?
Only one thread at a time is allowed to execute the lambda in forEach but you may indeed see different threads entering there.
I'm trying to get the latest value of a given Observable and get it to emit
immediately once it's called. Given the code below as an example:
return Observable.just(myObservable.last())
.flatMap(myObservable1 -> {
return myObservable1;
})
.map(o -> o.x) // Here I want to end up with a T object instead of Observable<T> object
This does not work because by doing this the flatMap will emit myObservable1 which in turn will have
to emit to reach the map.
I don't know if doing such a thing is even possible. Does anyone have any clue on how to achieve this goal? Thank you
last() method will not be of any help here as it waits for the Observable to terminate to give you the last item emitted.
Assuming that you do not have the control over the emitting observable you could simply create a BehaviorSubject and subscribe it to the observable that emits the data that you want to listen and then subscribe to the created subject. Since Subject is both Observable and Subscriber you will get what you want.
I think (do not have the time to check it now) you may have to manually unsubscribe from the original observable as the BehaviorSubject once all of his subscribers unsubscribe will not unsubscribe automatically.
Something like this:
BehaviorSubject subject = new BehaviorSubject();
hotObservable.subscribe(subject);
subject.subscribe(thing -> {
// Here just after subscribing
// you will receive the last emitted item, if there was any.
// You can also always supply the first item to the behavior subject
});
http://reactivex.io/RxJava/javadoc/rx/subjects/BehaviorSubject.html
In RxJava, subscriber.onXXX is called asynchronous.It means that if your Observable emit items in new thread, you can never get the last item before return, except you block the thread and wait for the item.But if the Observable emit item synchronously and you dont' change it's thread by subscribeOn and observOn,
such as the code:
Observable.just(1,2,3).subscribe();
In this case, you can get the last item by doing like this:
Integer getLast(Observable<Integer> o){
final int[] ret = new int[1];
Observable.last().subscribe(i -> ret[0] = i);
return ret[0];
}
It's a bad idea doing like this.RxJava prefer you to do asynchronous work by it.
What you actually want to achieve here is to take an asynchronous task and transform it to a synchronous one.
There are several ways to achieve it, each one with it's pros and cons:
Use toBlocking() - it means that this thread will be BLOCKED, until the stream is finish, in order to get only one item simply use first() as it will complete once an item is delivered.
let's say your entire stream is Observable<T> getData();
then a method that will get the last value immediately will look like this:
public T getLastItem(){
return getData().toBlocking().first();
}
please don't use last() as it will wait for the stream to complete and only then will emit the last item.
If your stream is a network request and it didn't get any item yet this will block your thread!, so only use it when you are sure that there is an item available immediately (or if you really want a block...)
another option is to simply cache the last result, something like this:
getData().subscribe(t-> cachedT = t;) //somewhere in the code and it will keep saving the last item delivered
public T getLastItem(){
return cachedT;
}
if there wasn't any item sent by the time you request it you will get null or whatever initial value you have set.
the problem with this approch is that the subscribe phase might happen after the get and might make a race condition if used in 2 different threads.
I essentially have a Future<List<T>> that is fetched in batches from the server. For some clients I'd like to provide incremental results while it loads in addition to the whole collection when future is fulfilled.
Is there a common Future extension defined somewhere for this? What are typical patterns/combinators exist for such futures?
I assume that given IncrementalListFuture<T> I can easily define map operation. What else comes to your mind?
Is there a common Future extension defined somewhere for this?
I assume you are talking about incremental results from an ExecutorService. You should consider using an ExecutorCompletionService which allows you to be informed as soon as one of the Future objects is get-able.
To quote from the javadocs:
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(e);
for (Callable<Result> s : solvers) {
ecs.submit(s);
}
int n = solvers.size();
for (int i = 0; i < n; ++i) {
// this waits for one of the futures to finish and provide a result
Future<Result> future = ecs.take();
Result result = future.get();
if (result != null) {
// do something with the result
}
}
Sorry. I initially misread the question and thought that you were asking about a List<Future<?>>. It may be that you could refactor your code to actually return a number of Futures so I'll leave this for posterity.
I would not pass back the list in this case in a Future. You aren't going to be able to get the return until the job finishes.
If possible, I would pass in some sort of BlockingQueue so both the caller and the thread can access it:
final BlockingQueue<T> queue = new LinkedBlockingQueue<T>();
// build out job with the queue
threadPool.submit(new SomeJob(queue));
threadPool.shutdown();
// now we can consume from the queue as it is built:
while (true) {
T result = queue.take();
// you could some constant result object to mean that the job finished
if (result == SOME_END_OBJECT) {
break;
}
// provide intermediate results
}
You could also have some sort of SomeJob.take() method which calls through to a BlockingQueue defined inside of your job class.
// the blocking queue in this case is hidden inside your job object
T result = someJob.take();
...
Here's what I would do:
In the thread that populates the List, make it thread-safe by wrapping the list using Collections.synchronizedList
Make the list publically available, but not modifiable by adding a public method to the thread which returns the list, but wrapped by Collections.unmodifiableList
Instead of giving clients a Future>, give them a handle to the thread, or some kind of wrapper of it, so that they can call the public method above.
Alternatively, as Gray has suggested, BlockingQueues are great for thread coordination like this. This may require more changes to your client code, however.
To answer my own question: there has been lots of development in this area recently. Among most used are: Play iteratees (http://www.playframework.org/documentation/2.0/Iteratees) and Rx for .NET (http://msdn.microsoft.com/en-us/data/gg577609.aspx)
Instead of Future they define something like:
interface Observable<T> {
Disposable subscribe(Observer<T> observer);
}
interface Observer<T> {
void onCompleted();
void onError(Exception error);
void onNext(T value);
}
and lots of combinators.
Alternatively to Observables you can take a look at twitter's approach.
They use Spool, which is an asynchronous version of the Stream.
Basically it is a simple trait similar to the List
trait Spool[+A] {
def head: A
/**
* The (deferred) tail of the spool. Invalid for empty spools.
*/
def tail: Future[Spool[A]]
}
that allows you to do functional stuff like map, filter and foreach on top of it.
Future is really designed to return a single (atomic) result, not for communicating intermediate results in this manner. What you will really want to do is to use multiple futures, one per batch.
We have a similar requirement where we have a bunch of things that we need to get from different remote servers, and each will come return at different times. We don't want to wait until the last one has returned, but rather process them in the order they return. For this we created the AsyncCompleter which takes an Iterable<Callable<T>> and returns an Iterable<T> that blocks on iteration, completely abstracting usage of the Future interface.
If you look at how that class is implemented, you'll see how to use a CompletionService to receive results from an Executor in the order in which they become available, if you need to build this for yourself.
edit: just saw that the second half of Gray's answer is similar, basically using an ExecutorCompletionService