How to aggregate different async sources into a Single using RxJava2? - java

Let's say I have this synchronous method:
public FruitBowl getFruitBowl() {
Apple apple = getApple(); // IO intensive
Banana banana = getBanana(); // CPU intensive
return new FruitBowl(apple, banana);
}
I can use the Java concurrency API to turn it into an async method, which would turn out somewhat like this:
public Future<FruitBowl> getFruitBowl() {
Future<Apple> appleFuture = getAppleAsync(); // IO intensive
Future<Banana> bananaFuture = getBananaAsync(); // CPU intensive
return createFruitBowlAsync(appleFuture, bananaFuture); // Awaits appleFuture and bananaFuture and then returns a new FruitBowl
}
What is the idiomatic Rx way of doing this while taking advantage of it's schedulers (io and computation) and return a Single?

You can use the zip operator. And for each of the async operation define a different thread. If you don't do so, the methods will be executed one after the other, on the same thread.
I would create an Observable version of both methods, in order to return respectively Observable<Apple> and Observable<Banana> and use them in this way:
Observalbe.zip(getAppleObservable().subscribeOn(Schedulers.newThread()),
getBananaObservable().subscribeOn(Schedulers.newThread()),
(apple, banana) -> new FruitBowl(apple, banana)))
.subscribe(/* do your work here with FruitBowl object */);
Here more details about how to parallelize operations with zip operator

Related

Is it possible to block/wait an already existing asynchronous function?

SomeLibrary lib = new SomeLibrary();
lib.doSomethingAsync(); // some function from a library I got and what it does is print 1-5 asynchronously
System.out.println("Done");
// output
// Done
// 1
// 2
// 3
// 4
// 5
I want to be clear that I didn't make the doSomethingAsync() function and it's out of my ability to change it. I want to find a way to block this async function and print Done after the numbers 1 to 5 because as you see Done is being instantly printed. Is there a way to do this in Java?
You can use CountDownLatch as follow:
final CountDownLatch wait = new CountDownLatch(1);
SomeLibrary lib = new SomeLibrary(wait);
lib.doSomethingAsync(); // some function from a library I got and what it does is print 1-5 asynchronously
//NOTE in the doSomethingAsync, you must call wait.countDown() before return
wait.await(); //-> it wait in here until wait.countDown() is called.
System.out.println("Done");
In Constructor SomeLibrary :
private CountDownLatch wait;
public ScannerTest(CountDownLatch _wait) {
this.wait = _wait;
}
In method doSomethingAsync():
public void doSomethingAsync(){
//TODO something
...
this.wait.countDown();
return;
}
This is achieved in a couple of ways in standard libraries :-
Completion Callback
Clients can often provider function to be invoked after the async task is complete. This function usually receives some information regarding the work done as it's input.
Future.get()
Async functions return Future for client synchronization. You can read more about them here.
Do check if any of these options are available (perhaps, an overloaded version ?_ in the method you wish to invoke. It is not too uncommon for libraries to include both sync and async version of some business logic so you could search for that too.

CompletionService vs CompletableFuture

I have 1000 big files to be processed in order as mentioned below:
First those files needs to be copied to a different directory in parallel, I am planning to use ExecutorService with 10 threads to achieve it.
As soon as any file is copied to another location(#1), I will submit that file for further processing to ExecutorService with 10 threads.
And finally, another action needs to be performed on these files in parallel, like #2 gets input from #1, #3 gets input from #2.
Now, I can use CompletionService here, so I can process the thread results from #1 to #2 and #2 to #3 in the order they are getting completed. CompletableFuture says we can chain asynchronous tasks together which sounds like something I can use in this case.
I am not sure if I should implement my solution with CompletableFuture (since it is relatively new and ought to be better) or if CompletionService is sufficient? And why should I chose one over another in this case?
It would probably be best if you tried both approaches and then choose the one you are more comfortable with. Though it sounds like CompletableFutures are better suited for this task because they make chaining processing steps / stages really easy. For example in your case the code could look like this:
ExecutorService copyingExecutor = ...
// Not clear from the requirements, but let's assume you have
// a separate executor for this
ExecutorService processingExecutor = ...
public CompletableFuture<MyResult> process(Path file) {
return CompletableFuture
.supplyAsync(
() -> {
// Retrieve destination path where file should be copied to
Path destination = ...
try {
Files.copy(file, destination);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
return destination;
},
copyingExecutor
)
.thenApplyAsync(
copiedFile -> {
// Process the copied file
...
},
processingExecutor
)
// This separate stage does not make much sense, so unless you have
// yet another executor for this or this stage is applied at a different
// location in your code, it should probably be merged with the
// previous stage
.thenApply(
previousResult -> {
// Process the previous result
...
}
);
}

Selecting Data from Room without observing

I need to select data from a table, manipulate it and then insert it into another table. This only happens when the app is opened for the first time that day and this isn't going to be used in the UI. I don't want to use LiveData because it doesn't need to be observed but when I was looking into how to do it, most people say I should use LiveData. I've tried using AsyncTask but I get the error "Cannot access database on the main thread since it may potentially....".
Here is the code for my AsyncTask
public class getAllClothesArrayAsyncTask extends AsyncTask<ArrayList<ClothingItem>, Void, ArrayList<ClothingItem>[]> {
private ClothingDao mAsyncDao;
getAllClothesArrayAsyncTask(ClothingDao dao) { mAsyncDao = dao;}
#Override
protected ArrayList<ClothingItem>[] doInBackground(ArrayList<ClothingItem>... arrayLists) {
List<ClothingItem> clothingList = mAsyncDao.getAllClothesArray();
ArrayList<ClothingItem> arrayList = new ArrayList<>(clothingList);
return arrayLists;
}
}
And this is how I'm calling it in my activity
mClothingViewModel = new ViewModelProvider(this).get(ClothingViewModel.class);
clothingItemArray = mClothingViewModel.getClothesArray();
What is the best practice in this situation?
Brief summary:
Room really doesn't allow to do anything (query|insert|update|delete) on main thread. You can switch off this control on RoomDatabaseBuilder, but you better shouldn't.
If you don't care about UI, minimally you can just to put your ROOM-ish code (Runnable) to one of - Thread, Executor, AsyncTask (but it was deprecated last year)... I've put examples below
Best practices in this one-shot operation to DB I think are Coroutines (for those who use Kotlin at projects) and RxJava (for those who use Java, Single|Maybe as a return types). They give much more possibilities but you should invest your time to get the point of these mechanisms.
To observe data stream from Room there are LiveData, Coroutines Flow, RxJava (Flowable).
Several examples of using Thread-switching with lambdas enabled (if you on some purpose don't want to learn more advanced stuff):
Just a Thread
new Thread(() -> {
List<ClothingItem> clothingList = mAsyncDao.getAllClothesArray();
// ... next operations
});
Executor
Executors.newSingleThreadExecutor().submit(() -> {
List<ClothingItem> clothingList = mAsyncDao.getAllClothesArray();
// ... next operations
});
AsyncTask
AsyncTask.execute(() -> {
List<ClothingItem> clothingList = mAsyncDao.getAllClothesArray();
// ... next operations
});
If you use Repository pattern you can put all this thread-switching there
Another useful link to read about life after AsyncTask deprecation

how to run multiple synchronous functions asynchronously?

I am writing in Java on the Vertx framework, and I have an architecture question regarding blocking code.
I have a JsonObject which consists of 10 objects, like so:
{
"system":"CD0",
"system":"CD1",
"system":"CD2",
"system":"CD3",
"system":"CD4",
"system":"CD5",
"system":"CD6",
"system":"CD7",
"system":"CD8",
"system":"CD9"
}
I also have a synchronous function which gets an object from the JsonObject, and consumes a SOAP web service, while sending the object to it.
the SOAP Web service gets the content (e.g. CD0), and after a few seconds returns an Enum.
I then want to take that enum value returned, and save it in some sort of data variable(like hash table).
What I ultimately want is a function that will iterate over all the JsonObject's objects, and for each one, run the blocking code, in parallel.
I want it to run in parallel so even if one of the calls to the function needs to wait 20 seconds, it won't stuck the other calls.
how can I do such a thing in vertx?
p.s: I will appreciate if you will correct mistakes I wrote.
Why not to use rxJava and "zip" separate calls? Vertx has great support for rxJava too. Assuming that you are calling 10 times same method with different String argument and returning another String you could do something like this:
private Single<String> callWs(String arg) {
return Single.fromCallable(() -> {
//DO CALL WS
return "yourResult";
});
}
and then just use it with some array of arguments:
String[] array = new String[10]; //get your arguments
List<Single<String>> wsCalls = new ArrayList<>();
for (String s : array) {
wsCalls.add(callWs(s));
}
Single.zip(wsCalls, r -> r).subscribe(allYourResults -> {
// do whatever you like with resutls
});
More about zip function and reactive programming in general: reactivex.io

Ideas on concurrent datastructure

I am not sure if i can put my question in the clearest fashion but i will try my best.
Lets say i am retrieving some information from a third party api. The retrieved information will be huge in size. To have a performance gain, instead of retrieving all the info in one go, i will be retrieving the info in a paged fashion (the api gives me that facility, basically an iterator). The return type is basically a list of objects.
My aim here is to process the information i have in hand(that includes comparing and storing in db and many other operations) while i get paged response on the request.
My question here to the expert community is , what data structure do you prefer in such case. Also does a framework like spring batch help you in getting performance gains in such cases.
I know the question is a bit vague, but i am looking for general ideas,tips and pointers.
In these cases, the data structure for me is java.util.concurrent.CompletionService.
For purposes of example, I'm going to assume a couple of additional constraints:
You want only one outstanding request to the remote server at a time
You want to process the results in order.
Here goes:
// a class that knows how to update the DB given a page of results
class DatabaseUpdater implements Callable { ... }
// a background thread to do the work
final CompletionService<Object> exec = new ExecutorCompletionService(
Executors.newSingleThreadExecutor());
// first call
List<Object> results = ThirdPartyAPI.getPage( ... );
// Start loading those results to DB on background thread
exec.submit(new DatabaseUpdater(results));
while( you need to ) {
// Another call to remote service
List<Object> results = ThirdPartyAPI.getPage( ... );
// wait for existing work to complete
exec.take();
// send more work to background thread
exec.submit(new DatabaseUpdater(results));
}
// wait for the last task to complete
exec.take();
This just a simple two-thread design. The first thread is responsible for getting data from the remote service and the second is responsible for writing to the database.
Any exceptions thrown by DatabaseUpdater will be propagated to the main thread when the result is taken (via exec.take()).
Good luck.
In terms of doing the actual parallelism, one very useful construct in Java is the ThreadPoolExecutor. A rough sketch of what that might look like is this:
public class YourApp {
class Processor implements Runnable {
Widget toProcess;
public Processor(Widget toProcess) {
this.toProcess = toProcess;
}
public void run() {
// commit the Widget to the DB, etc
}
}
public static void main(String[] args) {
ThreadPoolExecutor executor =
new ThreadPoolExecutor(1, 10, 30,
TimeUnit.SECONDS,
new LinkedBlockingDeque());
while(thereAreStillWidgets()) {
ArrayList<Widget> widgets = doExpensiveDatabaseCall();
for(Widget widget : widgets) {
Processor procesor = new Processor(widget);
executor.execute(processor);
}
}
}
}
But as I said in a comment: calls to an external API are expensive. It's very likely that the best strategy is to pull all the Widget objects down from the API in one call, and then process them in parallel once you've got them. Doing more API calls gives you the overhead of sending the data all the way from the server to you, every time -- it's probably best to pay that cost the fewest number of times that you can.
Also, keep in mind that if you're doing DB operations, it's possible that your DB doesn't allow for parallel writes, so you might get a slowdown there.

Categories

Resources