I'm writing a server end program using Twitter Finagle. I do not use the full Twitter server stack, just the part that enables asynchronous processing (so Future, Function, etc). I want the Future objects to have timeouts, so I wrote this:
Future<String> future = Future.value(some_input).flatMap(time_consuming_function1);
future.get(Duration.apply(5, TimeUnit.SECONDS));
time_consuming_function1 runs for longer than 5 seconds. But future doesn't time out after 5 seconds and it waits till time_consuming_function1 has finished.
I think this is because future.get(timeout) only cares about how long the future took to create, not the whole operation chain. Is there a way to timeout the whole operation chain?
Basically if you call map/flatMap on a satisfied Future, the code is executed immediately.
In your example, you're satisfying your future immediately when you call Future.value(some_input), so flatMap executes the code immediately and the call to get doesn't need to wait for anything. Also, everything is happening in one thread. A more appropriate use would be like this:
import scala.concurrent.ops._
import com.twitter.conversions.time._
import com.twitter.util.{Future,Promise}
val p = new Promise[String]
val longOp = (s: String) => {
val p = new Promise[String]
spawn { Thread.sleep(5000); p.setValue("Received: " + s) }
p
}
val both = p flatMap longOp
both.get(1 second) // p is not complete, so longOp hasn't been called yet, so this will fail
p.setValue("test") // we set p, but we have to wait for longOp to complete
both.get(1 second) // this fails because longOp isn't done
both.get(5 seconds) // this will succeed
Related
SomeLibrary lib = new SomeLibrary();
lib.doSomethingAsync(); // some function from a library I got and what it does is print 1-5 asynchronously
System.out.println("Done");
// output
// Done
// 1
// 2
// 3
// 4
// 5
I want to be clear that I didn't make the doSomethingAsync() function and it's out of my ability to change it. I want to find a way to block this async function and print Done after the numbers 1 to 5 because as you see Done is being instantly printed. Is there a way to do this in Java?
You can use CountDownLatch as follow:
final CountDownLatch wait = new CountDownLatch(1);
SomeLibrary lib = new SomeLibrary(wait);
lib.doSomethingAsync(); // some function from a library I got and what it does is print 1-5 asynchronously
//NOTE in the doSomethingAsync, you must call wait.countDown() before return
wait.await(); //-> it wait in here until wait.countDown() is called.
System.out.println("Done");
In Constructor SomeLibrary :
private CountDownLatch wait;
public ScannerTest(CountDownLatch _wait) {
this.wait = _wait;
}
In method doSomethingAsync():
public void doSomethingAsync(){
//TODO something
...
this.wait.countDown();
return;
}
This is achieved in a couple of ways in standard libraries :-
Completion Callback
Clients can often provider function to be invoked after the async task is complete. This function usually receives some information regarding the work done as it's input.
Future.get()
Async functions return Future for client synchronization. You can read more about them here.
Do check if any of these options are available (perhaps, an overloaded version ?_ in the method you wish to invoke. It is not too uncommon for libraries to include both sync and async version of some business logic so you could search for that too.
I have a use case in my Spring boot application as follows:
I would like to fetch the id field value from the response with the following function:
String id = getIdFromResponse(response);
If I don't get any id in the response, then I check if the id field is present in the request argument with the following function:
String id = getIdFromRequest(request);
As of now, I am invoking them sequentially. But I would like to make these two functions run parallelly, I would like to stop as soon as I get an id from either of them.
I am wondering if there is any way to implement this using streams in Java 8.
You can use something like this:
String id = Stream.<Supplier<String>>of(
() -> getIdFromResponse(response),
() -> getIdFromRequest(request)
)
.parallel()
.map(Supplier::get)
.filter(Objects::nonNull)
.findFirst()
.orElseThrow():
The suppliers are needed, because when you don't use them, then both requests are still executed sequentially.
I also assumed that your methods return null when nothing is found, so I had to filter these values out with .filter(Objects::nonNull).
Depending on your use case, you can replace .orElseThrow() with something different, like .orElse(null)
There is no need to use Stream API as long as there exists a method exactly for this.
ExecutorService::invokeAny(Collection<? extends Callable<T>>)
Executes the given tasks, returning the result of one that has completed successfully (i.e., without throwing an exception), if any do. Upon normal or exceptional return, tasks that have not completed are cancelled.
List<Callable<String>> collection = Arrays.asList(
() -> getIdFromResponse(response),
() -> getIdFromRequest(request)
);
// you want the same number of threads as the size of the collection
ExecutorService executorService = Executors.newFixedThreadPool(collection.size());
String id = executorService.invokeAny(collection);
Three notes:
There is also an overloaded method with timeout throwing TimeoutException if no result is available in time: invokeAny(Collection<? extends Callable<T>>, long, TimeUnit)
You need to handle ExecutionException and InterruptedException from the invokeAny method.
Don't forget to close the service once you are done
If you want to be in full control over when to enable the alternative evaluation, you may use CompletableFuture:
CompletableFuture<String> job
= CompletableFuture.supplyAsync(() -> getIdFromResponse(response));
String id;
try {
id = job.get(300, TimeUnit.MILLISECONDS);
}
catch(TimeoutException ex) {
// did not respond within the specified time, set up alternative
id = job.applyToEither(
CompletableFuture.supplyAsync(() -> getIdFromRequest(request)), s -> s).join();
}
catch(InterruptedException|ExecutionException ex) {
// handle error
}
The second job is only submitted when the first did not complete within the specified time. Then, whichever job responds first will provide the result value.
The problem I am facing is as follows:
I have two observables one is fetching data from network and the other from db. The second one might be empty but the lack of the first one is considered an error. Then if the result from network comes I need to compare it with the latest results from db ( if present ) and if they differ I want to store them ( if the db observable is empty I want to store network results anyway).
Is there any dedicated operator that handles a case like this?
So far I tried a solution with zipWith ( which is not working as expected if db is empty ), buffer ( which is working but is far from ideal ),
and I also tried flatmapping ( which requires additional casting in the subscriber ).
Below is the solution with buffer.
Observable.concat(ratesFromNetwork(), latestRatesFromDB())
.buffer(3000, 2)
.filter(buffer -> !(buffer.size() == 2 && !buffer.get(0).differentThan(buffer.get(1))))
.map(buffer -> buffer.get(0))
.subscribe(this::save,
(ex) -> System.out.println(ex.getMessage()),
() -> System.out.println("completed"));
If I modify latestRatesFromDb so that it is not returning Observable but and Optional instead the whole problem becomes trivial because I can filter using this result. It seams that there is no way to filter in an asynchronous way ( or did I miss something ?)
Okay, here is how I would go about writing this.
Firstly, whatever class has the differentThan function should be changed to override equals instead. Otherwise you can't use a lot of basic methods with these objects.
For the purpose of this example I wrote all the observables using the Integer class as my type parameter. I then use a scheduler to write two mock methods:
static Observable<Integer> ratesFromNetwork(Scheduler scheduler) {
return Observable.<Integer>create(sub -> {
sub.onNext(2);
sub.onCompleted();
}).delay(99, TimeUnit.MILLISECONDS, scheduler);
}
static Observable<Integer> latestRatesFromDB(Scheduler scheduler) {
return Observable.<Integer>create(sub -> {
sub.onNext(1);
sub.onCompleted();
}).delay(99, TimeUnit.MILLISECONDS, scheduler);
}
As you can see both are similar, however, they will emit different values.
lack of the first one is considered an error
The best way to achieve this is to use a timeout. You can log the error immediately here and continue:
final Observable<Integer> networkRate = ratesFromNetwork(scheduler)
.timeout(networkTimeOut, TimeUnit.MILLISECONDS, scheduler)
.doOnError(e -> System.err.println("Failed to get rates from network."));
When the timeout fails an error will be thrown by rx. doOnError will give you a better idea of where this error started and let it propagate through the rest of the sequence.
The second one might be empty
In this case I would do a similar strategy, however, do not let the error propagate by using the method onErrorResumeNext. Now you can make sure the observable emits at least one value by using firstOrDefault. In this method use some dummy value that you expect to never match with the network results.
final Observable<Integer> databaseRate = latestRatesFromDB(scheduler)
.timeout(databaseTimeOut, TimeUnit.MILLISECONDS, scheduler)
.doOnError(e -> System.err.println("Failed to get rates from database"))
.onErrorResumeNext(Observable.empty())
.firstOrDefault(-1);
Now by using the distinct method you can grab a value only when it is different than the one that came before it (which is why you need to override equals).
databaseRate.concatWith(networkRate).distinct().skip(1)
.subscribe(i -> System.out.println("Updating to " + i),
System.err::println,
() -> System.out.println("completed"));
Here the database rate was placed before the network rate to take advantage of distinct. a skip is then added to always ignore the database rate value.
Complete Code:
final long networkTimeOut = 100;
final long databaseTimeOut = 100;
final TestScheduler scheduler = new TestScheduler();
final Observable<Integer> networkRate = ratesFromNetwork(scheduler)
.timeout(networkTimeOut, TimeUnit.MILLISECONDS, scheduler)
.doOnError(e -> System.err.println("Failed to get rates from network."));
final Observable<Integer> databaseRate = latestRatesFromDB(scheduler)
.timeout(databaseTimeOut, TimeUnit.MILLISECONDS, scheduler)
.doOnError(e -> System.err.println("Failed to get rates from database"))
.onErrorResumeNext(Observable.empty())
.firstOrDefault(-1);
databaseRate.concatWith(networkRate).distinct().skip(1)
.subscribe(i -> System.out.println("Updating to " + i),
System.err::println,
() -> System.out.println("completed"));
scheduler.advanceTimeBy(200, TimeUnit.MILLISECONDS);
When networkTimeOut and databaseTimeOut are greater than 100 it prints:
Updating to 2
completed
When networkTimeOut is less than 100 it prints:
Failed to get rates from network.
java.util.concurrent.TimeoutException
When databaseTimeOut is less than 100 it prints:
Failed to get rates from database
Updating to 2
completed
And if you modify latestRatesFromDB and ratesFromNetwork to return the same value, it simply prints:
completed
And if you don't care about forcing timeouts or logging then it boils down to:
latestRatesFromDB().firstOrDefault(dummyValue)
.concatWith(ratesFromNetwork())
.distinct().skip(1)
.subscribe(this::save,
System.err::println,
() -> System.out.println("completed"));
I am writing a parser for a website , it has many pages (I call them IndexPages) . Each page has a lot of links (about 300 to 400 links in an IndexPage). I use Java's ExecutorService to invoke 12 Callables concurrently in one IndexPage. Each Callable just fire a http request to one link and do some parsing and db storing actions. When first IndexPage finished , program progresses to second IndexPage , until no next IndexPage found.
When running , it seems OK , I can observe the threads working/scheduling well. Each link's parsing/storing just takes about 1 to 2 seconds.
But as time goes by , I observed each Callable(parsing/storing) takes longer and longer. Take this picture for example , sometimes it takes 10 or more seconds to finish a Callable (The green bar is RUNNING , the purple bar is WAITING). And my PC is bogging down , everything becomes sluggish.
This is my main algorithm :
ExecutorService executorService = Executors.newFixedThreadPool(12);
String indexUrl = // Set initial (1st page) IndexPage
while(true)
{
String nextPage = // parse next page in the indexUrl
Set<Callable<Void>> callables = new HashSet<>();
for(String url : getUrls(indexUrl))
{
Callable callable = new ParserCallable(url , … and some DAOs);
callables.add(callable);
}
try {
executorService.invokeAll(callables);
} catch (InterruptedException e) {
e.printStackTrace();
}
if (nextPage == null)
break;
indexUrl = nextPage;
} // true
executorService.shutdown();
The algorithm is simple and self-explanatory. I wonder what may cause such situation ? Anyway to prevent such performance degradation ?
The CPU/Memory/Heap shows reasonable usage.
Environments , FYI.
==================== updated ====================
I've change my implementations from ExecutorService to ForkJoinPool :
ForkJoinPool pool=new ForkJoinPool(12);
String indexUrl = // Set initial (1st page) IndexPage
while(true)
{
Set<Callable<Void>> callables = new HashSet<>();
for(String url : for(String url : getUrls(indexUrl)))
{
Callable callable = new ParserCallable(url , DAOs...);
callables.add(callable);
}
pool.invokeAll(callables);
String nextPage = // parse next page in this indexUrl
if (nextPage == null)
break;
indexUrl = nextPage;
} // true
It takes longer than ExecutorService's solution. ExecutorService takes about 2 hours to finish all pages , while ForkJoinPool takes 3 hours , and each Callable still takes longer and longer time to complete (from 1 sec to 5,6 or even 10 seconds). I don't mind it takes longer , I just hope it takes constant time (not longer and longer) to finish a job .
I am wondering if I create a lot of (non-thread-safe) GregorianCalendar , Date and SimpleDateFormat objects in the parser and cause some thread issue. But I didn't reuse these objects or pass them among threads. So I still cannot find the reason.
Based on the heap you have a memory issue. ExecutorService.invokeAll collects all of the results of the Callable instances into a List and returns that List when they all complete. You may want to consider simply calling ExecutorService.submit since you don't seem to care about the results of each Callable.
I can't see why there is need of Callable to parse your index pages since your 'Caller' method does not expect any result from ParserCallable. I could see you would need to bit Exception handling,but still it can be managed with Runnable.
When you use Callable.call() it would return FutureTask back ,which is never used.
You should be able to improve implementation by using Runnable which could avoid this additional operation
ExecutorService executor = Executors.newFixedThreadPool(12);
for(String url : getUrls(indexUrl)) {
Runnable worker = new ParserRunnable(url , … and some DAOs);
executor.execute(worker);
}
class ParserRunnable implements Runnable{
}
As I understand it, if you have 40 pages, each with ~300 URLs, you will create ~12,000 Callables? While that it probably not too many Callables, it is a lot of HTTPConnections and Database Connections.
I think you should try using one Callable per page. You'll still gain a ton by running them in parallel. I don't know what you are using for the HTTP request, but you might be able to reuse system resources there instead of opening and closing 12,000 of them.
And especially for the DB. You'll have just 40 connections. You might even be able to be super efficient by collecting the ~300 records locally, then using a batch update.
The only model that I can come up with for running multiple similar processes (SIMD) using
Java Futures (java.util.concurrent.Future<T>) is as follows:
class Job extends Callable<T> {
public T call() {
// ...
}
}
List<Job> jobs = // ...
List<Future<T>> futures = ExecutorService.invokeAll(jobs);
for (Future<T> future : futures) {
T t = future.get();
// Do something with t ...
}
The problem with this model is that if job 0 takes a long time to complete, but jobs 1, 2, and 3 have already completed, the for loop will wait to get the return value from job 0.
Is there any model that allows me to get each Future result as it becomes available without just calling Future.isDone() and busy waiting (or calling Thread.sleep()) if none are ready yet?
You can try out the ExecutorCompletionService:
http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/ExecutorCompletionService.html
You would simply submit your tasks and call take until you've received all Futures.
Consider using ListenableFuture from Guava. They let you basically add a continuation to execute when the future has completed.
Why don't you add what you want done to the job?
class Job extends Runnable {
public void run() {
// ...
T result = ....
// do something with the result.
}
}
That way it will process the result as soon as it is available, concurrently. ;)
A CompletionService can be polled for available results.
If all you want is the results as they become available however, we wrote an AsyncCompleter that abstracts away the detail of completion service usage. It lets you submit an Iterable<Callable<T>> of jobs and returns an Iterable<T> of results that blocks on next() and returns the results in completion order.