I'm parallelizing a quite complex program to get it faster. For this I use most of the time the ExecutorService. Until now it worked pretty well, but then I noticed that just one line of code makes my program run half as fast as it could. It's the line with exactScore.get().
I don't know why, but it sometimes needs more that 0.1 s just to get the double value of the Future Object.
Why is this? How can I handle it that it runs faster? Is there a way to write directly in the Double[] while multithreading?
Thanks
int processors = Runtime.getRuntime().availableProcessors();
ExecutorService service = Executors.newFixedThreadPool(processors);
// initialize output
Double[] presortedExScores = new Double[sortedHeuScores.length];
for(int i =0; i < sortedHeuScores.length; i++ ){
final int index = i;
final Collection<MolecularFormula> formulas_for_exact_method = multimap.get(sortedHeuScores[i]);
for (final MolecularFormula formula : formulas_for_exact_method){
Future<Double> exactScore = service.submit(new Callable<Double>() {
#Override
public Double call() throws Exception {
return getScore(computeTreeExactly(computeGraph(formula)));
}
});
presortedExScores[index] = exactScore.get();
}
}
That is to be expected. It isn't "slower" then; it is just doing its job.
From the javadoc for get():
Waits if necessary for the computation to complete, and then retrieves its result.
Long story short: it seems that you do not understand the concepts you are using in your code. The idea of a Future is that it does things at some point in the future.
And by calling get() you express: I don't mind waiting now until the results of that computation "behind" that Future become available.
Thus: you have to step back and look into your code again; to understand how your different "threads of activity" really work; and how/when they come back together.
One idea that comes to mind: right now, you you are creating your Future objects in a loop; and directly after you created the Future, you call get() on it. That completely contradicts the idea of creating multiple Futures. In other words: instead of going:
foreach X
create future X.i
wait/get future X.i
you could do something like
foreach X
create future X.i
foreach X
wait/get for future X.i
In other words: allow your futures to really do things in parallel; instead of enforcing sequential processing.
If that doesn't help "enough", then as said: you have to look at your overall design, and determine if there are ways to further "pull apart" things. Right now all activity happens "closely" together; and surprise: when you do a lot of work at the same time, that takes time. But as you might guess: such a re-design could be a lot of work; and is close to impossible without knowing more about your problem/code base.
A more sophisticated approach would be that you write code where some each Future has a way of expressing "I am done" - then you would "only" start all Futures; and wait until the last one comes back. But as said; I can't design a full solution for you here.
The other really important take-away here: don't just blindly use some code that "happens" to work. One essence of programming is to understand each and any concept used in your source code. You should have a pretty good idea what your code is doing before running it and finding "oh, that get() makes things slow".
Related
I have two snippets of code which are technically same, but the second one takes 1 sec extra then the first one. The first one executes in 6 sec and the second in 7.
Double yearlyEarnings = employmentService.getYearlyEarningForUserWithEmployer(userId, emp.getId());
CompletableFuture<Double> earlyEarningsInHomeCountryCF = currencyConvCF.thenApplyAsync(currencyConv -> {
return currencyConv * yearlyEarnings;
});
The above one takes 6s and the next takes 7s
Here is the link to code
CompletableFuture<Double> earlyEarningsInHomeCountryCF = currencyConvCF.thenApplyAsync(currencyConv -> {
Double yearlyEarnings = employmentService.getYearlyEarningForUserWithEmployer(userId, emp.getId());
return currencyConv * yearlyEarnings;
});
Please explain why the second code consistently takes 1s more (extra time) as compared to the first one
Below is the signature of the method getYearlyEarningForUserWithEmployer. Just sharing, but it should not have any affect
Double getYearlyEarningForUserWithEmployer(long userId, long employerId);
Here is the link to code
Your question is horribly incomplete, but from what we can guess, it’s entirely plausible that the second variant takes longer, if we assume that currencyConvCF represents an asynchronous operation which might be running concurrently while your code fragments are executed and you’re talking about the overall time it takes to complete all operations, including the one represented by the CompletableFuture returned by thenApplyAsync (earlyEarningsInHomeCountryCF).
In the first variant you are invoking getYearlyEarningForUserWithEmployer while the operation represented by currencyConvCF might be still running concurrently. The multiplication will happen when both operations completed.
In the second variant, the getYearlyEarningForUserWithEmployer invocation is part of the operation passed to currencyConvCF.thenApplyAsync, thus it will not start before the operation represented by currencyConvCF has been completed, so no operation will run concurrently. If we assume that getYearlyEarningForUserWithEmployer takes a significant time, say one second, and has no internal dependencies to the other operation, it’s not surprising when the overall operation takes longer in that variant.
It seems, what you actually want to do is something like:
CompletableFuture<Double> earlyEarningsInHomeCountryCF = currencyConvCF.thenCombineAsync(
CompletableFuture.supplyAsync(
() -> employmentService.getYearlyEarningForUserWithEmployer(userId, emp.getId())),
(currencyConv, yearlyEarnings) -> currencyConv * yearlyEarnings);
so getYearlyEarningForUserWithEmployer is not executed sequentially in the initiating thread but both source operations can run asynchronously before the final multiplication applies.
However, when you are invoking get right afterwards in the initiating thread, like in your linked code on github, that asynchronous processing of the second operation has no benefit. Instead of waiting for the completion, your initiating thread can just perform the independent operation as the second code variant of your question already does and you will likely be even faster when not spawning an asynchronous operation for something as simple as a single multiplication, i.e. use instead:
CompletableFuture<Double> currencyConvCF = /* a true asynchronous operation */
return employmentService.getYearlyEarningForUserWithEmployer(userId, emp.getId())
* employerCurrencyCF.join();
What ever Holger said does make sense, but not in the problem I posted. I do agree that the question is not written in the best way.
The problem was that the order in which the futures were written were causing a consistent increase in time.
Ideally the order of the future should not matter as long as the code is written in a correct reactive fashion
The reason of the problem was the default ForkJoinPool of Java and Java uses this pool by default to run all the CompletableFutures. If I run all the CompletableFutues with a custom pool, I get almost the same time, irrespective of the order in which the future statements were written.
I still need to find what are the limitation of ForkJoinPool and find why my custom pool of 20 threads performs better.
I ll update my answer when I find the right reason.
I have an ExecutorService that I use to multithread the processing of the some text, and the unit of work to be applied to a given chunk of text is defined as my ParserCallable which returns a Payload.
So I have a
List> list;
which is populated by handing off chunks of work to each ParserCallable.
I want to debug ParserCallable in this multithreaded environment, and I don't have any concern for which one, but I'm not sure how I can step into the execution of any ParserCallable's from my code.
Which looks something like
List<Future<PayLoad>> list = new ArrayList<Future<PayLoad>>();
ExecutorService executor = Executors.newFixedThreadPool(25);
for (int i = 0; i < blocks_of_work.size(); i++) {
Callable<PayLoad> worker = new ParserCallable(blocks_of_work.get(i));
Future<PayLoad> submit = executor.submit(worker);
list.add(submit);
}
How I can debug any given ParserCallable in order to troubleshoot some errors I'm getting? Based on my code I'm not sure how to step into one of these callables.
You need to put a break point in the callable itself.
From a computer science theoretic standpoint, it's possible to enable smooth debugging (after all why does your language care whether the operation was sync or async?) but I don't believe this is supported in any mainstream language, like Java, yet.
I essentially have a Future<List<T>> that is fetched in batches from the server. For some clients I'd like to provide incremental results while it loads in addition to the whole collection when future is fulfilled.
Is there a common Future extension defined somewhere for this? What are typical patterns/combinators exist for such futures?
I assume that given IncrementalListFuture<T> I can easily define map operation. What else comes to your mind?
Is there a common Future extension defined somewhere for this?
I assume you are talking about incremental results from an ExecutorService. You should consider using an ExecutorCompletionService which allows you to be informed as soon as one of the Future objects is get-able.
To quote from the javadocs:
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(e);
for (Callable<Result> s : solvers) {
ecs.submit(s);
}
int n = solvers.size();
for (int i = 0; i < n; ++i) {
// this waits for one of the futures to finish and provide a result
Future<Result> future = ecs.take();
Result result = future.get();
if (result != null) {
// do something with the result
}
}
Sorry. I initially misread the question and thought that you were asking about a List<Future<?>>. It may be that you could refactor your code to actually return a number of Futures so I'll leave this for posterity.
I would not pass back the list in this case in a Future. You aren't going to be able to get the return until the job finishes.
If possible, I would pass in some sort of BlockingQueue so both the caller and the thread can access it:
final BlockingQueue<T> queue = new LinkedBlockingQueue<T>();
// build out job with the queue
threadPool.submit(new SomeJob(queue));
threadPool.shutdown();
// now we can consume from the queue as it is built:
while (true) {
T result = queue.take();
// you could some constant result object to mean that the job finished
if (result == SOME_END_OBJECT) {
break;
}
// provide intermediate results
}
You could also have some sort of SomeJob.take() method which calls through to a BlockingQueue defined inside of your job class.
// the blocking queue in this case is hidden inside your job object
T result = someJob.take();
...
Here's what I would do:
In the thread that populates the List, make it thread-safe by wrapping the list using Collections.synchronizedList
Make the list publically available, but not modifiable by adding a public method to the thread which returns the list, but wrapped by Collections.unmodifiableList
Instead of giving clients a Future>, give them a handle to the thread, or some kind of wrapper of it, so that they can call the public method above.
Alternatively, as Gray has suggested, BlockingQueues are great for thread coordination like this. This may require more changes to your client code, however.
To answer my own question: there has been lots of development in this area recently. Among most used are: Play iteratees (http://www.playframework.org/documentation/2.0/Iteratees) and Rx for .NET (http://msdn.microsoft.com/en-us/data/gg577609.aspx)
Instead of Future they define something like:
interface Observable<T> {
Disposable subscribe(Observer<T> observer);
}
interface Observer<T> {
void onCompleted();
void onError(Exception error);
void onNext(T value);
}
and lots of combinators.
Alternatively to Observables you can take a look at twitter's approach.
They use Spool, which is an asynchronous version of the Stream.
Basically it is a simple trait similar to the List
trait Spool[+A] {
def head: A
/**
* The (deferred) tail of the spool. Invalid for empty spools.
*/
def tail: Future[Spool[A]]
}
that allows you to do functional stuff like map, filter and foreach on top of it.
Future is really designed to return a single (atomic) result, not for communicating intermediate results in this manner. What you will really want to do is to use multiple futures, one per batch.
We have a similar requirement where we have a bunch of things that we need to get from different remote servers, and each will come return at different times. We don't want to wait until the last one has returned, but rather process them in the order they return. For this we created the AsyncCompleter which takes an Iterable<Callable<T>> and returns an Iterable<T> that blocks on iteration, completely abstracting usage of the Future interface.
If you look at how that class is implemented, you'll see how to use a CompletionService to receive results from an Executor in the order in which they become available, if you need to build this for yourself.
edit: just saw that the second half of Gray's answer is similar, basically using an ExecutorCompletionService
Is it clearer to sleep near a function call in a loop or in the function call itself? Personally I lean toward sleeping near the call rather than in the call, because there is nothing about "getApple()" that implies that it should sleep for some amount of time before returning the apple. I think it would be clearer to have:
for ( int i = 0; i < 10; ++i ) {
getApple();
sleep()
}
than...
for ( int i = 0; i < 10; ++i ) {
getApple();
}
Apple getApple() {
sleep(1);
return new Apple();
}
Of course, this would be different if the method were getAppleSlowly() or something.
Please let me know what you think.
Some additional information (also in comments below, see comments):
The wait is not required to get an apple. The wait is to avoid a rate limit on queries per minute to an API, but it is not necessary to sleep if you're only getting one apple. The sleep-in-get way makes it impossible to get an apple without sleeping, even if it is unnecessary. However, it has the benefit of making sure that no matter what, methods can call it without worrying about going over the rate limit. But that seems like an argument for renaming to getAppleSlowly().
The ideal is for one method to do one thing. Since getting apples and sleeping are two different things, I agree with you that it'd be better to have them be two different methods.
I would not put the sleep into the function itself as long as the function name does not suggest that there will be a delay.
There might be other callers to the function in the future who do not expect it to sleep.
I think this is a fair question, and it's very bad to put a sleep inside of a method where you might not know it's there (imagine trying to debug the slowness of your application in a few months when you have forgotten what you have done. The sleep should be only where you understand why it's sleeping (and presumably you have a good reason for that).
This is a very interesting question and I think that both of your solutions are somewhat flawed. The "getApple(); sleep();" solution suffers from forcing each getApple() to pause before processing even if we will never again do a getApple(). The "sleep(); return new Apple();" solution has a similar overhead on the first Apple we get. The optimal solution is something like.
for ( int i = 0; i < 10; ++i ) {
getApple();
}
Apple getApple() {
long sleepTime = needToSleep();
if ( sleepTime > 0 ) {
sleep(sleepTime);
}
return new Apple();
}
/**
* Checks if last query was made less than THRESHOLD ago and
* returns the difference in millis that we need to sleep.
*/
long needToSleep() {
return ( lastQueryInMillis + THRESHOLD ) - System.currentTimeInMillis();
}
I would be inclined to get the whole "how long do I sleep to avoid the API throttle" thing behind some interface and make some other class wholly responsible for enforcing it.
I am doing some Java performance comparison between my classes, and wondering if there is some sort of Java Performance Framework to make writing performance measurement code easier?
I.e, what I am doing now is trying to measure what effect does it have having a method as "synchronized" as in PseudoRandomUsingSynch.nextInt() compared to using an AtomicInteger as my "synchronizer".
So I am trying to measure how long it takes to generate random integers using 3 threads accessing a synchronized method looping for say 10000 times.
I am sure there is a much better way doing this. Can you please enlighten me? :)
public static void main( String [] args ) throws InterruptedException, ExecutionException {
PseudoRandomUsingSynch rand1 = new PseudoRandomUsingSynch((int)System.currentTimeMillis());
int n = 3;
ExecutorService execService = Executors.newFixedThreadPool(n);
long timeBefore = System.currentTimeMillis();
for(int idx=0; idx<100000; ++idx) {
Future<Integer> future = execService.submit(rand1);
Future<Integer> future1 = execService.submit(rand1);
Future<Integer> future2 = execService.submit(rand1);
int random1 = future.get();
int random2 = future1.get();
int random3 = future2.get();
}
long timeAfter = System.currentTimeMillis();
long elapsed = timeAfter - timeBefore;
out.println("elapsed:" + elapsed);
}
the class
public class PseudoRandomUsingSynch implements Callable<Integer> {
private int seed;
public PseudoRandomUsingSynch(int s) { seed = s; }
public synchronized int nextInt(int n) {
byte [] s = DonsUtil.intToByteArray(seed);
SecureRandom secureRandom = new SecureRandom(s);
return ( secureRandom.nextInt() % n );
}
#Override
public Integer call() throws Exception {
return nextInt((int)System.currentTimeMillis());
}
}
Regards
Ignoring the question of whether a microbenchmark is useful in your case (Stephen C' s points are very valid), I would point out:
Firstly, don't listen to people who say 'it's not that hard'. Yes, microbenchmarking on a virtual machine with JIT compilation is difficult. It's actually really difficult to get meaningful and useful figures out of a microbenchmark, and anyone who claims it's not hard is either a supergenius or doing it wrong. :)
Secondly, yes, there are a few such frameworks around. One worth looking at (thought it's in very early pre-release stage) is Caliper, by Kevin Bourrillion and Jesse Wilson of Google. Looks really impressive from a few early looks at it.
More micro-benchmarking advice - micro benchmarks rarely tell you what you really need to know ... which is how fast a real application is going to run.
In your case, I imagine you are trying to figure out if your application will perform better using an Atomic object than using synchronized ... or vice versa. And the real answer is that it most likely depends on factors that a micro-benchmark cannot measure. Things like the probability of contention, how long locks are held, the number of threads and processors, and the amount of extra algorithmic work needed to make atomic update a viable solution.
EDIT - in response to this question.
so is there a way i can measure all these probability of contention, locks held duration, etc ?
In theory yes. Once you have implemented the entire application, it is possible to instrument it to measure these things. But that doesn't give you your answer either, because there isn't a predictive model you can plug these numbers into to give the answer. And besides, you've already implemented the application by then.
But my point was not that measuring these factors allows you to predict performance. (It doesn't!) Rather, it was that a micro-benchmark does not allow you to predict performance either.
In reality, the best approach is to implement the application according to your intuition, and then use profiling as the basis for figuring out where the real performance problems are.
OpenJDK guys have developed a benchmarking tool called JMH:
http://openjdk.java.net/projects/code-tools/jmh/
This provides quite an easy to setup framework, and there is a couple of samples showing how to use that.
http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/
Nothing can prevent you from writing the benchmark wrong, but they did a great job at eliminating the non-obvious mistakes (such as false sharing between threads, preventing dead code elimination etc).
These guys designed a good JVM measurement methodology so you won't fool yourself with bogus numbers, and then published it as a Python script so you can re-use their smarts -
Statistically Rigorous Java Performance Evaluation (pdf paper)
You probably want to move the loop into the task. As it is you just start all the threads and almost immediately you're back to single threaded.
Usual microbenchmarking advice: Allow for some warm up. As well as average, deviation is interesting. Use System.nanoTime instead of System.currentTimeMillis.
Specific to this problem is how much the threads fight. With a large number of contending threads, cas loops can perform wasted work. Creating a SecureRandom is probably expensive, and so might System.currentTimeMillis to a lesser extent. I believe SecureRandom should already be thread safe, if used correctly.
In short, you are thus searching for an "Java unit performance testing tool"?
Use JUnitPerf.
Update: for the case it's not clear yet: it also supports concurrent (multithreading) testing. Here's an extract of the chapter "LoadTest" of the aforementioned link which includes a code sample:
For example, to create a load test of
10 concurrent users with each user
running the
ExampleTestCase.testOneSecondResponse()
method for 20 iterations, and with a 1
second delay between the addition of
users, use:
int users = 10;
int iterations = 20;
Timer timer = new ConstantTimer(1000);
Test testCase = new ExampleTestCase("testOneSecondResponse");
Test repeatedTest = new RepeatedTest(testCase, iterations);
Test loadTest = new LoadTest(repeatedTest, users, timer);