Java ExecutorService to solve Recursive Fibonacci Series

Java ExecutorService to solve Recursive Fibonacci Series - java

I need to find out the number based on some index in the Fibonacci Series recursively using threads and I tried the following code, but the program never ends. Please let me know if I am missing something.
Code:
import java.math.BigInteger;
import java.util.concurrent.*;
public class MultiThreadedFib {
private ExecutorService executorService;
public MultiThreadedFib(final int numberOfThreads) {
executorService = Executors.newFixedThreadPool(numberOfThreads);
}
public BigInteger getFibNumberAtIndex(final int index)
throws InterruptedException, ExecutionException {
Future<BigInteger> indexMinusOne = executorService.submit(
new Callable<BigInteger>() {
public BigInteger call()
throws InterruptedException, ExecutionException {
return getNumber(index - 1);
}
});
Future<BigInteger> indexMinusTwo = executorService.submit(
new Callable<BigInteger>() {
public BigInteger call()
throws InterruptedException, ExecutionException {
return getNumber(index - 2);
}
});
return indexMinusOne.get().add(indexMinusTwo.get());
}
public BigInteger getNumber(final int index)
throws InterruptedException, ExecutionException {
if (index == 0 || index == 1)
return BigInteger.valueOf(index);
return getFibNumberAtIndex(index - 1).add(getFibNumberAtIndex(index - 2));
}
}
Fixed it (Thanks to fiver)
Instead of calling getNumber(int) from the call method, I am calling to a dynamic programming algorithm that computes the number at that index.
The code for that is:
public class DynamicFib implements IFib {
private Map<Integer, BigInteger> memoize = new HashMap<Integer, BigInteger>();
public DynamicFib() {
memoize.put(0, BigInteger.ZERO);
memoize.put(1, BigInteger.ONE);
}
public BigInteger getFibNumberAtIndex(final int index) {
if (!memoize.containsKey(index))
memoize.put(index, getFibNumberAtIndex(index - 1).add(getFibNumberAtIndex(index - 2)));
return memoize.get(index);
}
}

This recursion will overflow the stack very fast. This is because you are computing lower fibonacci numbers over and over again - exponentially many number of times.
One effective way to avoid that is to use memoized recursion (a dynamic programming approach)
Basically use a static array to hold the already computed fibonacci numbers and whenever you need one, take it from the array, if it's already computed. If not, then compute it and store it in the array. This way each number will be computed only once.
(You can use other data structure instead of array, of course, i.e. hashtable)

What you are doing is replacing simple recursion with recursion via threads / tasks.
Until you get to the fib(0) and fib(1) cases, each task submits two more tasks, and then waits for them to complete. While it is waiting, it is still using a thread. Since the thread pool is bounded, you soon get to the point where calls to submit block ... and the whole computation locks up.
In addition to that, you've got a bug in indexMinusTwo which would result in the computation giving the wrong answer.
But still the recursive multithreaded procedure takes much longer than the memoized recursive non-multithreaded one.. any tip to improve performance?
Even assuming that you "fixed" the above problem (e.g. by using an unbounded thread pool) there is no way that you will be able to do a multi-threaded version of fibonacci that performs better than a single-threaded version that uses memoization. The computation is simply not suited to parallelization.

Threads work best when you have independant tasks to perform. The fibonacci series by definition does not have any degrees of parallelism. Each f(n) depends on the previous two values. As such it is not possible to calculate f(n) faster using multiple threads than using one (unless you have an inefficient algo)
The only thing you could make parallel potentially the + operation for large numbers, however this is likely to be a) complex b) difficult to make faster than the single threaded solution.
The fastest/simplest way to calculate fibonacci numbers is to use a loop in one thread. You don't need to use recusrion or memorize values.

Related

Massive tasks alternative pattern for Runnable or Callable

For massive parallel computing I tend to use executors and callables. When I have thousand of objects to be computed I feel not so good to instantiate thousand of Runnables for each object.
So I have two approaches to solve this:
I. Split the workload into a small amount of x-workers giving y-objects each. (splitting the object list into x-partitions with y/x-size each)
public static <V> List<List<V>> partitions(List<V> list, int chunks) {
final ArrayList<List<V>> lists = new ArrayList<List<V>>();
final int size = Math.max(1, list.size() / chunks + 1);
final int listSize = list.size();
for (int i = 0; i <= chunks; i++) {
final List<V> vs = list.subList(Math.min(listSize, i * size), Math.min(listSize, i * size + size));
if(vs.size() == 0) break;
lists.add(vs);
}
return lists;
}
II. Creating x-workers which fetch objects from a queue.
Questions:
Is creating thousand of Runnables really expensive and to be avoided?
Is there a generic pattern/recommendation how to do it by solution II?
Are you aware of a different approach?

Creating thousands of Runnable (objects implementing Runnable) is not more expensive than creating a normal object.
Creating and running thousands of Threads can be very heavy, but you can use Executors with a pool of threads to solve this problem.

As for the different approach, you might be interested in java 8's parallel streams.

Combining various answers here :
Is creating thousand of Runnables really expensive and to be avoided?
No, it's not in and of itself. It's how you will make them execute that may prove costly (spawning a few thousand threads certainly has its cost).
So you would not want to do this :
List<Computation> computations = ...
List<Thread> threads = new ArrayList<>();
for (Computation computation : computations) {
Thread thread = new Thread(new Computation(computation));
threads.add(thread);
thread.start();
}
// If you need to wait for completion:
for (Thread t : threads) {
t.join();
}
Because it would 1) be unnecessarily costly in terms of OS ressource (native threads, each having a stack on the heap), 2) spam the OS scheduler with a vastly concurrent workload, most certainly leading to plenty of context switchs and associated cache invalidations at the CPU level 3) be a nightmare to catch and deal with exceptions (your threads should probably define an Uncaught exception handler, and you'd have to deal with it manually).
You'd probably prefer an approach where a finite Thread pool (of a few threads, "a few" being closely related to your number of CPU cores) handles many many Callables.
List<Computation> computations = ...
ExecutorService pool = Executors.newFixedSizeThreadPool(someNumber)
List<Future<Result>> results = new ArrayList<>();
for (Computation computation : computations) {
results.add(pool.submit(new ComputationCallable(computation));
}
for (Future<Result> result : results {
doSomething(result.get);
}
The fact that you reuse a limited number threads should yield a really nice improvement.
Is there a generic pattern/recommendation how to do it by solution II?
There are. First, your partition code (getting from a List to a List<List>) can be found inside collection tools such as Guava, with more generic and fail-proofed implementations.
But more than this, two patterns come to mind for what you are achieving :
Use the Fork/Join Pool with Fork/Join tasks (that is, spawn a task with your whole list of items, and each task will fork sub tasks with half of that list, up to the point where each task manages a small enough list of items). It's divide and conquer. See: http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ForkJoinTask.html
If your computation were to be "add integers from a list", it could look like (there might be a boundary bug in there, I did not really check) :
public static class Adder extends RecursiveTask<Integer> {
protected List<Integer> globalList;
protected int start;
protected int stop;
public Adder(List<Integer> globalList, int start, int stop) {
super();
this.globalList = globalList;
this.start = start;
this.stop = stop;
System.out.println("Creating for " + start + " => " + stop);
}
#Override
protected Integer compute() {
if (stop - start > 1000) {
// Too many arguments, we split the list
Adder subTask1 = new Adder(globalList, start, start + (stop-start)/2);
Adder subTask2 = new Adder(globalList, start + (stop-start)/2, stop);
subTask2.fork();
return subTask1.compute() + subTask2.join();
} else {
// Manageable size of arguments, we deal in place
int result = 0;
for(int i = start; i < stop; i++) {
result +=i;
}
return result;
}
}
}
public void doWork() throws Exception {
List<Integer> computation = new ArrayList<>();
for(int i = 0; i < 10000; i++) {
computation.add(i);
}
ForkJoinPool pool = new ForkJoinPool();
RecursiveTask<Integer> masterTask = new Adder(computation, 0, computation.size());
Future<Integer> future = pool.submit(masterTask);
System.out.println(future.get());
}
Use Java 8 parallel streams in order to launch multiple parallel computations easily (under the hood, Java parallel streams can fall back to the Fork/Join pool actually).
Others have shown how this might look like.
Are you aware of a different approach?
For a different take at concurrent programming (without explicit task / thread handling), have a look at the actor pattern. https://en.wikipedia.org/wiki/Actor_model
Akka comes to mind as a popular implementation of this pattern...

#Aaron is right, you should take a look into Java 8's parallel streams:
void processInParallel(List<V> list) {
list.parallelStream().forEach(item -> {
// do something
});
}
If you need to specify chunks, you could use a ForkJoinPool as described here:
void processInParallel(List<V> list, int chunks) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> {
// do something with each item
});
});
}
You could also have a functional interface as an argument:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> processor.accept(item));
});
}
Or in shorthand notation:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
new ForkJoinPool(chunks).submit(() -> list.parallelStream().forEach(processor::accept));
}
And then you would use it like:
processInParallel(myList, 2, item -> {
// do something with each item
});
Depending on your needs, the ForkJoinPool#submit() returns an instance of ForkJoinTask, which is a Future and you may use it to check for the status or wait for the end of your task.
You'd most probably want the ForkJoinPool instantiated only once (not instantiate it on every method call) and then reuse it to prevent CPU choking if the method is called multiple times.

Is creating thousand of Runnables really expensive and to be avoided?
Not at all, the runnable/callable interfaces have only one method to implement each, and the amount of "extra" code in each task depends on the code you are running. But certainly no fault of the Runnable/Callable interfaces.
Is there a generic pattern/recommendation how to do it by solution II?
Pattern 2 is more favorable than pattern 1. This is because pattern 1 assumes that each worker will finish at the exact same time. If some workers finish before other workers, they could just be sitting idle since they only are able to work on the y/x-size queues you assigned to each of them. In pattern 2 however, you will never have idle worker threads (unless the end of the work queue is reached and numWorkItems < numWorkers).
An easy way to use the preferred pattern, pattern 2, is to use the ExecutorService invokeAll(Collection<? extends Callable<T>> list) method.
Here is an example usage:
List<Callable<?>> workList = // a single list of all of your work
ExecutorService es = Executors.newCachedThreadPool();
es.invokeAll(workList);
Fairly readable and straightforward usage, and the ExecutorService implementation will automatically use solution 2 for you, so you know that each worker thread has their use time maximized.
Are you aware of a different approach?
Solution 1 and 2 are two common approaches for generic work. Now, there are many different implementation available for you choose from (such as java.util.Concurrent, Java 8 parallel streams, or Fork/Join pools), but the concept of each implementation is generally the same. The only exception is if you have specific tasks in mind with non-standard running behavior.

Java Concurrency in Practice : Listing 3.12 and 3.13

We have these objects:
// Listing 3.12
#Immutable
class OneValueCache {
private final BigInteger lastNumber;
private final BigInteger[] lastFactors;
public OneValueCache(BigInteger i, BigInteger[] factors) {
lastNumber = i;
lastFactors = Arrays.copyOf(factors, factors.length);
}
public BigInteger[] getFactors(BigInteger i) {
if (lastNumber == null || !lastNumber.equals(i)) {
return null;
} else {
return Arrays.copyOf(lastFactors, lastFactors.length);
}
}
}
// Listing 3.13
#ThreadSafe
public class VolatileCachedFactorizer implements Servlet {
private volatile OneValueCache cache =
new OneValueCache(null, null);
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
BigInteger[] factors = cache.getFactors(i);
if (factors == null) {
factors = factor(i);
cache = new OneValueCache(i, factors);
}
encodeIntoResponse(resp, factors);
}
}
In the book it is said that VolatileCachedFactorizer is thread save. Why?
For example:
Thread A reads the cache.getFactors(i);
Then thread B reads the cache.getFactors(i);
Then thread B writes cache.
Then thread A writes cache.
Can this flow be considered thread save? What do I not understand?

Short answer:
Thread safety is not really an absolute. You have to determine the desired behavior, and then ask whether the implementation gives you that behavior that in the presence of multithreading.
Longer answer:
So, what's the desired behavior here? Is it just that the right answer is always given, or is it also that it's always implemented exactly once if two threads ask for it in a row?
If it's the latter — that is, if you really want to save every bit of CPU — then you're right, this isn't thread-safe. Two requests could come in at the same time (or close enough to it) to get the factors for the same number N, and if the timings worked out, both threads could end up calculating that number.
But with a single-value cache, you already have the problem of recalculating things you already knew. For instance, what if three requests come in, for N, K, and N again? The request for K would invalidate the cache at N, and so you'd have to recalculate it.
So, this cache is really optimized for "streaks" of the same value, and as such the cost of twice-calculating the first couple (or even few!) answers in that streak might be an acceptable cost: in return, you get code that's free of any blocking and pretty simple to understand.
What's crucial is that it never gives you the wrong answer. That is, if you ask for N and K at the same time, the response for K should never give you the answer for N. This implementation gets you that guarantee, so I would call it thread safe.

It's thread-safety. The ultimate goal is to ensure that the service method output is correct. The OneValueCache always keep the right relationship between lastNumber and lastFactors.

concurrent application not as fast as a singlethreaded

I've implemented a pipeline approach. I'm going to traverse a tree and I need certain values which aren't available beforehand... so I have to traverse the tree in parallel (or before) and once more for every node I want to save values (descendantCount for example).
As such I'm interating through the tree, then from the constructor I'm calling a method which invokes a new Thread started through an ExecutorService. The Callable which is submitted is:
#Override
public Void call() throws Exception {
// Get descendants for every node and save it to a list.
final ExecutorService executor =
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
int index = 0;
final Map<Integer, Diff> diffs = mDiffDatabase.getMap();
final int depth = diffs.get(0).getDepth().getNewDepth();
try {
boolean first = true;
for (final AbsAxis axis = new DescendantAxis(mNewRtx, true); index < diffs.size()
&& ((diffs.get(index).getDiff() == EDiff.DELETED && depth < diffs.get(index).getDepth()
.getOldDepth()) || axis.hasNext());) {
if (axis.getTransaction().getNode().getKind() == ENodes.ROOT_KIND) {
axis.next();
} else {
if (index < diffs.size() && diffs.get(index).getDiff() != EDiff.DELETED) {
axis.next();
}
final Future<Integer> submittedDescendants =
executor.submit(new Descendants(mNewRtx.getRevisionNumber(), mOldRtx
.getRevisionNumber(), axis.getTransaction().getNode().getNodeKey(), mDb
.getSession(), index, diffs));
final Future<Modification> submittedModifications =
executor.submit(new Modifications(mNewRtx.getRevisionNumber(), mOldRtx
.getRevisionNumber(), axis.getTransaction().getNode().getNodeKey(), mDb
.getSession(), index, diffs));
if (first) {
first = false;
mMaxDescendantCount = submittedDescendants.get();
// submittedModifications.get();
}
mDescendantsQueue.put(submittedDescendants);
mModificationQueue.put(submittedModifications);
index++;
}
}
mNewRtx.close();
} catch (final AbsTTException e) {
LOGWRAPPER.error(e.getMessage(), e);
}
executor.shutdown();
return null;
}
Therefore for every node it's creating a new Callable which traverses the tree for every node and counts descendants and modifications (I'm actually fusing two tree-revisions together). Well, mDescendantsQueue and mModificationQueue are BlockingQueues. At first I've only had the descendantsQueue and traversed the tree once more to get modifications of every node (counting modifications made in the subtree of the current node). Then I thought why not do both in parallel and implement a pipelined approach. Sadly the performance seemed to have decreased everytime I've implemented another multithreaded "step".
Maybe because an XML-tree usually isn't that deep and the Concurrency-Overhead is too heavy :-/
At first I did everything sequential, which was the fastest:
- traversing the tree
- for every node traverse the descendants and compute descendantCount and modificationCount
After using a pipelined approach with BlockingQueues it seems the performance has decreased, but I haven't actually made any time measures and I would have to revert many changes to go back :( Maybe the performance increases with more CPUs, because I only have a Core2Duo for testing right now.
best regards,
Johannes

Probably this should help: Amadahl's law, what it basically says it that the increase in productivity depends (inversely proportional) to the percentage of the code which has to be processed by synchronization. Hence even by increasing by increasing more computing resources, it wont end up to the better result. Ideally if the ratio of ( the synchronized part to the total part) is low, then with (number of processors +1) should give the best output (unless you are using network or other I/O in which case you can increase the size of the pool).
So just follow it up from the above link and see if it helps

From your description it sounds like you're recursively creating threads, each of which processes one node and then spawns a new thread? Is this correct? If so, I'm not surprised that you're suffering from performance degradation.
A simple recursive descent method might actually be the best way to do this. I can't see how multithreading will gain you any advantages here.

ExecutorService slow multi thread performance

I am trying to execute a simple calculation (it calls Math.random() 10000000 times). Surprisingly running it in simple method performs much faster than using ExecutorService.
I have read another thread at ExecutorService's surprising performance break-even point --- rules of thumb? and tried to follow the answer by executing the Callable using batches, but the performance is still bad
How do I improve the performance based on my current code?
import java.util.*;
import java.util.concurrent.*;
public class MainTest {
public static void main(String[]args) throws Exception {
new MainTest().start();;
}
final List<Worker> workermulti = new ArrayList<Worker>();
final List<Worker> workersingle = new ArrayList<Worker>();
final int count=10000000;
public void start() throws Exception {
int n=2;
workersingle.add(new Worker(1));
for (int i=0;i<n;i++) {
// worker will only do count/n job
workermulti.add(new Worker(n));
}
ExecutorService serviceSingle = Executors.newSingleThreadExecutor();
ExecutorService serviceMulti = Executors.newFixedThreadPool(n);
long s,e;
int tests=10;
List<Long> simple = new ArrayList<Long>();
List<Long> single = new ArrayList<Long>();
List<Long> multi = new ArrayList<Long>();
for (int i=0;i<tests;i++) {
// simple
s = System.currentTimeMillis();
simple();
e = System.currentTimeMillis();
simple.add(e-s);
// single thread
s = System.currentTimeMillis();
serviceSingle.invokeAll(workersingle); // single thread
e = System.currentTimeMillis();
single.add(e-s);
// multi thread
s = System.currentTimeMillis();
serviceMulti.invokeAll(workermulti);
e = System.currentTimeMillis();
multi.add(e-s);
}
long avgSimple=sum(simple)/tests;
long avgSingle=sum(single)/tests;
long avgMulti=sum(multi)/tests;
System.out.println("Average simple: "+avgSimple+" ms");
System.out.println("Average single thread: "+avgSingle+" ms");
System.out.println("Average multi thread: "+avgMulti+" ms");
serviceSingle.shutdown();
serviceMulti.shutdown();
}
long sum(List<Long> list) {
long sum=0;
for (long l : list) {
sum+=l;
}
return sum;
}
private void simple() {
for (int i=0;i<count;i++){
Math.random();
}
}
class Worker implements Callable<Void> {
int n;
public Worker(int n) {
this.n=n;
}
#Override
public Void call() throws Exception {
// divide count with n to perform batch execution
for (int i=0;i<(count/n);i++) {
Math.random();
}
return null;
}
}
}
The output for this code
Average simple: 920 ms
Average single thread: 1034 ms
Average multi thread: 1393 ms
EDIT: performance suffer due to Math.random() being a synchronised method.. after changing Math.random() with new Random object for each thread, the performance improved
The output for the new code (after replacing Math.random() with Random for each thread)
Average simple: 928 ms
Average single thread: 1046 ms
Average multi thread: 642 ms

Math.random() is synchronized. Kind of the whole point of synchronized is to slow things down so they don't collide. Use something that isn't synchronized and/or give each thread its own object to work with, like a new Random.

You'd do well to read the contents of the other thread. There's plenty of good tips in there.
Perhaps the most significant issue with your benchmark is that according to the Math.random() contract, "This method is properly synchronized to allow correct use by more than one thread. However, if many threads need to generate pseudorandom numbers at a great rate, it may reduce contention for each thread to have its own pseudorandom-number generator"
Read this as: the method is synchronized, so only one thread is likely to be able to usefully use it at the same time. So you do a bunch of overhead to distribute the tasks, only to force them again to run serially.

When you use multiple threads, you need to be aware of the overhead of using additional threads. You also need to determine if your algorithm has work which can be preformed in parallel or not. So you need to have work which can be run concurrently which is large enough that it will exceed the overhead of using multiple threads.
In this case, the simplest workaround is to use a separate Random in each thread. The problem you have is that as a micro-benchmark, your loop doesn't actually do anything and the JIT is very good at discarding code which doesn't do anything. A workaround for this is to sum the random results and return it from the call() as this is usually enough to prevent the JIT from discarding the code.
Lastly if you want to sum lots of numbers, you don't need to save them and sum them later. You can sum them as you go.

Java Threading Tutorial Type Question

I am fairly naive when it comes to the world of Java Threading and Concurrency. I am currently trying to learn. I made a simple example to try to figure out how concurrency works.
Here is my code:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ThreadedService {
private ExecutorService exec;
/**
* #param delegate
* #param poolSize
*/
public ThreadedService(int poolSize) {
if (poolSize < 1) {
this.exec = Executors.newCachedThreadPool();
} else {
this.exec = Executors.newFixedThreadPool(poolSize);
}
}
public void add(final String str) {
exec.execute(new Runnable() {
public void run() {
System.out.println(str);
}
});
}
public static void main(String args[]) {
ThreadedService t = new ThreadedService(25);
for (int i = 0; i < 100; i++) {
t.add("ADD: " + i);
}
}
}
What do I need to do to make the code print out the numbers 0-99 in sequential order?

Thread pools are usually used for operations which do not need synchronization or are highly parallel.
Printing the numbers 0-99 sequentially is not a concurrent problem and requires threads to be synchronized to avoid printing out of order.
I recommend taking a look at the Java concurrency lesson to get an idea of concurrency in Java.

The idea of threads is not to do things sequentially.
You will need some shared state to coordinate. In the example, adding instance fields to your outer class will work in this example. Remove the parameter from add. Add a lock object and a counter. Grab the lock, increment print the number, increment the number, release the number.

The simplest solution to your problem is to use a ThreadPool size of 1. However, this isn't really the kind of problem one would use threads to solve.
To expand, if you create your executor with:
this.exec = Executors.newSingleThreadExecutor();
then your threads will all be scheduled and executed in the order they were submitted for execution. There are a few scenarios where this is a logical thing to do, but in most cases Threads are the wrong tool to use to solve this problem.
This kind of thing makes sense to do when you need to execute the task in a different thread -- perhaps it takes a long time to execute and you don't want to block a GUI thread -- but you don't need or don't want the submitted tasks to run at the same time.

The problem is by definition not suited to threads. Threads are run independently and there isn't really a way to predict which thread is run first.
If you want to change your code to run sequentially, change add to:
public void add(final String str) {
System.out.println(str);
}
You are not using threads (not your own at least) and everything happens sequentially.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.