concurrent application not as fast as a singlethreaded

concurrent application not as fast as a singlethreaded - java

I've implemented a pipeline approach. I'm going to traverse a tree and I need certain values which aren't available beforehand... so I have to traverse the tree in parallel (or before) and once more for every node I want to save values (descendantCount for example).
As such I'm interating through the tree, then from the constructor I'm calling a method which invokes a new Thread started through an ExecutorService. The Callable which is submitted is:
#Override
public Void call() throws Exception {
// Get descendants for every node and save it to a list.
final ExecutorService executor =
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
int index = 0;
final Map<Integer, Diff> diffs = mDiffDatabase.getMap();
final int depth = diffs.get(0).getDepth().getNewDepth();
try {
boolean first = true;
for (final AbsAxis axis = new DescendantAxis(mNewRtx, true); index < diffs.size()
&& ((diffs.get(index).getDiff() == EDiff.DELETED && depth < diffs.get(index).getDepth()
.getOldDepth()) || axis.hasNext());) {
if (axis.getTransaction().getNode().getKind() == ENodes.ROOT_KIND) {
axis.next();
} else {
if (index < diffs.size() && diffs.get(index).getDiff() != EDiff.DELETED) {
axis.next();
}
final Future<Integer> submittedDescendants =
executor.submit(new Descendants(mNewRtx.getRevisionNumber(), mOldRtx
.getRevisionNumber(), axis.getTransaction().getNode().getNodeKey(), mDb
.getSession(), index, diffs));
final Future<Modification> submittedModifications =
executor.submit(new Modifications(mNewRtx.getRevisionNumber(), mOldRtx
.getRevisionNumber(), axis.getTransaction().getNode().getNodeKey(), mDb
.getSession(), index, diffs));
if (first) {
first = false;
mMaxDescendantCount = submittedDescendants.get();
// submittedModifications.get();
}
mDescendantsQueue.put(submittedDescendants);
mModificationQueue.put(submittedModifications);
index++;
}
}
mNewRtx.close();
} catch (final AbsTTException e) {
LOGWRAPPER.error(e.getMessage(), e);
}
executor.shutdown();
return null;
}
Therefore for every node it's creating a new Callable which traverses the tree for every node and counts descendants and modifications (I'm actually fusing two tree-revisions together). Well, mDescendantsQueue and mModificationQueue are BlockingQueues. At first I've only had the descendantsQueue and traversed the tree once more to get modifications of every node (counting modifications made in the subtree of the current node). Then I thought why not do both in parallel and implement a pipelined approach. Sadly the performance seemed to have decreased everytime I've implemented another multithreaded "step".
Maybe because an XML-tree usually isn't that deep and the Concurrency-Overhead is too heavy :-/
At first I did everything sequential, which was the fastest:
- traversing the tree
- for every node traverse the descendants and compute descendantCount and modificationCount
After using a pipelined approach with BlockingQueues it seems the performance has decreased, but I haven't actually made any time measures and I would have to revert many changes to go back :( Maybe the performance increases with more CPUs, because I only have a Core2Duo for testing right now.
best regards,
Johannes

Probably this should help: Amadahl's law, what it basically says it that the increase in productivity depends (inversely proportional) to the percentage of the code which has to be processed by synchronization. Hence even by increasing by increasing more computing resources, it wont end up to the better result. Ideally if the ratio of ( the synchronized part to the total part) is low, then with (number of processors +1) should give the best output (unless you are using network or other I/O in which case you can increase the size of the pool).
So just follow it up from the above link and see if it helps

From your description it sounds like you're recursively creating threads, each of which processes one node and then spawns a new thread? Is this correct? If so, I'm not surprised that you're suffering from performance degradation.
A simple recursive descent method might actually be the best way to do this. I can't see how multithreading will gain you any advantages here.

Related

Massive tasks alternative pattern for Runnable or Callable

For massive parallel computing I tend to use executors and callables. When I have thousand of objects to be computed I feel not so good to instantiate thousand of Runnables for each object.
So I have two approaches to solve this:
I. Split the workload into a small amount of x-workers giving y-objects each. (splitting the object list into x-partitions with y/x-size each)
public static <V> List<List<V>> partitions(List<V> list, int chunks) {
final ArrayList<List<V>> lists = new ArrayList<List<V>>();
final int size = Math.max(1, list.size() / chunks + 1);
final int listSize = list.size();
for (int i = 0; i <= chunks; i++) {
final List<V> vs = list.subList(Math.min(listSize, i * size), Math.min(listSize, i * size + size));
if(vs.size() == 0) break;
lists.add(vs);
}
return lists;
}
II. Creating x-workers which fetch objects from a queue.
Questions:
Is creating thousand of Runnables really expensive and to be avoided?
Is there a generic pattern/recommendation how to do it by solution II?
Are you aware of a different approach?

Creating thousands of Runnable (objects implementing Runnable) is not more expensive than creating a normal object.
Creating and running thousands of Threads can be very heavy, but you can use Executors with a pool of threads to solve this problem.

As for the different approach, you might be interested in java 8's parallel streams.

Combining various answers here :
Is creating thousand of Runnables really expensive and to be avoided?
No, it's not in and of itself. It's how you will make them execute that may prove costly (spawning a few thousand threads certainly has its cost).
So you would not want to do this :
List<Computation> computations = ...
List<Thread> threads = new ArrayList<>();
for (Computation computation : computations) {
Thread thread = new Thread(new Computation(computation));
threads.add(thread);
thread.start();
}
// If you need to wait for completion:
for (Thread t : threads) {
t.join();
}
Because it would 1) be unnecessarily costly in terms of OS ressource (native threads, each having a stack on the heap), 2) spam the OS scheduler with a vastly concurrent workload, most certainly leading to plenty of context switchs and associated cache invalidations at the CPU level 3) be a nightmare to catch and deal with exceptions (your threads should probably define an Uncaught exception handler, and you'd have to deal with it manually).
You'd probably prefer an approach where a finite Thread pool (of a few threads, "a few" being closely related to your number of CPU cores) handles many many Callables.
List<Computation> computations = ...
ExecutorService pool = Executors.newFixedSizeThreadPool(someNumber)
List<Future<Result>> results = new ArrayList<>();
for (Computation computation : computations) {
results.add(pool.submit(new ComputationCallable(computation));
}
for (Future<Result> result : results {
doSomething(result.get);
}
The fact that you reuse a limited number threads should yield a really nice improvement.
Is there a generic pattern/recommendation how to do it by solution II?
There are. First, your partition code (getting from a List to a List<List>) can be found inside collection tools such as Guava, with more generic and fail-proofed implementations.
But more than this, two patterns come to mind for what you are achieving :
Use the Fork/Join Pool with Fork/Join tasks (that is, spawn a task with your whole list of items, and each task will fork sub tasks with half of that list, up to the point where each task manages a small enough list of items). It's divide and conquer. See: http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ForkJoinTask.html
If your computation were to be "add integers from a list", it could look like (there might be a boundary bug in there, I did not really check) :
public static class Adder extends RecursiveTask<Integer> {
protected List<Integer> globalList;
protected int start;
protected int stop;
public Adder(List<Integer> globalList, int start, int stop) {
super();
this.globalList = globalList;
this.start = start;
this.stop = stop;
System.out.println("Creating for " + start + " => " + stop);
}
#Override
protected Integer compute() {
if (stop - start > 1000) {
// Too many arguments, we split the list
Adder subTask1 = new Adder(globalList, start, start + (stop-start)/2);
Adder subTask2 = new Adder(globalList, start + (stop-start)/2, stop);
subTask2.fork();
return subTask1.compute() + subTask2.join();
} else {
// Manageable size of arguments, we deal in place
int result = 0;
for(int i = start; i < stop; i++) {
result +=i;
}
return result;
}
}
}
public void doWork() throws Exception {
List<Integer> computation = new ArrayList<>();
for(int i = 0; i < 10000; i++) {
computation.add(i);
}
ForkJoinPool pool = new ForkJoinPool();
RecursiveTask<Integer> masterTask = new Adder(computation, 0, computation.size());
Future<Integer> future = pool.submit(masterTask);
System.out.println(future.get());
}
Use Java 8 parallel streams in order to launch multiple parallel computations easily (under the hood, Java parallel streams can fall back to the Fork/Join pool actually).
Others have shown how this might look like.
Are you aware of a different approach?
For a different take at concurrent programming (without explicit task / thread handling), have a look at the actor pattern. https://en.wikipedia.org/wiki/Actor_model
Akka comes to mind as a popular implementation of this pattern...

#Aaron is right, you should take a look into Java 8's parallel streams:
void processInParallel(List<V> list) {
list.parallelStream().forEach(item -> {
// do something
});
}
If you need to specify chunks, you could use a ForkJoinPool as described here:
void processInParallel(List<V> list, int chunks) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> {
// do something with each item
});
});
}
You could also have a functional interface as an argument:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
ForkJoinPool forkJoinPool = new ForkJoinPool(chunks);
forkJoinPool.submit(() -> {
list.parallelStream().forEach(item -> processor.accept(item));
});
}
Or in shorthand notation:
void processInParallel(List<V> list, int chunks, Consumer<V> processor) {
new ForkJoinPool(chunks).submit(() -> list.parallelStream().forEach(processor::accept));
}
And then you would use it like:
processInParallel(myList, 2, item -> {
// do something with each item
});
Depending on your needs, the ForkJoinPool#submit() returns an instance of ForkJoinTask, which is a Future and you may use it to check for the status or wait for the end of your task.
You'd most probably want the ForkJoinPool instantiated only once (not instantiate it on every method call) and then reuse it to prevent CPU choking if the method is called multiple times.

Is creating thousand of Runnables really expensive and to be avoided?
Not at all, the runnable/callable interfaces have only one method to implement each, and the amount of "extra" code in each task depends on the code you are running. But certainly no fault of the Runnable/Callable interfaces.
Is there a generic pattern/recommendation how to do it by solution II?
Pattern 2 is more favorable than pattern 1. This is because pattern 1 assumes that each worker will finish at the exact same time. If some workers finish before other workers, they could just be sitting idle since they only are able to work on the y/x-size queues you assigned to each of them. In pattern 2 however, you will never have idle worker threads (unless the end of the work queue is reached and numWorkItems < numWorkers).
An easy way to use the preferred pattern, pattern 2, is to use the ExecutorService invokeAll(Collection<? extends Callable<T>> list) method.
Here is an example usage:
List<Callable<?>> workList = // a single list of all of your work
ExecutorService es = Executors.newCachedThreadPool();
es.invokeAll(workList);
Fairly readable and straightforward usage, and the ExecutorService implementation will automatically use solution 2 for you, so you know that each worker thread has their use time maximized.
Are you aware of a different approach?
Solution 1 and 2 are two common approaches for generic work. Now, there are many different implementation available for you choose from (such as java.util.Concurrent, Java 8 parallel streams, or Fork/Join pools), but the concept of each implementation is generally the same. The only exception is if you have specific tasks in mind with non-standard running behavior.

Implementing a swap method for an indexable concurrent skip list

I'm implementing a concurrent skip list map based on Java's ConcurrentSkipListMap, the differences being that I want the list to allow duplicates, and I also want the list to be indexable (so that finding the Nth element of the list takes O(lg(n)) time, instead of O(n) time as with a standard skip list). These modifications aren't presenting a problem.
In addition, the skip list's keys are mutable. For example, if the list elements are the integers {0, 4, 7}, then the middle element's key can be changed to any value in [0, 7] without prompting a change to the list structure; if the key changes to (-inf, -1] or [8, +inf) then the element is removed and re-added to maintain the list order. Rather than implementing this as a removal followed by a O(lg(n)) insert, I implement this as a removal followed by a linear traversal followed by an O(1) insert (with an expected runtime of O(1) - 99% of the time the node will be swapped with an adjacent node).
Inserting a completely new node is rare (after startup), and deleting a node (without immediately re-adding it) never occurs; almost all of the operations are elementAt(i) to retrieve the element at the ith index, or operations to swap nodes after a key is modified.
The problem I'm running into is in how to implement the key modification class(es). Conceptually, I'd like to do something like
public class Node implements Runnable {
private int key;
private Node prev, next;
private BlockingQueue<Integer> queue;
public void update(int i) {
queue.offer(i);
}
public void run() {
while(true) {
int temp = queue.take();
temp += key;
if(prev.getKey() > temp) {
// remove node, update key to temp, perform backward linear traversal, and insert
} else if(next.getKey() < temp) {
// remove node, update key to temp, perform forward linear traveral, and insert
} else {
key = temp; // node doesn't change position
}
}
}
}
(The insert sub-method being called from run uses CAS in order to handle the problem of two nodes attempting to simultaneously insert at the same location (similar to how the ConcurrentSkipListMap handles conflicting inserts) - conceptually this is the same as if the first node locked the nodes adjacent to the insertion point, except that the overhead is reduced for the case where there's no conflict.)
This way I can ensure that the list is always in order (it's okay if a key update is a bit delayed, because I can be certain that the update will eventually happen; however, if the list becomes unordered then things might go haywire). The problem being that implementing the list this way will generate an awful lot of threads, one per Node (with several thousand nodes in the list) - most of them will be blocking at any given point in time, but I'm concerned that several thousand blocking threads will still result in too high of an overhead.
Another option is to make the update method synchronized and remove the Runnable interface from Node, so that rather than having two threads enqueuing updates in the Node which then takes care of processing these updates on its separate thread, the two threads would instead take turns executing the Node#update method. The problem is that this could potentially create a bottleneck; if eight different threads all decided to update the same node at once then the queue implementation would scale just fine, but the synchronized implementation would block seven out of the eight threads (and would then block six threads, then five, etc).
So my question is, how would I implement something like the queue implementation except with a reduced number of threads, or else how would I implement something like the synchronized implementation except without the potential bottleneck problem.

I think I may be able to solve this with a ThreadPoolExecutor, something like
public class Node {
private int key;
private Node prev, next;
private ConcurrentLinkedQueue<Integer> queue;
private AtomicBoolean lock = new AtomicBoolean(false);
private ThreadPoolExecutor executor;
private UpdateNode updater = new UpdateNode();
public void update(int i) {
queue.offer(i);
if(lock.compareAndSet(false, true)) {
executor.execute(updater);
}
}
private class UpdateNode implements Runnable {
public void run() {
do {
try {
int temp = key;
while(!queue.isEmpty()) {
temp += queue.poll();
}
if(prev.getKey() > temp) {
// remove node, update key to temp, perform backward linear traversal, and insert
} else if(next.getKey() < temp) {
// remove node, update key to temp, perform forward linear traveral, and insert
} else {
key = temp; // node doesn't change position
}
} finally {
lock.set(false);
}
} while (!queue.isEmpty() && lock.compareAndSet(false, true));
}
}
}
This way I have the advantages of the queue approach without having a thousand threads sitting blocked - I instead execute a UpdateNode each time I need to update a node (unless there's already an UpdateNode being executed on that Node, hence the AtomicBoolean that's acting as a lock), and rely on the ThreadPoolExecutor to make it inexpensive to create several thousand runnables.

How to correctly use synchronized?

This piece of code:
synchronized (mList) {
if (mList.size() != 0) {
int s = mList.size() - 1;
for (int i = s; i > 0; i -= OFFSET) {
mList.get(i).doDraw(canv);
}
getHead().drawHead(canv);
}
}
Randomly throws AIOOBEs. From what I've read, the synchronized should prevent that, so what am I doing wrong?
Edits:
AIOOBE = Array Index Out Of Bounds Exception
The code's incomplete, cut down to what is needed. But to make you happy, OFFSET is 4, and just imagine that there is a for-loop adding a bit of data at the beginning. And a second thread reading and / or modifying the list.
Edit 2:
I've noticed it happens when the list is being drawn and the current game ends. The draw-thread hasn't drawn all elements when the list is emptied. Is there a way of telling the game to wait with emtying the list untill it's empty?
Edit 3:
I've just noticed that I'm not sure if this is a multi-threading problem. Seems I only have 2 threads, one for calculating and drawing and one for user input.. Gonna have to look into this a bit more than I thought.

What you're doing looks right... but that's all:
It doesn't matter on what object you synchronize, it needn't be the list itself.
What does matter is if all threads always synchronize on the same object, when accessing a shared resource.
Any access to SWING (or another graphic library) must happen in the AWT-Thread.
To your edit:
I've noticed it happens when the list is being drawn and the current game ends. The draw-thread hasn't drawn all elements when the list is emptied. Is there a way of telling the game to wait with emtying the list untill it's empty?
I think you mean "...wait with emptying the list until the drawing has completed." Just synchronize the code doing it on the same lock (i.e., the list itself in your case).
Again: Any access to a shared resource must be protected somehow. It seems like you're using synchronized just here and not where you're emptying the list.

The safe solution is to only allow one thread to create objects, add and remove them from a List after the game has started.
I had problems myself with random AIOOBEs erros and no synchornize could solve it properly plus it was slowing down the response of the user.
My solution, which is now stable and fast (never had an AIOOBEs since) is to make UI thread inform the game thread to create or manipulate an object by setting a flag and coordinates of the touch into the persistent variables.
Since the game thread loops about 60 times per second this proved to be sufficent to pick up the message from the UI thread and do something.
This is a very simple solution and it works great!

My suggestion is to use a BlockingQueue and I think you are looking for this solution also. How you can do it? It is already shown with an example in the javadoc :)
class Producer implements Runnable {
private final BlockingQueue queue;
Producer(BlockingQueue q) { queue = q; }
public void run() {
try {
while (true) { queue.put(produce()); }
} catch (InterruptedException ex) { ... handle ...}
}
Object produce() { ... }
}
class Consumer implements Runnable {
private final BlockingQueue queue;
Consumer(BlockingQueue q) { queue = q; }
public void run() {
try {
while (true) { consume(queue.take()); }
} catch (InterruptedException ex) { ... handle ...}
}
void consume(Object x) { ... }
}
class Setup {
void main() {
BlockingQueue q = new SomeQueueImplementation();
Producer p = new Producer(q);
Consumer c1 = new Consumer(q);
Consumer c2 = new Consumer(q);
new Thread(p).start();
new Thread(c1).start();
new Thread(c2).start();
}
}
The beneficial things for you are, you need not to worry about synchronizing your mList. BlockingQueue offers 10 special method. You can check it in the doc. Few from javadoc:
BlockingQueue methods come in four forms, with different ways of handling operations that cannot be satisfied immediately, but may be satisfied at some point in the future: one throws an exception, the second returns a special value (either null or false, depending on the operation), the third blocks the current thread indefinitely until the operation can succeed, and the fourth blocks for only a given maximum time limit before giving up.
To be in safe side: I am not experienced with android. So not certain whether all java packages are allowed in android. But at least it should be :-S, I wish.

You are getting Index out of Bounds Exception because there are 2 threads that operate on the list and are doing it wrongly.
You should have been synchronizing at another level, in such a way that no other thread can iterate through the list while other thread is modifying it! Only on thread at a time should 'work on' the list.
I guess you have the following situation:
//piece of code that adds some item in the list
synchronized(mList){
mList.add(1, drawableElem);
...
}
and
//code that iterates you list(your code simplified)
synchronized (mList) {
if (mList.size() != 0) {
int s = mList.size() - 1;
for (int i = s; i > 0; i -= OFFSET) {
mList.get(i).doDraw(canv);
}
getHead().drawHead(canv);
}
}
Individually the pieces of code look fine. They seam thread-safe. But 2 individual thread-safe pieces of code might not be thread safe at a higher level!
It's just you would have done the following:
Vector v = new Vector();
if(v.length() == 0){ v.length() itself is thread safe!
v.add("elem"); v.add() itself is also thread safe individually!
}
BUT the compound operation is NOT!
Regards,
Tiberiu

Java ExecutorService to solve Recursive Fibonacci Series

I need to find out the number based on some index in the Fibonacci Series recursively using threads and I tried the following code, but the program never ends. Please let me know if I am missing something.
Code:
import java.math.BigInteger;
import java.util.concurrent.*;
public class MultiThreadedFib {
private ExecutorService executorService;
public MultiThreadedFib(final int numberOfThreads) {
executorService = Executors.newFixedThreadPool(numberOfThreads);
}
public BigInteger getFibNumberAtIndex(final int index)
throws InterruptedException, ExecutionException {
Future<BigInteger> indexMinusOne = executorService.submit(
new Callable<BigInteger>() {
public BigInteger call()
throws InterruptedException, ExecutionException {
return getNumber(index - 1);
}
});
Future<BigInteger> indexMinusTwo = executorService.submit(
new Callable<BigInteger>() {
public BigInteger call()
throws InterruptedException, ExecutionException {
return getNumber(index - 2);
}
});
return indexMinusOne.get().add(indexMinusTwo.get());
}
public BigInteger getNumber(final int index)
throws InterruptedException, ExecutionException {
if (index == 0 || index == 1)
return BigInteger.valueOf(index);
return getFibNumberAtIndex(index - 1).add(getFibNumberAtIndex(index - 2));
}
}
Fixed it (Thanks to fiver)
Instead of calling getNumber(int) from the call method, I am calling to a dynamic programming algorithm that computes the number at that index.
The code for that is:
public class DynamicFib implements IFib {
private Map<Integer, BigInteger> memoize = new HashMap<Integer, BigInteger>();
public DynamicFib() {
memoize.put(0, BigInteger.ZERO);
memoize.put(1, BigInteger.ONE);
}
public BigInteger getFibNumberAtIndex(final int index) {
if (!memoize.containsKey(index))
memoize.put(index, getFibNumberAtIndex(index - 1).add(getFibNumberAtIndex(index - 2)));
return memoize.get(index);
}
}

This recursion will overflow the stack very fast. This is because you are computing lower fibonacci numbers over and over again - exponentially many number of times.
One effective way to avoid that is to use memoized recursion (a dynamic programming approach)
Basically use a static array to hold the already computed fibonacci numbers and whenever you need one, take it from the array, if it's already computed. If not, then compute it and store it in the array. This way each number will be computed only once.
(You can use other data structure instead of array, of course, i.e. hashtable)

What you are doing is replacing simple recursion with recursion via threads / tasks.
Until you get to the fib(0) and fib(1) cases, each task submits two more tasks, and then waits for them to complete. While it is waiting, it is still using a thread. Since the thread pool is bounded, you soon get to the point where calls to submit block ... and the whole computation locks up.
In addition to that, you've got a bug in indexMinusTwo which would result in the computation giving the wrong answer.
But still the recursive multithreaded procedure takes much longer than the memoized recursive non-multithreaded one.. any tip to improve performance?
Even assuming that you "fixed" the above problem (e.g. by using an unbounded thread pool) there is no way that you will be able to do a multi-threaded version of fibonacci that performs better than a single-threaded version that uses memoization. The computation is simply not suited to parallelization.

Threads work best when you have independant tasks to perform. The fibonacci series by definition does not have any degrees of parallelism. Each f(n) depends on the previous two values. As such it is not possible to calculate f(n) faster using multiple threads than using one (unless you have an inefficient algo)
The only thing you could make parallel potentially the + operation for large numbers, however this is likely to be a) complex b) difficult to make faster than the single threaded solution.
The fastest/simplest way to calculate fibonacci numbers is to use a loop in one thread. You don't need to use recusrion or memorize values.

Java Threadpools with competeting queues

I have a situation where I'd like to use an extension of Java's fixed thread pools. I have N groups of runnable objects that I'd like to compete for resources. However, I'd like the total number of threads used to remain constant. The way that I would like this to work is outlined here
Allocate an object with N threads and M queues;
Schedule job n on queue m.
Have a pointer to the first queue
Repeat
a. If the maximum number of threads is currently in use wait.
b. Pop off a job on the current queue
c. Move the pointer one queue over (or from the last queue to the first)
First, does something like this already exist? Second if not, I'm nervous about writing my own because I know writing my own thread pools can be dangerous. Can anyone point me to some good examples for writing my own.

Your best bet is probably creating your own implementation of a Queue that cycles through other queues. For example (in pseudo-code):
class CyclicQueue {
Queue queues[];
int current = 0;
CyclicQueue(int size) {
queues = new Queue[size];
for(int i=0; i<size; i++)
queues[i] = new LinkedList<T>();
}
T get() {
int i = current;
T value;
while( (value = queues[i].poll() == null) {
i++;
if(i == current)
return null;
}
return value;
}
}
Of course, with this, if you want blocking you'll need to add that in yourself.
In which case, you'll probably want a custom Queue for each queue which can notify the parent queue that value has been added.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.