Multithreading and recursion together - java

I have recursive code that processes a tree structure in a depth first manner. The code basically looks like this:
function(TreeNode curr)
{
if (curr.children != null && !curr.children.isEmpty())
{
for (TreeNode n : curr.children)
{
//do some stuff
function(n);
}
}
else
{
//do some other processing
}
}
I want to use threads to make this complete faster. Most of the time is spent traversing so I don't want to just create a thread to handle "the other processing" because it doesn't take that long. I think I want to fork threads at "do some stuff" but how would that work?

It's a good case for Fork/Join framework which is to be included into Java 7. As a standalone library for use with Java 6 it can be downloaded here.
Something like this:
public class TreeTask extends RecursiveAction {
private final TreeNode node;
private final int level;
public TreeTask(TreeNode node, int level) {
this.node = node;
this.level = leve;
}
public void compute() {
// It makes sense to switch to single-threaded execution after some threshold
if (level > THRESHOLD) function(node);
if (node.children != null && !node.children.isEmpty()) {
List<TreeTask> subtasks = new ArrayList<TreeTask>(node.children.size());
for (TreeNode n : node.children) {
// do some stuff
subtasks.add(new TreeTask(n, level + 1));
}
invokeAll(subtasks); // Invoke and wait for completion
} else {
//do some other processing
}
}
}
...
ForkJoinPool p = new ForkJoinPool(N_THREADS);
p.invoke(root, 0);
The key point of fork/join framework is work stealing - while waiting for completion of subtasks thread executes other tasks. It allows you to write algorithm in straightforward way, while avoiding problems with thread exhausting as a naive apporach with ExecutorService would have.

In the // do some stuff code block where you work on the individual Node, what you could do instead is submit the Node to some sort of ExecutorService (in the form of a Runnable which will work on the Node).
You can configure the ExecutorService that you use to be backed by a pool of a certain number of threads, allowing you to decouple the "handling" logic (along with logic around creating threads, how many to create, etc) from your tree-parsing logic.

This solution assumes that the processing only happens at the leaf nodes and that the actual recursion of the tree doesn't take a long time.
I would have the caller thread do the recursion and then a BlockingQueue of workers that process the leafs via a thread-pool. I'm not handling the InterruptedException in a couple of places here.
public void processTree(TreeNode top) {
final LinkedBlockingQueue<Runnable> queue =
new LinkedBlockingQueue<Runnable>(MAX_NUM_QUEUED);
// create a pool that starts at 1 threads and grows to MAX_NUM_THREADS
ExecutorService pool =
new ThreadPoolExecutor(1, MAX_NUM_THREADS, 0L, TimeUnit.MILLISECONDS, queue,
new RejectedExecutionHandler() {
public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
queue.put(r); // block if we run out of space in the pool
}
});
walkTree(top, pool);
pool.shutdown();
// i think this will join with all of the threads
pool.awaitTermination(WAIT_TILL_CHILDREN_FINISH_MILLIS, TimeUnit.MILLISECONDS);
}
private void walkTree(final TreeNode curr, ExecutorService pool) {
if (curr.children == null || curr.children.isEmpty()) {
pool.submit(new Runnable() {
public void run() {
processLeaf(curr);
}
});
return;
}
for (TreeNode child : curr.children) {
walkTree(child, pool);
}
}
private void processLeaf(TreeNode leaf) {
// ...
}

Related

In Loom, can I use virtual threads for Recursive[Action/Task]?

Is it possible to use RecursiveAction, for example, in conjunction with -- instead of the fork/join pool -- a pool of virtual threads (before I attempt a poorly-designed, custom effort)?
RecursiveAction is a subclass of ForkJoinTask which is, as the name suggests and the documentation even says literally, an
Abstract base class for tasks that run within a ForkJoinPool.
While the ForkJoinPool can be customized with a thread factory, it’s not the standard thread factory, but a special factory for producing ForkJoinWorkerThread instances. Since these threads are subclasses of Thread, they can’t be created with the virtual thread factory.
So, you can’t use RecursiveAction with virtual threads. The same applies to RecursiveTask. But it’s worth rethinking what using these classes with virtual threads would gain you.
The main challenge, to implement decomposition of your task into sub-task is on you, anyway. What these classes provide you, are features specifically for dealing with the Fork/Join pool and balancing the workload with the available platform threads. When you want to perform each sub-task on its own virtual thread, you don’t need this. So you can easily implement a recursive task with virtual threads without the built-in classes, e.g.
record PseudoTask(int from, int to) {
public static CompletableFuture<Void> run(int from, int to) {
return CompletableFuture.runAsync(
new PseudoTask(from, to)::compute, Thread::startVirtualThread);
}
protected void compute() {
int mid = (from + to) >>> 1;
if(mid == from) {
// simulate actual processing with potentially blocking operations
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(500));
}
else {
CompletableFuture<Void> sub1 = run(from, mid), sub2 = run(mid, to);
sub1.join();
sub2.join();
}
}
}
This example just doesn’t care about limiting the subdivision nor avoiding blocking join() calls and it still performs well when running, e.g. PseudoTask.run(0, 1_000).join(); You might notice that with larger ranges, the techniques known from the other recursive task implementations can be useful here too, where the sub-task is rather cheap.
E.g., you may only submit one half of the range to another thread and process the other half locally, like
record PseudoTask(int from, int to) {
public static CompletableFuture<Void> run(int from, int to) {
return CompletableFuture.runAsync(
new PseudoTask(from, to)::compute, Thread::startVirtualThread);
}
protected void compute() {
CompletableFuture<Void> f = null;
for(int from = this.from, mid; ; from = mid) {
mid = (from + to) >>> 1;
if (mid == from) {
// simulate actual processing with potentially blocking operations
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(500));
break;
} else {
CompletableFuture<Void> sub1 = run(from, mid);
if(f == null) f = sub1; else f = CompletableFuture.allOf(f, sub1);
}
}
if(f != null) f.join();
}
}
which makes a notable difference when running, e.g. PseudoTask.run(0, 1_000_000).join(); which will use only 1 million threads in the second example rather than 2 millions. But, of course, that’s a discussion on a different level than with platform threads where neither approach would work reasonably.
Another upcoming option is the StructuredTaskScope which allows to spawn sub-tasks and wait for their completion
record PseudoTask(int from, int to) {
public static void run(int from, int to) {
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
new PseudoTask(from, to).compute(scope);
scope.join();
} catch (InterruptedException e) {
throw new IllegalStateException(e);
}
}
protected Void compute(StructuredTaskScope<Object> scope) {
for(int from = this.from, mid; ; from = mid) {
mid = (from + to) >>> 1;
if (mid == from) {
// simulate actual processing with potentially blocking operations
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(500));
break;
} else {
var sub = new PseudoTask(from, mid);
scope.fork(() -> sub.compute(scope));
}
}
return null;
}
}
Here, the tasks do not wait for the completion of their sub-task but only the root task waits for the completion of all tasks. But this feature is in incubator state, hence, may take even longer than the virtual threads feature, to become production-ready.

How to stop execution when working queue is empty and all threads have stopped working?

While writing a state-space search like algorithm, I have a working queue with node elements. I have multiple threads with access to that queue that pop an element, do some transformations and checks to it, and may add more nodes to be visited to the queue.
I want the program to stop whenever the queue is empty, and all threads have stopped working (since they could add more elements in which case we would need the other threads to help handling these new nodes).
How should I go about making that check? I was currently thinking keeping some AtomicBitSet, keeping track of which threads are working and which are not, and stop execution when the bitset is empty. I would set and unset with the following, in the run method of my handlers
while (!bitset.isAllUnset()) {
Node node = queue.poll();
if (node == null) {
bitset.unset(THREAD_INDEX);
} else {
bitset.set(THREAD_INDEX);
// HANDLE THE NODE
}
}
Is there any recommended method to go about this?
What you could do is the following approach:
Create a ThreadPool and push the initial Task to your Queue.
Keep one Thread (you main Thread) as a Monitor on the ThreadPool.
The job of this Thread is to start new Threads as long as the Queue is not empty, the Thread Pool still has capacity left and give them their Task.
A Thread that is started will do its job and writes the results back to the queue.
Afterwards it is returned to the pool and you will have to wake up your Monitor.
Your Main Thread will then try to start a new thread as long as the Thread Pool has not reached is limit and the Task Queue is not empty.
Use an ExecutorService to which you submit the Runnables that read the queue and which stop running when the queue is empty.
Then call the executor service 's awaitTermination() method which will block until all threads are finished.
Or use CompletableFuture:
CompleteableFuture.allOf(
CompleteableFuture.runAsync(
() -> while(!queue.isEmpty()) handle(queue.poll())
));
I think this is actually rather complicated. I do not know how to write a correct version using a set based approach. For example, the following approach is wrong:
public class ThreadsStopWorkingWrong {
ConcurrentLinkedQueue queue = new ConcurrentLinkedQueue();
ConcurrentHashMap activeThreads = new ConcurrentHashMap();
volatile int prozessedCount = 0;
volatile boolean stop = false;
#Interleave(group = ThreadsStopWorkingWrong.class, threadCount = 1)
public void readFromQueue() {
int prozessAdditionalElements = 1;
while (!stop) {
Object element = queue.poll();
if (element != null) {
activeThreads.put(Thread.currentThread(), "");
if (prozessAdditionalElements > 0) {
prozessAdditionalElements--;
queue.offer("2");
}
prozessedCount++;
} else {
activeThreads.remove(Thread.currentThread());
}
}
}
#Interleave(group = ThreadsStopWorkingWrong.class, threadCount = 1)
public void waitTillProzessed() throws InterruptedException {
while (!queue.isEmpty() && !activeThreads.isEmpty()) {
Thread.sleep(1);
}
assertEquals(2, prozessedCount);
}
#Test
public void test() throws InterruptedException {
queue.offer("1");
Thread worker = new Thread(() -> readFromQueue());
worker.start();
waitTillProzessed();
worker.join();
}
}
The problem is that when you poll the message out of the queue you have not yet added the thread to the activated set so !queue.isEmpty() && !activeThreads.isEmpty() becomes true. What works is using a message counter as in the following example:
public class ThreadsStopWorkingCorrect {
ConcurrentLinkedQueue queue = new ConcurrentLinkedQueue();
AtomicLong messageCount = new AtomicLong();
volatile int prozessedCount = 0;
volatile boolean stop = false;
#Interleave(group = ThreadsStopWorkingCorrect.class, threadCount = 1)
public void readFromQueue() {
int prozessAdditionalElements = 1;
while (!stop) {
Object element = queue.poll();
if (element != null) {
if (prozessAdditionalElements > 0) {
prozessAdditionalElements--;
queue.offer("2");
messageCount.incrementAndGet();
}
prozessedCount++;
messageCount.decrementAndGet();
}
}
}
#Interleave(group = ThreadsStopWorkingCorrect.class, threadCount = 1)
public void waitTillProzessed() throws InterruptedException {
while (messageCount.get() > 0) {
Thread.sleep(1);
}
assertEquals(2, prozessedCount);
}
#Test
public void test() throws InterruptedException {
queue.offer("1");
messageCount.incrementAndGet();
Thread worker = new Thread(() -> readFromQueue());
worker.start();
waitTillProzessed();
worker.join();
}
}
I tested both the example with vmlens, a tool I wrote to test multithreaded software. Therefore the Interleave annotations.
In the set-based version, some thread interleavings lead to prozessedCount==0.
In the counter-based version, the prozessedCount is always 2.

Iterate through threads run via ThreadPoolTaskExecutor

I have a ThreadPoolTaskExecutor and when I create a Process which implements Runnable I run it via: executor.execute(process).
Now, before calling execute I want to check one field from Process object and compare it with ALL other currently running processes, executed by my ThreadPoolTaskExecutor. How I can do that, not generating a concurrent problem?
Code:
public class MyApp {
ThreadPoolTaskExecutor executor;
//...
public void runProcesses {
Process firstone = new Process(1);
Process nextOne = new Process(1);
// iterate through all processes started via executor and currently running,
// verify if there is any process.getX() == 1, if not run it
executor.execute(firstone );
//wait till firstone will end becouse have the same value of X
executor.execute(nextOne); // this cant be perform until the first one will end
}
}
public class Process {
private int x;
//...
public Process (int x){
this.x = x;
}
public int getX(){
return this.x;
}
}
I was thinking about createing simple Set of process started and add new one to it. But I have problem how to determine is it still running and remove it from set when it is done. So now I'm thinking about iterating through running threads, but completly dunno how.
I think that your initial idea is pretty good and can be made to work with not too much code.
It will require some tinkering in order to decouple "is a Runnable for this value already running" from "execute this Runnable", but here's a rough illustration that doesn't take care about that:
Implement equals() and hashCode() in Process, so that instances can safely be used in unordered sets and maps.
Create a ConcurrentMap<Process, Boolean>
You won't be using Collections.newSetFromMap(new ConcurrentHashMap<Process, Boolean>) because you'd want to use the map's putIfAbsent() method.
Try to add in it using putIfAbsent() each Process that you will be submitting and bail if the returned value is not null.
A non-null return value means that there's already an equivalent Process in the map (and therefore being processed).
The trivial and not very clean solution will be to inject a reference to the map in each Process instance and have putIfAbsent(this, true) as the first thing you do in your run() method.
Remove from it each Process that has finished processing.
The trivial and not very clean solution will be inject a reference to the map in each Process instance and have remove(this) as the last thing you do in your run() method.
Other solutions can have Process implement Callable and return its unique value as a result, so that it can be removed from the map, or use CompletableFuture and its thenAccept() callback.
Here's a sample that illustrates the trivial and not very clean solution described above (code too long to paste directly here).
Though #Dimitar provided very good solution for solving this problem I want to make an addition with another approach.
Having your requirements, it seems like you need to keep all submitted Processes, slicing them by x into separate queues and executing processes in queues one by one.
API of ThreadPoolExecutor empowers to enhance behaviour of Executor and I came to the following implementation of ThreadPoolExecutor:
ThreadPoolExecutor executor = new ThreadPoolExecutor(2, 2,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<>()) {
private final ConcurrentMap<Integer, Queue<Runnable>> processes = new ConcurrentHashMap<>();
#Override
public void execute(Runnable command) {
if (command instanceof Process) {
int id = ((Process) command).getX();
Queue<Runnable> t = new ArrayDeque<>();
Queue<Runnable> queue = this.processes.putIfAbsent(id, t);
if (queue == null) {
queue = t;
}
synchronized (queue) {
queue.add(command);
if (!processes.containsKey(id)) {
processes.put(id, queue);
}
if (queue.size() == 1) {
super.execute(queue.peek()); // removal of current process would be done in #afterExecute
}
}
} else {
super.execute(command);
}
}
#Override
protected void afterExecute(Runnable r, Throwable t) {
super.afterExecute(r, t);
if (r instanceof Process) {
int id = ((Process) r).getX();
Queue<Runnable> queue = this.processes.get(id);
synchronized (queue) {
queue.poll(); // remove completed prev process
Runnable nextProcess = queue.peek(); // retrieve next process
if (nextProcess != null) {
super.execute(nextProcess);
} else {
this.processes.remove(id);
}
}
}
}
}

AtomicReference to a mutable object and visibility

Say I have an AtomicReferenceto a list of objects:
AtomicReference<List<?>> batch = new AtomicReference<List<Object>>(new ArrayList<Object>());
Thread A adds elements to this list: batch.get().add(o);
Later, thread B takes the list and, for example, stores it in a DB: insertBatch(batch.get());
Do I have to do additional synchronization when writing (Thread A) and reading (Thread B) to ensure thread B sees the list the way A left it, or is this taken care of by the AtomicReference?
In other words: if I have an AtomicReference to a mutable object, and one thread changes that object, do other threads see this change immediately?
Edit:
Maybe some example code is in order:
public void process(Reader in) throws IOException {
List<Future<AtomicReference<List<Object>>>> tasks = new ArrayList<Future<AtomicReference<List<Object>>>>();
ExecutorService exec = Executors.newFixedThreadPool(4);
for (int i = 0; i < 4; ++i) {
tasks.add(exec.submit(new Callable<AtomicReference<List<Object>>>() {
#Override public AtomicReference<List<Object>> call() throws IOException {
final AtomicReference<List<Object>> batch = new AtomicReference<List<Object>>(new ArrayList<Object>(batchSize));
Processor.this.parser.parse(in, new Parser.Handler() {
#Override public void onNewObject(Object event) {
batch.get().add(event);
if (batch.get().size() >= batchSize) {
dao.insertBatch(batch.getAndSet(new ArrayList<Object>(batchSize)));
}
}
});
return batch;
}
}));
}
List<Object> remainingBatches = new ArrayList<Object>();
for (Future<AtomicReference<List<Object>>> task : tasks) {
try {
AtomicReference<List<Object>> remainingBatch = task.get();
remainingBatches.addAll(remainingBatch.get());
} catch (ExecutionException e) {
Throwable cause = e.getCause();
if (cause instanceof IOException) {
throw (IOException)cause;
}
throw (RuntimeException)cause;
}
}
// these haven't been flushed yet by the worker threads
if (!remainingBatches.isEmpty()) {
dao.insertBatch(remainingBatches);
}
}
What happens here is that I create four worker threads to parse some text (this is the Reader in parameter to the process() method). Each worker saves the lines it has parsed in a batch, and flushes the batch when it is full (dao.insertBatch(batch.getAndSet(new ArrayList<Object>(batchSize)));).
Since the number of lines in the text isn't a multiple of the batch size, the last objects end up in a batch that isn't flushed, since it's not full. These remaining batches are therefore inserted by the main thread.
I use AtomicReference.getAndSet() to replace the full batch with an empty one. It this program correct with regards to threading?
Um... it doesn't really work like this. AtomicReference guarantees that the reference itself is visible across threads i.e. if you assign it a different reference than the original one the update will be visible. It makes no guarantees about the actual contents of the object that reference is pointing to.
Therefore, read/write operations on the list contents require separate synchronization.
Edit: So, judging from your updated code and the comment you posted, setting the local reference to volatile is sufficient to ensure visibility.
I think that, forgetting all the code here, you exact question is this:
Do I have to do additional synchronization when writing (Thread A) and
reading (Thread B) to ensure thread B sees the list the way A left it,
or is this taken care of by the AtomicReference?
So, the exact response to that is: YES, atomic take care of visibility. And it is not my opinion but the JDK documentation one:
The memory effects for accesses and updates of atomics generally follow the rules for volatiles, as stated in The Java Language Specification, Third Edition (17.4 Memory Model).
I hope this helps.
Adding to Tudor's answer: You will have to make the ArrayList itself threadsafe or - depending on your requirements - even larger code blocks.
If you can get away with a threadsafe ArrayList you can "decorate" it like this:
batch = java.util.Collections.synchronizedList(new ArrayList<Object>());
But keep in mind: Even "simple" constructs like this are not threadsafe with this:
Object o = batch.get(batch.size()-1);
The AtomicReference will only help you with the reference to the list, it will not do anything to the list itself. More particularly, in your scenario, you will almost certainly run into problems when the system is under load where the consumer has taken the list while the producer is adding an item to it.
This sound to me like you should be using a BlockingQueue. You can then Limit the memory footprint if you producer is faster than your consumer and let the queue handle all contention.
Something like:
ArrayBlockingQueue<Object> queue = new ArrayBlockingQueue<Object> (50);
// ... Producer
queue.put(o);
// ... Consumer
List<Object> queueContents = new ArrayList<Object> ();
// Grab everything waiting in the queue in one chunk. Should never be more than 50 items.
queue.drainTo(queueContents);
Added
Thanks to #Tudor for pointing out the architecture you are using. ... I have to admit it is rather strange. You don't really need AtomicReference at all as far as I can see. Each thread owns its own ArrayList until it is passed on to dao at which point it is replaced so there is no contention at all anywhere.
I am a little concerned about you creating four parser on a single Reader. I hope you have some way of ensuring each parser does not affect the others.
I personally would use some form of producer-consumer pattern as I have described in the code above. Something like this perhaps.
static final int PROCESSES = 4;
static final int batchSize = 10;
public void process(Reader in) throws IOException, InterruptedException {
final List<Future<Void>> tasks = new ArrayList<Future<Void>>();
ExecutorService exec = Executors.newFixedThreadPool(PROCESSES);
// Queue of objects.
final ArrayBlockingQueue<Object> queue = new ArrayBlockingQueue<Object> (batchSize * 2);
// The final object to post.
final Object FINISHED = new Object();
// Start the producers.
for (int i = 0; i < PROCESSES; i++) {
tasks.add(exec.submit(new Callable<Void>() {
#Override
public Void call() throws IOException {
Processor.this.parser.parse(in, new Parser.Handler() {
#Override
public void onNewObject(Object event) {
queue.add(event);
}
});
// Post a finished down the queue.
queue.add(FINISHED);
return null;
}
}));
}
// Start the consumer.
tasks.add(exec.submit(new Callable<Void>() {
#Override
public Void call() throws IOException {
List<Object> batch = new ArrayList<Object>(batchSize);
int finishedCount = 0;
// Until all threads finished.
while ( finishedCount < PROCESSES ) {
Object o = queue.take();
if ( o != FINISHED ) {
// Batch them up.
batch.add(o);
if ( batch.size() >= batchSize ) {
dao.insertBatch(batch);
// If insertBatch takes a copy we could merely clear it.
batch = new ArrayList<Object>(batchSize);
}
} else {
// Count the finishes.
finishedCount += 1;
}
}
// Finished! Post any incopmplete batch.
if ( batch.size() > 0 ) {
dao.insertBatch(batch);
}
return null;
}
}));
// Wait for everything to finish.
exec.shutdown();
// Wait until all is done.
boolean finished = false;
do {
try {
// Wait up to 1 second for termination.
finished = exec.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException ex) {
}
} while (!finished);
}

Producer-consumer problem with a twist

The producer is finite, as should be the consumer.
The problem is when to stop, not how to run.
Communication can happen over any type of BlockingQueue.
Can't rely on poisoning the queue(PriorityBlockingQueue)
Can't rely on locking the queue(SynchronousQueue)
Can't rely on offer/poll exclusively(SynchronousQueue)
Probably even more exotic queues in existence.
Creates a queued seq on another (presumably lazy) seq s. The queued
seq will produce a concrete seq in the background, and can get up to
n items ahead of the consumer. n-or-q can be an integer n buffer
size, or an instance of java.util.concurrent BlockingQueue. Note
that reading from a seque can block if the reader gets ahead of the
producer.
http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/seque
My attempts so far + some tests: https://gist.github.com/934781
Solutions in Java or Clojure appreciated.
class Reader {
private final ExecutorService ex = Executors.newSingleThreadExecutor();
private final List<Object> completed = new ArrayList<Object>();
private final BlockingQueue<Object> doneQueue = new LinkedBlockingQueue<Object>();
private int pending = 0;
public synchronized Object take() {
removeDone();
queue();
Object rVal;
if(completed.isEmpty()) {
try {
rVal = doneQueue.take();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
pending--;
} else {
rVal = completed.remove(0);
}
queue();
return rVal;
}
private void removeDone() {
Object current = doneQueue.poll();
while(current != null) {
completed.add(current);
pending--;
current = doneQueue.poll();
}
}
private void queue() {
while(pending < 10) {
pending++;
ex.submit(new Runnable() {
#Override
public void run() {
doneQueue.add(compute());
}
private Object compute() {
//do actual computation here
return new Object();
}
});
}
}
}
Not exactly an answer I'm afraid, but a few remarks and more questions. My first answer would be: use clojure.core/seque. The producer needs to communicate end-of-seq somehow for the consumer to know when to stop, and I assume the number of produced elements is not known in advance. Why can't you use an EOS marker (if that's what you mean by queue poisoning)?
If I understand your alternative seque implementation correctly, it will break when elements are taken off the queue outside your function, since channel and q will be out of step in that case: channel will hold more #(.take q) elements than there are elements in q, causing it to block. There might be ways to ensure channel and q are always in step, but that would probably require implementing your own Queue class, and it adds so much complexity that I doubt it's worth it.
Also, your implementation doesn't distinguish between normal EOS and abnormal queue termination due to thread interruption - depending on what you're using it for you might want to know which is which. Personally I don't like using exceptions in this way — use exceptions for exceptional situations, not for normal flow control.

Categories

Resources