RecursiveAction/ForkJoinPool with Blocking I/O - java

I've got a Java client that needs to recursively call a server to retrieve a large data graph - requiring approximately one thousand calls. I have no control of the server, and this is required for a time-critical crash recovery scenario.
My problem is that I need my original thread to block until all calls have completed.
The RecursiveAction and ForkJoinPool abstractions from java.util.concurrent are precisely what I need, except that they are designed for CPU parallelism and forbid use of blocking I/O.
So, what would be the best way to implement recursive network calls, with the initiating thread blocking until all calls have completed?
Additional context info:
I can't modify the server.
The server allows and supports this kind of heavy querying.
I will restrict the number of concurrent network calls to something like 10-30.
Caching the data on disk is not feasible.
Additional thoughts: would a single-phase Phaser be appropriate, in conjunction with a ThreadPoolExecutor? Call tasks would call Phaser.register(), make the call, submit child tasks and then call Phaser.arrive(). The initiating thread would call Phaser.awaitAdvance(1). Would this be the most appropriate approach?

Got this working well using JDK1.7 Phasers. The pattern I used was something like this:
private void loadGraphFromServer() {
final Phaser phaser = new Phaser(1); // "1" registers the calling thread
for (final Item item : getDataListFromServer()) {
phaser.register();
executorService.submit(new Runnable() {
public void run() {
try {
getMoreDataFromServer(item.getSomeId());
// more nested loops/tasks/calls here...
}
finally {
phaser.arrive();
}
}
});
}
phaser.arriveAndAwaitAdvance(); // blocks until all tasks are complete
}

I would try using a fixed size executor pool. You can set the maximum size to 10 to 30 threads and lot it up with all 1000 requests, or add 1 request which create two more, and those two more etc. You can wait for all these request to complete with shutdown() and awaitTermination()

Related

WorkerThread: Wait for processing done (BlockingQueue)

im an building a multithreaded application, using WorkerThreads which process Tasks from BlockingQueues. The worker looks as follws (as an abstract class. subclasses implement processItem()).
abstract class WorkerThread extends Thread {
BlockingQueue<Task> q;
int tasksInSystem; // globally available
public void run() {
while(!interrupted()) {
Task t = q.take();
process(t);
tasksInSystem--;
}
}
abstract void process(Task t);
}
The special thing is that i'd like to wait for all tasks to complete.
My first idea was to:
count each added task
decrease the counter when processing completed.
But:
But there are different types of Tasks and different worker implementations and multiple queues. So I would have to maintain tons of different counters.
What I'd like to have:
q.waitForEmptyAndCompleted()
That would require the queue to keep track of the Tasks "in flight" and require the Worker Processes to signal when they are done (instead of tasksInsystem---;).
The worker is not able to increase that counter, because he would have to count the tasks after he took them from the queue. But another thread may become running right after the take() call, such that the worker was not able to increase the counter beforehand.
Hence, the counter increase and take() must be tied together (atomar). Which leads me to a specialized BlockingQueue.
I didn't find a premade solution. So my best guess is to implement my own BlockingQueue. Is there something that I could use instead (to avoid implementing and testing a thread-safe blocking queue on my own)? Or do you have any idea to implement that wait call differently?
OK, since general ExecutorService is not enough perhaps ForkJoinPool will work, it does not expose queue explicitly, but should be very easy to use given what you have described.
Key method is awaitQuiescence(long timeout, TimeUnit unit) which will wait until all submitted tasks have finished execution.

Java Executor with throttling/throughput control

I'm looking for a Java Executor that allows me to specify throttling/throughput/pacing limitations, for example, no more than say 100 tasks can be processed in a second -- if more tasks get submitted they should get queued and executed later. The main purpose of this is to avoid running into limits when hitting foreign APIs or servers.
I'm wondering whether either base Java (which I doubt, because I checked) or somewhere else reliable (e.g. Apache Commons) provides this, or if I have to write my own. Preferably something lightweight. I don't mind writing it myself, but if there's a "standard" version out there somewhere I'd at least like to look at it first.
Take a look at guavas RateLimiter:
A rate limiter. Conceptually, a rate limiter distributes permits at a
configurable rate. Each acquire() blocks if necessary until a permit
is available, and then takes it. Once acquired, permits need not be
released. Rate limiters are often used to restrict the rate at which
some physical or logical resource is accessed. This is in contrast to
Semaphore which restricts the number of concurrent accesses instead of
the rate (note though that concurrency and rate are closely related,
e.g. see Little's Law).
Its threadsafe, but still #Beta. Might be worth a try anyway.
You would have to wrap each call to the Executor with respect to the rate limiter. For a more clean solution you could create some kind of wrapper for the ExecutorService.
From the javadoc:
final RateLimiter rateLimiter = RateLimiter.create(2.0); // rate is "2 permits per second"
void submitTasks(List<Runnable> tasks, Executor executor) {
for (Runnable task : tasks) {
rateLimiter.acquire(); // may wait
executor.execute(task);
}
}
The Java Executor doesn't offer such a limitation, only limitation by amount of threads, which is not what you are looking for.
In general the Executor is the wrong place to limit such actions anyway, it should be at the moment where the Thread tries to call the outside server. You can do this for example by having a limiting Semaphore that threads wait on before they submit their requests.
Calling Thread:
public void run() {
// ...
requestLimiter.acquire();
connection.send();
// ...
}
While at the same time you schedule a (single) secondary thread to periodically (like every 60 seconds) releases acquired resources:
public void run() {
// ...
requestLimiter.drainPermits(); // make sure not more than max are released by draining the Semaphore empty
requestLimiter.release(MAX_NUM_REQUESTS);
// ...
}
no more than say 100 tasks can be processed in a second -- if more
tasks get submitted they should get queued and executed later
You need to look into Executors.newFixedThreadPool(int limit). This will allow you to limit the number of threads that can be executed simultaneously. If you submit more than one thread, they will be queued and executed later.
ExecutorService threadPool = Executors.newFixedThreadPool(100);
Future<?> result1 = threadPool.submit(runnable1);
Future<?> result2 = threadPool.submit(runnable2);
Futurte<SomeClass> result3 = threadPool.submit(callable1);
...
Snippet above shows how you would work with an ExecutorService that allows no more than 100 threads to be executed simultaneously.
Update:
After going over the comments, here is what I have come up with (kinda stupid). How about manually keeping a track of threads that are to be executed ? How about storing them first in an ArrayList and then submitting them to the Executor based on how many threads have already been executed in the last one second.
So, lets say 200 tasks have been submitted into our maintained ArrayList, We can iterate and add 100 to the Executor. When a second passes, we can add few more threads based on how many have completed in theExecutor and so on
Depending on the scenario, and as suggested in one of the previous responses, the basic functionalities of a ThreadPoolExecutor may do the trick.
But if the threadpool is shared by multiple clients and you want to throttle, to restrict the usage of each one of them, making sure that one client won't use all the threads, then a BoundedExecutor will do the work.
More details can be found in the following example:
http://jcip.net/listings/BoundedExecutor.java
Personally I found this scenario quite interesting. In my case, I wanted to stress that the interesting phase to throttle is the consuming side one, as in classical Producer/Consumer concurrent theory. That's the opposite of some of the suggested answers before. This is, we don't want to block the submitting thread, but block the consuming threads based in a rate (tasks/second) policy. So, even if there are tasks ready in the queue, executing/consuming Threads may block waiting to meet the throtle policy.
That said, I think a good candidate would be the Executors.newScheduledThreadPool(int corePoolSize). This way you would need a simple queue in front of the executor (a simple LinkedBlockingQueue would suit), and then schedule a periodic task to pick actual tasks from the queue (ScheduledExecutorService.scheduleAtFixedRate). So, is not an straightforward solution, but it should perform goog enough if you try to throttle the consumers as discussed before.
Can limit it inside Runnable:
public static Runnable throttle (Runnable realRunner, long delay) {
Runnable throttleRunner = new Runnable() {
// whether is waiting to run
private boolean _isWaiting = false;
// target time to run realRunner
private long _timeToRun;
// specified delay time to wait
private long _delay = delay;
// Runnable that has the real task to run
private Runnable _realRunner = realRunner;
#Override
public void run() {
// current time
long now;
synchronized (this) {
// another thread is waiting, skip
if (_isWaiting) return;
now = System.currentTimeMillis();
// update time to run
// do not update it each time since
// you do not want to postpone it unlimited
_timeToRun = now+_delay;
// set waiting status
_isWaiting = true;
}
try {
Thread.sleep(_timeToRun-now);
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
// clear waiting status before run
_isWaiting = false;
// do the real task
_realRunner.run();
}
}};
return throttleRunner;
}
Take from JAVA Thread Debounce and Throttle

Java's FutureTask composability

I try to work with Java's FutureTask, Future, Runnable, Callable and ExecutorService types.
What is the best practice to compose those building blocks?
Given that I have multiple FutureTasks and and I want to execute them in sequence.
Ofcourse I could make another FutureTask which is submitting / waiting for result for each subtask in sequence, but I want to avoid blocking calls.
Another option would be to let those subtasks invoke a callback when they complete, and schedule the next task in the callback. But going that route, how to I create a proper outer FutureTask object which also handles exceptions in the subtask without producing that much of a boilerplate?
Do I miss something here?
Very important thing, though usually not described in tutorials:
Runnables to be executed on an ExecutorService should not block. This is because each blocking switches off a working thread, and if ExecutorService has limited number of working threads, there is a risk to fall into deadlock (thread starvation), and if ExecutorService has unlimited number of working threads, then there is a risk to run out of memory. Blocking operations in the tasks simply destroy all advantages of ExecutorService, so use blocking operations on usual threads only.
FutureTask.get() is blocking operation, so can be used on ordinary threads and not from an ExecutorService task. That is, it cannot serve as a building block, but only to deliver result of execution to the master thread.
Right approach to build execution from tasks is to start next task when all input data for the next task is ready, so that the task do not have to block waiting for input data. So you need a kind of a gate which stores intermediate results and starts new task when all arguments have arrived. Thus tasks do not bother explicitly to start other tasks. So a gate, which consists of input sockets for arguments and a Runnable to compute them, can be considered as a right building block for computations on ExcutorServices.
This approach is called dataflow or workflow (if gates cannot be created dynamically).
Actor frameworks like Akka use this approach but are limited in the fact that an actor is a gate with single input socket.
I have written a true dataflow library published at https://github.com/rfqu/df4j.
I tried to do something similar with a ScheduledFuture, trying to cause a delay before things were displayed to the user. This is what I come up with, simply use the same ScheduledFuture for all your 'delays'. The code was:
public static final ScheduledExecutorService scheduler = Executors
.newScheduledThreadPool(1);
public ScheduledFuture delay = null;
delay = scheduler.schedule(new Runnable() {
#Override
public void run() {
//do something
}
}, 1000, TimeUnit.MILLISECONDS);
delay = scheduler.schedule(new Runnable() {
#Override
public void run() {
//do something else
}
}, 2000, TimeUnit.MILLISECONDS);
Hope this helps
Andy
The usual approach is to:
Decide about ExecutorService (which type, how many threads).
Decide about the task queue (for how long it could be non-blocking).
If you have some external code that waits for the task result:
* Submit tasks as Callables (this is non blocking as long as you do not run out of the queue).
* Call get on the Future.
If you want some actions to be taken automatically after the task is finished:
You can submit as Callables or Runnables.
Just add that you need to do at the end as the last code inside the task. Use
Activity.runOnUIThread these final actions need to modify GUI.
Normally, you should not actively check when you can submit one more task or schedule callback in order just to submit them. The thread queue (blocking, if preferred) will handle this for you.

Having threads run from event listeners in java?

I have a program that creates hundreds of instances of a class, each of which listens to another thread which simply fires an event on a regular timed schedule (so that they all perform at the same speed). What I'd like is for each of the hundreds of instances to be its own thread, so that when an event is fired, they can all work in parallel. What makes sense to me is to have these classes extend the Thread class and then have this code inside them...
public class IteratorStepListener implements StepEventListener {
public void actionPerformed(ActionEvent e) {
start();
}
}
public void run() {
doStuff();
}
This doesn't seem to work though. Clearly I'm not understanding something basic here. What's the proper way to do this?
Okay, first thing: overcome the notion that your hundreds of threads will run in parallel. At the very best, they will run concurrently, ie, time-sliced. As you get into the hundreds of threads, you will see the bearings on the scheduling algorithm start to glow; in the thousands they'll smoke and eventually seize up, and you'll get no more threads.
Now, that said, we don't have near enough code to understand what you're really doing, but one thing that I note is you don't seem to be making new Threads. Remember that a thread is an object; the canonical way to start a thread is
Thread t = new Thread(Runnable r);
t.run();
What it looks like is that you're trying to run() the same thread over and over again; this way lies madness. Have a look at Wiki on Event Driven Programming. If you really want to have a separate thread for handling each event, you'll want a scheme something like this (pseudocode):
processEvents: function
eventQueue: queue of Events
event: implements Runnable
-- something produces events and puts them on the queue
loop -- forever
do
Event ev := eventQueue.front
new Thread(ev).run();
od
end -- processEvents
It sounds like the event is going to be fired more than once... but you can't start the same thread more than once.
It sounds like your listener should implement the interface but start a thread directly in actionPerformed (or better, use an Executor so that it could use a thread pool). So instead of your current implementation, you could use:
// Assuming the listener implements runnable; you may want to
// delegate that to a separate class for separation of concerns.
public void actionPerformed(ActionEvent e) {
new Thread(this).start();
}
or
public void actionPerformed(ActionEvent e) {
executor.execute(this);
}
What I'd like is for each of the hundreds of instances to be its own thread, so that when an event is fired, they can all work in parallel.
I don't think this is a good approach.
Unless you have hundreds of processors, the threads cannot possibly all work in parallel. You'll end up with the threads running them one at a time (one per processor), or time-slicing between processors.
Each thread actually ties down a significant slice of the JVM's resources, even when inactive. IIRC, the default stack size is about 1 Mbyte.
The example code in your question shows the event calling start() on the thread. Unfortunately, you can only call start() on a thread once. Once the thread has terminated it cannot be restarted.
A better approach would be to create an executor with a bounded thread pool, and have each event cause a new task to be submitted to the executor. Something like this:
ThreadPoolExecutor executor = new ThreadPoolExecutor(corePoolSize, maxPoolSize,
keepAliveTime, timeUnit, workQueue);
...
public class IteratorStepListener implements StepEventListener, Runnable {
public void actionPerformed(ActionEvent e) {
executor.submit(this);
}
public void run() {
doStuff();
}
}
You can't use threads like that in Java. This is because Java threads directly map to underlying OS threads (at least on JVM implementations that I'm aware of), and OS threads can't scale like that. A rule of thumb is, you want to keep total number of threads within hundred or something in an app. A few hundred is probably ok. A few thousand gets usually problematic, depending on the HW you are using.
The use of threads like you described is a valid implementation strategy in languages like Erlang for example. Meanwhile, if you are stuck with Java this time, creating a shared thread pool and submitting your tasks to this instead of allowing all tasks to run concurrently might be a good alternative. In this case, you can choose a suitable number of threads (best number depends on the nature of the task. If you have no idea, number of CPU core available times 2 is a good start), and have that number of tasks run concurrently.
If you absolutely need all tasks to proceed concurrently, it could get a little complicated, but that's doable as well.

What are the advantages of using an ExecutorService?

What is the advantage of using ExecutorService over running threads passing a Runnable into the Thread constructor?
ExecutorService abstracts away many of the complexities associated with the lower-level abstractions like raw Thread. It provides mechanisms for safely starting, closing down, submitting, executing, and blocking on the successful or abrupt termination of tasks (expressed as Runnable or Callable).
From JCiP, Section 6.2, straight from the horse's mouth:
Executor may be a simple interface, but it forms the basis for a flexible and powerful framework for asynchronous task execution that supports a wide variety of task execution policies. It provides a standard means of decoupling task submission from task execution, describing tasks as Runnable. The Executor implementations also provide lifecycle support and hooks for adding statistics gathering, application management, and monitoring.
...
Using an Executor is usually the easiest path to implementing a producer-consumer design in your application.
Rather than spending your time implementing (often incorrectly, and with great effort) the underlying infrastructure for parallelism, the j.u.concurrent framework allows you to instead focus on structuring tasks, dependencies, potential parallelism. For a large swath of concurrent applications, it is straightforward to identify and exploit task boundaries and make use of j.u.c, allowing you to focus on the much smaller subset of true concurrency challenges which may require more specialized solutions.
Also, despite the boilerplate look and feel, the Oracle API page summarizing the concurrency utilities includes some really solid arguments for using them, not least:
Developers are likely to already
understand the standard library
classes, so there is no need to learn
the API and behavior of ad-hoc
concurrent components. Additionally,
concurrent applications are far
simpler to debug when they are built
on reliable, well-tested components.
Java concurrency in practice is a good book on concurrency. If you haven't already, get yourself a copy. The comprehensive approach to concurrency presented there goes well beyond this question, and will save you a lot of heartache in the long run.
An advantage I see is in managing/scheduling several threads. With ExecutorService, you don't have to write your own thread manager which can be plagued with bugs. This is especially useful if your program needs to run several threads at once. For example you want to execute two threads at a time, you can easily do it like this:
ExecutorService exec = Executors.newFixedThreadPool(2);
exec.execute(new Runnable() {
public void run() {
System.out.println("Hello world");
}
});
exec.shutdown();
The example may be trivial, but try to think that the "hello world" line consists of a heavy operation and you want that operation to run in several threads at a time in order to improve your program's performance. This is just one example, there are still many cases that you want to schedule or run several threads and use ExecutorService as your thread manager.
For running a single thread, I don't see any clear advantage of using ExecutorService.
The following limitations from traditional Thread overcome by Executor framework(built-in Thread Pool framework).
Poor Resource Management i.e. It keep on creating new resource for every request. No limit to creating resource. Using Executor framework we can reuse the existing resources and put limit on creating resources.
Not Robust : If we keep on creating new thread we will get StackOverflowException exception consequently our JVM will crash.
Overhead Creation of time : For each request we need to create new resource. To creating new resource is time consuming. i.e. Thread Creating > task. Using Executor framework we can get built in Thread Pool.
Benefits of Thread Pool
Use of Thread Pool reduces response time by avoiding thread creation during request or task processing.
Use of Thread Pool allows you to change your execution policy as you need. you can go from single thread to multiple thread by just replacing ExecutorService implementation.
Thread Pool in Java application increases stability of system by creating a configured number of threads decided based on system load and available resource.
Thread Pool frees application developer from thread management stuff and allows to focus on business logic.
Source
Below are some benefits:
Executor service manage thread in asynchronous way
Use Future callable to get the return result after thread completion.
Manage allocation of work to free thread and resale completed work from thread for assigning new work automatically
fork - join framework for parallel processing
Better communication between threads
invokeAll and invokeAny give more control to run any or all thread at once
shutdown provide capability for completion of all thread assigned work
Scheduled Executor Services provide methods for producing repeating invocations of runnables and callables
Hope it will help you
Is it really that expensive to create a new thread?
As a benchmark, I just created 60,000 threads with Runnables with empty run() methods. After creating each thread, I called its start(..) method immediately. This took about 30 seconds of intense CPU activity. Similar experiments have been done in response to this question. The summary of those is that if the threads do not finish immediately, and a large number of active threads accumulate (a few thousand), then there will be problems: (1) each thread has a stack, so you will run out of memory, (2) there might be a limit on the number of threads per process imposed by the OS, but not necessarily, it seems.
So, as far as I can see, if we're talking about launching say 10 threads per second, and they all finish faster than new ones start, and we can guarantee that this rate won't be exceeded too much, then the ExecutorService doesn't offer any concrete advantage in visible performance or stability. (Though it may still make it more convenient or readable to express certain concurrency ideas in code.) On the other hand, if you might be scheduling hundreds or thousands of tasks per second, which take time to run, you could run into big problems straight away. This might happen unexpectedly, e.g. if you create threads in response to requests to a server, and there is a spike in the intensity of requests that your server receives. But e.g. one thread in response to every user input event (key press, mouse motion) seems to be perfectly fine, as long as the tasks are brief.
ExecutorService also gives access to FutureTask which will return to the calling class the results of a background task once completed. In the case of implementing Callable
public class TaskOne implements Callable<String> {
#Override
public String call() throws Exception {
String message = "Task One here. . .";
return message;
}
}
public class TaskTwo implements Callable<String> {
#Override
public String call() throws Exception {
String message = "Task Two here . . . ";
return message;
}
}
// from the calling class
ExecutorService service = Executors.newFixedThreadPool(2);
// set of Callable types
Set<Callable<String>>callables = new HashSet<Callable<String>>();
// add tasks to Set
callables.add(new TaskOne());
callables.add(new TaskTwo());
// list of Future<String> types stores the result of invokeAll()
List<Future<String>>futures = service.invokeAll(callables);
// iterate through the list and print results from get();
for(Future<String>future : futures) {
System.out.println(future.get());
}
Prior to java 1.5 version, Thread/Runnable was designed for two separate services
Unit of work
Execution of that unit of work
ExecutorService decouples those two services by designating Runnable/Callable as unit of work and Executor as a mechanism to execute ( with lifecycling) the unit of work
Executor Framework
//Task
Runnable someTask = new Runnable() {
#Override
public void run() {
System.out.println("Hello World!");
}
};
//Thread
Thread thread = new Thread(someTask);
thread.start();
//Executor
Executor executor = new Executor() {
#Override
public void execute(Runnable command) {
Thread thread = new Thread(someTask);
thread.start();
}
};
Executor is just an interface which accept Runnable. execute() method can just call command.run() or working with other classes which use Runnable(e.g. Thread)
interface Executor
execute(Runnable command)
ExecutorService interface which extends Executor and adds methods for managing - shutdown() and submit() which returns Future[About] - get(), cancel()
interface ExecutorService extends Executor
Future<?> submit(Runnable task)
shutdown()
...
ScheduledExecutorService extends ExecutorService for planning executing tasks
interface ScheduledExecutorService extends ExecutorService
schedule()
Executors class which is a Factory to provide ExecutorService realisations for running async tasks[About]
class Executors
newFixedThreadPool() returns ThreadPoolExecutor
newCachedThreadPool() returns ThreadPoolExecutor
newSingleThreadExecutor() returns FinalizableDelegatedExecutorService
newWorkStealingPool() returns ForkJoinPool
newSingleThreadScheduledExecutor() returns DelegatedScheduledExecutorService
newScheduledThreadPool() returns ScheduledThreadPoolExecutor
...
Conclusion
Working with Thread is an expensive operation for CPU and memory.
ThreadPoolExecutor consist of Task Queue(BlockingQueue) and Thread Pool(Set of Worker) which have better performance and API to handle async tasks
Creating a large number of threads with no restriction to the maximum threshold can cause application to run out of heap memory. Because of that creating a ThreadPool is much better solution. Using ThreadPool we can limit the number of threads can be pooled and reused.
Executors framework facilitate process of creating Thread pools in java. Executors class provide simple implementation of ExecutorService using ThreadPoolExecutor.
Source:
What is Executors Framework

Categories

Resources