Is there a Java class such that:
Executable tasks can be added via an id, where all tasks with the same id are guaranteed to never run concurrently
The number of threads can be limited to a fixed amount
A naive solution of a Map would easily solve (1), but it would be difficult to manage (2). Similarly, all thread pooling classes that I know of will pull from a single queue, meaning (1) is not guaranteed.
Solutions involving external libraries are welcome.
For each id, you need a SerialExecutor, described in the documentation of java.util.concurrent.Executor. All serial executors delegate work to a ThreadPoolExecutor with given corePoolSize.
Opimized version of SerialExecutor can be found at my code samples.
If you don't find something that does this out of the box, it shouldn't be hard to roll your own. One thing you could do is to wrap each task in a simple class that reads on a queue unique per id, e.g.:
public static class SerialCaller<T> implements Callable<T> {
private final BlockingQueue<Caller<T>> delegates;
public SerialCaller(BLockingQueue<Caller<T>> delegates) {
this.delegates = delegates;
}
public T call() throws Exception {
return delegates.take().call();
}
}
It should be easy to maintain a map of ids to queues for submitting tasks. That satisfies condition (1), and then you can look for simple solutions to condition (2), such as Executors. newFixedThreadPool
I think that the simplest solution is to just have a separate queue for each index and a separate executor (with one thread) for each queue.
The only thing you could achieve with a more complex solution would be to use fewer threads, but if the number of indexes is small and bounded that's probably not worth the effort.
Yes, there is such a library now: https://github.com/jano7/executor
int maxTasks = 10;
ExecutorService underlyingExecutor = Executors.newFixedThreadPool(maxTasks);
KeySequentialBoundedExecutor executor = new KeySequentialBoundedExecutor(maxTasks, underlyingExecutor);
Runnable task = new Runnable() {
#Override
public void run() {
// do something
}
};
executor.execute(new KeyRunnable<>("ID-1", task)); // execute the task by the underlying executor
executor.execute(new KeyRunnable<>("ID-2", task)); // execution is not blocked by the task for ID-1
executor.execute(new KeyRunnable<>("ID-1", task)); // execution starts when the previous task for ID-1 completes
Related
In C++ you can start a thread with a deferred or asynchronous launch policy. Is there a way to replicate this functionality in Java?
auto T1 = std::async(std::launch::deferred, doSomething());
auto T2 = std::async(std::launch::async, doSomething());
Descriptions of each--
Asynchronous:
If the async flag is set, then async executes the callable object f on a new thread of execution (with all thread-locals initialized) except that if the function f returns a value or throws an exception, it is stored in the shared state accessible through the std::future that async returns to the caller.
Deferred:
If the deferred flag is set, then async converts f and args... the same way as by std::thread constructor, but does not spawn a new thread of execution. Instead, lazy evaluation is performed: the first call to a non-timed wait function on the std::future that async returned to the caller will cause the copy of f to be invoked (as an rvalue) with the copies of args... (also passed as rvalues) in the current thread (which does not have to be the thread that originally called std::async). The result or exception is placed in the shared state associated with the future and only then it is made ready. All further accesses to the same std::future will return the result immediately.
See the documentation for details.
Future
First of all, we have to observe that std::async is a tool to execute a given task and return a std::future object that holds the result of the computation once its available.
For example we can call result.get() to block and wait for the result to arrive. Also, when the computation encountered an exception, it will be stored and rethrown to us as soon as we call result.get().
Java provides similar classes, the interface is Future and the most relevant implementation is CompletableFuture.
std::future#get translates roughly to Future#get. Even the exceptional behavior is very similar. While C++ rethrows the exception upon calling get, Java will throw a ExecutionException which has the original exception set as cause.
How to obtain a Future?
In C++ you create your future object using std::async. In Java you could use one of the many static helper methods in CompletableFuture. In your case, the most relevant are
CompletableFuture#runAsync, if the task does not return any result and
CompletableFuture#supplyAsync, if the task will return a result upon completion
So in order to create a future that just prints Hello World!, you could for example do
CompletableFuture<Void> task = CompletableFuture.runAsync(() -> System.out.println("Hello World!"));
/*...*/
task.get();
Java not only has lambdas but also method references. Lets say you have a method that computes a heavy math task:
class MyMath {
static int compute() {
// Very heavy, duh
return (int) Math.pow(2, 5);
}
}
Then you could create a future that returns the result once its available as
CompletableFuture<Integer> task = CompletableFuture.runAsync(MyMath::compute);
/*...*/
Integer result = task.get();
async vs deferred
In C++, you have the option to specify a launch policy which dictates the threading behavior for the task. Let us put the memory promises C++ makes aside, because in Java you do not have that much control over memory.
The differences are that async will immediately schedule creation of a thread and execute the task in that thread. The result will be available at some point and is computed while you can continue work in your main task. The exact details whether it is a new thread or a cached thread depend on the compiler and are not specified.
deferred behaves completely different to that. Basically nothing happens when you call std::async, no extra thread will be created and the task will not be computed yet. The result will not be made available in the meantime at all. However, as soon as you call get, the task will be computed in your current thread and return a result. Basically as if you would have called the method directly yourself, without any async utilities at all.
std::launch::async in Java
That said, lets focus on how to translate this behavior to Java. Lets start with async.
This is the simple one, as it is basically the default and intended behavior offered in CompletableFuture. So you just do runAsync or supplyAsync, depending on whether your method returns a result or not. Let me show the previous examples again:
// without result
CompletableFuture<Void> task = CompletableFuture.runAsync(() -> System.out.println("Hello World!"));
/*...*/ // the task is computed in the meantime in a different thread
task.get();
// with result
CompletableFuture<Integer> task = CompletableFuture.supplyAsync(MyMath::compute);
/*...*/
Integer result = task.get();
Note that there are also overloads of the methods that except an Executor which can be used if you have your own thread pool and want CompletableFuture to use that instead of its own (see here for more details).
std::launch::deferred in Java
I tried around a lot to mock this behavior with CompletableFuture but it does not seem to be possibly without creating your own implementation (please correct me if I am wrong though). No matter what, it either executes directly upon creation or not at all.
So I would just propose to use the underlying task interface that you gave to CompletableFuture, for example Runnable or Supplier, directly. In our case, we might also use IntSupplier to avoid the autoboxing.
Here are the two code examples again, but this time with deferred behavior:
// without result
Runnable task = () -> System.out.println("Hello World!");
/*...*/ // the task is not computed in the meantime, no threads involved
task.run(); // the task is computed now
// with result
IntSupplier task = MyMath::compute;
/*...*/
int result = task.getAsInt();
Modern multithreading in Java
As a final note I would like to give you a better idea how multithreading is typically used in Java nowadays. The provided facilities are much richer than what C++ offers by default.
Ideally should design your system in a way that you do not have to care about such little threading details. You create an automatically managed dynamic thread pool using Executors and then launch your initial task against that (or use the default executor service provided by CompletableFuture). After that, you just setup an operation pipeline on the future object, similar to the Stream API and then just wait on the final future object.
For example, let us suppose you have a list of file names List<String> fileNames and you want to
read the file
validate its content, skip it if its invalid
compress the file
upload the file to some web server
check the response status code
and count how many where invalid, not successfull and successfull. Suppose you have some methods like
class FileUploader {
static byte[] readFile(String name) { /*...*/ }
static byte[] requireValid(byte[] content) throws IllegalStateException { /*...*/ }
static byte[] compressContent(byte[] content) { /*...*/ }
static int uploadContent(byte[] content) { /*...*/ }
}
then we can do so easily by
AtomicInteger successfull = new AtomicInteger();
AtomicInteger notSuccessfull = new AtomicInteger();
AtomicInteger invalid = new AtomicInteger();
// Setup the pipeline
List<CompletableFuture<Void>> tasks = fileNames.stream()
.map(name -> CompletableFuture
.completedFuture(name)
.thenApplyAsync(FileUploader::readFile)
.thenApplyAsync(FileUploader::requireValid)
.thenApplyAsync(FileUploader::compressContent)
.thenApplyAsync(FileUploader::uploadContent)
.handleAsync((statusCode, exception) -> {
AtomicInteger counter;
if (exception == null) {
counter = statusCode == 200 ? successfull : notSuccessfull;
} else {
counter = invalid;
}
counter.incrementAndGet();
})
).collect(Collectors.toList());
// Wait until all tasks are done
tasks.forEach(CompletableFuture::join);
// Print the results
System.out.printf("Successfull %d, not successfull %d, invalid %d%n", successfull.get(), notSuccessfull.get(), invalid.get());
The huge benefit of this is that it will reach max throughput and use all hardware capacity offered by your system. All tasks are executed completely dynamic and independent, managed by an automatic pool of threads. And you just wait until everything is done.
For asynchronous launch of a thread, in modern Java prefer the use of a high-level java.util.concurrent.ExecutorService.
One way to obtain an ExecutorService is through java.util.concurrent.Executors. Different behaviors are available for ExecutorServices; the Executors class provides methods for some common cases.
Once you have an ExecutorService, you can submit Runnables and Callables to it.
Future<MyReturnValue> myFuture = myExecutorService.submit(myTask);
If I understood you correctly, may be something like this:
private static CompletableFuture<Void> deferred(Runnable run) {
CompletableFuture<Void> future = new CompletableFuture<>();
future.thenRun(run);
return future;
}
private static CompletableFuture<Void> async(Runnable run) {
return CompletableFuture.runAsync(run);
}
And then using them like:
public static void main(String[] args) throws Exception {
CompletableFuture<Void> def = deferred(() -> System.out.println("run"));
def.complete(null);
System.out.println(def.join());
CompletableFuture<Void> async = async(() -> System.out.println("run async"));
async.join();
}
To get something like a deferred thread, you might try running a thread at a reduced priority.
First, in Java it's often idiomatic to make a task using a Runnable first. You can also use the Callable<T> interface, which allows the thread to return a value (Runnable can't).
public class MyTask implements Runnable {
#Override
public void run() {
System.out.println( "hello thread." );
}
}
Then just create a thread. In Java threads normally wrap the task they execute.
MyTask myTask = new MyTask();
Thread t = new Tread( myTask );
t.setPriority( Thread.currentThread().getPriority()-1 );
t.start();
This should not run until there is a core available to do so, which means it shouldn't run until the current thread is blocked or run out of things to do. However you're at the mercy of the OS scheduler here, so the specific operation is not guaranteed. Most OSs will guarantee that all threads run eventually, so if the current thread takes a long time with out blocking the OSs will start it executing anyway.
setPriority() can throw a security exception if you're not allowed to set the priority of a thread (uncommon but possible). So just be aware of that minor inconvenience.
For an asynch task with a Future I would use an executor service. The helper methods in the class Executors are a convenient way to do this.
First make your task as before.
public class MyCallable implements Callable<String> {
#Override
public String call() {
return "hello future thread.";
}
}
Then use an executor service to run it:
MyCallable myCallable = new MyCallable();
ExecutorService es = Executors.newCachedThreadPool();
Future<String> f = es.submit( myCallable );
You can use the Future object to query the thread, determine its running status and get the value it returns. You will need to shutdown the executor to stop all of its threads before exiting the JVM.
es.shutdown();
I've tried to write this code as simply as possible, without the use of lambdas or clever use of generics. The above should show you what those lambdas are actually implementing. However it's usually considered better to be a bit more sophisticated when writing code (and a bit less verbose) so you should investigate other syntax once you feel you understand the above.
I have the following classes:
WorkerTask.java
public interface WorkerTask extends Task {
// Constants
public static final short WORKERTASK_SPIDER = 1;
public static final short WORKERTASK_PARSER = 2;
public static final short WORKERTASK_PRODUCT = 3;
public int getType();
}
WorkerPool.java
class workerPool {
private ThreadPoolExecutor executorPool_;
//----------------------------------------------------
public WorkerPool(int poolSize)
{
executorPool_ = new ThreadPoolExecutor(
poolSize,5,10,TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(10000000,false),
Executors.defaultThreadFactory()
);
//----------------------------------------------------
public void assign(WorkerTask workerTask) {
executorPool_.execute(new WorkerThread(workerTask));
}
//----------------------------------------------------
public void removeTasks(int siteID) {
executorPool_.getQueue().removeIf(...);
}
}
I want to call the method removeTasks to remove certain amount of pending tasks but I have no idea of how to use the method removeIf. It says: Removes all of the elements of this collection that satisfy the given predicate, but I have no idea how to create the parameter Predicate. Any idea?
If you had a Queue<WorkerTask>, you could do something like this:
queue.removeIf(task -> task.getSiteID() == siteID)
There are several problems. One problem is that the queue you get from getQueue() is BlockingQueue<Runnable> and not Queue<WorkerTask>. If you're submitting Runnable instances to the pool, the queue might contain references to your actual tasks; if so, you could downcast them to WorkerTask. However, this isn't guaranteed. Furthermore, the class doc for ThreadPoolExecutor says (under "Queue maintenance"):
Method getQueue() allows access to the work queue for purposes of monitoring and debugging. Use of this method for any other purpose is strongly discouraged. Two supplied methods, remove(Runnable) and purge() are available to assist in storage reclamation when large numbers of queued tasks become cancelled.
Looking at the remove(Runnable) method, its doc says
It may fail to remove tasks that have been converted into other forms before being placed on the internal queue.
This suggests that you should hang onto the Runnable instances that have been submitted in order to call remove() on them later. Or, call submit(Runnable) to get a Future and save those instances around in order to cancel them.
But there is also a second problem that probably renders this approach inadequate. Suppose you've found a way to remove or cancel the matching tasks from the queue. Another thread might have decided to submit a new task that matches, but hasn't submitted it yet. There's a race condition here. You might be able to cancel the enqueued tasks, but after you've done so, you can't guarantee that new matching tasks haven't been submitted.
Here's an alternative approach. Presumably, when you cancel (or whatever) a site ID, there's some logic somewhere to stop submitting new tasks that match that side ID. The problem is how to deal with matching tasks that are "in-flight," that is, that are in the queue or are about to be enqueued.
Instead of trying to cancel the matching tasks, change the task so that if its site ID has been canceled, the task turns into a no-op. You could record the cancellation of a site ID in, say, a ConcurrentHashMap. Any task would check this map before beginning its work, and if the site ID is present, it'd simply return. Adding a site ID to the map would have the immediate effect of ensuring that no new task on that site ID will commence. (Tasks that have already started will run to completion.) Any in-flight tasks will eventually drain from the queue without causing any actual work to occur.
A predicate is a function that receives an input and returns a boolean value.
If you are using java 8 you can use lambda expressions:
(elem) -> return elem.id == siteID
I'm using Spring ThreadPoolTaskExecutor in order to execute my threads.
I want to group my threads in several groups, and that every group will have different max allowed threads.
For example, something like this:
for (MyTask myTask : myTaskList){
threadPoolTaskExecutor.setMaxThreadsForGroup(myTask.getGroupName(), myTask.getMaxThreads());
threadPoolTaskExecutor.execute(myTask, myTask.getGroupName());
}
Somehow the threadPoolTaskExecutor should know to allow only myTask.getMaxThreads() to every group named myTask.getGroupName(), and the max threads in all tasks all together should not exceed what defined for threadPoolTaskExecutor in the applicationContext.xml
is it possible to do it in simple way?
Thanks
I can see two ways of doing this. The first (and simpler) way is to create a Map<String, ExecutorService> which maps your group names to a specific executor with a max thread limit. This doesn't satisfy your requirement to have a max number of threads overall, but I would argue that this requirement might be an unreasonable one since you can never have total control over the number of threads running in your Java application anyway.
The second way, while more complex, gives you more control. You can have a single executor with a max pool size, and instead of submitting your jobs to it directly, you submit worker tasks which take the real tasks of a BlockingQueue and process them until there are none left. The number of worker tasks you submit will be equal to the group thread limit. The pseudo-code might look like this:
ExecutorService executor = ...
int groupThreadLimit = 3;
final BlockingQueue<Runnable> groupTaskQueue = ...;
// Add all your tasks to the groupTaskQueue.
for(int i = 0; i < groupThreadLimit; i++) {
executor.execute(new Runnable() {
public void run() {
while(true) {
Runnable r = groupTaskQueue.pollFirst();
if(r == null) {
return; // All tasks complete or being processed. Queue empty.
}
r.run();
}
}
});
}
The only slight drawback with this technique is that once the worker tasks start, they won't yield until all the subtasks are complete. This might be bad if you have a fair usage policy and want to avoid starvation.
This question already has answers here:
How to wait for all threads to finish, using ExecutorService?
(27 answers)
Closed 5 years ago.
I need to submit a number of task and then wait for them until all results are available. Each of them adds a String to a Vector(that is synchronized by default). Then I need to start a new task for each result in the Vector but I need to do this only when all the previous tasks have stopped doing their job.
I want to use Java Executor, in particular I tried using Executors.newFixedThreadPool(100) in order to use a fixed number of thread (I have a variable number of task that can be 10 or 500) but I'm new with executors and I don't know how to wait for task termination.
This is something like a pseudocode of what my program needs to do:
ExecutorService e = Executors.newFixedThreadPool(100);
while(true){
/*do something*/
for(...){
<start task>
}
<wait for all task termination>
for each String in result{
<start task>
}
<wait for all task termination>
}
I can't do a e.shutdown because I'm in a while(true) and I need to reuse the executorService...
Can you help me? Can you suggest me a guide/book about java executors?
The ExecutorService gives you a mechanism to execute multiple tasks simultaneously and get a collection of Future objects back (representing the asynchronous computation of the task).
Collection<Callable<?>> tasks = new LinkedList<Callable<?>>();
//populate tasks
for (Future<?> f : executorService.invokeAll(tasks)) { //invokeAll() blocks until ALL tasks submitted to executor complete
f.get();
}
If you have Runnables instead of Callables, you can easily turn a Runnable into a Callable<Object> using the method:
Callable<?> c = Executors.callable(runnable);
Can you suggest me a guide/book about
java executors??
I can answer this part:
Java Concurrency in Practice by Brian Goetz (with Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes and Doug Lea) is most likely your best bet.
It's not only about executors though, but instead covers java.util.concurrent package in general, as well as basic concurrency concepts and techniques, and some advanced topics such as the Java memory model.
Rather than submitting Runnables or Callables to an Executor directly and storing the corresponding Future return values I'd recommend using a CompletionService implementation to retrieve each Future when it completes. This approach decouples the production of tasks from the consumption of completed tasks, allowing for example new tasks to originate on a producer thread over a period of time.
Collection<Callable<Result>> workItems = ...
ExecutorService executor = Executors.newSingleThreadExecutor();
CompletionService<Result> compService = new ExecutorCompletionService<Result>(executor);
// Add work items to Executor.
for (Callable<Result> workItem : workItems) {
compService.submit(workItem);
}
// Consume results as they complete (this would typically occur on a different thread).
for (int i=0; i<workItems.size(); ++i) {
Future<Result> fut = compService.take(); // Will block until a result is available.
Result result = fut.get(); // Extract result; this will not block.
}
When you submit to an executor service, you'll get a Future object back.
Store those objects in a collection, and then call get() on each in turn. get() blocks until the underlying job completes, and so the result is that calling get() on each will complete once all underlying jobs have finished.
e.g.
Collection<Future> futures = ...
for (Future f : futures) {
Object result = f.get();
// maybe do something with the result. This could be a
// genericised Future<T>
}
System.out.println("Tasks completed");
Once all these have completed, then begin your second submission. Note that this might not be an optimal use of your thread pool, since it will become dormant, and then you're re-populating it. If possible try and keep it busy doing stuff.
ExecutorService executor = ...
//submit tasks
executor.shutdown(); // previously submitted tasks are executed,
// but no new tasks will be accepted
while(!executor.awaitTermination(1, TimeUnit.SECONDS))
;
There's no easy way to do what you want without creating custom ExecutorService.
this is how i normally iterate a collection
for(Iterator iterator = collectionthing.iterator(); iterator.hasNext();){
I believe most of us doing this, I wonder is there any better approach than have to iterate sequentially? is there any java library..can I can make this parallel executed by multi-code cpu? =)
looking forward feedback from you all.
Java's multithreading is quite low level in this respect. The best you could do is something like this:
ExecutorService executor = Executors.newFixedThreadPool(10);
for (final Object item : collectionThingy) {
executor.submit(new Runnable() {
#Override
public void run() {
// do stuff with item
}
});
}
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
This is Java 6 code. If running on Java 5 drop the #Override annotation (it doesn't apply to objects implementing interfaces in java 5 but it does in Java 6).
What this does is it creates a task for each item in the collection. A thread pool (size 10) is created to run those tasks). You can replace that with anything you want. Lastly, the thread pool is shut down and the code blocks awaiting the finishing of all the tasks.
The last has at least one or two exceptions you will need to catch. At a guess, InterruptedException and ExecutionException.
In most cases, the added complexity wouldn't be worth the potential performance gain. However, if you needed to process a Collection in multiple threads, you could possibly use Executors to do this, which would run all the tasks in a pool of threads:
int numThreads = 4;
ExecutorService threadExecutor = Executors.newFixedThreadPool(numThreads);
for(Iterator iterator = collectionthing.iterator(); iterator.hasNext();){
Runnable runnable = new CollectionThingProcessor(iterator.next());
threadExecutor.execute(runnable);
}
As part of the fork-join framework JDK7 should (although not certain) have parallel arrays. This is designed to allow efficient implementation of certain operations across arrays on many-core machines. But just cutting the array into pieces and throwing it at a thread pool will also work.
Sorry, Java does not have this sort of language-level support for automatic parallelism, if you wish it, you will have to implement it yourself using libraries and threads.