Ensure unique tasks in ExectorService - java

I have a scenario wherein same Tasks get assigned multiple times to an ExecutorService. I want to avoid that, Is there a way to do it?
I have Tasks with a String constructor.
Task task1 = new Task ("A");
than I execute this task
executor.execute (task1);
Then I create another task with same string.
Task task2 = new Task ("A");
Lets say I cannot avoid this from happening.
Now I execute this task.
executor.execute (task2).
I want only one of these tasks to be executed, since both tasks are similar in nature.
How?

At first, I would have implemented a queue interface and passed it to the executor service. My implementation would be using a hashset to hold the memory, next to a regular collection to hold the tasks as the queue requires. Adding to my queue therefore would have involved checking the hashset first. Maybe a linkedhashset rolling off eldest entries...
However, the executorservice submit() doesn't fully rely on the queue. See the javadoc of ThreadPoolExecutor. The queue could refuse the task, but the executor could spawn a thread instead and still accept the task. Besides, that implies you could intervene in the executor's construction anyway.
So, presuming control on the executor's class, it seems you must extend an executorservice and have the knowledge there instead of a in a custom queue. You only need to override submit() and throw the rejection exception.
Of course, this reveals the rejection to the submitter. You should deal with that gracefully. You cannot hide this fact because you cannot return a Future if there was no submission. Unless you wire up the old Future to the new one (your knowledge hashmap contains Futures). This may cause difficulties though since every Future has a 'done' state and is cancel-able... Your own Future would be delegating to the original Futures. I think task rejection is simpler.

Related

Does completableFuture in Java 8 scale to multiple cores?

Lets say I have a single thread that calls bunch of methods that return completablefuture and say I add all of them to a list and in the end I do completablefutures.allof(list_size).join(). Now does the futures in the list can scale to multiple cores? other words are the futures scheduled into multiple cores to take advantage of parallelism?
CompletableFuture represents a task which is associated with some Executor. If you did not specify executor explicitly (for example, you used CompletableFuture.supplyAsync(Supplier) instead of CompletableFuture.supplyAsync(Supplier, Executor)), then common ForkJoinPool is used as executor. This pool could be obtained via ForkJoinPool.commonPool() and by default it creates as many threads as many hardware threads your system has (usually number of cores, double it if your cores support hyperthreading). So in general, yes, if you use all defaults, then multiple cores will be used for your completable futures.
CompletableFuture itself is not scheduled to a thread (or core). Tasks are. To achieve parallelism, you need to create multiple tasks. If your methods which return CompletableFuture submit tasks like
return CompletableFuture.supplyAsync(this::calculate);
then multiple tasks are started.
If they just create CompletableFuture like
return new CompletableFuture();
then no tasks are started and no parallelism present.
CompletableFuture objects created by CompletableFuture{handle, thenCombine, thenCompose, thenApply} are not connected to parallel tasks, so parallelism is not increased.
CompletableFuture objects created by CompletableFuture{handleAsync, thenCombineAsync, thenComposeAsync, thenApplyAsync} are connected to parallel tasks, but these tasks are executed strictly after the task corresponding to the this CompletableFuture object, so cannot increase parallelism.
Having a bunch of CompletableFutures doesn't tell you anything about how they will be completed.
There are two kind of completions:
Explicit, through cancel, complete, completeExceptionally, obtrudeException and obtrudeValue on an instance, or by obtainng a future with the completedFuture static method
Implicit, through executing a provided function, whether it returns normally or exceptionally, or through the completion of a previous future
For instance:
exceptionally completes normally without running the provided function if the previous future completes normally
Every other chaining method, except for handle and whenComplete and their *Async variations, complete exceptionally without running the provided function if the previous future, or any of the previous futures in the combining (*Both*, *Combine* and *Either*) methods, complete exceptionally
Otherwise, the future completes when the provided function runs and completes either normally or exceptionally
If the futures you have were created without a function or they're not chained to another future, or in other words, if they don't have a function associated, then they will only complete explicitly, and as such it makes no sense to say if this kind of completable future runs, much less if they may use multiple threads.
On the other hand, if the futures have a function, it depends on how they were created:
If they're all independent and use the ForkJoinPool.commonPool() (or a cached thread pool or similar) as the executor, then they will probably run in parallel, possibly using as many active threads as the number of cores
If they all have a dependency on each other (except for one) or if the executor is single-threaded, then they'll run one at a time
Anything in between is valid, such as:
some futures may depend on each other, or on some other internal future you have no knowledge of
some futures may have been created with e.g. a fixed thread pool executor where you'll see a limited degree of concurrently running tasks
Invoking join does not tell a future to start running, it just waits for it to complete.
So, to finally answer your question:
If the future has a function associated, then it may already be running, it may or may not run its function depending on how it was chained and the completion of the previous future, and it may never run if it doesn't have a function or if it was completed before it had a chance to run its function
The futures that are already running or that will run do so:
On the provided executor when chained with the *Async methods or when created with the *Async static methods that take an executor
On the ForkJoinPool.commonPool() when chained with the *Async methods or when created with the *Async static methods that don't take an executor
On the same thread as where the future they depend on is completed when chained without the *Async methods in case the future is not yet complete
On the current thread if the future they depend on is already completed when chained without the *Async methods
In my opinion, the explicit completion methods should have been segregated to a e.g. CompletionSource interface and have a e.g. CompletableFutureSource class that implements it and provides a future, much like .NET's relation between a TaskCompletionSource and its Task.
As things are now, most probably you can tamper with the completable futures you have by completing them in a way not originally intended. For this reason, you should not use a CompletableFuture after you expose it publicly; it's your API user's CompletableFuture from then on.

Queries on Java Future and RejectionHandler

I had some queries regarding Future usage. Please go through below example before addressing my queries.
http://javarevisited.blogspot.in/2015/01/how-to-use-future-and-futuretask-in-Java.html
The main purpose of using thread pools & Executors is to execute task asynchronously without blocking main thread. But once you use Future, it is blocking calling thread. Do we have to create separate new thread/thread pool to analyse the results of Callable tasks? OR is there any other good solution?
Since Future call is blocking the caller, is it worth to use this feature? If I want to analyse the result of a task, I can have synchronous call and check the result of the call without Future.
What is the best way to handle Rejected tasks with usage of RejectionHandler? If a task is rejected, is it good practice to submit the task to another Thread or ThreadPool Or submit the same task to current ThreadPoolExecutor again?
Please correct me if my thought process is wrong about this feature.
Your question is about performing an action when an asynchronous action has been done. Futures on the other hand are good if you have an unrelated activity which you can perform while the asynchronous action is running. Then you may regularly poll the action represented by the Future via isDone() and do something else if not or call the blocking get() if you have no more unrelated work for your current thread.
If you want to schedule an on-completion action without blocking the current thread, you may instead use CompletableFuture which offers such functionality.
CompletableFuture is the solution for queries 1 and 2 as suggested by #Holger
I want to update about RejectedExecutionHandler mechanism regarding query 3.
Java provides four types of Rejection Handler policies as per javadocs.
In the default ThreadPoolExecutor.AbortPolicy, the handler throws a runtime RejectedExecutionException upon rejection.
In ThreadPoolExecutor.CallerRunsPolicy, the thread that invokes execute itself runs the task. This provides a simple feedback control mechanism that will slow down the rate that new tasks are submitted.
In ThreadPoolExecutor.DiscardPolicy, a task that cannot be executed is simply dropped.
In ThreadPoolExecutor.DiscardOldestPolicy, if the executor is not shut down, the task at the head of the work queue is dropped, and then execution is retried (which can fail again, causing this to be repeated.)
CallerRunsPolicy: If you have more tasks in task queue, using this policy will degrade the performance. You have to be careful since reject tasks will be executed by main thread itself. If Running the rejected task is critical for your application and you have limited task queue, you can use this policy.
DiscardPolicy: If discarding a non-critical event does not bother you, then you can use this policy.
DiscardOldestPolicy: Discard the oldest job and try to resume the last one
If none of them suits your need, you can implement your own RejectionHandler.

CompletionService without regular polling

Use case: tasks are generated in one thread, need to be distributed for computation to many threads and finally the generating task shall reap the results and mark the tasks as done.
I found the class ExecutorCompletionService which fits the use case nearly perfectly --- except that I see no good solution for non-idle waiting. Let me explain.
In principle my code would look like
while (true) {
MyTask t = generateNextTask();
if (t!=null) {
completionService.submit(t);
}
MyTask finished;
while (null!=(finished=compService.poll())) {
retireTaks(finished);
}
}
Both, generateNextTask() and completionService.poll() may return null if there are currently no new tasks available and if currently no task has returned from the CompletionService respectively.
In these cases, the loop degenerates into an ugly idle-wait. I could poll() with a timeout or add a Thread.sleep() for the double-null case, but I consider this a bad workaround, because it nevertheless wastes CPU and is not as responsive as possible, due to the wait.
Suppose I replace generateNextTask() by a poll() on a BlockingQueue, is there good way to poll the queue as well as the CompletionService in parallel to be woken up for work on whichever end something becomes available?
Actually this reminds me of Selector. Is something like it available for queues?
You should use CompletionService.take() to wait until the next task completes and retrieve its Future. poll() is the non-blocking version, returning null if no task is currently completed.
Also, your code seems to be inefficient, because you produce and consume tasks one at a time, instead of allowing multiple tasks to be processed in parallel. Consider having a different thread for task generation and for task results consumption.
-- Edit --
I think that given the constraints you mention in your comments, you can't achieve all your requirements.
Requiring the main thread to be producer and consumer, and disallowing any busy loop or timed loop, you can't avoid the scenario where a blocking wait for a task completion takes too long and no other task gets processed in the meanwhile.
Since you "can replace generateNextTask() by a poll() on a BlockingQueue", I assume incoming tasks can be put in a queue by some other thread, and the problem is, you cannot execute take() on 2 queues simultaneously. The solution is to simply put both incoming and finished tasks in the same queue. To differentiate, wrap them in objects of different types, and then check that type in the loop after take().
This solution works, but we can go further. You said you don't want to use 2 threads for handling tasks - then you can use zero threads. Let wrappers implement Runnable and, instead of checking of the type, you just call take().run(). This way your thread become a single-threaded Executor. But we already have an Executor (CompletionService), can we use it? The problem is, handling of incoming and finished tasks should be done serially, not in parallel. So we need SerialExecutor described in api/java/util/concurrent/Executor, which accepts Runnables and executes them serially, but on another executor. This way no thread is wasted.
And finally, you mentioned Selector as possible solution. I must say, it is an outdated approach. Learn dataflow and actor computing. Nice introduction is here. Look at Dataflow4java project of mine, it has MultiPortActorTest.java example, where class Accum does what you need, with all the boilerplate with wrapper Runnables and serial executors hidden in the supporting library.
What you need is a ListenableFuture from Guava. ListenableFutureExplained

How to use Thread Pool concept in Java?

I am creating a http proxy server in java. I have a class named Handler which is responsible for processing the requests and responses coming and going from web browser and to web server respectively. I have also another class named Copy which copies the inputStream object to outputStream object . Both these classes implement Runnable interface. I would like to use the concept of Thread pooling in my design, however i don't know how to go about that! Any hint or idea would be highly appreciated.
I suggest you look at Executor and ExecutorService. They add a lot of good stuff to make it easier to use Thread pools.
...
#Azad provided some good information and links. You should also buy and read the book Java Concurrency in Practice. (often abbreviated as JCiP) Note to stackoverflow big-wigs - how about some revenue link to Amazon???
Below is my brief summary of how to use and take advantage of ExecutorService with thread pools. Let's say you want 8 threads in the pool.
You can create one using the full featured constructors of ThreadPoolExecutor, e.g.
ExecutorService service = new ThreadPoolExecutor(8,8, more args here...);
or you can use the simpler but less customizable Executors factories, e.g.
ExecutorService service = Executors.newFixedThreadPool(8);
One advantage you immediately get is the ability to shutdown() or shutdownNow() the thread pool, and to check this status via isShutdown() or isTerminated().
If you don't care much about the Runnable you wish to run, or they are very well written, self-contained, never fail or log any errors appropriately, etc... you can call
execute(Runnable r);
If you do care about either the result (say, it calculates pi or downloads an image from a webpage) and/or you care if there was an Exception, you should use one of the submit methods that returns a Future. That allows you, at some time in the future, check if the task isDone() and to retrieve the result via get(). If there was an Exception, get() will throw it (wrapped in an ExecutionException). Note - even of your Future doesn't "return" anything (it is of type Void) it may still be good practice to call get() (ignoring the void result) to test for an Exception.
However, this checking the Future is a bit of chicken and egg problem. The whole point of a thread pool is to submit tasks without blocking. But Future.get() blocks, and Future.isDone() begs the questions of which thread is calling it, and what it does if it isn't done - do you sleep() and block?
If you are submitting a known chunk of related of tasks simultaneously, e.g., you are performing some big mathematical calculation like a matrix multiply that can be done in parallel, and there is no particular advantage to obtaining partial results, you can call invokeAll(). The calling thread will then block until all the tasks are complete, when you can call Future.get() on all the Futures.
What if the tasks are more disjointed, or you really want to use the partial results? Use ExecutorCompletionService, which wraps an ExecutorService. As tasks get completed, they are added to a queue. This makes it easy for a single thread to poll and remove events from the queue. JCiP has a great example of an web page app that downloads all the images in parallel, and renders them as soon as they become available for responsiveness.
I hope below will help you:,
class Executor
An object that executes submitted Runnable tasks. This interface provides a way of decoupling task submission from the mechanics of how each task will be run, including details of thread use, scheduling, etc. An Executor is normally used instead of explicitly creating threads. For example, rather than invoking new Thread(new(RunnableTask())).start() for each of a set of tasks, you might use:
Executor executor = anExecutor;
executor.execute(new RunnableTask1());
executor.execute(new RunnableTask2());
...
class ScheduledThreadPoolExecutor
A ThreadPoolExecutor that can additionally schedule commands to run after a given delay, or to execute periodically. This class is preferable to Timer when multiple worker threads are needed, or when the additional flexibility or capabilities of ThreadPoolExecutor (which this class extends) are required.
Delayed tasks execute no sooner than they are enabled, but without any real-time guarantees about when, after they are enabled, they will commence. Tasks scheduled for exactly the same execution time are enabled in first-in-first-out (FIFO) order of submission.
and
Interface ExecutorService
An Executor that provides methods to manage termination and methods that can produce a Future for tracking progress of one or more asynchronous tasks.
An ExecutorService can be shut down, which will cause it to stop accepting new tasks. After being shut down, the executor will eventually terminate, at which point no tasks are actively executing, no tasks are awaiting execution, and no new tasks can be submitted.
Edited:
you can find example to use Executor and ExecutorService herehereand here Question will be useful for you.

In Java, how do I wait for all tasks, but halt on first error?

I have a series of concurrent tasks to run. If any one of them fails, I want to interrupt them all and await termination. But assuming none of them fail, I want to wait for all of them to finish.
ExecutorCompletionService seems like almost what I want here, but there doesn't appear to be a way to tell if all of my tasks are done, except by keeping a separate count of the number of tasks. (Note that both of the examples of in the Javadoc for ExecutorCompletionService keep track of the count "n" of the tasks, and use that to determine if the service is finished.)
Am I overlooking something, or do I really have to write this code myself?
Yes, you do need to keep track if you're using an ExecutorCompletionService. Typically, you would call get() on the futures to see if an error occurred. Without iterating over the tasks, how else could you tell that one failed?
If your series of tasks is of a known size, then you should use the second example in the javadoc.
However, if you don't know the number of tasks which you will submit to the CompletionService, then you have a sort of Producer-Consumer problem. One thread is producing tasks and placing them in the ECS, another would be consuming the task futures via take(). A shared Semaphore could be used, allowing the Producer to call release() and the Consumer to call acquire(). Completion semantics would depend on your application, but a volatile or atomic boolean on the producer to indicate that it is done would suffice.
I suggest a Semaphore over wait/notify with poll() because there is a non-deterministic delay between the time a task is produced and the time that task's future is available for consumption. Therefore the consumer and producer needs to be just slightly smarter.

Categories

Resources