Replicate deferred/async launch policies from C++ in Java

Replicate deferred/async launch policies from C++ in Java - java

In C++ you can start a thread with a deferred or asynchronous launch policy. Is there a way to replicate this functionality in Java?
auto T1 = std::async(std::launch::deferred, doSomething());
auto T2 = std::async(std::launch::async, doSomething());
Descriptions of each--
Asynchronous:
If the async flag is set, then async executes the callable object f on a new thread of execution (with all thread-locals initialized) except that if the function f returns a value or throws an exception, it is stored in the shared state accessible through the std::future that async returns to the caller.
Deferred:
If the deferred flag is set, then async converts f and args... the same way as by std::thread constructor, but does not spawn a new thread of execution. Instead, lazy evaluation is performed: the first call to a non-timed wait function on the std::future that async returned to the caller will cause the copy of f to be invoked (as an rvalue) with the copies of args... (also passed as rvalues) in the current thread (which does not have to be the thread that originally called std::async). The result or exception is placed in the shared state associated with the future and only then it is made ready. All further accesses to the same std::future will return the result immediately.
See the documentation for details.

Future
First of all, we have to observe that std::async is a tool to execute a given task and return a std::future object that holds the result of the computation once its available.
For example we can call result.get() to block and wait for the result to arrive. Also, when the computation encountered an exception, it will be stored and rethrown to us as soon as we call result.get().
Java provides similar classes, the interface is Future and the most relevant implementation is CompletableFuture.
std::future#get translates roughly to Future#get. Even the exceptional behavior is very similar. While C++ rethrows the exception upon calling get, Java will throw a ExecutionException which has the original exception set as cause.
How to obtain a Future?
In C++ you create your future object using std::async. In Java you could use one of the many static helper methods in CompletableFuture. In your case, the most relevant are
CompletableFuture#runAsync, if the task does not return any result and
CompletableFuture#supplyAsync, if the task will return a result upon completion
So in order to create a future that just prints Hello World!, you could for example do
CompletableFuture<Void> task = CompletableFuture.runAsync(() -> System.out.println("Hello World!"));
/*...*/
task.get();
Java not only has lambdas but also method references. Lets say you have a method that computes a heavy math task:
class MyMath {
static int compute() {
// Very heavy, duh
return (int) Math.pow(2, 5);
}
}
Then you could create a future that returns the result once its available as
CompletableFuture<Integer> task = CompletableFuture.runAsync(MyMath::compute);
/*...*/
Integer result = task.get();
async vs deferred
In C++, you have the option to specify a launch policy which dictates the threading behavior for the task. Let us put the memory promises C++ makes aside, because in Java you do not have that much control over memory.
The differences are that async will immediately schedule creation of a thread and execute the task in that thread. The result will be available at some point and is computed while you can continue work in your main task. The exact details whether it is a new thread or a cached thread depend on the compiler and are not specified.
deferred behaves completely different to that. Basically nothing happens when you call std::async, no extra thread will be created and the task will not be computed yet. The result will not be made available in the meantime at all. However, as soon as you call get, the task will be computed in your current thread and return a result. Basically as if you would have called the method directly yourself, without any async utilities at all.
std::launch::async in Java
That said, lets focus on how to translate this behavior to Java. Lets start with async.
This is the simple one, as it is basically the default and intended behavior offered in CompletableFuture. So you just do runAsync or supplyAsync, depending on whether your method returns a result or not. Let me show the previous examples again:
// without result
CompletableFuture<Void> task = CompletableFuture.runAsync(() -> System.out.println("Hello World!"));
/*...*/ // the task is computed in the meantime in a different thread
task.get();
// with result
CompletableFuture<Integer> task = CompletableFuture.supplyAsync(MyMath::compute);
/*...*/
Integer result = task.get();
Note that there are also overloads of the methods that except an Executor which can be used if you have your own thread pool and want CompletableFuture to use that instead of its own (see here for more details).
std::launch::deferred in Java
I tried around a lot to mock this behavior with CompletableFuture but it does not seem to be possibly without creating your own implementation (please correct me if I am wrong though). No matter what, it either executes directly upon creation or not at all.
So I would just propose to use the underlying task interface that you gave to CompletableFuture, for example Runnable or Supplier, directly. In our case, we might also use IntSupplier to avoid the autoboxing.
Here are the two code examples again, but this time with deferred behavior:
// without result
Runnable task = () -> System.out.println("Hello World!");
/*...*/ // the task is not computed in the meantime, no threads involved
task.run(); // the task is computed now
// with result
IntSupplier task = MyMath::compute;
/*...*/
int result = task.getAsInt();
Modern multithreading in Java
As a final note I would like to give you a better idea how multithreading is typically used in Java nowadays. The provided facilities are much richer than what C++ offers by default.
Ideally should design your system in a way that you do not have to care about such little threading details. You create an automatically managed dynamic thread pool using Executors and then launch your initial task against that (or use the default executor service provided by CompletableFuture). After that, you just setup an operation pipeline on the future object, similar to the Stream API and then just wait on the final future object.
For example, let us suppose you have a list of file names List<String> fileNames and you want to
read the file
validate its content, skip it if its invalid
compress the file
upload the file to some web server
check the response status code
and count how many where invalid, not successfull and successfull. Suppose you have some methods like
class FileUploader {
static byte[] readFile(String name) { /*...*/ }
static byte[] requireValid(byte[] content) throws IllegalStateException { /*...*/ }
static byte[] compressContent(byte[] content) { /*...*/ }
static int uploadContent(byte[] content) { /*...*/ }
}
then we can do so easily by
AtomicInteger successfull = new AtomicInteger();
AtomicInteger notSuccessfull = new AtomicInteger();
AtomicInteger invalid = new AtomicInteger();
// Setup the pipeline
List<CompletableFuture<Void>> tasks = fileNames.stream()
.map(name -> CompletableFuture
.completedFuture(name)
.thenApplyAsync(FileUploader::readFile)
.thenApplyAsync(FileUploader::requireValid)
.thenApplyAsync(FileUploader::compressContent)
.thenApplyAsync(FileUploader::uploadContent)
.handleAsync((statusCode, exception) -> {
AtomicInteger counter;
if (exception == null) {
counter = statusCode == 200 ? successfull : notSuccessfull;
} else {
counter = invalid;
}
counter.incrementAndGet();
})
).collect(Collectors.toList());
// Wait until all tasks are done
tasks.forEach(CompletableFuture::join);
// Print the results
System.out.printf("Successfull %d, not successfull %d, invalid %d%n", successfull.get(), notSuccessfull.get(), invalid.get());
The huge benefit of this is that it will reach max throughput and use all hardware capacity offered by your system. All tasks are executed completely dynamic and independent, managed by an automatic pool of threads. And you just wait until everything is done.

For asynchronous launch of a thread, in modern Java prefer the use of a high-level java.util.concurrent.ExecutorService.
One way to obtain an ExecutorService is through java.util.concurrent.Executors. Different behaviors are available for ExecutorServices; the Executors class provides methods for some common cases.
Once you have an ExecutorService, you can submit Runnables and Callables to it.
Future<MyReturnValue> myFuture = myExecutorService.submit(myTask);

If I understood you correctly, may be something like this:
private static CompletableFuture<Void> deferred(Runnable run) {
CompletableFuture<Void> future = new CompletableFuture<>();
future.thenRun(run);
return future;
}
private static CompletableFuture<Void> async(Runnable run) {
return CompletableFuture.runAsync(run);
}
And then using them like:
public static void main(String[] args) throws Exception {
CompletableFuture<Void> def = deferred(() -> System.out.println("run"));
def.complete(null);
System.out.println(def.join());
CompletableFuture<Void> async = async(() -> System.out.println("run async"));
async.join();
}

To get something like a deferred thread, you might try running a thread at a reduced priority.
First, in Java it's often idiomatic to make a task using a Runnable first. You can also use the Callable<T> interface, which allows the thread to return a value (Runnable can't).
public class MyTask implements Runnable {
#Override
public void run() {
System.out.println( "hello thread." );
}
}
Then just create a thread. In Java threads normally wrap the task they execute.
MyTask myTask = new MyTask();
Thread t = new Tread( myTask );
t.setPriority( Thread.currentThread().getPriority()-1 );
t.start();
This should not run until there is a core available to do so, which means it shouldn't run until the current thread is blocked or run out of things to do. However you're at the mercy of the OS scheduler here, so the specific operation is not guaranteed. Most OSs will guarantee that all threads run eventually, so if the current thread takes a long time with out blocking the OSs will start it executing anyway.
setPriority() can throw a security exception if you're not allowed to set the priority of a thread (uncommon but possible). So just be aware of that minor inconvenience.
For an asynch task with a Future I would use an executor service. The helper methods in the class Executors are a convenient way to do this.
First make your task as before.
public class MyCallable implements Callable<String> {
#Override
public String call() {
return "hello future thread.";
}
}
Then use an executor service to run it:
MyCallable myCallable = new MyCallable();
ExecutorService es = Executors.newCachedThreadPool();
Future<String> f = es.submit( myCallable );
You can use the Future object to query the thread, determine its running status and get the value it returns. You will need to shutdown the executor to stop all of its threads before exiting the JVM.
es.shutdown();
I've tried to write this code as simply as possible, without the use of lambdas or clever use of generics. The above should show you what those lambdas are actually implementing. However it's usually considered better to be a bit more sophisticated when writing code (and a bit less verbose) so you should investigate other syntax once you feel you understand the above.

Related

Can multiple thread perform Single task, So that task gets done quickly ? in Java

I have a task, it consumes lots of seconds to complete. So I want to know whether using the multithreading concept in Java, can I use multiple threads to perform that single task, so that my task would get done quickly.
Here is my Example. Thread Class:
public class ThreadImpl implements Runnable {
#Override
public void run() {
String value="a";
for (int i = 0; i < 2147483646; i++) {
value = value + "b";
}
System.out.println("thread task completed " + value);
}
}
Main Class
public class Main {
public static void main(String[] args) {
Thread t = new Thread(new ThreadImpl());
t.start();
}
}
I want to understand, is it really possible to create multiple threads and execute that task public void run() alone simultaneously? If yes, how to achieve this?

Can I use multiple thread to perform [a] single task.
That depends on what you mean by "task." Your example shows a single function that performs a single computation when it is called by a thread. (Note: It's not a very interesting computation, but it's a computation none the less.) If that same function is called by more than one thread, then all that will happen is, each thread will perform the identical computation because that's the only thing that your function was written to do. No time will be saved.
In order for multiple threads to cooperate on solving some problem, You have to write the code that instructs them to cooperate. You have to write one or more functions that can perform parts of the computation; You have to write code that assigns the different parts to the different threads; and You likely will need to write code that combines the partial results into a single, final result. There is no magic in Java that will automatically re-write your single-threaded code to do those things for you. You have to write it yourself.

A sample snippet using parallel streams
final String[] value = {"a"};
Stream.iterate(1,x -> x+1 )
.limit(10)
.parallel()
.forEach(y -> value[0] = value[0] + "b" );
You can change the limit value to the number you require. On my machine with parallel i see 5x performance gain with parallel when limit is 100000.

How to use the method "removeIf" using a Predicate in a ArrayBlockingQueue

I have the following classes:
WorkerTask.java
public interface WorkerTask extends Task {
// Constants
public static final short WORKERTASK_SPIDER = 1;
public static final short WORKERTASK_PARSER = 2;
public static final short WORKERTASK_PRODUCT = 3;
public int getType();
}
WorkerPool.java
class workerPool {
private ThreadPoolExecutor executorPool_;
//----------------------------------------------------
public WorkerPool(int poolSize)
{
executorPool_ = new ThreadPoolExecutor(
poolSize,5,10,TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(10000000,false),
Executors.defaultThreadFactory()
);
//----------------------------------------------------
public void assign(WorkerTask workerTask) {
executorPool_.execute(new WorkerThread(workerTask));
}
//----------------------------------------------------
public void removeTasks(int siteID) {
executorPool_.getQueue().removeIf(...);
}
}
I want to call the method removeTasks to remove certain amount of pending tasks but I have no idea of how to use the method removeIf. It says: Removes all of the elements of this collection that satisfy the given predicate, but I have no idea how to create the parameter Predicate. Any idea?

If you had a Queue<WorkerTask>, you could do something like this:
queue.removeIf(task -> task.getSiteID() == siteID)
There are several problems. One problem is that the queue you get from getQueue() is BlockingQueue<Runnable> and not Queue<WorkerTask>. If you're submitting Runnable instances to the pool, the queue might contain references to your actual tasks; if so, you could downcast them to WorkerTask. However, this isn't guaranteed. Furthermore, the class doc for ThreadPoolExecutor says (under "Queue maintenance"):
Method getQueue() allows access to the work queue for purposes of monitoring and debugging. Use of this method for any other purpose is strongly discouraged. Two supplied methods, remove(Runnable) and purge() are available to assist in storage reclamation when large numbers of queued tasks become cancelled.
Looking at the remove(Runnable) method, its doc says
It may fail to remove tasks that have been converted into other forms before being placed on the internal queue.
This suggests that you should hang onto the Runnable instances that have been submitted in order to call remove() on them later. Or, call submit(Runnable) to get a Future and save those instances around in order to cancel them.
But there is also a second problem that probably renders this approach inadequate. Suppose you've found a way to remove or cancel the matching tasks from the queue. Another thread might have decided to submit a new task that matches, but hasn't submitted it yet. There's a race condition here. You might be able to cancel the enqueued tasks, but after you've done so, you can't guarantee that new matching tasks haven't been submitted.
Here's an alternative approach. Presumably, when you cancel (or whatever) a site ID, there's some logic somewhere to stop submitting new tasks that match that side ID. The problem is how to deal with matching tasks that are "in-flight," that is, that are in the queue or are about to be enqueued.
Instead of trying to cancel the matching tasks, change the task so that if its site ID has been canceled, the task turns into a no-op. You could record the cancellation of a site ID in, say, a ConcurrentHashMap. Any task would check this map before beginning its work, and if the site ID is present, it'd simply return. Adding a site ID to the map would have the immediate effect of ensuring that no new task on that site ID will commence. (Tasks that have already started will run to completion.) Any in-flight tasks will eventually drain from the queue without causing any actual work to occur.

A predicate is a function that receives an input and returns a boolean value.
If you are using java 8 you can use lambda expressions:
(elem) -> return elem.id == siteID

How to limit a thread's execution time and terminate it if it runs too long?

A general case for my question was, how do we detect if a particular function call has been taking too long so that we want to terminate it?
On top of my head I think of using a thread to run that function, and kill the thread if it runs too long, as defined below:
class MyThread extends Thread
{
public void run()
{
someFunction();
}
}
And say someFunction might be:
public void someFunction()
{
// Unknown code that could take arbitrarily long time.
}
In the code above, someFunction() might take no time to finish, or takes forever, so say I want to stop it when it's taking too long.
However, in a Java thread implementation, apparently I can't use a shared variable or any timestamp in the thread so that the thread will have a sense of time, because someFunction() funs atomically and such check-against-timestamp code can only go after someFunction, thus becoming useless since at the point of the coding being executed, someFunction is already done.
NOTE that I also want to do so with someFunction() being agnostic. That is, someFunction() shouldn't
be worrying about how much time it runs. It simply shouldn't be aware of it at all.
Can anyone provide some insight in how I can accomplish this functionality?

I would use an ExecutorService to run the thread. Then I would get back a Future and use get() with a timeout to cancel it.
ExecutorService es = Executors.newFixedThreadPool(1); // You only asked for 1 thread
Future<?> future = es.submit( new Mythread() );
try {
future.get(timeout, TimeUnit.SECONDS); // This waits timeout seconds; returns null
} catch(TimeoutException e) {
future.cancel(true);
}

Does Java have an indexable multi-queue thread pool?

Is there a Java class such that:
Executable tasks can be added via an id, where all tasks with the same id are guaranteed to never run concurrently
The number of threads can be limited to a fixed amount
A naive solution of a Map would easily solve (1), but it would be difficult to manage (2). Similarly, all thread pooling classes that I know of will pull from a single queue, meaning (1) is not guaranteed.
Solutions involving external libraries are welcome.

For each id, you need a SerialExecutor, described in the documentation of java.util.concurrent.Executor. All serial executors delegate work to a ThreadPoolExecutor with given corePoolSize.
Opimized version of SerialExecutor can be found at my code samples.

If you don't find something that does this out of the box, it shouldn't be hard to roll your own. One thing you could do is to wrap each task in a simple class that reads on a queue unique per id, e.g.:
public static class SerialCaller<T> implements Callable<T> {
private final BlockingQueue<Caller<T>> delegates;
public SerialCaller(BLockingQueue<Caller<T>> delegates) {
this.delegates = delegates;
}
public T call() throws Exception {
return delegates.take().call();
}
}
It should be easy to maintain a map of ids to queues for submitting tasks. That satisfies condition (1), and then you can look for simple solutions to condition (2), such as Executors. newFixedThreadPool

I think that the simplest solution is to just have a separate queue for each index and a separate executor (with one thread) for each queue.
The only thing you could achieve with a more complex solution would be to use fewer threads, but if the number of indexes is small and bounded that's probably not worth the effort.

Yes, there is such a library now: https://github.com/jano7/executor
int maxTasks = 10;
ExecutorService underlyingExecutor = Executors.newFixedThreadPool(maxTasks);
KeySequentialBoundedExecutor executor = new KeySequentialBoundedExecutor(maxTasks, underlyingExecutor);
Runnable task = new Runnable() {
#Override
public void run() {
// do something
}
};
executor.execute(new KeyRunnable<>("ID-1", task)); // execute the task by the underlying executor
executor.execute(new KeyRunnable<>("ID-2", task)); // execution is not blocked by the task for ID-1
executor.execute(new KeyRunnable<>("ID-1", task)); // execution starts when the previous task for ID-1 completes

Incremental Future of list extensions

I essentially have a Future<List<T>> that is fetched in batches from the server. For some clients I'd like to provide incremental results while it loads in addition to the whole collection when future is fulfilled.
Is there a common Future extension defined somewhere for this? What are typical patterns/combinators exist for such futures?
I assume that given IncrementalListFuture<T> I can easily define map operation. What else comes to your mind?

Is there a common Future extension defined somewhere for this?
I assume you are talking about incremental results from an ExecutorService. You should consider using an ExecutorCompletionService which allows you to be informed as soon as one of the Future objects is get-able.
To quote from the javadocs:
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(e);
for (Callable<Result> s : solvers) {
ecs.submit(s);
}
int n = solvers.size();
for (int i = 0; i < n; ++i) {
// this waits for one of the futures to finish and provide a result
Future<Result> future = ecs.take();
Result result = future.get();
if (result != null) {
// do something with the result
}
}
Sorry. I initially misread the question and thought that you were asking about a List<Future<?>>. It may be that you could refactor your code to actually return a number of Futures so I'll leave this for posterity.
I would not pass back the list in this case in a Future. You aren't going to be able to get the return until the job finishes.
If possible, I would pass in some sort of BlockingQueue so both the caller and the thread can access it:
final BlockingQueue<T> queue = new LinkedBlockingQueue<T>();
// build out job with the queue
threadPool.submit(new SomeJob(queue));
threadPool.shutdown();
// now we can consume from the queue as it is built:
while (true) {
T result = queue.take();
// you could some constant result object to mean that the job finished
if (result == SOME_END_OBJECT) {
break;
}
// provide intermediate results
}
You could also have some sort of SomeJob.take() method which calls through to a BlockingQueue defined inside of your job class.
// the blocking queue in this case is hidden inside your job object
T result = someJob.take();
...

Here's what I would do:
In the thread that populates the List, make it thread-safe by wrapping the list using Collections.synchronizedList
Make the list publically available, but not modifiable by adding a public method to the thread which returns the list, but wrapped by Collections.unmodifiableList
Instead of giving clients a Future>, give them a handle to the thread, or some kind of wrapper of it, so that they can call the public method above.
Alternatively, as Gray has suggested, BlockingQueues are great for thread coordination like this. This may require more changes to your client code, however.

To answer my own question: there has been lots of development in this area recently. Among most used are: Play iteratees (http://www.playframework.org/documentation/2.0/Iteratees) and Rx for .NET (http://msdn.microsoft.com/en-us/data/gg577609.aspx)
Instead of Future they define something like:
interface Observable<T> {
Disposable subscribe(Observer<T> observer);
}
interface Observer<T> {
void onCompleted();
void onError(Exception error);
void onNext(T value);
}
and lots of combinators.

Alternatively to Observables you can take a look at twitter's approach.
They use Spool, which is an asynchronous version of the Stream.
Basically it is a simple trait similar to the List
trait Spool[+A] {
def head: A
/**
* The (deferred) tail of the spool. Invalid for empty spools.
*/
def tail: Future[Spool[A]]
}
that allows you to do functional stuff like map, filter and foreach on top of it.

Future is really designed to return a single (atomic) result, not for communicating intermediate results in this manner. What you will really want to do is to use multiple futures, one per batch.
We have a similar requirement where we have a bunch of things that we need to get from different remote servers, and each will come return at different times. We don't want to wait until the last one has returned, but rather process them in the order they return. For this we created the AsyncCompleter which takes an Iterable<Callable<T>> and returns an Iterable<T> that blocks on iteration, completely abstracting usage of the Future interface.
If you look at how that class is implemented, you'll see how to use a CompletionService to receive results from an Executor in the order in which they become available, if you need to build this for yourself.
edit: just saw that the second half of Gray's answer is similar, basically using an ExecutorCompletionService

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.