I read the source of org.apache.nutch.parse.ParseUtil.runParser(Parser p, Content content).
Do these two method calls do the same thing:
Instruction 1:
t.interrupt();
Instruction 2:
task.cancel(true);
The source of the org.apache.nutch.parse.ParseUtil.runParser(Parser p, Content content) is:
ParseCallable pc = new ParseCallable(p, content);
FutureTask<ParseResult> task = new FutureTask<ParseResult>(pc);
ParseResult res = null;
Thread t = new Thread(task);
t.start();
try {
res = task.get(MAX_PARSE_TIME, TimeUnit.SECONDS);
} catch (TimeoutException e) {
LOG.warn("TIMEOUT parsing " + content.getUrl() + " with " + p);
} catch (Exception e) {
task.cancel(true);
res = null;
t.interrupt();
} finally {
t = null;
pc = null;
}
return res;
They don't usually do the same thing, as they act on different abstraction levels (tasks being a higher abstraction levels than threads). In this case, however the calls seem to be redundant.
FutureTask.cancel() tells the task that it no longer needs to run and (if true is passed as the argument) will attempt to interrupt the Thread on which the task is currently running (if any).
t.interrupt() attempts to interrupt the Thread t.
In this case it seems to be redundant. If the Task is still running, then cancel(true) should interrupt the thread, in which case the duplicate interrupt() call is unnecessary (unless the code running in the thread somehow ignores one interruption, but halts on two interrupts, which is unlikely).
If the Task is already complete at that point, then both cancel() and interrupt() will have no effet.
Here,I'd like to make up a conclusion:
when we pass the true as the argument of the FutureTask.cancel(),we can get the same effect as the interupt() does yet.
why?
Let's peek into the src of cancel() method.
we got that the cancel() method call the method:
innerCancel(mayInterruptIfRunning);
when inside the method:innerCancel(mayInterruptIfRunning);,we can have the instructions below:
if (mayInterruptIfRunning) {
Thread r = runner;
if (r != null)
r.interrupt();
}
So,in my case,the cancel() actually call the interrupt() indeed.
Related
Backgroud
Building a data pipeline where each message received is to be processes asynchronously.
Trying to simulate the behavior by
Reading message from file
Processing with CompletableFuture
Code
BufferedReader reader = null;
ExecutorService service = Executors.newFixedThreadPool(4);
try {
String filepath = str[0];
FileReaderAsync fileReaderAsync = new FileReaderAsync();
reader = new BufferedReader(new FileReader(filepath));
Random r = new Random();
String line;
while ((line = reader.readLine()) != null) {
Integer val = Integer.valueOf(line.trim());
int randomInt = r.nextInt(5);
Thread.sleep(randomInt * 100);
CompletableFuture.supplyAsync(() -> {
System.out.println("Square : " + val);
return val * val;
}, service)
.thenApplyAsync(value -> {
System.out.println(":::::::Double : " + value);
return 2 * value;
}, service)
.thenAccept(value -> {
System.out.println("Answer : " + value);
});
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (Exception e) {
throw new RuntimeException(e.getMessage());
}
}
For simplicity just pasting main method code, assume variables are declared and in scope.
Issues
Code
Program works fine but does not exit, tried commenting Async logic and just reading the file. it works fine and ends too.
Design
In Streaming pipeline, will this Async model work for each incoming message if each message is passed to the CompletableFuture for processing?
Or it will block for current message to be processed ?
It is required to introduce another queue and then consume from it instead of consuming incoming messages as they flow in ?
Edit 1
Added
public void shutdown() {
service.shutdown();
}
and
reader.close();
fileReaderAsync.shutdown();
which did the trick.
Problem
You're using a thread pool created by:
ExecutorService service = Executors.newFixedThreadPool(4);
Which by default is configured to use non-daemon threads. And as documented by java.lang.Thread:
When a Java Virtual Machine starts up, there is usually a single non-daemon thread (which typically calls the method named main of some designated class). The Java Virtual Machine continues to execute threads until either of the following occurs:
The exit method of class Runtime has been called and the security manager has permitted the exit operation to take place.
All threads that are not daemon threads have died, either by returning from the call to the run method or by throwing an exception that propagates beyond the run method.
In other words, any non-daemon thread that is still alive will also keep the JVM alive.
Solution
There are at least two solutions to your problem.
Shutdown the Thread Pool
You can shutdown the thread pool when you're finished with it.
service.shutdown(); // Calls ExecutorService#shutdown()
The #shutdown() method starts a graceful shutdown. It prevents any new tasks from being submitted but allows any already-submitted tasks to complete. Once all tasks are complete the pool will terminate (i.e. all threads will be allowed to die). If you want to wait for all tasks to complete before continuing then you can call #awaitTermination(long,TimeUnit) after calling #shutdown() / #shutdownNow().
If you want to try and immediately shutdown the pool then call #shutdownNow(). Any currently-executing tasks will be cancelled and any submitted-but-not-yet-started tasks are simply not executed (and are in fact returned to you in a list). Note whether a task responds to cancellation depends on how that task was implemented.
Use Daemon Threads
A daemon thread will not keep the JVM alive. You can configure the thread pool to use daemon threads via a ThreadFactory.
ExecutorService service = Executors.newFixedThreadPool(4, r -> {
Thread t = new Thread(r); // may want to name the threads
t.setDaemon(true);
return t;
});
Note you should still shutdown the thread pool when finished with it, regardless.
You have 4 threads in the pool but the Thread.sleep() will block the main thread. Your program reads a line, blocks for max. 5 secs, will then fire the async code which does not require any async-ness at all and will in fact creates a huge overhead.
Do not use Thread.sleep() in an async program.
But I tried to get the idea of your code and I can offer this:
public int calcWork(final int x) {
return x*x;
}
public void iter_async_rec(final BufferedReader reader) {
String line = reader.readline();
if (line != null) {
int i = Integer.tryParse(line); // checks required
CompetableFuture.supplyAsync(calcWork(i))
.thenSupplyAsync(i->System.out.println(i))
.thenRunAsync(()->iter_asnc_rec(reader))
}
}
In addition: Most of the time it is the best choice to just use the standard executors. The given sample will not improve speed, on the contrary.
Maybe have a look at the reactive idea!? reactivejava
Is it possible in standard java8 to execute multiple callables on single thread concurrently?
i.e. when one callable sleeps, start working on other callable.
My current experiment, which does not work:
ExecutorService executor = Executors.newSingleThreadExecutor();
List<Future> fs = new ArrayList<>();
for (int i = 0; i < 2; i++) {
final int nr = i;
fs.add(executor.submit(() -> {
System.out.println("callable-" + nr + "-start");
try { Thread.sleep(10_000); } catch (InterruptedException e) { }
System.out.println("callable-" + nr + "-end");
return nr;
}));
}
try { executor.awaitTermination(5, TimeUnit.SECONDS); } catch (InterruptedException e) { }
Results in:
callable-0-start
callable-0-end
callable-1-start
callable-1-end
I want to have:
callable-0-start
callable-1-start
callable-0-end
callable-1-end
Notes:
I kind of expect an answer: "No it's not possible. This is not how threads work. Once thread is assigned to some executable code it runs until completion, exception or cancellation. There can be no midflight-switching between callables/runnables. Thread.sleep only allows other threads to run on CPU/core." (explicit confirmation would put my mind to rest)
Naturally, this is "toy" example.
This is about understanding, not some specific problem that I have.
What you attempt to do is to emulate deprecated functionality from older java versions. Back then it was possible to stop, suspend or resume a Thread. But from the javadoc of Thread.stop:
This method is inherently unsafe. Stopping a thread with Thread.stop causes it to unlock all of the monitors that it has locked (as a natural consequence of the unchecked ThreadDeath exception propagating up the stack). If any of the objects previously protected by these monitors were in an inconsistent state, the damaged objects become visible to other threads, potentially resulting in arbitrary behavior. Many uses of stop should be replaced by code that simply modifies some variable to indicate that the target thread should stop running. The target thread should check this variable regularly, and return from its run method in an orderly fashion if the variable indicates that it is to stop running. If the target thread waits for long periods (on a condition variable, for example), the interrupt method should be used to interrupt the wait.
As described by this outtake, the risks of doing what you want were critical, and therefore this behavior has been deprecated.
I would suggest, that instead of trying to force a running thread into some sort of halting position from the outside, you should maybe think about a ThreadPool API that allows you to package your code segments properly, so that their state can be unloaded from a thread, and later resumed. e.g. create Ticket, which would be an elementary job, which a thread would always complete before beginning another, a TicketChain that sequentially connects tickets and stores the state. Then make a handler that handles tickets one by one. In case a Ticket cannot be currently done (e.g. because not all data is present, or some lock cannot be acquired) the thread can skip it until a later point in time, when said conditions might be true.
Building on answer from #TreffnonX
One way to achieve desired stdout result is using CompletableFuture
(callable code must be explicitly split into separate functions):
ExecutorService executor = Executors.newSingleThreadExecutor();
CompletableFuture<Integer>[] fs = new CompletableFuture[2];
for(int i=0; i<2; i++) {
final Integer ii = i;
fs[i] = (CompletableFuture.completedFuture(ii)
.thenApply((Integer x) -> { System.out.println("callable-" + x + "-start");return x; })
.thenApplyAsync((Integer x) -> { try { Thread.sleep(1_000); } catch (InterruptedException e) {Thread.currentThread().interrupt();} return x; }, executor)
.thenApply((Integer x) -> { System.out.println("callable-" + x + "-end");return x; }));
}
CompletableFuture.allOf(fs).join();
try { executor.awaitTermination(5, TimeUnit.SECONDS); } catch (InterruptedException e) { }
Result:
callable-0-start
callable-1-start
callable-0-end
callable-1-end
I want to have a thread which does some I/O work when it is interrupted by a main thread and then go back to sleep/wait until the interrupt is called back again.
So, I have come up with an implementation which seems to be not working. The code snippet is below.
Note - Here the flag is a public variable which can be accessed via the thread class which is in the main class
// in the main function this is how I am calling it
if(!flag) {
thread.interrupt()
}
//this is how my thread class is implemented
class IOworkthread extends Thread {
#Override
public void run() {
while(true) {
try {
flag = false;
Thread.sleep(1000);
} catch (InterruptedException e) {
flag = true;
try {
// doing my I/O work
} catch (Exception e1) {
// print the exception message
}
}
}
}
}
In the above snippet, the second try-catch block catches the InterruptedException. This means that both of the first and second try-catch block are catching the interrupt. But I had only called interrupt to happen during the first try-catch block.
Can you please help me with this?
EDIT
If you feel that there can be another solution for my objective, I will be happy to know about it :)
If it's important to respond fast to the flag you could try the following:
class IOworkthread extends Thread {//implements Runnable would be better here, but thats another story
#Override
public void run() {
while(true) {
try {
flag = false;
Thread.sleep(1000);
}
catch (InterruptedException e) {
flag = true;
}
//after the catch block the interrupted state of the thread should be reset and there should be no exceptions here
try {
// doing I/O work
}
catch (Exception e1) {
// print the exception message
// here of course other exceptions could appear but if there is no Thread.sleep() used here there should be no InterruptedException in this block
}
}
}
}
This should do different because in the catch block when the InterruptedException is caught, the interrupted flag of the thread is reset (at the end of the catch block).
It does sound like a producer/consumer construct. You seem to kind of have it the wrong way around, the IO should be driving the algorithm. Since you stay very abstract in what your code actually does, I'll need to stick to that.
So let's say your "distributed algorithm" works on data of type T; that means that it can be described as a Consumer<T> (the method name in this interface is accept(T value)). Since it can run concurrently, you want to create several instances of that; this is usually done using an ExecutorService. The Executors class provides a nice set of factory methods for creating one, let's use Executors.newFixedThreadPool(parallelism).
Your "IO" thread runs to create input for the algorithm, meaning it is a Supplier<T>. We can run it in an Executors.newSingleThreadExecutor().
We connect these two using a BlockingQueue<T>; this is a FIFO collection. The IO thread puts elements in, and the algorithm instances take out the next one that becomes available.
This makes the whole setup look something like this:
void run() {
int parallelism = 4; // or whatever
ExecutorService algorithmExecutor = Executors.newFixedThreadPool(parallelism);
ExecutorService ioExecutor = Executors.newSingleThreadExecutor();
// this queue will accept up to 4 elements
// this might need to be changed depending on performance of each
BlockingQueue<T> queue = new ArrayBlockingQueue<T>(parallelism);
ioExecutor.submit(new IoExecutor(queue));
// take element from queue
T nextElement = getNextElement(queue);
while (nextElement != null) {
algorithmExecutor.submit(() -> new AlgorithmInstance().accept(nextElement));
nextElement = getNextElement(queue);
if (nextElement == null) break;
}
// wait until algorithms have finished running and cleanup
algorithmExecutor.awaitTermination(Integer.MAX_VALUE, TimeUnit.YEARS);
algorithmExecutor.shutdown();
ioExecutor.shutdown(); // the io thread should have terminated by now already
}
T getNextElement(BlockingQueue<T> queue) {
int timeOut = 1; // adjust depending on your IO
T result = null;
while (true) {
try {
result = queue.poll(timeOut, TimeUnits.SECONDS);
} catch (TimeoutException e) {} // retry indefinetely, we will get a value eventually
}
return result;
}
Now this doesn't actually answer your question because you wanted to know how the IO thread can be notified when it can continue reading data.
This is achieved by the limit to the BlockingQueue<> which will not accept elements after this has been reached, meaning the IO thread can just keep reading and try to put in elements.
abstract class IoExecutor<T> {
private final BlockingQueue<T> queue;
public IoExecutor(BlockingQueue<T> q) { queue = q; }
public void run() {
while (hasMoreData()) {
T data = readData();
// this will block if the queue is full, so IO will pause
queue.put(data);
}
// put null into queue
queue.put(null);
}
protected boolean hasMoreData();
protected abstract T readData();
}
As a result during runtime you should at all time have 4 threads of the algorithm running, as well as (up to) 4 items in the queue waiting for one of the algorithm threads to finish and pick them up.
I've written following multi thread program. I want to cancel the all the thread if one of the thread sends back false as return. However though I'm canceling the thread by canceling individual task. Its not working. What changes I need to make inorder to cancel the thread?
I've written following multi thread program. I want to cancel the all the thread if one of the thread sends back false as return. However though I'm canceling the thread by canceling individual task. Its not working. What changes I need to make inorder to cancel the thread?
import java.util.Iterator;
import java.util.List;
import java.util.concurrent.Callable;
public class BeamWorkerThread implements Callable<Boolean> {
private List<BeamData> beamData;
private String threadId;
public BeamScallopingWorkerThread(
List<BeamData> beamData, String threadId) {
super();
this.beamData = beamData;
this.threadId = threadId;
}
#Override
public Boolean call() throws Exception {
Boolean result = true;
DataValidator validator = new DataValidator();
Iterator<BeamScallopingData> it = beamData.iterator();
BeamData data = null;
while(it.hasNext()){
data = it.next();
if(!validator.validateDensity(data.getBin_ll_lat(), data.getBin_ll_lon(), data.getBin_ur_lat(), data.getBin_ur_lon())){
result = false;
break;
}
}
return result;
}
}
ExecutorService threadPool = Executors.newFixedThreadPool(100);
List<Future<Boolean>> results = new ArrayList<Future<Boolean>>();
long count = 0;
final long RowLimt = 10000;
long threadCount = 1;
while ((beamData = csvReader.read(
BeamData.class, headers1, processors)) != null) {
if (count == 0) {
beamDataList = new ArrayList<BeamData>();
}
beamDataList.add(beamData);
count++;
if (count == RowLimt) {
results.add(threadPool
.submit(new BeamWorkerThread(
beamDataList, "thread:"
+ (threadCount++))));
count = 0;
}
}
results.add(threadPool.submit(new BeamWorkerThread(
beamDataList, "thread:" + (threadCount++))));
System.out.println("Number of threads" + threadCount);
for (Future<Boolean> fs : results)
try {
if(fs.get() == false){
System.out.println("Thread is false");
for(Future<Boolean> fs1 : results){
fs1.cancel(true);
}
}
} catch(CancellationException e){
} catch (InterruptedException e) {
} catch (ExecutionException e) {
} finally {
threadPool.shutdownNow();
}
}
My comments
Thanks all for your input I'm overwhelmed by the response. I do know that, well implemented thread takes an app to highs and mean time it a bad implementation brings the app to knees. I agree I'm having fancy idea but I don't have other option. I've a 10 million plus record hence I will have memory constraint and time constraint. I need to tackle both. Hence rather than swallowing whole data I'm breaking it into chunks and also if one data is invalid i don't want to waste time in processing remaining million data. I find #Mark Peters suggestion is an option. Made the changes accordingly I mean added flag to interrupt the task and I'm pretty confused how the future list works. what I understand is that looping through each field of future list starts once all the thread returns its value. In that case, there is no way to cancel all the task in half way from main list. I need to pass on the reference of object to each thread. and if one thread finds invalid data using the thread refernce call the cancel mathod of each thread to set the interrupt flag.
while(it.hasNext() && !cancelled) {
if(!validate){
// loop through each thread reference and call Cancel method
}
}
Whatever attempt you make to cancel all the remaining tasks, it will fail if your code is not carefully written to be interruptible. What that exactly entails is beyond just one StackOverflow answer. Some guidelines:
do not swallow InterruptedException. Make its occurrence break the task;
if your code does not spend much time within interruptible methods, you must insert explicit Thread.interrupted() checks and react appropriately.
Writing interruptible code is in general not beginner's stuff, so take care.
Cancelling the Future will not interrupt running code. It primarily serves to prevent the task from being run in the first place.
While you can provide a true as a parameter, which will interrupt the thread running the task, that only has an effect if the thread is blocked in code that throws an InterruptedException. Other than that, nothing implicitly checks the interrupted status of the thread.
In your case, there is no blocking; it's busy work that is taking time. One option would be to have a volatile boolean that you check at each stage of your loop:
public class BeamWorkerThread implements Callable<Boolean> {
private volatile boolean cancelled = false;
#Override
public Boolean call() throws Exception {
//...
while(it.hasNext() && !cancelled) {
//...
}
}
public void cancel() {
cancelled = true;
}
}
Then you would keep references to your BeamWorkerThread objects and call cancel() on it to preempt its execution.
Why don't I like interrupts?
Marko mentioned that the cancelled flag above is essentially reinventing Thread.interrupted(). It's a valid criticism. Here's why I prefer not to use interrupts in this scenario.
1. It's dependent on certain threading configurations.
If your task represents a cancellable piece of code that can be submitted to an executor, or called directly, using Thread.interrupt() to cancel execution in the general case assumes that the code receiving the interrupt will be the code that should know how to cleanly cancel the task.
That might be true in this case, but we only know so because we know how both the cancel and the task work internally. But imagine we had something like this:
Task does piece of work
Listeners are notified on-thread for that first piece of work
First listener decides to cancel the task using Thread.interrupt()
Second listener does some interruptible piece of work, and is interrupted. It logs but otherwise ignores the interrupt.
Task does not receive interrupt, and task is not cancelled.
In other words, I feel that interrupt() is too global of a mechanism. Like any shared global state, it makes assumptions about all of the actors. That's what I mean by saying that using interrupt() exposes/couples to details about the run context. By encapsulating it in a cancel() method applicable only for that task instance, you eliminate that global state.
2. It's not always an option.
The classic example here is an InputStream. If you have a task that blocks on reading from an InputStream, interrupt() will do nothing to unblock it. The only way to unblock it is to manually close the stream, and that's something best done in a cancel() method for the task itself. Having one way to cancel a task (e.g. Cancellable), regardless of its implementation, seems ideal to me.
Use the ExecutorService.shutdownNow() method. It will stop the executor from accepting more submissions and returns with the Future objects of the ongoing tasks that you can call cancel(true) on to interrupt the execution. Of course, you will have to discard this executor as it cannot be restarted.
The cancel() method may not terminate the execution immediately if the Thread is not waiting on a monitor (not blocked interruptibly), and also if you swallow the InterruptedException that will be raised in this case.
I'm trying to understand how to ensure that a specific action completes in a certain amount of time. Seems like a simple job for java's new util.concurrent library. However, this task claims a connection to the database and I want to be sure that it properly releases that connection upon timeout.
so to call the service:
int resultCount = -1;
ExecutorService executor = null;
try {
executor = Executors.newSingleThreadExecutor();
FutureTask<Integer> task = new CopyTask<Integer>();
executor.execute(task);
try {
resultCount = task.get(2, TimeUnit.MINUTES);
} catch (Exception e) {
LOGGER.fatal("Migrate Events job crashed.", e);
task.cancel(true);
return;
}
} finally {
if (executor != null) {
executor.shutdown();
}
The task itself simply wrapps a callable, here is the call method:
#Override
public Integer call() throws Exception {
Session session = null;
try {
session = getSession();
... execute sql against sesssion ...
}
} finally {
if (session != null) {
session.release();
}
}
}
So, my question for those who've made it this far, is: Is session.release() garaunteed to be called in the case that the task fails due to a TimeoutException? I postulate that it is no, but I would love to be proven wrong.
Thanks
edit: The problem I'm having is that occasionally the sql in question is not finishing due to wierd db problems. So, what I want to do is simply close the connection, let the db rollback the transaction, get some rest and reattempt this at a later time. So I'm treating the get(...) as if it were like killing the thead. Is that wrong?
When you call task.get() with a timeout, that timeout only applies to the attempt to obtain the results (in your current thread), not the calculation itself (in the worker thread). Hence your problem here; if a worker thread gets into some state from which it will never return, then the timeout simply ensures that your polling code will keep running but will do nothing to affect the worker.
Your call to task.cancel(true) in the catch block is what I was initially going to suggest, and this is good coding practice. Unfortunately this only sets a flag on the thread that may/should be checked by well-behaved long-running, cancellable tasks, but it doesn't take any direct action on the other thread. If the SQL executing methods don't declare that they throw InterruptedException, then they aren't going to check this flag and aren't going to be interruptable via the typical Java mechanism.
Really all of this comes down to the fact that the code in the worker thread must support some mechanism of stopping itself if it's run for too long. Supporting the standard interrupt mechanism is one way of doing this; checking some boolean flag intermittently, or other bespoke alternatives, would work too. However there is no guaranteed way to cause another thread to return (short of Thread.stop, which is deprecated for good reason). You need to coordinate with the running code to signal it to stop in a way that it will notice.
In this particular case, I expect there are probably some parameters you could set on the DB connection so that the SQL calls will time out after a given period, meaning that control returns to your Java code (probably with some exception) and so the finally block gets called. If not, i.e. there's no way to make the database call (such as PreparedStatement.execute()) return control after some predetermined time, then you'll need to spawn an extra thread within your Callable that can monitor a timeout and forcibly close the connection/session if it expires. This isn't very nice though and your code will be a lot cleaner if you can get the SQL calls to cooperate.
(So ironically despite you supplying a good amount of code to support this question, the really important part is the bit you redacted: "... execute sql against sesssion ..." :-))
You cannot interrupt a thread from the outside, so the timeout will have no effect on the code down in the JDBC layer (perhaps even over in JNI-land somewhere.) Presumably eventually the SQL work will end and the session.release() will happen, but that may be long after the end of your timeout.
The finally block will eventually execute.
When your Task takes longer then 2 minutes, a TimeoutException is thrown but the actual thread continues to perform it's work and eventually it will call the finally block. Even if you cancel the task and force an interrupt, the finally block will be called.
Here's a small example based in your code. You can test these situations:
public static void main(String[] args) {
int resultCount = -1;
ExecutorService executor = null;
try {
executor = Executors.newSingleThreadExecutor();
FutureTask<Integer> task = new FutureTask<Integer>(new Callable<Integer>() {
#Override
public Integer call() throws Exception {
try {
Thread.sleep(10000);
return 1;
} finally {
System.out.println("FINALLY CALLED!!!");
}
}
});
executor.execute(task);
try {
resultCount = task.get(1000, TimeUnit.MILLISECONDS);
} catch (Exception e) {
System.out.println("Migrate Events job crashed: " + e.getMessage());
task.cancel(true);
return;
}
} finally {
if (executor != null) {
executor.shutdown();
}
}
}
Your example says:
copyRecords.cancel(true);
I assume this was meant to say:
task.cancel(true);
Your finally block will be called assuming that the contents of the try block are interruptible. Some operations are (like wait()), some operations are not (like InputStream#read()). It all depends on the operation that that the code is blocking on when the task is interrupted.