How to properly handle thread interrupts

How to properly handle thread interrupts - java

I am working on an application that at some point starts a worker thread. This thread's behaviour will vary greatly depending on the parameters used to start it, but the following list of properties apply:
It will do some minor I/O operations
It will spend minor time in 3rd party libraries
It may create some worker threads for a certain subtask (these threads will not be reused after their task is finished)
It will spend most of its time crunching numbers (there are no blocking calls present)
Due to the possible long duration (5 minutes up to several hours, depending on the input), we want to be able to abort the calculation. If we choose to abort it, we no longer care about the output, and the thread is in fact wasting valuable resources as long as it keeps running. Since the code is under our control, the advised way is to use interrupts to indicate an abort.
While most examples on the web deal with a worker thread that is looping over some method, this is not the case for me (similar question here). There are also very few blocking calls in this work thread, in which case this article advises to manually check the interrupt flag. My question is: How to deal with this interrupt?
I see several options, but can't decide which is the most "clean" approach. Despite my practical example, I'm mainly interested in the "best practice" on how to deal with this.
Throw some kind of unchecked exception: this would kill the thread in a quick and easy way, but it reminds me of the ThreadDeath approach used by the deprecated Thread#stop() method, with all its related problems. I can see this approach being acceptable in owned code (due to the known logic flow), but not in library code.
Throw some kind of checked exception: this would kill the thread in a quick and easy way, and alleviates the ThreadDeath-like problems by enforcing programmers to deal with this event. However, it places a big burden on the code, requiring the exception to be mentioned everywhere. There is a reason not everything throws an InterruptedException.
Exit the methods with a "best result so far" or empty result. Because of the amount of classes involved, this will be a very hard task. If not enough care is taken, NullPointerExceptions might arise from empty results, leading to the same problems as point 1. Finding these causes would be next to impossible in large code bases.

I suggest you check Thread.currentThread().isInterrupted() periodically at points you knwo it is safe to stop and stop if it is set.
You could do this in a method which checks this flag and throws a custom unchecked exception or error.

What about a use of ExecutorService to execute the Runnable? Checkout the methods wherein you can specify the timeout. E.g.
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.invokeAll(Arrays.asList(new Task()), 10, TimeUnit.MINUTES); // Timeout of 10 minutes.
executor.shutdown();
Here Task of course implements Runnable.

Related

I've been taught that conditions in concurrency do not necessarily need to be written in a loop, against what the oracle doc says

So basically I am learning a bit more serious concurrency (studying how things actually work, instead of just using random stuff if needed).
And my proffesor, when I asked him about this, said me that he and his colleagues hadn't been able to reproduce a spurious wake up, and believes that line is an old line not deleted (like, it was there, java got "better", it's not longer needed, the line is still there), and that is not the case.
Link:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html
It's right below the point called:
Implementation Considerations
In his opinion, a condition that looked kind of like this:
lock.lock()
if (p>q) {
lock.newCondition().await
}
Would be perfectly fine, since he says a spurious wake up can not happen, it wouldn't be needed a loop:
lock.lock()
while (p>q) {
lock.newCondition().await
}
I am MORE than likely mixing things and understanding both the doc and my teacher the wrong way, but I do have spent some time trying to understand why each thing, and can't come with an "answer" of my own, I either believe one or the other (not like it matters, it's pure I-want-to-learn).
My teacher does spend time telling us how explaining concurrency in java it's pretty silly, but I didn't choose it either, so there's that.

Would be perfectly fine, since he says a spurious wake up can not happen, it wouldnt be needed a loop:
Your teacher is wrong for two reasons:
Spurious wakeups do happen. It may not happen on the architecture that they tested on but if you don't take it into account, when you move your application to a different piece of hardware or a different OS revision, you will see problems. It may also be that the spurious interrupts happen occasionally during an exceptional kernel event such as a signal getting delivered at precisely the wrong time. Again, your application may run fine in testing but when you move it into production with a lot higher load, the frequency of the exceptional event may increase...
The underlying problem is that certain native thread implementations may choose to wakeup all conditions associated with an application instead of the specific one that was notified. This is well documented in the javadocs for Object.wait():
As in the one argument version, interrupts and spurious wakeups are possible, and this method should always be used in a loop:
Here's one example of an architecture that has this limitation. I'll quote from this interesting blog entry:
Internally, wait is implemented as a call to the 'futex' system call. Each blocking system call on Linux returns abruptly when the process receives a signal -- because calling signal handler from kernel call is tricky. What if the signal handler calls some other system function? And a new signal arrives? It's easy to run out of kernel stack for a process. Exactly because each system call can be interrupted, when glibc calls any blocking function, like 'read', it does it in a loop, and if 'read' returns EINTR, calls 'read' again.
The while loop is also very important to protect against race conditions -- especially in multiple thread producer/consumer models. If you have multiple threads that are consuming from a queue (for example), a notification that there are items in the queue may wakeup a thread but by the time it is able to get the lock, another thread has already dequeued the item.
This is well documented on my page here with a sample program that demonstrates the race condition without the use of while.
Producer Consumer Thread Race Conditions
In your example, thread A may be waiting in await() while another thread B may be waiting to get the lock(). Thread C has the lock and is adding to the queue.
// B is here waiting for the lock
lock.lock()
while (p > q) {
// A is here waiting for the signal
lock.newCondition().await();
}
// dequeue
lock.unlock();
Then if the producer adds something to the queue and calls signal() the thread A moves from the WAIT state to the BLOCKED state to get the lock itself. But it may be behind thread B which is already waiting. Once the lock is released, thread B dequeues the element, not thread A. When thread A then gets a chance to dequeue, the queue is empty. Without the while loop, you can get out-of-bounds exceptions or other problems by trying to dequeue from an empty queue.
See my link for more explicit details of the race.

It is still necessary. Your professor is not necessarily incorrect, but has created a strawman argument to knock down.
There are two reasons why you must protect your conditions in a loop.
The first is spurious wake-up. Your professor seems to have been unable to reproduce this, and it may likely not be a problem on the platforms he tests on. This does not mean it is unreproduceable on all platforms.
The second is that between the times that you wake up and actually go to do the logic, the condition may no longer be true. You must guard against this potential race condition. This is also notoriously difficult to reproduce in the lab, and will probably only happen in bizarre circumstances in production.

ThreadPoolExecutor hangs

Trying to debug a race condition where one of our application's poller threads never return causing future pollers to never get scheduled. In abstract terms to hide our business logic while capturing the problem, here's what our code path is.
We have to update some state X of resource Y in a remote server. We have a resource manager, which changes the resource state and updates X as a side effect of the change. This manager polls the resource continually and when it believes resource is updated, it uses a ThreadPoolExecutor to do the work. This thread pool executor has a reasonably sized blocking queue but fairly small number of max threads. The hang itself from thread dump happens in invokeAll call (among other things)
We have reasons to believe that the number of core/max threads in this pool executor are busy doing other stuff (more resource state updations, if you will).
Since invokeAll returns us futures which we wait on, the question is does invokeAll hang even if the blocking data structure used by the executor is big enough to take in the work passed in via invokeAll but there are no enough threads available?

As other users have pointed out, without some code (even pseudo-code), and a clearer understanding of what "state X" is, and what "resource Y" is, it is virtually impossible for anybody here to provide an intelligent answer. In short, you need an SSCCE. Nevertheless, I'll do my best here ;-). And if you do post code and/or provide more info, I'll update my answer accordingly.
From the Java 7 ExecutorService#invokeAll javadoc:
Executes the given tasks, returning a list of Futures holding their status and results when all complete. Future.isDone() is true for each element of the returned list. Note that a completed task could have terminated either normally or by throwing an exception. The results of this method are undefined if the given collection is modified while this operation is in progress.
From your description (and again, I can't tell for sure because of the lack of details), one of your worker threads is hanging. Since you're calling invokeAll(...), the executor is hanging because it's waiting for the hung thread to finish. But it never does. Now, as to why you're getting a hung thread, that's an entirely different issue, and we would definitely need to see some code. HTH.

Runnable or Executor Service

Which is easier and more suitable to use for running things in another thread, notably so that the program waits for the result but doesn't lock up an ui.
There may be a method that is better than either of these also, but I don't know of them.
Thanks :)

Runnable represents the code to be executed.
Executor and its subclasses represent execution strategies.
This means that the former is actually consumed by the later. What you probably meant is: between simple threads and executors, which are more suitable?
The answer to this question is basically: it depends.
Executors are sophisticated tools, which let you choose how many concurrent tasks may be running, and tune different aspects of the execution context. They also provide facilities to monitor the tasks' executions, by returning a token (called a Future or sometimes a promise) which let the code requesting the task execution to query for that task completion.
Threads are less elaborate (or more barebone) a solution to executing code asynchronously. You can still have them return a Future by hand, or simply check if the thread is still running.
So maybe depending on much sophistication you require, you will pick one or the other: Executors for more streamlined requirements (many tasks to execute and monitor), Threads for one shot or simpler situations.

Should ScheduledExecutorService.scheduleAt* methods re-schedule tasks if the task throws RuntimeException/Error?

The other day I was implementing an important service in my application, that should continue to run no matter what. So I used the following construct:
ScheduledExecutorService ses =
Executors.newSingleThreadScheduledExecutor();
//If the thread dies, another will take over
ses.scheduleAtFixedRate(importantPeriodicTask, 1, 1, TimeUnit.NANOSECONDS);
...only to find out that when importantPeriodicTask acutually throws a RuntimeException or Error, the ScheduledExecutorService will stop executing this task (they will cease to be scheduled).
This is of course exactly what the javadoc says:
If any execution of the task
encounters an exception, subsequent
executions are suppressed.
So shame on me, but I couldn't understand why the authors implemented ScheduledExecutorService like this.
Granted, a RuntimeException or Error should generally not be caught, especially Error. But in reality, especially in case of RuntimeException, truth is that they are thrown quite common in production deployment, and I feel it is almost always desirable that while that particular operation should fail, the app itself should not fail because of that isolated error.
It is true that a suppression of one periodic task does not affect other kinds of periodic tasks. But given the nature of most periodic tasks, shouldn't these tasks perceived as a "service", rather than isolated tasks?
In other words, shouldn't only that one instance of importantPeriodicTask fail, and the task itself be continued to be rescheduled?

In my opinion the current behavior is reasonable. RuntimeExceptions usually refer to bugs. They can practically occur anywhere in the task's code. If the task is stateful for example, it may leave its state inconsistent, and subsequent executions will have an unexpected behavior. In general, I don't like code that tries to recover from its own bugs, but that's my opinion.
If you wish to change the behavior of ScheduledExecutorService, take a look at the following generic solution:
http://www.javaspecialists.eu/archive/Issue154.html

Are Thread.stop and friends ever safe in Java?

The stop(), suspend(), and resume() in java.lang.Thread are deprecated because they are unsafe. The Oracle recommended work around is to use Thread.interrupt(), but that approach doesn't work in all cases. For example, if you are call a library method that doesn't explicitly or implicitly check the interrupted flag, you have no choice but to wait for the call to finish.
So, I'm wondering if it is possible to characterize situations where it is (provably) safe to call stop() on a Thread. For example, would it be safe to stop() a thread that did nothing but call find(...) or match(...) on a java.util.regex.Matcher?
(If there are any Oracle engineers reading this ... a definitive answer would be really appreciated.)
EDIT: Answers that simply restate the mantra that you should not call stop() because it is deprecated, unsafe, whatever are missing the point of this question. I know that that it is genuinely unsafe in the majority of cases, and that if there is a viable alternative you should always use that instead.
This question is about the subset cases where it is safe. Specifically, what is that subset?

Here's my attempt at answering my own question.
I think that the following conditions should be sufficient for a single thread to be safely stopped using Thread.stop():
The thread execution must not create or mutate any state (i.e. Java objects, class variables, external resources) that might be visible to other threads in the event that the thread is stopped.
The thread execution must not use notify to any other thread during its normal execution.
The thread must not start or join other threads, or interact with then using stop, suspend or resume.
(The term thread execution above covers all application-level code and all library code that is executed by the thread.)
The first condition means that a stopped thread will not leave any external data structures or resources in an inconsistent state. This includes data structures that it might be accessing (reading) within a mutex. The second condition means that a stoppable thread cannot leave some other thread waiting. But it also forbids use of any synchronization mechanism other that simple object mutexes.
A stoppable thread must have a way to deliver the results of each computation to the controlling thread. These results are created / mutated by the stoppable thread, so we simply need to ensure that they are not visible following a thread stop. For example, the results could be assigned to private members of the Thread object and "guarded" with a flag that is atomically by the thread to say it is "done".
EDIT: These conditions are pretty restrictive. For example, for a "regex evaluator" thread to be safely stopped, if we must guarantee that the regex engine does not mutate any externally visible state. The problem is that it might do, depending on how you implement the thread!
The Pattern.compile(...) methods might update a static cache of compiled
patterns, and if they did they would (should) use a mutex to do it. (Actually, the OpenJDK 6.0 version doesn't cache Patterns, but Sun might conceivably change this.)
If you try to avoid 1) by compiling the regex in the control thread and supplying a pre-instantiated Matcher, then the regex thread does mutate externally visible state.
In the first case, we would probably be in trouble. For example, suppose that a HashMap was used to implement the cache and that the thread was interrupted while the HashMap was being reorganized.
In the second case, we would be OK provided that the Matcher had not been passed to some other thread, and provided that the controller thread didn't try to use the Matcher after stopping the regex matcher thread.
So where does this leave us?
Well, I think I have identified conditions under which threads are theoretically safe to stop. I also think that it is theoretically possible to statically analyse the code of a thread (and the methods it calls) to see if these conditions will always hold. But, I'm not sure if this is really practical.
Does this make sense? Have I missed something?
EDIT 2
Things get a bit more hairy when you consider that the code that we might be trying to kill could be untrusted:
We can't rely on "promises"; e.g. annotations on the untrusted code that it is either killable, or not killable.
We actually need to be able to stop the untrusted code from doing things that would make it unkillable ... according to the identified criteria.
I suspect that this would entail modifying JVM behaviour (e.g. implementing runtime restrictions what threads are allowed to lock or modify), or a full implementation of the Isolates JSR. That's beyond the scope of what I was considering as "fair game".
So lets rule the untrusted code case out for now. Or at least, acknowledge that malicious code can do things to render itself not safely killable, and put that problem to one side.

The lack of safety comes from the idea idea of critical sections
Take mutex
do some work, temporarily while we work our state is inconsistent
// all consistent now
Release mutex
If you blow away the thread and it happend to be in a critical section then the object is left in an inconsistent state, that means not safely usable from that point.
For it to be safe to kill the thread you need to understand the entire processing of whatever is being done in that thread, to know that there are no such critical sections in the code. If you are using library code, then you may not be able to see the source and know that it's safe. Even if it's safe today it may not be tomorrow.
(Very contrived) Example of possible unsafety. We have a linked list, it's not cyclic. All the algorithms are really zippy because we know it's not cyclic. During our critical section we temporarily introduce a cycle. We then get blown away before we emerge from the critical section. Now all the algorithms using the list loop forever. No library author would do that surely! How do you know? You cannot assume that code you use is well written.
In the example you point to, it's surely possible to write the requreid functionality in an interruptable way. More work, but possible to be safe.
I'll take a flyer: there is no documented subset of Objects and methods that can be used in cancellable threads, because no library author wants to make the guarantees.

Maybe there's something I don't know, but as java.sun.com said, it is unsafe because anything this thread is handling is in serious risk to be damaged. Other objects, connections, opened files... for obvious reasons, like "don't shut down your Word without saving first".
For this find(...) exemple, I don't really think it would be a catastrophe to simply kick it away with a sutiless .stop()...

A concrete example would probably help here. If anyone can suggest a good alternative to the following use of stop I'd be very interested. Re-writing java.util.regex to support interruption doesn't count.
import java.util.regex.*;
import java.util.*;
public class RegexInterruptTest {
private static class BadRegexException extends RuntimeException { }
final Thread mainThread = Thread.currentThread();
TimerTask interruptTask = new TimerTask() {
public void run() {
System.out.println("Stopping thread.");
// Doesn't work:
// mainThread.interrupt();
// Does work but is deprecated and nasty
mainThread.stop(new BadRegexException());
}
};
Timer interruptTimer = new Timer(true);
interruptTimer.schedule(interruptTask, 2000L);
String s = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab";
String exp = "(a+a+){1,100}";
Pattern p = Pattern.compile(exp);
Matcher m = p.matcher(s);
try {
System.out.println("Match: " + m.matches());
interruptTimer.cancel();
} catch(BadRegexException bre) {
System.out.println("Oooops");
} finally {
System.out.println("All over");
}
}
}

There are ways to use Thread.stop() relatively stable w/o leaking memory or file descriptors (FDs are exceptionally leak prone on *NIX) but you shall rely on it only if you are forced to manage 3rd party code. Never do use it to achieve the result if you can have control over the code itself.
If I use Thread.stop along w/ interrupt() and some more hacks stuff like adding custom logging handlers to re-throw the trapped ThreadDeath, adding unhandleExceltionHandler, running into your own ThreadGroup (sync over 'em), etc...
But that deserves an entire new topic.
But in this case it's the Java Designers telling you; and
they're more authorative on their language then either of us :)
Just a note: quite a few of them are pretty clueless

If my understanding is right, the problem has to do with synchronization locks not being released as the generated ThreadInterruptedException() propagates up the stack.
Taking that for granted, it's inherently unsafe because you can never know whether or not any "inner method call" you happened to be in at the very moment stop() was invoked and effectuated, was effectively holding some synchronization lock, and then what the java engineers say is, seemingly, unequivocally right.
What I personally don't understand is why it should be impossible to release any synchronization lock as this particular type of Exception propagates up the stack, thereby passing all the '}' method/synchronization block delimiters, which do cause any locks to be release for any other type of exception.
I have a server written in java, and if the administrator of that service wants a "cold shutdown", then it is simply NECESSARY to be able to stop all running activity no matter what. Consistency of any object's state is not a concern because all I'm trying to do is to EXIT. As fast as I can.

There is no safe way to kill a thread.
Neither there is a subset of situations where it is safe. Even if it is working 100% while testing on Windows, it may corrupt JVM process memory under Solaris or leak thread resources under Linux.
One should always remember that underneath the Java Thread there is a real, native, unsafe thread.
That native thread works with native, low-level, data and control structures. Killing it may leave those native data structures in an invalid state, without a way to recover.
There is no way for Java machine to take all possible consequences into account, as the thread may allocate/use resources not only within JVM process, but within the OS kernel as well.
In other words, if native thread library doesn't provide a safe way to kill() a thread, Java cannot provide any guarantees better than that. And all known to me native implementations state that killing thread is a dangerous business.

All forms of concurrency control can be provided by the Java synchronization primitives by constructing more complex concurrency controls that suit your problem.
The reasons for deprecation are clearly given in the link you provide. If you're willing to accept the reasons why, then feel free to use those features.
However, if you choose to use those features, you also accept that support for those features could stop at any time.
Edit: I'll reiterate the reason for deprecation as well as how to avoid them.
Since the only danger is that objects that can be referenced by the stoped thread could be corrupted, simply clone the String before you pass it to the Thread. If no shared objects exist, the threat of corrupted objects in the program outside the stoped Thread is no longer there.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.