I'm working on a cross platform application in Java which currently works nicely on Windows, Linux and MacOS X. I'm trying to work out a nice way to do detection (and handling) of 'crashes'. Is there an easy, cross-platform way to detect 'crashes' in Java and to do something in response?
I guess by 'crashes' I mean uncaught exceptions. However the code does use some JNI so it'd be nice to be able to catch crashes from bad JNI code, but I have a feeling that's JVM specific.
For simple catch-all handling, you can use the following static method in Thread. From the Javadoc:
static void setDefaultUncaughtExceptionHandler(Thread.UncaughtExceptionHandler eh) Set the default handler invoked when a thread abruptly terminates due to an uncaught exception, and no other handler has been defined for that thread.
This is a very broad way to deal with errors or unchecked exceptions that may not be caught anywhere else.
Side-note: It's better if the code can catch, log and/or recover from exceptions closer to the source of the problem. I would reserve this kind of generalized crash handling for totally unrecoverable situations (i.e. subclasses of java.lang.Error). Try to avoid the possibility of a RuntimeException ever going completely uncaught, since it might be possible--and preferable--for the software to survive that.
For handling uncaught exceptions you can provide a new ThreadGroup which provides an implementation of ThreadGroup.uncaughtException(...). You can then catch any uncaught exceptions and handle them appropriately (e.g. send a crash log home).
I can't help you on the JNI front, there's probably a way using a native wrapper executable before calling the JVM, but that executable is going to need to know about all the possible JVMs it could be calling and how the indicate crashes and where crash logs are placed etc.
Not sure if this is what you needing, but you can also detect if an exception has occurred from within your native code. See http://java.sun.com/javase/6/docs/technotes/guides/jni/spec/functions.html#wp5234 for more info.
Related
I've seen the JNA Crash protection feature description starting with "It is not uncommon when defining a new library and writing tests to encounter memory access errors which crash the VM". This suggests to me that this crash protection is really intended for debugging / early phases of development. Is it safe to leave it on in production, or is there a large performance cost that I should be aware of / other reason to turn it off?
For Windows, it's on by default and safe to leave that way, because it uses structured exception handling to catch the errors (you should still attempt to shut down gracefully, because there's no guarantee that the caught error is in any way recoverable).
For platforms that use signal handling to trap errors, you should probably not use it in production, since the VM itself uses those same signals. In development you typically need to use the jsig library in order for JNA to be able to properly trap the signals without interfering with the JVM (or vice versa) and it's unlikely you'll be able to do this in production.
There is no performance cost, it's more of a reliability issue.
A great deal has been written about the wisdom of exceptions in general and the use of checked vs. unchecked exceptions in Java in particular, but I'm interested in seeing a defense of the decision to make thread termination the default policy instead of application termination the way it is in C++. This choice seems extremely dangerous to me: some condition that the programmer didn't plan for randomly causes some part of the program to die after logging a stack trace but the rest of the program soldiers resolutely on, what could go wrong? My intuition and experience say that a lot of things can go wrong here and that the default policy is the sort of thing that should only be selected specifically by someone who has a specific reason for choosing it, so what's the upside to this strategy which has such a seemingly large downside? Am I overestimating the risk?
EDIT: based on the answers so far, I feel that I need to be more focused in my description of the dangers that I perceive; I'm talking about the case of an application which uses multiple thread (e.g. in a thread pool), to update shared state. I recognize that this policy does not present a problem for single-threaded applications.
EDIT2: You can see that there is an awareness of these risks among the language maintainers from the explanation for why the Thread.stop() method was deprecated (found here: http://docs.oracle.com/javase/7/docs/technotes/guides/concurrency/threadPrimitiveDeprecation.html). The exact same issues apply when a thread dies unexpectedly due to uncaught exceptions. They must have designed the JVM so that all monitors are automatically unlocked when a thread dies, which seems like a poor implementation choice; having a thread die while it has a monitor locked should be an indication that the entire program should die because the alternative is almost certain to be internal inconsistency in some shared state.
#BD, Not sure what your what your experience says about this because you haven't explained it here. But, here is what I have experienced as a developer:
Generally, it is a bad idea to make application fail if one of its component has failed (temporarily or permanently) due to any reason like DB restart or some file being replaced. for example if I introduced a new type of trade in system and some issue comes in, it shouldn't shutdown my application.
Applications like web/application servers should be able to continue to work and responding to users even if any of its deployment is throwing any weird exception/s.
As per your worry on exceptions, generally all applications have a health monitoring system which monitors their health like CPU/Disk/RAM usage or errors in logs etc. and fire alerts accordingly.
I hope this should resolve your confusion.
From discussing this issue with a co-worker and also reviewing the answers received so far, I have formed a supposition here, and would like to get some feedback.
I suspect that the decision to make this behavior the default has its roots in the philosophy that defined the early development of language, as well as its early environment.
As part of the original philosophy, programmers/designers were expected to use checked exceptions, and the language enforces that checked exceptions which may be emitted by a method call (i.e. have been declared in the method definition) must be handled in the calling method, or else be declared by it to "officially" pass the responsibility to higher-level callers. Common practice has moved sharply away from the use of checked exceptions, not to mention the fact that one of the most commonly occurring exceptions in practice, NullPointerException, is unchecked. As a result of this, programmers must now assume that any method call can generate an unchecked exception, and the corollary to this is that any code which updates shared data in a concurrent context must implement transactional semantics for these updates in order to be fully correct. My experience is that most developers don't really understand this even if they do understand the basics of multi-threaded development, such as avoiding deadlock when managing critical sections with synchronization. The default uncaught exception handler behavior exacerbates the problem by masking its effects: in C++, it wouldn't matter if an uncaught exception would result in corrupted shared state because the program is dead anyway, but in Java the program will continue to limp along as best it can despite the fact that it is very likely to no longer be operating correctly.
The environmental factor is that single-threaded programs were likely the norm when the language was first developed, so the default behavior masqueraded as the correct one. The rise of multi-core architectures and increased usage of thread pools exposes the threat more broadly, and commonly applied approaches such as use of immutable objects can only go so far to solve it (hint for some: ConcurrentMap is probably not as safe as you think it is). My experience so far is that people who deny this risk are not being paranoid enough relative to the actual safety requirements of their code, but I would love to be proved wrong.
I suspect that modifying uncaught exception handlers to terminate the program should be the standard procedure required by most development organizations; at the very least this should be done for thread pools which are known to update shared state based on incoming inputs.
Normal (no GUI, no container) applications exit on uncaught exceptions - the default behavior is fine - as same as you wanted.
A GUI based application, it will be nice to to show error message and able to handle the error more usefully - for example we could submit a defect report with some additional information.
The behavior is fully changeable by providing thread specific exception handler- that could include application exit.
Here are some useful notes
I've run into the issue where I have a program (not written by me, by someone else) I want to run 24/7, but sometimes it crashes. Normally, this wouldn't be an issue because I can simply create a process watcher that checks if it crashed, and then restarts it if necessary.
But, this particular program sometimes throws an exception and outputs it into the graphical interface that's integrated into it. In this instance, the program doesn't crash at all. The interface stays up, but the actual server functionality is unavailable.
Is there any way I can intercept this information from this process?
You want to use the Java Virtual Machine Tools Interface. I can't give you the code to catch your exception, but this is where to look. You'll have to do some detective work to find the class that throws the exception, or at least to find some indicator that it has been thrown.
Edit: You can also try calling the vendor to see if they know of a way. You can also look to see if it is writing the exception to a log file, which you could then watch.
This may or may not work, but if when the application displays it's error and the server stops working does the memory usage drop? If so you could probably just add some logic to your process monitor to call the windows command tasklist to see if the memory usage drops below some threshold. You'll have to check how much memory the program normally uses and how much it uses after the error though.
Since you said the server functionality stops working, another option could be to write a simple program that basically just pings the server how ever often you want to make sure it is still up. If not, kill the process and restart it.
I assume you have no access to the source code, so if it is outputting to the GUI the answer is no. Even if you could attach to the running process you would need to intercept the exception, but it is caught and sent to the GUI, not thrown from the application.
In theory, you could screen scrape the application. I don't know of any specific tools for doing this, but they may be out there.
Edit: I may have been wrong above, check out a post here where they get the stack from a running thread. You probably won't be able to capture the exception this way, but if you're lucky the stack trace will look very different when the program is operating normally compared to when an exception has been thrown.
Edit 2: I submitted a second, more accurate answer. See below.
Is the other program Java? Look at AspectJ, you may be able to hack something using it if you have control on the program startup.
Without ability to rebuild the app you are generally out of luck unless you do some extensive hacking. Here is one option I can think of.
Most likely the application replaces System.out and/or System.err with its own stream implementation. If that's the case you can try to locate the class for this stream and replace it with your own wrapper with the same name. You may rename original class using jarjar. In the wapper you can provide console output to detect the exception.
I started looking into JNI and from what I understand is that if a problem occurs with the loaded dll, the jvm is possible to terminate on the spot.
I.e. the process can not be protected e.g. like when catching an exception.
So if my understanding is correct, my question is if there is a standard approach/pattern for this situation when using jni.
Or to state it differently, are processes using jni designed in way to avoid these issues?
Or such problems are not expected to occur?
Thank you.
Yes, the JVM will just terminate which is one of the reasons why JNI code is really hard to debug. If you are using C++ code you can use exceptions and then map them to a Java exception which at least gives you some level of security but doesn't help with things like bad memory access etc.
From an architecture point of view I suggest to decouple you code from JNI as much as possible. Create a class / procedure structure that is entirely testable from C++/ C and let the JNI code do only all the conversion stuff. If the JVM then crashes you at least know where you have to look.
The principles are no different from any multi-threaded C application:
Always check all your input thoroughly.
Always free up temporary memory you allocated.
Make sure your functions are re-entrant.
Don't rely on undefined behaviour.
The Java virtual machine offers you no extra protection for your native code, if it fails or is leaking, your VM will fail or leak.
You can have exactly the same spectrum of error handling in a JNI library as in anything else.
You can use try/catch. If you are on Windows, you can use SEH. If you are on Linux, you can call sigaction.
Still, if you mess up and there's a SIGSEGV, your JVM is probably toast whether you try to catch that signal or not.
Following on from this question https://softwareengineering.stackexchange.com/questions/37294/logging-why-and-what
I was wondering what actually happens to an error that occurs during the runtime of a Java Enterprise Edition applicaiton.
Does the JVM store a log of all the errors?
Or are the errors forgotten?
It is contingent where the output is directed. If output is getting pushed to a console window then yes...it is all but lost. An enterprise application however would be making use of a logging framework to deal with all output thus rendering any exception available within the logs provided by the framework.
Application servers usually have a big catch-all net to mop up any unhandled exceptions. However, if an exception is allowed to bubble up without it ever hitting a catch close, the thread it came from will die and the exception will be passed to the thread's UncaughtExceptionHandler, if one exists.
an error happens when the program is running. and is being handled by exceptions.
Exceptions : ignore the error, handel the exception, and go back to method that was called.