Java - under which circumstances may a JVM abruptly crash? - java

I'm running a daemon java process on my ubuntu machine :
java -cp (...) &> err.log&
The process runs for a random period of time and then just disappears. Nothing in LOGs, err.log, no JVM crash file created (hs_err_*.log), nothing. My two questions are :
1) Under which circumstances can a java process abruptly finish ?
2) IS there any way to know what happened to the process (knowing PID) ? Does UNIX keep information about finished processes somehow ?

1) Under which circumstances can a java process abruptly finish ?
When it exits by its own but I guess you ruled out that or when it is killed with SIGKILL. This might be the oom killer if you are on Linux. Did you look at the system message logs ?
2) IS there any way to know what happened to the process (knowing PID) ?
Generally not unless you configure some tracing tools to get that information
Does UNIX keep information about finished processes somehow ?
No, but depending on the Unix variant you are using, that might be something simple to add.
In your example, you can just print the process exit status with echo $?
If it is 265, that would mean the process was killed with signal 9 (=265-256).

I would write a simple shell script that somehow alerts me when the JVM terminated. Perhaps send an email with the JVM's exit code.
#!/bin/sh
# Launch JVM and wait...
java -cp ...
# do something with the exit code $?
# log it to a file or mail it yourself
Perhaps, the exit code might reveal something.

I would run it as a daemon with YAJSW as it allows several ways to monitor the memory etc, has restart options, and you can also enable log on the wrapper process so you can have much info when there is an issue.

Related

Creating a Nohup Process in Java

Using ProcessBuilder, I've been trying to create an independent process that doesn't get terminated when the JVM gets terminated, but nothing seems to work.
I've tried /usr/bin/nohup commands, but that still seems to terminate when the JVM that launched it is terminated. Is there any way to accomplish this in Java?
Well, first things first lets write a test script that validates what you're seeing:
$ cat /tmp/test.sh
#!/bin/bash
for sig in SIGINT SIGTERM SIGHUP; do
trap "echo Caught $sig" $sig
done
echo Traps registered, sleeping
sleep 10
echo Done sleeping, exiting
When I invoke this via:
new ProcessBuilder("/tmp/test.sh").inheritIO().start().waitFor(1, TimeUnit.SECONDS);
I see the Java process terminate after 1 second (since waitFor() timed out), but the subprocess keeps going (the Done sleeping message is printed to the console after the JVM exits). This lines up with what's discussed in this question.
So the first thing I'd suggest doing is validating your initial assumption that the subprocess is in fact being killed; perhaps something else is going wrong that causes it to die for another reason. Using .inheritIO() can help debugging, in case the subprocess is generating error messages you're not seeing.
All that said, nohup may not be what you want, as 'that other guy' notes; nohup only causes a process to ignore SIGHUP, and it will still respect other signals, notably SIGINT and SIGTERM (Process.destroy() sends a SIGTERM, for instance). See this question for more.
As with all problems in programming, introducing more layers would probably help :)
Create a shell script that handles the desired disowning logic, e.g.
$ cat start-and-disconnect.sh
#!/bin/bash
# obviously tweak this as necessary to get the behavior you want,
# such as redirecting output or using disown instead of nohup
nohup "$#" &
The advantage here is that Bash (or whatever shell you prefer) has better process-management control than Java, and now you can test it in isolation, without needing to compile and run your Java application.
Once you're happy with the script, your Java code simply becomes:
new ProcessBuilder("/path/to/start-and-disconnect.sh", "/path/to/your_binary.sh")
.inheritIO().start().waitFor();
You can safely .waitFor() the call to complete, since start-and-disconnect.sh will exit after starting its subprocess.

kill -3 or jstack : What is the difference?

I want to get the thread dump of my web app that running on a jboss server.
I found two solutions for my problem :
Using the unix command : kill -3
Using the jstack tool that exists in the JDK.
Can anyone explain to me the difference between theses two methods?
Thanks in advance !
The jstack command can get a thread dump of a program running on a remote machine, and it also works on Windows.
kill -3 only works on local programs, and on Windows there is no kill.
From the oracle page of jstack:
The output from the jstack pid option is the same as that obtained by pressing Ctrl+\ at the application console (standard input) or by sending the process a QUIT signal.
Also remember that Ctrl+\ is equivalent to a SIGQUIT.
From what is kill -3 (unix.se):
kill -l shows us all signals. Following this hint 3 means SIGQUIT
So basically both of them do exactly the same thing, i.e asking for a coredump. Here are some pointers related to jstack:
Jstack performs deadlock detection by default.
Regarding official support, from the jstack man page:
Prints Java thread stack traces for a Java process, core file, or remote debug server. This command is experimental and unsupported.
This utility is unsupported and might not be available in future release of the JDK. In Windows Systems where the dbgeng.dll file is not present, Debugging Tools For Windows must be installed so these tools work.
Regarding the output difference, its basically the same thing. There is a one to one mapping between the outputs. See my output for the same application to demonstrate the mapping between the statuses of kill -3 and jstack. The mapping between the statuses are:
kill -3 | Jstack
------------------------------
RUNNABLE | IN_NATIVE
TIMED_WAITING | BLOCKED
WAITING | BLOCKED (PARK)
In Windows you have something called "taskkill /PID {yourpid} /F" for killin process. The process id can be obtained from netstat command or use viusal vm to know process id

killing a bash process does not kill the process the bash is currently running

The scenario is as follows: I have a java daemon, which is supposed to not terminate. However, in case of an unexpected error, the crashed JVM should be restarted by a script. So I wrote a command which starts a background bash which has a loop starting the JVM (so when the JVM terminates, it will be restarted again).
/bin/bash -c "while true; do java ...; done" &
In order to be able to stop the daemon, I thought of killing this bash background process (by saving it's process id in a file). This works insofar as the background bash doesn't restart the JVM, but still doesn't kill the currently running process - so the bash seems to end it's current command before it checks for a kill command. I would like to have the currently running JVM to be killed, too.
Since I don't want to manage 2 PIDs (one for the background bash and one for the currently running JVM), is there a way of "force kill" which by design stops the current command? (I couldn't find such thing in man kill)?
There are a number of process-management tools built for exactly this purpose: runit, daemontools, upstart... even an entry in the SysV inittab table.
All of these will automate restarting immediately on shutdown, track desired status as opposed to current status (and attempt to signal startup or shutdown as-desired), manage signal delivery, etc.
You can trap signals in bash and trigger events on them, but that only handles the subset which can be trapped (you can't trap a KILL, for instance). The better thing is to use a tool built-to-purpose.
The ProcessManagement page of the wooledge.org wiki (used by irc.freenode.org's #bash channel) has some other concrete suggestions on doing this yourself in bash... though it too suggests runit, daemontools, and their kin as the best-practices approach.
Why not use cron to start your app, and manage only 1 pid, the one belonging to your app? That way you'll always be killing the correct process.
Emphasising a bit, you could create a bash script to manage your app: start|stop|status. On start it will save the java pid to a file. Then you can schedule a cron job to verify the status of the app, and if the pid does not exist, relaunch it.
Isn't this the default behaviour of bash? I thought for example zsh does the opposite and doesn't send a SIGHUP to all child process? Maybe you can try this answer and write a little script and start it with disown?
see this question: Tie the life of a process to the shell that started it
I didn't test it but I need zsh in my webserver because I start it manually and exit my shell with double CTRL-D.

How can you find out if a Java program failed on a Linux server?

I was running a big overnight batch program last night written in Java on a Linux based server. I can't seem to find anything in my error logs that suggests an error was encountered in my Java application.
Is there a way in Linux to see if a program exited unexpectedly?
The program is one of many programs that get run overnight off a chronjob/tab and runs off its own main method. It catches a series of exceptions which prints messages to System.err.println and exits with status one if these are hit.
NB: I always use a Logger in my code unfortunately I'm dealing with legacy code written by someone else.
If Java crashed there will be a hs_err_pid????.log file in the working directory of the application by default. (This is unlikely to be the case)
If the application logged an error before exiting, you need to understand where your application places its logs and read those (as they can be anywhere on your system)
There's no easy mechanism to discover what you're after, if whatever tool you used to start the java JVM didn't bother recording the exit status for you.
If you're running the auditd(8) server to provide audit logging, and your auditd(8) is configured to log abnormal exits and your java JVM exited abnormally -- signal-based termination -- then you can look for ANOM_ABEND events in /var/log/audit/audit.log:
# ausearch -m ANOM_ABEND
/sbin/audispd permissions should be 0750
----
time->Tue Nov 8 18:42:22 2011
type=ANOM_ABEND msg=audit(1320806542.571:264): auid=4294967295 uid=1000 gid=1000 ses=4294967295 pid=11955 comm="regex" sig=11
...
For future executions you might want to do something like this:
java /path/to/whatever.jar && echo `date` >> /path/to/dir/success || echo `date` >> /path/to/dir/failure
This will echo the date of success or failure into a log file -- assuming that your application uses the standard Unix-style exit(0) for success and anything else for failure.
Because you've run your programs out of cron(8), there's a good chance that the standard error of the program has in fact been captured and mailed somewhere.
Check the crontab(5) for the user account that runs the program. (If it is run out of /etc/crontab or /etc/cron.d/, then in those files.) Look for the MAILTO variable. If it doesn't exist, then cron(8) tried to deliver mail to the crontab(5) owner. If it does exist, then cron(8) tried to deliver mail to whoever is specified with the variable.
Look in /var/spool/mail/ for the user's mailbox, if the server doesn't seem like it's got an email setup in place -- there might be enough for local delivery.
it is the virtual machine process (named java) whose termination status you want to check. you can write a trivial script with 2 commands, the first invokes the java vm to run the java program and the second records the exit status: echo $?
If you did write the application, you should use a logger that writes to a file.
See this tutorial how to use Log4j with a file appender. In your code, you need to catch and log exceptions.
See this issue.

Why does the JVM return exit status code 143?

A Java application running as an scheduled task on Windows 2003 crashed with no logs or anything that would help to find out what happened. The only information available, is that the application returned code 143 (8F). That error code was retrieved from the scheduled tasks log.
Does anyone knows what that error code (143) stands for? Is it possible that an user logging off could cause the application to be terminated?
Thanks,
143 often means that the application was terminated due to a SIGTERM command. See also https://unix.stackexchange.com/questions/10231/when-does-the-system-send-a-sigterm-to-a-process
However, please note that an application might use 143 for its own custom result.
JVM error code 143 means Internal field must be valid. This is discussed on the OTN discussion forums. However, the conclusion seems to be something killed your process.
I suspect this could indeed be caused by a user logging off.
An user logging off would signal the CTRL_LOGOFF_EVENT signal to all running processes. From https://msdn.microsoft.com/en-us/library/windows/desktop/aa376876(v=vs.85).aspx:
The system also sends the CTRL_LOGOFF_EVENT control signal to every
process during a log-off operation.
Now, under certain circumstances it will terminate the Java application with error code 143 (SIGTERM). See https://bugs.openjdk.java.net/browse/JDK-6871190.
Well, anyway, what you need to stop this from happening is to start Java with the -Xrs option. From https://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.win.80.doc/diag/appendixes/cmdline/Xrs.html:
Setting -Xrs prevents the Java™ run time environment from handling any
internally or externally generated signals such as SIGSEGV and
SIGABRT.
So you should start your Java application with something like:
>java -Xrs -jar myapplication.jar
PS:
The relation between SIGTERM and 143 number is explained in https://unix.stackexchange.com/questions/10231/when-does-the-system-send-a-sigterm-to-a-process#comment13523_10231.

Categories

Resources