I have a java application that's mysteriously dying without any exceptions in the logs. I'm running it in the background via a bash script that wraps a nohup like below:
nohup java -Xms6g -Xmx6g -jar myapp.jar 2>> stderr.txt >> /dev/null & echo $! > /tmp/myapp-pid
The java application is quite memory intensive and so has been configured with 6GB of heap space (running on a 64 bit JVM). It runs fine for about 8 hours and then silently dies. No exceptions in the logs, nothing.
From the main method the app enters an infinite while loop, polls AWS SQS for messages and processes them. This is all wrapped in a try-catch and I am logging in the catch. The application seems to exit after it completes a while loop as it logs the last line. e.g. The application will always end with 'Successfully processed'.
while(true) {
try {
// Logic to poll SQS and process the message
} catch (MyCustomException e) {
// Write to SQS dead letter queue (was throwing at this point)
// Delete message from original SQS
} catch (Throwable e) {
LOG.error(...);
} finally {
LOG.info("Processing time was...");
}
}
I'm not sure where to begin as I would've thought it would log something. Can anyone provide some pointers or maybe some JVM settings to configure so that I can start investigating?
I am wondering if things outside the code may be causing the error. Like perhaps a JVM crash?
Update
It seems like this was indeed a programming error. I didn't think it was causing the issue so I hadn't added it to the code path above (just added it now) but I did have another catch clause catching a custom Exception that I had created. Within that catch I was attempting to move the SQS message to the dead letter queue but did not have permission to it and thus was throwing inside the catch which I wasn't handling.
Thanks for all those that helped in suggesting what may have gone wrong!
Without having more code it is hard to say what actually happens.
But per definition of finally it is executed always, which means also in case of failure. Maybe you are just missing the exception which is written before it.
Try to move the finally call inside the 'try'-block.
while(true) {
try {
// Logic to poll SQS and process the message
LOG.info("Successfully processed");
} catch (Throwable e) {
//As mentioned in the comments try for debugging to log on info level here as well.
// Maybe error level is disabled (although this should be
//very unlikely since error normally is written too when info is written.
LOG.info(...);
} finally {
//Clean up.
}
}
This are two ideas which may help you further investigate your issue.
Don't your system run out of memory? Try running the application from wrapping script, logging the exit code - echo $! >&2 .
Also running dmesg could tell you if oom killer chose ypur application as a victim.
Related
I got the error "Process finished with exit code 1" when I was running my Java code. I am using Intellij IDEA 2018.3. Below is the error log I got.
While running a Java application in Intellij Idea, after the program execution, JVM prints the exit code to the console. If the program terminates without any exception, exit code 0 is printed. Otherwise, any signed integer may be outputted.
To get more information on the exit code try putting a try catch around the SpringApplication.run() function like this:
try
{
SpringApplication.run(Application.class, args);
}
catch (Throwable throwable)
{
System.out.println(throwable.toString());
throwable.printStackTrace();
}
For springboot projects, the most common reason is org.springframework.beans.factory.BeanCreationException.
Search BeanCreationException, debug at each construct function, and debug the projects.
Then you will find out the 'beanName' with problem, then you can focus on the bean.
for example:
Say I have the following code:
try {
//Do something with File
} catch (FileNotFoundException e) {
outputInfo("Error in IO Redirection", true);
e.printStackTrace();
System.exit(1);
}
My program exits right after this catch location, is a single thread (one main method) program and should not expect to recover from such an exception.
Should I really be using System.exit(1); ?
If you expect someone else to run your program, and they rely on the process status code to know if your program has succeeded or failed, then you should use System.exit(1);
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#exit%28int%29
Terminates the currently running Java Virtual Machine. The argument
serves as a status code; by convention, a nonzero status code
indicates abnormal termination.
One of the reasons to use a non zero exit code on failure of an application is that they can be used in batch files. If your application is a console application always use proper exit code. You don't know how it will be used in future.
When I use 4 threads for my program there is usually no problems, but today I increased it to 8 and I noticed 1-3 threads stop working without throwing any exceptions. Is there anyway to find out why they are stopping? is there anyway to make the thread restart?
This is how the structure of my thread is
public void run()
{
Main.logger.info(threadName + ": New Thread started (inside run)");
while (true)
{
try
{
//all my code
//all my code
//all my code
}
catch(Exception e)
{
Main.logger.error("Exception: " + e);
try
{
Thread.sleep(10000);
}
catch (InterruptedException e1)
{
e1.printStackTrace();
}
}
finally
{
try
{
webClient.closeAllWindows();
Thread.sleep(3000);
Main.logger.info(threadName + ": Closed browser!");
}
catch (Exception e)
{
Main.logger.error("Exception: " + e);
}
}
}// end while
}
Regards!
Note that an Error is not an Exception; it's a Throwable.
So, if you catch Exception, Errors will still get through:
private void m() {
try {
m(); // recursively calling m() will throw a StackOverflowError
} catch (Exception e) {
// this block won't get executed,
// because StackOverflowError is not an Exception!
}
}
to catch "everything", change your code to this:
try {
...
} catch (Throwable e) {
// this block will execute when anything "bad" happens
}
Note that there might be little you can do if an Error occurs. Excerpt from javadoc for Error:
An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch. Most such errors are abnormal conditions. The ThreadDeath error, though a "normal" condition, is also a subclass of Error because most applications should not try to catch it.
Is there anyway to find out why they are stopping?
That's a bit tricky.
A Java thread can terminate for two reasons:
it can return from its run() method,
it can terminate due to an exception being thrown and not caught on the thread's stack.
You can detect the latter case by using an "UncaughtExceptionHandler" for the thread, but the former case can't be positively detected unless you modify your thread's run() method to log the event ... or something like that.
I guess, the other way to figure out what is going on would be to attach a debugger to the JVM and get it to report the uncaught exception to you.
(I suspect that the reason you are not seeing any exceptions is that your threads' run methods are not catching / logging all exceptions, AND they don't have an uncaught exception handler.)
is there anyway to make the thread restart?
No. There is no way to restart a Thread that has terminated.
If you are running from the command line, you can have dump states of all threads to the console. On windows you do this by hitting Ctrl+Break, under linux, by sending the QUIT signal to the process with 'kill'.
Please refer to An Introduction to Java Stack Traces
Sending a signal to the Java Virtual Machine On UNIX platforms you can
send a signal to a program by using the kill command. This is the quit
signal, which is handled by the JVM. For example, on Solaris you can
use the command kill -QUIT process_id, where process_id is the process
number of your Java program.
Alternatively you can enter the key sequence \ in the window
where the Java program was started. Sending this signal instructs a
signal handler in the JVM, to recursively print out all the
information on the threads and monitors inside the JVM.
To generate a stack trace on Windows 95, or Windows NT platforms,
enter the key sequence in the window where the Java
program is running, or click the Close button on the window.
Thread priority on one of them could be too high, try setting them the same level through?
Deadlocking is possible if there is any control on each and other between them.
Historical Context: This problem ended up being not at all what I thought it was. The cause and solution are below, but the original posting is left for reference.
I'm developing a simple framework for periodically polling a directory for .properties files, then performing SQL queries and sending e-mails based on their configurations. Because each .properties file has the same spectrum of operations, they are all interpreted by the same Task class. But since they each represent different logical operations, they each get separate log files.
This is accomplished by sharing one instance of a log4j RollingFileAppender, and dynamically changing its output file based on a value in the .properties file. Since this is a single-threaded application, this works fine.
However, I've noticed that in certain situations, this RollingFileAppender will become closed, and the application will continue on obliviously except that now no logging takes place. I've only managed to catch this in action once, thanks to the console output, since usually this service is running as a background process on a Linux server. Here's what happened:
1) StartScheduler, the main class, creates a new instance of TaskPoller every minute.
2) TaskPoller scans the directory, loads a little information from each .properties file, and determines if it should be run. It also has its own separate RollingFileAppender, which it retrieves via Logger.getLogger(TaskPoller.class). If a Task should be run, then it instantiates a Task object, passing in the specific .properties file to be run.
3) Task gets its RollingFileAppender, then calls fileAppender.setFile("newtaskname.log") and fileAppender.activateOptions() to change the output file location. Then, during its execution, something like this happens:
[TaskPoller]
...
task = new Task(fileName); //Points RollingFileAppender to the right place
if (!task.Execute())
logger.warn(fileName + " returned with an error code."); //Succeeds
[Task.Execute]
...
try {
dbDAO.Connect();
} catch (Exception e) {
logger.fatal{"Database connection error.", e}; //Different RFA; Fails
return false;
}
[DBDAO.Connect throws SQLException, ClassNotFoundException]
...
try {
Class.forName(dbDriver); //Dynamically loaded jdbc driver class name
connection = DriverManager.getConnection(urlString, userName, password);
} catch (SQLException e) {
if (connection != null)
try { connection.close(); } catch (Exception e2) { ; }
throw e;
}
What's happening is that during DBDAO.Connect(), sometimes I'll get a com.mysql.jdbc.exceptions.jdbc4.CommunicationsException (or some other unexpected exception from whichever jdbc class is loaded). This will not be caught by Connect(), but it will be caught by Execute().
Somehow, this process causes Task's RollingFileAppender to become closed. The only thing I can think of that's special to this situation, as opposed to its consistent and stable normal operation, is that the exception being thrown isn't declared as thrown by Connect(). But I don't think that should cause a log4j Appender to close.
So my question is, what could be causing this appender to unexpectedly close in methods that have nothing to do with its configuration?
--Edit--
It looks like I've been misdirected entirely; the problem is somewhere in the interactions between Quartz, which I was using to have TaskPoller fire every minute, and log4j. I don't entirely understand its cause yet, but [this solution][1] seems to solve this problem. It just didn't manifest itself as an observed problem until now, so I thought it had something to do with what was happening recently.
The real cause of this problem is an interaction between the Quartz scheduler and the way I was using log4j. It turns out, if you modify log4j's properties (which I was doing by calling fileAppender.setFile(fileName) and fileAppender.activateOptions()) on a Quartz worker thread (even if Quartz is configured to only have a single thread running at a time), things break down. This is fixed by reloading the log4j properties on each new instance of the worker thread before using it, which I accomplished like so:
[Task() Constructor]
Properties props = new Properties();
URL url = ClassLoader.getSystemResource("log4j.properties");
try {
props.load(url.openStream());
PropertyConfigurator.configure(props);
} catch (Exception e) {
//The logger that never got renamed never stopped working.
Logger.getLogger(TaskPoller.class).error("Diagnostics!");
}
logger = Logger.getLogger(Task.class);
I have written a java class where if a method throws an exception, an email is sent, via java mail, with a report to the administrators.
It works - my question is w.r.t elegance - to catch the exception thrown by the main method, the sendEmail() method resides in the catch block of the main method. The sendEmail() method has its own try-catch block.
In effect - it looks like below - is there a more beautiful way of writing this?
try {
foo;
}
catch {
try{
sendEmail();
}
catch {
log(e.message);
}
}
If you want something "more elegant", one simple suggestion is to have your sendEmail helper method catch and log the email exceptions. (I don't imagine you want the exceptions to propagate ... or do some other recovery ...)
However, there is something more important to say. What you are implementing here is the wrong approach to reporting errors.
If something goes badly wrong with your application there is a chance that you will SPAM the administrator with multiple emails reporting the same problem over, and over, and over ...
By sending emails from deep within your code, you are making it hard for the administrator to integrate your application's error reporting.
A better approach is to report the problem via a Java logging frame such as Log4J. If the administrator wants to he / she can configure some kind of monitoring system like LogWatch, Nagios, etc, etc. Such a monitoring system will detect and classify errors, anomalies, etc (like your application's errors) in the various logger streams, de-dup them, and if the administrator configures it send a notification via email, pager or whatever.
Java can have nested try / catch blocks.
If you'd like, you can move the try / catch sendmail block to another method. When the try / catch blocks are more complex, it will make the code easier to understand.