Historical Context: This problem ended up being not at all what I thought it was. The cause and solution are below, but the original posting is left for reference.
I'm developing a simple framework for periodically polling a directory for .properties files, then performing SQL queries and sending e-mails based on their configurations. Because each .properties file has the same spectrum of operations, they are all interpreted by the same Task class. But since they each represent different logical operations, they each get separate log files.
This is accomplished by sharing one instance of a log4j RollingFileAppender, and dynamically changing its output file based on a value in the .properties file. Since this is a single-threaded application, this works fine.
However, I've noticed that in certain situations, this RollingFileAppender will become closed, and the application will continue on obliviously except that now no logging takes place. I've only managed to catch this in action once, thanks to the console output, since usually this service is running as a background process on a Linux server. Here's what happened:
1) StartScheduler, the main class, creates a new instance of TaskPoller every minute.
2) TaskPoller scans the directory, loads a little information from each .properties file, and determines if it should be run. It also has its own separate RollingFileAppender, which it retrieves via Logger.getLogger(TaskPoller.class). If a Task should be run, then it instantiates a Task object, passing in the specific .properties file to be run.
3) Task gets its RollingFileAppender, then calls fileAppender.setFile("newtaskname.log") and fileAppender.activateOptions() to change the output file location. Then, during its execution, something like this happens:
[TaskPoller]
...
task = new Task(fileName); //Points RollingFileAppender to the right place
if (!task.Execute())
logger.warn(fileName + " returned with an error code."); //Succeeds
[Task.Execute]
...
try {
dbDAO.Connect();
} catch (Exception e) {
logger.fatal{"Database connection error.", e}; //Different RFA; Fails
return false;
}
[DBDAO.Connect throws SQLException, ClassNotFoundException]
...
try {
Class.forName(dbDriver); //Dynamically loaded jdbc driver class name
connection = DriverManager.getConnection(urlString, userName, password);
} catch (SQLException e) {
if (connection != null)
try { connection.close(); } catch (Exception e2) { ; }
throw e;
}
What's happening is that during DBDAO.Connect(), sometimes I'll get a com.mysql.jdbc.exceptions.jdbc4.CommunicationsException (or some other unexpected exception from whichever jdbc class is loaded). This will not be caught by Connect(), but it will be caught by Execute().
Somehow, this process causes Task's RollingFileAppender to become closed. The only thing I can think of that's special to this situation, as opposed to its consistent and stable normal operation, is that the exception being thrown isn't declared as thrown by Connect(). But I don't think that should cause a log4j Appender to close.
So my question is, what could be causing this appender to unexpectedly close in methods that have nothing to do with its configuration?
--Edit--
It looks like I've been misdirected entirely; the problem is somewhere in the interactions between Quartz, which I was using to have TaskPoller fire every minute, and log4j. I don't entirely understand its cause yet, but [this solution][1] seems to solve this problem. It just didn't manifest itself as an observed problem until now, so I thought it had something to do with what was happening recently.
The real cause of this problem is an interaction between the Quartz scheduler and the way I was using log4j. It turns out, if you modify log4j's properties (which I was doing by calling fileAppender.setFile(fileName) and fileAppender.activateOptions()) on a Quartz worker thread (even if Quartz is configured to only have a single thread running at a time), things break down. This is fixed by reloading the log4j properties on each new instance of the worker thread before using it, which I accomplished like so:
[Task() Constructor]
Properties props = new Properties();
URL url = ClassLoader.getSystemResource("log4j.properties");
try {
props.load(url.openStream());
PropertyConfigurator.configure(props);
} catch (Exception e) {
//The logger that never got renamed never stopped working.
Logger.getLogger(TaskPoller.class).error("Diagnostics!");
}
logger = Logger.getLogger(Task.class);
Related
I have a Docker container with a Java application that uses a DB to persist some data. My application has a class that extends another one that is not code of mine (specifically SinkTask, a class from Kafka that is used to transfer data from Kafka to another system). When the application starts it opens a connection to the database. Sometimes, the database closes the connection and tasks start to fail. The exceptions thrown by these failures are catched in one part of my code and I can think of different ways to handle them:
1. Simply executing the code from within the application that stops and starts the connection again
2. Restarting the Docker container, creating a new connection in the process
I think the best solution is number 1. However, I wanted to know how could I trigger the second situation. My guess is that I should throw a new Exception in the catch block capable of terminating the application (remember that the SinkTask part of the code is out of my control). Would this be a good solution? Which kind of Exception should I throw in this case?
This is the part of the code where I catch the exception
private void commitCollections() {
for (SinkCollection sc : collections.values()) {
try {
commitCollection(sc);
} catch (Exception e) {
LOG.error("Error flushing collection " + sc.getTableName(), e);
}
}
transactionRecordCount = 0;
try {
connection.commit();
} catch (SQLException e) {
LOG.error("Commit error", e);
}
}
Throwing an Exception and letting it propagate in order to terminate the application is a perfectly nice solution. IMO, using System.exit(exit_code) would be better because it clearly describes what that code is doing.
In addition, docker will display the exit_code in the status of the container (docker ps -a), thus helping differentiate between different error conditions. When an uncaught exception is thrown the exit code is always 1.
Hope that helps.
I have the following scenario. My application reads from a configuration file in which I have defined queues, its exclusiveness, thread count and some other details. When application is started, it reads from that configuration and creates DirectMessageListenerContainer for each stated entry. I save these containers in a map in which I associate each of them with a custom name I have given.
On startup, if any failure happens, application fails to run which is what I want.
Now, about the problem. I created a reload method which allows users to change configuration without restarting the application through JMX. So, when the configuration file is changed, and if reload method is called, the following process is performed. The validity of new configuration is checked, if it is correct, it is used to set up the new one. To do so, I first, stop all the containers, then destroy them. After that, I initialize new containers. That's it. The issue is that, what happens when an exception occurs on stopping, destroying or any other next step. I handle the exception, but the issue is that it will leave the current setting broken or half-baked. I would like to have a rollback feature, but I am not sure how that can be done. Because after checking the validity of new configuration, I set it as the current one.
I can save the current setting, check if the new one works, if not then I can initialize the previous one, again. However, I can run into another exception when initializing the previous one.
Here's the reload function. RabbitManager is the class I have created, it has nothing special, just does actions like stop, destroy, etc.
public String reloadConfiguration() {
Rules newRules;
// checking validity of new rules, setting it, handling exceptions...
try {
// setting new rules
// rules variable saves the current rules
rules = newRules;
// basically calls stop in all the containers
rabbitManager.stopAll();
// basically calls destroy in all the containers
rabbitManager.destroyAllContainers();
rabbitManager
.init(rules) // initializes an empty map and sets rules as new rule.
.registerListeners(); // reads rules and creates DirectMessageListenerContainer for each setting
log.info("Configuration has been successfully changed, and stopped");
// returns are for jConsole/monitoring
return "Configuration has been successfully changed, and stopped";
} catch (Exception ex) {
log.error("Exception occurred - "+ex.getMessage(), ex);
// returns are for jConsole/monitoring
return "Exception : "+ex.getMessage();
}
}
I hope my question is clear, if anything is needed or you think the current approach has issues, or maybe I am missing some point, please let me know. Sorry for the vague title, though.
Why don't you just stop() the containers and only destroy them when the new ones are good. If the new config fails, just start() the old containers after destroying the new ones.
I have a java application that's mysteriously dying without any exceptions in the logs. I'm running it in the background via a bash script that wraps a nohup like below:
nohup java -Xms6g -Xmx6g -jar myapp.jar 2>> stderr.txt >> /dev/null & echo $! > /tmp/myapp-pid
The java application is quite memory intensive and so has been configured with 6GB of heap space (running on a 64 bit JVM). It runs fine for about 8 hours and then silently dies. No exceptions in the logs, nothing.
From the main method the app enters an infinite while loop, polls AWS SQS for messages and processes them. This is all wrapped in a try-catch and I am logging in the catch. The application seems to exit after it completes a while loop as it logs the last line. e.g. The application will always end with 'Successfully processed'.
while(true) {
try {
// Logic to poll SQS and process the message
} catch (MyCustomException e) {
// Write to SQS dead letter queue (was throwing at this point)
// Delete message from original SQS
} catch (Throwable e) {
LOG.error(...);
} finally {
LOG.info("Processing time was...");
}
}
I'm not sure where to begin as I would've thought it would log something. Can anyone provide some pointers or maybe some JVM settings to configure so that I can start investigating?
I am wondering if things outside the code may be causing the error. Like perhaps a JVM crash?
Update
It seems like this was indeed a programming error. I didn't think it was causing the issue so I hadn't added it to the code path above (just added it now) but I did have another catch clause catching a custom Exception that I had created. Within that catch I was attempting to move the SQS message to the dead letter queue but did not have permission to it and thus was throwing inside the catch which I wasn't handling.
Thanks for all those that helped in suggesting what may have gone wrong!
Without having more code it is hard to say what actually happens.
But per definition of finally it is executed always, which means also in case of failure. Maybe you are just missing the exception which is written before it.
Try to move the finally call inside the 'try'-block.
while(true) {
try {
// Logic to poll SQS and process the message
LOG.info("Successfully processed");
} catch (Throwable e) {
//As mentioned in the comments try for debugging to log on info level here as well.
// Maybe error level is disabled (although this should be
//very unlikely since error normally is written too when info is written.
LOG.info(...);
} finally {
//Clean up.
}
}
This are two ideas which may help you further investigate your issue.
Don't your system run out of memory? Try running the application from wrapping script, logging the exit code - echo $! >&2 .
Also running dmesg could tell you if oom killer chose ypur application as a victim.
How does async JMS work? I've below sample code:
public class JmsAdapter implements MessageListener, ExceptionListener
{
private ConnectionFactory connFactory = null;
private Connection conn = null;
private Session session = null;
public void receiveMessages()
{
try
{
this.session = this.conn.createSession(true, Session.SESSION_TRANSACTED);
this.conn.setExceptionListener(this);
Destination destination = this.session.createQueue("SOME_QUEUE_NAME");
this.consumer = this.session.createConsumer(destination);
this.consumer.setMessageListener(this);
this.conn.start();
}
catch (JMSException e)
{
//Handle JMS Exceptions Here
}
}
#Override
public void onMessage(Message message)
{
try
{
//Do Message Processing Here
//Message sucessfully processed... Go ahead and commit the transaction.
this.session.commit();
}
catch(SomeApplicationException e)
{
//Message processing failed.
//Do whatever you need to do here for the exception.
//NOTE: You may need to check the redelivery count of this message first
//and just commit it after it fails a predefined number of times (Make sure you
//store it somewhere if you don't want to lose it). This way you're process isn't
//handling the same failed message over and over again.
this.session.rollback()
}
}
}
But I'm new to Java & JMS. I'll probably consume messages in onMessage method. But I don't know how does it work exactly.
Do I need to add main method in JmsAdapter class? After adding main method, do I need to create a jar & then run the jar as "java -jar abc.jar"?
Any help is much appreciated.
UPDATE: What I want to know is that if I add main method, should I simply call receiveMessages() in main? And then after running, will the listener keep on running? And if there are messages, will it retrieve automatically in onMessage method?
Also, if the listener is continuously listening, doesn't it take CPU??? In case of threads, when we create a thread & put it in sleep, the CPU utilization is zero, how doe it work in case of listener?
Note: I've only Tomcat server & I'll not be using any jms server. I'm not sure if listener needs any specific jms server such as JBoss? But in any case, please assume that I'll not be having anything except tomcat.
Thanks!
You need to learn to walk before you start trying to run.
Read / do a tutorial on Java programming. This should explain (among other things) how to compile and run a Java program from the command line.
Read / do a tutorial on JMS.
Read the Oracle material on how to create an executable JAR file.
Figure out what it is you are trying to do ... and design your application.
Looking at what you've shown and told us:
You could add a main method to that class, but to make an executable JAR file, you've got to create your JAR file with a manifest entry that specifies the name of the class with the main method.
There's a lot more that you have to do before that code will work:
add code to (at least) log the exceptions that you are catching
add code to process the messages
add code to initialize the connection factory and connection objects
And like I said above, you probably need some kind of design ... so that you don't end up with everything in a "kitchen sink" class.
if I add main method, should I simply call receiveMessages() in main?
That is one approach. But like I said, you really need to design your application.
And then after running, will the listener keep on running?
It is not entirely clear. It should keep running as long as the main thread is alive, but it is not immediately obvious what happens when your main method returns. (It depends on whether the JMS threads are created as daemon threads, and that's not specified.)
And if there are messages, will it retrieve automatically in onMessage method?
It would appear that each message is retrieved (read from the socket) before your onMessage method is called.
Also, if the listener is continuously listening, doesn't it take CPU???
Not if it is implemented properly.
In case of threads, when we create a thread & put it in sleep, the CPU utilization is zero, how doe it work in case of listener?
At a certain level, a listener thread will make a system call that waits for data to arrive on a network socket. I don't know how it is exactly implemented, but this could be as simple as an read() call on the network socket's InoutStream. No CPU is used by a thread while it waits in a blocking system call.
This link looks like a pretty good place with examples using Oracle AQ. There's an examples section that tells you how to setup the examples and run them. Hopefully this can help.
Link to Oracle Advanced Queueing
I have written a java class where if a method throws an exception, an email is sent, via java mail, with a report to the administrators.
It works - my question is w.r.t elegance - to catch the exception thrown by the main method, the sendEmail() method resides in the catch block of the main method. The sendEmail() method has its own try-catch block.
In effect - it looks like below - is there a more beautiful way of writing this?
try {
foo;
}
catch {
try{
sendEmail();
}
catch {
log(e.message);
}
}
If you want something "more elegant", one simple suggestion is to have your sendEmail helper method catch and log the email exceptions. (I don't imagine you want the exceptions to propagate ... or do some other recovery ...)
However, there is something more important to say. What you are implementing here is the wrong approach to reporting errors.
If something goes badly wrong with your application there is a chance that you will SPAM the administrator with multiple emails reporting the same problem over, and over, and over ...
By sending emails from deep within your code, you are making it hard for the administrator to integrate your application's error reporting.
A better approach is to report the problem via a Java logging frame such as Log4J. If the administrator wants to he / she can configure some kind of monitoring system like LogWatch, Nagios, etc, etc. Such a monitoring system will detect and classify errors, anomalies, etc (like your application's errors) in the various logger streams, de-dup them, and if the administrator configures it send a notification via email, pager or whatever.
Java can have nested try / catch blocks.
If you'd like, you can move the try / catch sendmail block to another method. When the try / catch blocks are more complex, it will make the code easier to understand.