application server restart on OOM exception

application server restart on OOM exception - java

Can we automatically restart a websphere application server v6.1 on OOM exception after heap dump is created?we have an enterprise application hosted on websphere application server,recently we are facing OOM exceptions,and from time to time the app server gets automatically restarted after the heap dump is generated.But recently the app server restart is not happening automatically but has to be done manually.Can you please let me know what may be the issue

There is no in built/parameter based option in WAS 6.1 answering your question. It comes in v.7.0.
Better way I/Many follow is write a basic java program to monitor the sysout.log/ syserr.log for the particular String "OutOfMemory" or "in total in the server that may be hung". If the log has any of those string, then (i) stop the server, (ii) rotate the logs (iii)start the server.
Schedule this java program for every 2 or 5 mins.
I wont recommend this method, This is not a good practice as well. I would recommend WASADM should inform the data related team to fix the issue and providing the logs, threads, hprofs, etc..
But most of the time, it is difficult and time consuming for data/application team to fix it immediately. So WAS administrator has to follow these kind of methods.

Related

Automatic restart of Spring Boot app after fatal exception

In node.js world, there are apps like Supervisor - it's a daemon process that checks if your app is running, and if not (closed, crashed), it restarts it instantly. That's very nice way of temporary handling critical errors in production, when one feature could fail, but rest of system is still running.
I come from PHP background where all you had to do is to press back button in browser, when something is broken.
How to achieve this behavior in Spring Boot ? So far what I noticed is that when app faces unhandled exception, it crashes and whole server is down. I know that those kind of errors are the ones that one should fix asap, but sometimes it's just impossible, and system needs to be running.
Are there any tools that work like Node.js supervisor?

In the past I sometimes used the Java Wrapper by Tanuki which worked quite nice. Otherwise you do have the option to either monitor the process and automatically restart if it fails (dependent on your system environment) or on the highest level of you application catch Throwable, which will not be a good idea because you'll catch fatal cases that by intend should kill your jvm execution, e.g. OutOfMemory...

Application in Tomcat is not responding

We are trying to access an application from the tomcat which is on a different host, but it is not loading even though the tomcat is running. It was running fine for the past 3 months. We restarted the tomcat now it is working fine.
But, we could not able to zero in on what happened.
Any idea how to trace / what might have caused this?
The CPU usage was normal and the tomcat memory was 1205640.
the memory setting of tomcat are 1024- 2048(min-max)
We are using tomcat 7.
Help much appreciated....thanks in advance.....cheers!!

...also - not sure on Windows - you may be running out of file descriptors. This typically happens when streams are not properly closed in finally blocks.
In addition, check with netstat if you have a lot of sockets remaining open or accumulating in wait state.
Less likely, the application is creating threads and never releasing them.
The application is leaking something (memory, file descriptors, sockets, threads,...) and running over a limit.
There are different ways to track this. A profiler may help or more simply, running JVM dumps at regular intervals and checking what is accumulating. The excellent MAT will help you analyze the dumps.

Memory leak problems are not uncommon. If your Tomcat instance was running for three months and suddenly the contained application became unresponsive maybe that was the case. One solution (and if your resources allow you to do so) could be monitoring that Tomcat instance though JMX using jconsole to see how it behaves

How to catch OutOfMemory errors on Amazon EBS (Elastic BeanStalk)

Here's a tricky one for ya - We have a Java web application, deployed on Tomcat web servers on Amazon ElasticBeanStalk. and we believe we have a memory leak b/c it seems that the JVM crashes every night with OutOfMemory exception.
The problem is that after the crash, EBS automatically scraps the old EC2 instance and starts a fresh one. all the logs and info get scrapped too...
I am now developing a custom CloudWatch metric to monitor the memory of the JVM (you would think there should be a prepared one...) but that won't help me generate heap dumps
Has anyone gone through a similar problem and knows how to catch these errors on EBS?

This certainly sounds like unusual EC2 (not EBS) instance behaviour. It's interesting that if Tomcats falls over then the machine instance gets affected (in terms of stopping or terminating).
This is what I would suggest to diagnose:
get a running instance read to examine / play with
take a look at the "Termination Protection" - is this set to "enabled" or not - that could explaing the "scrapping" part of your problem (if by scrapping you mean the instance terminates and is removed). This you can find in the properties of your EC2 instance using the AWS console.
take a look at the Java memory settings your Tomcat server is configured with. Perhaps the max is (Xmx) bigger that the virtual machine has!? If so perhaps Tomcat is literally running the machine out of memory which could explain some of the EC2-response to your out of memory. I assume you mean "stopped" rather than "scrapped" otherwise how would you know your are getting an out of memory error?
if you manually kill the tomcat/java process on a working instance, does the instance stay operational (or do you get booted off and the instance gets stopped)? If something happens simply because you stop tomcat, it means some monitoring process is kicking in and taking down the machine explicitly.
use the -XX:-HeapDumpOnOutOfMemoryError to produce a dump file - this will help you work out where your leak is and hopefully fix the root cause.
Good luck. Hope that helps.

Consider a log collection service like Sumologic. The log files you specify are collected and available for analysis online. So even if your EC2 instances get replaced you can do forensics to see what happened to them

Memory Leak in multiple apps

I have a memory leak in two apps in Tomcat 6.0.35 server that appeared "out of nowhere". One app is Solr and the other is our own software. I'm hoping someone has seen this before as it's been happening to me for the last few weeks and I have to keep restarting Tomcat in a production environment.
It appeared on our original server despite the fact that none of the code related to thread or DB connection operation has been touched. As the old server this app runs on was due to be retired I migrated the site to a new server and a "cleaner" environment with the idea that would clear out any legacy stuff. But it continues to happen.
Just before Tomcat shuts down the catalina.out log is filled with errors like:
2012-04-25 21:46:00,300 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak.
2012-04-25 21:46:00,339 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [com.mchan
ge.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2] but has failed to stop it. This is very likely to create a memory leak.
2012-04-25 21:46:00,470 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] is still processing a request that has yet to fin
ish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Conte
xt implementation.
During that migration we went from Solr 1.4->Solr 3.6 in an attempt to fix the problem. When the errors above start filling the log the Solr error below follows right behind repeated 10-15 times and then tomcat stops working and I have to shutdown and startup to get it to respond.
2012-04-25 21:46:00,527 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/solr] created a ThreadLocal with key of type [org.a
pache.solr.schema.DateField.ThreadLocalDateFormat] (value [org.apache.solr.schema.DateField$ThreadLocalDateFormat#1f1e90ac]) and a value of type [org.apache.solr.
schema.DateField.ISO8601CanonicalDateFormat] (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat#6b2ed43a]) but failed to remove it when the web a
pplication was stopped. This is very likely to create a memory leak.
My research has brought up a lot of suggestions about changing the code that manages threads to make sure they kill off DB pooled connections etc. but the this code has not been changed in nearly 12 months. Also the Solr application is crashing and that's 3rd party so my thinking is that this is environmental (jar conflict, versioning, config fat fingered?)
My last change was updating the mysql connector for java to the latest as some memory leak bugs existed around pooling in earlier releases but the server's just crashed again only a few hours later.
One thing I just noticed is I'm seeing thousands of sessions in the Tomcat web manager but that could be a red herring.
If anyone has seen this any help is very much appreciated.
[Edit]
I think I found the source of the problem. It wasn't a memory leak after all. I've taken over an application from another development team that uses c3p0 for database pooling via Hibernate. c3p0 has a bug/feature that if you don't release DB connections c3p0 can go into a waiting state once all the connections (via MaxPoolSize: default is 15) are used. It will wait indefinitely for a connection to become available. Hence my stall.
I upped the MaxPoolSize firstly from 25->100 and my application ran for several days without a hang and then from 100->1000 and it's been running steady ever since (over 2 weeks).
This isn't the complete solution as I need to find out why it's running out of pooled connections so I also set c3p0's unreturnedConnectionTimeout to 4hrs which enforces a 4hr time limit on all connections regardless of whether they're active or not. If it's an active connection it will close it and re-open again.
Not pretty and c3p0 don't recommend it but it gives me some breathing space to find out the source of the problem.
Note: when using c3p0 with Hibernate the settings are stored in your persistence.xml file but not all settings can be put there. Some settings (e.g. unreturnedConnectionTimeout) must go in c3p0.properties

You state that the sequence of events is:
errors appear
Tomcat stops responding
restart is required
However, the memory leak error messages only get reported when the web application is stopped. Therefore, something is triggering the web applications to stop (or reload). You need to figure out what is triggering this and stop it.
Regarding the actual leaks, you may find this useful:
http://people.apache.org/~markt/presentations/2010-11-04-Memory-Leaks-60mins.pdf
It looks both your app and Solr have some leaks that need to be fixed. The presentation will provide you with some pointers. I would also consider an upgrade to the latest 7.0.x. The memory leak detection has been improved and not all improvements have made it into 6.0.x yet.

How to obtain a Java stack trace from a client running under web start?

I wanted to get ideas from the SO community about this issue.
Here is the problem:
We have a user on the other side of the world launching our app through WebStart. The user, however, is complaining that her whole application freezes up and becomes unresponsive. Usually, the client is doing a lot of database queries to a distributed database.
Questions:
If we ask her to do a CTRL-Break on her application, where would the JVM write the stack trace to?
Would it be enough just to use JConsole?
Would implementing JMX beans on the client be overkill? Would it actually help in troubleshooting issues in production?
Right now the users are running on JRE 1.5.0-b08, but we do plan on migrating to JRE 6 in a couple of months.
What do you think?

José, you can get a lot of information from the JVM in a number of ways.
The best might be to enable debugging in the remote JVM. You can set them using the j2se element in the descriptor XML, as shown here. Since you can set -Xdebug you have a good start; I've never tried to do remote debugging on a web start app, so it may be a little bit of an issue setting up the remote part.
You could also set some things up yourself by adding a separate thread to talk to you remotely and send debugging messages.
You could use a native java or log4j remote logger.
If it's hanging the way you describe, though, the odds are very high that what's happening is a network hangup of some sort. Can you put some tracing/debugging onto your end of the conversation?

Instead of these debugging suggestions, why don't you install an exception handler for your threads? See java.lang.Thread.
void setDefaultUncaughtExceptionHandler(Thread.UncaughtExceptionHandler eh)
Here's the relevant javadoc:
http://java.sun.com/javase/6/docs/api/java/lang/Thread.html#setDefaultUncaughtExceptionHandler(java.lang.Thread.UncaughtExceptionHandler)
If you install that in your code, and once inside Swing's EDT, then just write some java code to e-mail it to yourself, save it on a server, show it to the user, etc.

You need to have the Java Console displayed (run javaws from the command line, and select this from the Preferences dialog), then hit "v"

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.