I have a memory leak in two apps in Tomcat 6.0.35 server that appeared "out of nowhere". One app is Solr and the other is our own software. I'm hoping someone has seen this before as it's been happening to me for the last few weeks and I have to keep restarting Tomcat in a production environment.
It appeared on our original server despite the fact that none of the code related to thread or DB connection operation has been touched. As the old server this app runs on was due to be retired I migrated the site to a new server and a "cleaner" environment with the idea that would clear out any legacy stuff. But it continues to happen.
Just before Tomcat shuts down the catalina.out log is filled with errors like:
2012-04-25 21:46:00,300 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak.
2012-04-25 21:46:00,339 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [com.mchan
ge.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2] but has failed to stop it. This is very likely to create a memory leak.
2012-04-25 21:46:00,470 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] is still processing a request that has yet to fin
ish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Conte
xt implementation.
During that migration we went from Solr 1.4->Solr 3.6 in an attempt to fix the problem. When the errors above start filling the log the Solr error below follows right behind repeated 10-15 times and then tomcat stops working and I have to shutdown and startup to get it to respond.
2012-04-25 21:46:00,527 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/solr] created a ThreadLocal with key of type [org.a
pache.solr.schema.DateField.ThreadLocalDateFormat] (value [org.apache.solr.schema.DateField$ThreadLocalDateFormat#1f1e90ac]) and a value of type [org.apache.solr.
schema.DateField.ISO8601CanonicalDateFormat] (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat#6b2ed43a]) but failed to remove it when the web a
pplication was stopped. This is very likely to create a memory leak.
My research has brought up a lot of suggestions about changing the code that manages threads to make sure they kill off DB pooled connections etc. but the this code has not been changed in nearly 12 months. Also the Solr application is crashing and that's 3rd party so my thinking is that this is environmental (jar conflict, versioning, config fat fingered?)
My last change was updating the mysql connector for java to the latest as some memory leak bugs existed around pooling in earlier releases but the server's just crashed again only a few hours later.
One thing I just noticed is I'm seeing thousands of sessions in the Tomcat web manager but that could be a red herring.
If anyone has seen this any help is very much appreciated.
[Edit]
I think I found the source of the problem. It wasn't a memory leak after all. I've taken over an application from another development team that uses c3p0 for database pooling via Hibernate. c3p0 has a bug/feature that if you don't release DB connections c3p0 can go into a waiting state once all the connections (via MaxPoolSize: default is 15) are used. It will wait indefinitely for a connection to become available. Hence my stall.
I upped the MaxPoolSize firstly from 25->100 and my application ran for several days without a hang and then from 100->1000 and it's been running steady ever since (over 2 weeks).
This isn't the complete solution as I need to find out why it's running out of pooled connections so I also set c3p0's unreturnedConnectionTimeout to 4hrs which enforces a 4hr time limit on all connections regardless of whether they're active or not. If it's an active connection it will close it and re-open again.
Not pretty and c3p0 don't recommend it but it gives me some breathing space to find out the source of the problem.
Note: when using c3p0 with Hibernate the settings are stored in your persistence.xml file but not all settings can be put there. Some settings (e.g. unreturnedConnectionTimeout) must go in c3p0.properties
You state that the sequence of events is:
errors appear
Tomcat stops responding
restart is required
However, the memory leak error messages only get reported when the web application is stopped. Therefore, something is triggering the web applications to stop (or reload). You need to figure out what is triggering this and stop it.
Regarding the actual leaks, you may find this useful:
http://people.apache.org/~markt/presentations/2010-11-04-Memory-Leaks-60mins.pdf
It looks both your app and Solr have some leaks that need to be fixed. The presentation will provide you with some pointers. I would also consider an upgrade to the latest 7.0.x. The memory leak detection has been improved and not all improvements have made it into 6.0.x yet.
Related
I have been fighting with this issue for ages and I cannot for the life of me figure out what the problem is. Let me set the stage for the stack we are using:
Web-based Java 8 application
GWT
Hibernate 4.3.11
MySQL
MongoDB
Spring
Tomcat 8 (incl Tomcat connection pooling instead of C3PO, for example)
Hibernate Search / Lucene
Terracotta and EhCache
The problem is that every couple of days (sometimes every second day, sometimes once every 10 days, it varies) in the early hours of the morning, our application "locks up". To clarify, it does not crash, you just cannot log in or do anything for that matter. All background tasks - everything - just halt. If we attempt to login when it is in this state, we can see in our log file that it is authenticating us as a valid user, but no response is ever sent so the application just "spins".
The only pattern we have found to date related to when these "lock ups" occur is that it happens when our morning scheduled tasks or SAP imports are running. It is not always the same process that is running though, sometimes the lock up happens during one of our SAP imports and sometimes during internal scheduled task execution. All that these things have in common are that they run outside of business hours (between 1am and 6am) and that they are quite intensive processes.
We are using JavaMelody for monitoring and what we see every time is that starting at different times in this 1 - 6am window, the number of used jdbc connections just start to spike (as per the attached image). Once that starts, it is just a matter of time before the lock up occurs and the only way to solve it is to bounce Tomcat thereby restarting the application.
As for as I can tell, memory, CPU, etc, are all fine when the lock up occurs the only thing that looks like it has an issue is the constantly increasing number of used jdbc connections.
I have checked the code for our transaction management so many times to ensure that transactions are being closed off correctly (the transaction management code is quite old fashioned: explicit begin and commit in try block, rollback in catch blocks and entity manager close in a finally block). It all seems correct to me so I am really, really stumped. In addition to this, I have also recently explicitly configured the Hibernate connection release mode properly to after_transaction, but the issue still occurs.
The other weird thing is that we run several instances of the same application for different clients and this issue only happens regularly for one client. They are our client with by far the most data to be processed though and although all clients run these scheduled tasks, this big client is the only one with SAP imports. That is why I originally thought that the SAP imports were the issue, but it locked up just after 1am this morning and that was a couple hours before the imports even start running. In this case it locked up during an internal scheduled task executing.
Does anyone have any idea what could be causing this strange behavior? I have looked into everything I can think of but to no avail.
After some time and a lot of trial and error, my team and I managed to sort out this issue. Turns out that the spike in JDBC connections was not the cause of the lock-ups but was instead a consequence of the lock-ups. Apache Terracotta was the culprit. It was just becoming unresponsive it seems. It might have been a resource allocation issue, but I don't think so since this was happening on servers that were low usage as well and they had more than enough resources available.
Fortunately we actually no longer needed Terracotta so I removed it. As I said in the question, we were getting these lock-ups every couples of days - at least once per week, every week. Since removing it we have had no such lock-ups for 4 months and counting. So if anyone else experiences the same issue and you are using Terracotta, try dropping it and things might come right, as they did in my case.
As said by coladict, you need to look at "Opened jdbc connections" page in the javamelody monitoring report and before the server "locks up".
Sorry if you need to do that at 2h or 3h in the morning, but perhaps you can run a wget command automatically in the night.
I've noticed that on continuous redeploys of the application, I've observed some memory leak when I am using VisualVM 1.3.8 to figure out why does the Tomcat installation occasionally run into PermGem problems. And I discovered through VisualVM and also the following errors constantly appearing during testing:
04-Nov-2015 16:35:27.828 WARNING [http-apr-8000-exec-9] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [admin-rrs-sm] appears to have started a thread named [Abandoned connection cleanup thread] but has failed to stop it. This is very likely to create a memory leak.
and also
04-Nov-2015 16:29:24.624 WARNING [http-apr-8000-exec-8] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesJdbc The web application [admin-rrs-sm] registered the JDBC driver [com.mysql.jdbc.Driver] but failed to unregister it when the web application was stopped. To prevent a memory leak, the JDBC Driver has been forcibly unregistered.
That for some reason, even though the old application context has been "destroyed," the entire application is still being held in the memory, and, using VisualVM, it turns out that contextClassLoader which is the closest to the GC root is being referred to by com.mysql.jdbc.AbandonedConnectionCleanupThread. For latest test, here was the stack trace:
Stack trace of thread:
java.lang.Object.wait(Native Method)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:142)
com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:43)
Since my initial code before it packaged as a WAR was in Clojure, I'm not sure how exactly I would be dealing with this. For a start, nothing happened when I decided to try and remove the MySQL connector .jar from the WAR and then deploy the application this way while having the same .jar inside the Tomcat's /lib folder. I still kept getting the same errors. On a slightly less related note, Eclipse's MAT tells me the biggest objects are clojure.lang.LazySeq, clojure.lang.Namespace and clojure.lang.Var, is this normal and is this actually related to my problem?
EDIT: Casual OQL scraping of my heap dumps has led me to discover an absurd amount of threads being kept in memory, presumably because the AbandonedConnectionCleanupThread was never cleared out despite being so-called "deregistered" by Tomcat. Even if I were to define some sort of destroy handler in the :ring { } options object of my project.clj, checking the existing documentation of org.clojure/java.jdbc and not putting mysql-connector-java into the WAR file does not give me a single direction. I'm not asking for any code, but rather I'd just like some pointers as to where to go about starting off so I can implement a destroy handler myself.
Can we automatically restart a websphere application server v6.1 on OOM exception after heap dump is created?we have an enterprise application hosted on websphere application server,recently we are facing OOM exceptions,and from time to time the app server gets automatically restarted after the heap dump is generated.But recently the app server restart is not happening automatically but has to be done manually.Can you please let me know what may be the issue
There is no in built/parameter based option in WAS 6.1 answering your question. It comes in v.7.0.
Better way I/Many follow is write a basic java program to monitor the sysout.log/ syserr.log for the particular String "OutOfMemory" or "in total in the server that may be hung". If the log has any of those string, then (i) stop the server, (ii) rotate the logs (iii)start the server.
Schedule this java program for every 2 or 5 mins.
I wont recommend this method, This is not a good practice as well. I would recommend WASADM should inform the data related team to fix the issue and providing the logs, threads, hprofs, etc..
But most of the time, it is difficult and time consuming for data/application team to fix it immediately. So WAS administrator has to follow these kind of methods.
We are trying to access an application from the tomcat which is on a different host, but it is not loading even though the tomcat is running. It was running fine for the past 3 months. We restarted the tomcat now it is working fine.
But, we could not able to zero in on what happened.
Any idea how to trace / what might have caused this?
The CPU usage was normal and the tomcat memory was 1205640.
the memory setting of tomcat are 1024- 2048(min-max)
We are using tomcat 7.
Help much appreciated....thanks in advance.....cheers!!
...also - not sure on Windows - you may be running out of file descriptors. This typically happens when streams are not properly closed in finally blocks.
In addition, check with netstat if you have a lot of sockets remaining open or accumulating in wait state.
Less likely, the application is creating threads and never releasing them.
The application is leaking something (memory, file descriptors, sockets, threads,...) and running over a limit.
There are different ways to track this. A profiler may help or more simply, running JVM dumps at regular intervals and checking what is accumulating. The excellent MAT will help you analyze the dumps.
Memory leak problems are not uncommon. If your Tomcat instance was running for three months and suddenly the contained application became unresponsive maybe that was the case. One solution (and if your resources allow you to do so) could be monitoring that Tomcat instance though JMX using jconsole to see how it behaves
Have a web application running across multiple locations,
I can see many connections piling up by running this command on linux:
ps -ef|grep LOCAL
shows me the count of active oracle connections with process id's, and the connection count has been growing up by 5-7 number every hour. After few hours, application slows down and eventually tomcat server needs to be restarted.
As, I am able to see connections growing, Is there any way to get the source of these connections, to find out what classes or object's have created these laid up connections?
And I am not using Tomcat connection pooling, I tried generating thread dumps by issuing kill -3 tomcat pid, but of no use to me, as I am not able to understand them, even tried thread analyzers.
Is there any simple way to get the originator classes associated with these laid up connections to get a small hint, using some tomcat feature, or by any other means?
In JProfiler, you yould use the JDBC probe to get the stack trace that opened a connection. You would select the connection in the timeline
and jump to the events view
where you can select the "Connection opened" event. in the lower pane, the associated stack trace is shown.
Disclaimer: My company develops JProfiler
You could search for uses of javax.sql.DataSource.getConnection() using your IDE.
If you start tomcat in debug mode, you can look for instances of the connection class (and see them increasing). Also, putting a breakpoint on the constructor will catch them in the act of being created.
But really you should be using a connection pool. That is the easiest solution to your problems.
Perhaps these two tools can help you to determine what slows your sever application's performance.
jmeter
ab benchmarking tool
Performance might have slowed due to some simple implementation issues too. You might want to use NIO (buffer oriented, non-blocking IO) instead of IO for web applications, also you might be doing a lot of string concatenations (use StringBuffer).