Pitfals of deploying/redploying app to Tomcat without restarting

Pitfals of deploying/redploying app to Tomcat without restarting - java

I have read that it is possible with Tomcat 5.5+ to deploy a war to a Tomcat server without a restart. That sounds fantastic but I guess I am too skeptical about this functionality and it's reliability. My previous experience (with Websphere) was that it was a best practice to restart the server to avoid memory problems, etc. So I wanted to get feedback as to what pitfalls might exist with Tomcat.
(To be clear about my experience, I developed java web apps for 5 years for a large company that partitioned the app developers from the app server engineers - we used Websphere - so I don't have a lot of experience with running/configuring any app servers myself)

In general, there are multiple type of leaks and they apply to redeploy-scenarios. For production systems, it's really the best to perform restarts if possible, as there are so many different components and libraries used in todays applications that it's very hard to find them all and even harder to fix them. Esp. if you haven't got access to all source code.
Memory leaks
Thread and ThreadLocal leaks
ClassLoader leaks
System resource leaks
Connection leaks
ClassLoader leaks are the ones which bite at redeployment.
They can be caused by everything. Really, i mean everything:
Timers: Timers have Threads and Threads created at runtime inherit the current context class loader, which means the WebappClassloader of Tomcat.
ThreadLocals: ThreadLocals are bound to the thread. App servers use Thread pools. When a ThreadLocal is bound to a Thread and the Thread is given back to the pool, the ThreadLocal will stay there if nobody removes() it properly. Happens quite often and very hard to find (ThreadLocals do not have a name, except the rarely used Spring NamedThreadLocal). If the ThreadLocal holds a class loaded by the WebappClassloader, you got a ClassLoader leak.
Caches: e.g. EhCache CacheManager
Reflection: JavaBeans Introspector (e.g. holding Class or Method caches)
JDBC Drivers: they shouldn't be in the .war file anyway. Leak due to static registry
Static libraries which cache ClassLoaders, such as Commons-Logging LogFactory
Specific to Tomcat, my experience is as follows:
For simple apps with "clean" libraries, it works fine in Tomcat
Tomcat tries very hard to clean up classes loaded by the WebappClassloader. For example, all static fields of classes are set to null when a webapp is undeployed. This sometimes leads to NullPointerExceptions when code is run while the undeployment is happening, e.g. background jobs using a Logger
Tomcat has a Listener which cleans up even more stuff. Its called org.apache.catalina.core.JreMemoryLeakPreventionListener and was submitted recently to Tomcat 6.x
I wrote a blog post about my experience with leaks when doing redeployment stresstesting - trying to "fix" all possible leaks of an enterprise-grade Java Web Application.

Hot deployment is very nice as it usually is much faster than bringing the server up and down.
mhaller has written a lot about avoiding leaks. Another issue is for active users to have their session survive the application "reboot". There are several things that must be taken care of, but which all in all means that their session must be serializable and THEN deserialize properly afterwards. This can be a bit tricky if you have stateful database connections etc, but if your code is robust against database hickups anyway that shouldn't be too bad.
Also note that some IDE's allow updating code inside the WAR (in the same way as applications) when saving a modified source file, instead of having to redeploy. MyEclipse does this rather nicely.

Related

Java equivalent of .net recycle

What is the Java equivalent of the .net Recycle for web apps in IIS.
This is when using Java on a linux machine outside of IIS.
Is it just to stop and start the application?

Not really - IIS and JVM work in different ways. When you say recycling in IIS, it's basically restarting the Worker process. Each Web Application deployed to IIS is under an application pool and worker process.
In case of java, it's not like that. The whole App server runs on a jvm and you have different Web applications deployed into the App server which runs within the app server.
You could use DB connection pools or Apache commons pool for pooling (Some of your expensive objects you reuse) which can be refreshed but not exactly in a way like IIS.
Even though this would be a nice feature - in reality if you ever
reach a situation of needing to refresh application pool, your
code/dll(may be 3rd party) is the culprit. There would definitely be a
memory leak which needs to be addressed! Also when you recycle the session state might be lost. Apparently users logged in would get logged out (and if they are in the middle of a transaction they might loose data).So it could lead to a very volatile situation!
Update
You could use stuff like Terracotta which handles memory management.

Usually JEE servlet containers offer the option of reloading the application. i.e. Tomcat has a Reload button for each app on the manager console.
It also has a button that triggers a full garbage collection, by the way.

Why do some webservers complain about memory leaks they create?

The title might be a bit strong, but let me explain how I understand what happens. I guess this happened with Tomcat (and the message cited comes from Tomcat), but I'm not sure anymore.
TL;DR At the bottom there's a summary why I'm claiming that it is the web servers' fault.
I might be wrong (but without the possibility of being wrong there would be no reason to ask):
An application uses a library
the library uses a ThreadLocal
the ThreadLocal refers to an object from the library
each object refers to its ClassLoader
The webserver
pools its worker threads for efficiency
lends an arbitrary thread to an application
does nothing special (w.r.t. the thread pool) when an application stops or redeploys
If I understand it correctly, after a redeploy the old "dirty" threads continue to be reused. Their ThreadLocals refer to the old classes which refer to their ClassLoader which refer to the whole old class hierarchy. So a lot of stuff stays in the PermGen space which over time leads to an OutOfMemoryError. Is this right so far?
I'm assuming two things:
the redeploy frequency is a few time per hour
the thread creation overhead is a fraction of a millisecond
So a complete thread pool renewal upon each redeploy costs a fraction of a millisecond a few times per hour, i.e., there's a time overhead of 0.0001 * 12/3600 * 100% i.e. 0.000033%.
But instead of accepting this tiny overhead, there are countless problems. Is my calculation wrong or what am I overlooking?
As a warning we get the message
The web application ... created a ThreadLocal with key of type ... and a value of type ... but failed to remove it when the web application was stopped.
which should be better stated as
The web server ... uses a thread pool but failed to renew it after stopping (or redeploying) an application.
Or am I wrong? The time overhead is negligible even when all threads get recreated from time to time. But clearing their ThreadLocals before they are provided to the applications would suffice and be even faster.
Summary
There are some real problems (recently this one) and the user can do nothing about it. The library writers sometimes can and sometimes can not. IMHO the web servers could solve it pretty easily. The thing happens and has a cause. So I'm blaming the only one party which could do anything about it.
Proposal for what the web server should exactly do
The title of this question is more provocative than correct, but it has its point. And so does the answer by raphw. This linked question has another open bounty.
I think the web servers could solve it as follows:
ensure that each thread gets reused (or killed) sometime
store a LastCleanupTimestamp in a ThreadLocal (for new threads it's the creation time)
when re-using a thread, check if the cleanup timestamp is below some threshold (e.g., now minus some delta, e.g., 1 hour)
if so, clean all ThreadLocals and set a new LastCleanupTimestamp
This would assure that no such leak exists longer than delta plus the duration of the longest request plus the thread turnaround time. The cost would compose as follows:
checking a single ThreadLocal (i.e., some nanoseconds) per request
cleaning all ThreadLocals reflectively (i.e., some more nanoseconds once each delta per thread)
the cost from removing the data possibly useful for the application which stored them. This can't break an application as no application can assume to see a thread containing the thread locals it has set (since it can't even assume to see the thread itself anymore), but it may cost time needed to recreate the data (e.g., a cached DateFormat instance if someone still uses such a terrible thing).
It could be switched off by simply setting the thresold, if no app has been undeployed or redeployed recently.

TL;DR It's not web servers that create memory leaks. It's you.
Let me first state the problem more explicitly: ThreadLocal variables often refer to an instance of a Class that was loaded by a ClassLoader that was meant to be exclusively used by a container's application. When this application gets undeployed, the ThreadLocal reference gets orphaned. Since each instance keeps a reference to its Class and since each Class keeps a reference to its ClassLoader and since each ClassLoader keeps a reference to all classes it ever loaded, the entire class tree of the undeployed application cannot get garbage collected and the JVM instance suffers a memory leak.
Looking at this problem, you can optimize for either:
Allow as many requests per second as possible even throughout a redeploy (thus keep response time short and reuse threads from a thread pool)
Make sure that threads stay clean by discarding threads once they were used when a redeploy occurred (thus patch forgotten manual cleaning)
Most developers of web applications would argue that the first is more important since the second can be achieved by writing good code. And what would happen when a redeploy would happen concurrently to long lasting requests? You cannot shut down the old thread pool since this would interrupt running requests. (There is no globally defined maximum for how long a request cycle can take.) In the end, you would need a quite complex protocol for that and that would bring its own problems.
The ThreadLocal induced leak can however be avoided by always writing:
myThreadLocal.set( ... );
try {
// Do something here.
} finally {
myThreadLocal.remove();
}
That way, your thread will always turn out clean. (On a side note, this is almost like creating global variables: It is almost always a terrible idea. There are some web frameworks like for example Wicket that make a lot of use of this. Web frameworks like this are terrible to use when you need to do things concurrently and get very unintuitive for others to use. There is a trend away from the typical Java one thread per request model such as demonstrated with Play and Netty. Do not get stuck with this anti-pattern. Do use ThreadLocal sparingly! It is almost always a sign of bad design.)
You should further be aware that memory leaks that are induced by ThreadLocal are not always detected. Memory leaks are detected by scanning the web server's worker thread pool for ThreadLocal variables. If a ThreadLocal variable was found the variable's Class reveals its ClassLoader. If this ClassLoader or one of its parents is that of the web application that just got undeployed, the web server can safely assume a memory leak.
However, imagine that you stored some large array of Strings in a ThreadLocal variable. How can the web server assume that this array belongs to your application? The String.class was of course loaded with the JVM's bootstrap ClassLoader instance and cannot be associated with a particular web application. By removing the array, the web server might break some other application that is running in the same container. By not removing it, the web server might leak a large amount of memory. (This time, it is not a ClassLoader and its Classes that are leaked. Depending on the size of the array, this leak might however even be worse.)
And it gets worse. This time, imagine that you stored an ArrayList in your ThreadLocal variable. The ArrayList is part of the Java standard library and therefore loaded with the system ClassLoader. Again, there is no way of telling that the instance belongs to a particular web application. However, this time your ClassLoader and all its Classes will leak as well as all instances of such classes that are stored in the thread local ArrayList. This time, the web server even cannot certainly determine that a memory leak occurred when it finds that the ClassLoader was not garbage collected since garbage collection can only be recommended to a JVM (via System#gc()) but not enforced.
Renewing the thread pool is not as cheap as you might assume.
A web application cannot just go and throw away all threads in a thread pool whenever an application is undeployed. What if you stored some values in those threads? When a web application recycles a thread, it should (I am not sure if all web servers do this) find all non-leaking thread local variables and reregister them in the replaced Thread. The numbers you stated about efficiency would therefore not longer hold.
At the same time, the web server need to implement some logic that manages the replacement of all thread pool's Threads what does neither work in favor of your proposed time calculation. (You might have to deal with long lasting requests - think of running an FTP server in a servlet container -- such that this thread pool transition logic might be active for quite a long time.)
Furthermore, ThreadLocal is not the only possibility of creating a memory leak in a servlet container.
Setting a shut down hook is another example. (And it is unfortunately a common one. Here, you should manually remove the shut down hook when your application is undeployed. This problem would not be solved by discarding threads.) Shut down hooks are furthermore instances of custom subclasses of Thread that were always loaded by an application's class loader.
In general, any application that keeps a reference to an object that was loaded by a child class loader might create a memory leak. (This is generally possible via Thread#getContextClassLoader().) In the end, it is the developer's resposibility to not cause memory leaks, even in Java applications where many developer's misinterpret the automatic garbage collection as there are no memory leaks. (Think of Jochua Bloch's famous stack implementation example.)
After this general statement, I want to comment on Tomcat's memory leak protection:
Tomcat does not promise you to detect all memory leaks but covers specific types of such leaks as they are listed in their wiki. What Tomcat actually does:
Each Thread in the JVM is examined, and the internal structures of the
Thread and ThreadLocal classes are introspected to see if either the
ThreadLocal instance or the value bound to it were loaded by the
WebAppClassLoader of the application being stopped.
Some versions of Tomcat even try to compensate for the leak:
Tomcat 6.0.24 to 6.0.26 modify internal structures of the JDK
(ThreadLocalMap) to remove the reference to the ThreadLocal instance,
but this is unsafe (see #48895) so that it became optional and
disabled by default from 6.0.27. Starting with Tomcat 7.0.6, the
threads of the pool are renewed so that the leak is safely fixed.
However, you have to properly configure Tomcat to do so. The wiki entry on its memory leak protection even warns you how you can break other applications when TimerThreads are involved or how you might leak memory leaks when starting your own Threads or ThreadPoolExecutors or when using common dependencies for several web applications.
All the clean up work offered by Tomcat is a last resort! Its nothing you want to have in your production code.
Summarized: It is not Tomcat that creates a memory leak, it is your code. Some versions of Tomcat try to compensate for such leaks which are detectable if it is configured to do so. However, it is your responsibility to take care of memory leaks and you should see Tomcat's warnings as an invitation to fix your code rather than to reconfigure Tomcat to clean up your mess. If Tomcat detects memory leaks in your application, there might even be more. So take a heap and thread dump out of your application and find out where your code is leaking.

ThreadLocal memory leak in Glassfish

Will the ThreadLocal cause memory leak in Glassfish server like it leaks in Tomcat? Why?
http://wiki.apache.org/tomcat/MemoryLeakProtection

Yes, it will leak and Glassfish won't even warn you according to this relatively recent Glassfish JIRA issue:
http://java.net/jira/browse/GLASSFISH-14128
What needs to be said however is that the ThreadLocal specific leaking is not a 'bug' in app/web servers per se, but a problem with code in components running in those containers (whether these components are servlets, session beans or whatever).
What app servers/web containers try to do in general is to shield developer from writing a lot of maintenance code and to make him focus on business logic. There needs to be however some understanding on his part of how the application server works (thread pools, classloaders, deploy/undeploy mechanism, ...) so that stuff like this ThreadLocal issue is done properly or avoided. It is not always easy and it can be very tricky. I remember reading about a memory leak issue in Glassfish? related to use of custom log levels.
What Apache Tomcat does is that it has a helper mechanism to warn user/deal with some commonly occurring memory leak issues in user code. But even in the link provided in the question, you may read that not all possible ThreadLocal memory leaks are done automatically using this mechanism.
Glassfish does not seem to have this added functionality yet.

This problem causes all sorts of issues. I posted about it a while ago
I need help finding my memory leak using MAT
We're manually freeing the objects ourselves. I think I saw in the GF bug lists that this had been fixed in the 3.1x release.

Running webapps in separate processes

I'd like to run a web container where each webapp runs in its own process (JVM). Incoming requests get forwarded by a proxy webapp running on port 80 to individual webapps, each (webapp) running on its own port in its own JVM.
This will solve three problems:
Webapps using JNI (where the JNI code changes between restarts) cannot be restarted. There is no way to guarantee that the old webapp has been garbage-collected before loading the new webapp, so when the code invokes System.loadLibrary() the JVM throws: java.lang.UnsatisfiedLinkError: Native Library x already loaded in another classloader.
Libraries leak memory every time a webapp is reloaded, eventually forcing a full server restart. Tomcat has made headway in addressing this problem but it will never be completely fixed.
Faster restarts. The mechanism I'm proposing would allow near-instant webapp restarts. We no longer have to wait for the old webapp to finish unloading, which is the slowest part.
I've posted a RFE here and here. I'd like to know what you think.
Does any existing web container do this today?

I'm closing this question because I seem to have run into a dead end: http://tomcat.10.n6.nabble.com/One-process-per-webapp-td2084881.html
As a workaround, I'm manually launching a separate Jetty instance per webapp.

Can't you just deploy one app per container and then use DNS entries and reverse proxies to do the exact same thing? I believe Weblogic has something like this in the form of managed domains.

No, AFAIK, none of them do, probably because Java web containers emphasize following the servlet API - which spins off a thread per http request. What you want would be a fork at the JVM level - and that simply isn't a standard Java idiom.

If I understand correctly you are asking for the standard features for enterprise quality servers such IBM's WebSphere Network Deployment (disclaimer I work for IBM) where you can distribute applications across many JVMs, and those JVMs can in fact be distributed across many physical machines.
I'm not sure that your fundamental premise is correct though. It's not necessary to restart a whole JVM in order to deploy a new version of an application. Many app servers will use a class-loader strategy that allows them to discard a version of an app and load a new one.

tomcat isolate webapps

multiple webapp running on same tomcat using same jvm. sometime, one webapp that have memory leak will cause entire jvm to crash and affect other webapps. any recommendation how to isolated that without need to use multiple jvm and tomcat

Within the same JVM everything shares the the same memory. There is no system to allocate separate pools or quota.
If one of your applications behaves really badly in this regard, the only thing you can do is run it isolated in a separate JVM (separate Tomcat).

Are the applications running as separate processes? Or the same one?
First off you should look at profiling to find the memory leak https://stackoverflow.com/questions/1716597/java-memory-leak-detection-tools.
However, as a quick solution from inside you could use Runtime.getRuntime().totalMemory() to see how much memory is in use, and if it grows above a certain limit, and you know which app is causing the problem, you could restart that app.
You could also try running System.gc() which is a terrible way to do it, and really shouldn't be used as it can be ignored by the JVM.

To the best of my knowledge, the short answer is: No, it can't be done. Tomcat uses a single memory space for all running apps.
My knee-jerk response is that you should fix the memory leak rather than trying to isolate the misbehaving app. Cure is better than quarantine. As I don't know the details of your problem, maybe this isn't practical for some reason.

You can't isolate apps in the same JVM (though you can do things like instrument a particular apps ClassLoader for diagnostics)
If your concern is administration/configuration though (and not total memory consumption) you can run multiple instances of Tomcat off the same install by using catalina.home and catalina.base

JSR 121 was designed to solve this, but it hasn't been implemented yet.

There is no standard system in Java to truly isolate memory used by web applications.
However, you could write some byte-code weaving logic to track how much memory a particular app has allocated. If it goes over a particular threshold, you could throw an exception and stop the app from allocating anymore memory. What do you want to do if you could track all the memory consumed by a web app? What are you trying to implement?
Note that this would only really work effectively for figuring out how much memory a webapp has allocated, not how much it is currently consuming in the system. In order to get that metric, you'd have to byte-code weave finalize() for all objects. Since finalize() gets run in a best-effort fashion by the JVM, this may not get you the most accurate value should the system be under load. The JVM would deprioritize these finalize threads and your value will never get updated even though objects have been cleaned up.

To bring this up to date, it is now possible to run multiple applications on a single JVM. Applications run in isolated java virtual containers which protect your applications from 'noisy neighbours' as well as allowing you to share resources across your applications. This gives you isolation, elasticity and increased application density for Apache Tomcat. Download it from www.elasticat.com NB I do work for Waratek who developed this new JVM

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.