ClassLoader Leak - Are they worth solving? - java

ClassLoader leaks usually result in java.lang.OutOfMemoryError: PermGen. In the instance of working on application servers you may see this as a result of many redeploys of a common application. The explanation and possible resolutions to this problem can be seen on these two links. (among others)
http://blogs.oracle.com/fkieviet/entry/classloader_leaks_the_dreaded_java
http://dev.eclipse.org/blogs/memoryanalyzer/2008/05/17/the-unknown-generation-perm/
Now for the most part they are easy to get around. Simply increase the -XX:MaxPermSize and when the inevitable happens, restart the JVM completely. The problem with trying to solve this is that in large applications many classes can cause the classloader to leak and thus the classes to stay within the permgen.
Two questions arise from this:
Is it reasonable to say that an issue like this is better to just increase the max perm size and restart where necessary or should finding a resolution be a higher priority?
Are there easier ways to resolve a classloader leak?

It really depends on the application, or rather, the deployment process being used. Many applications are only ever redeplyoed during development, new releases happen once every few months, and the application server is restarted for other reasons far more often than the app is deployed. In those circumstances, chasing Classloader leaks is a waste of time.
Of course, if you plan on implementing a continuous deployment process, especially in a high-availability environment, then Classloader leaks are something you really need to tackle. But there are a lot of other things you need to do better than most projects before that becomes an issue.

#biziclop is right. You need to be pragmatic about this.
If the problem is only in test servers, you can probably dismiss this as not worth the effort to solve.
If the problem is in production servers then you need a solution or a workaround. The solution is hard work, but the workarounds may be less work:
Workaround #1 - don't do hot deploys to production servers; only do full redeployments and restarts.
Workaround #2 - periodically do a full restart of the production servers to avoid running out of permgen space. Combine this with increasing the permgen space.
In a well resourced / well run environment you should be doing all of your testing on separate servers. If the downtime of a full deployment is a concern, you should be minimizing redeployment disruptions using server replication and progressive redeployment. Hot deployments to production should be unnecessary.
If you are in the position where you have no test environment and are doing frequent hot deploys to a production machine to minimize downtime, you are skating thin ice. The chances are that you will eventually make a mistake that results in damage which takes a long time to recover from ...

Those are one of the worst leaks... but any leak is evil. So, I, personally, resolve them. Profiling helps as well.
There are no easy ways per se but:
Threads go into threadGroups +starter thread for each module to ensure new Threads() have that group.
Special care of the Thread.inheritedAccessControlContext (which holds a reference to the classloader)
WeakReferences when you need to keep classes, actually use WeakReferences for listeners, so no one can skip de-registers (and use only annon. clasess). Having the framework for WeakListeners does help.
Extra care for DB drives, java.security.Provider
few more tricks (incl. dynamic enhance of class files but that's overkill usually)
bottom line:
leaks are evil.

Yes, there are easier - and more proper - ways to resolve the leaks. Add the ClassLoader Leak Prevention library to your project, and it should take care of the problem for you!
In case you want to track down the leaks yourself, this blog series will be of help.

I'd approach the problem pragmatically:
Is it causing problems in production environments?
Have you got enough time and resources to track it down?
If the answer to both these questions is yes, then by all means go for it. If it's one yes, one no, it's probably up to the management to decide, if both are nos, don't bother.

Related

Multi-WAR tomcat vs Docker containers

I'm wondering if a Docker solution is faster and more memory efficient than my current Tomcat deployment. I will explain both solutions.
The current:
I have a Tomcat server with about 20 WAR's deployed. The WAR's are Spring Boot applications. It takes up a lot of memory and boottime and money too.
The docker alternative:
The alternative I'm thinking about is a docker host with 20 docker containers, one for each app. It seems Spring recommends using JAR's on JDK images.
Now, does Docker, or containerization in general, improve memory and speed?
One improvement I am expecting is that applications can start in parallel. This will hopefully speed up boot-time (assuming multi-core hardware). Am I right here?
Secondly I'm wondering which approach will handle memory most efficient.
What happens when I have multiple WAR's, sharing the exact same dependency? Will Tomcat reuse dependency memory for that? And will Docker?
Memory (and thus likely CPU) efficiency can be debated and probably needs to be measured. Let me give some insight.
Let's assume you create 20 containers, one for each of the war's you want to run. At that time you have 20 different JVMs in memory. Depending whether they come from the same container image or from different ones, the OS recognizes they are the same, and the codebase could be shared. So this depends on whether you bake your wars into the container images or have one image only and mount the wars at runtime.
What about permgen space, heap or other memory regions? I doubt the OS can share much between the processes here. And the JVMs cannot share on their level since the docker container isolation would not allow them to talk to each other. So shared memory on JVM level is lost.
With that, every JVM would start up and run the JIT for hotspot code locations, and no synergy between the applications can be used. With a bigger codebase in memory, also the CPU would have to jump more between processes, invalidating the cache more often.
All in all I believe dockerizing your setup is an improvement in application isolation. You can more easily install/uninstall your stuff, and one application running havoc cannot impact the others. But performance-wise, you should notice lower execution times and higher memory usage. To what extent might only be benchmarked.

ThreadLocal memory leak in Glassfish

Will the ThreadLocal cause memory leak in Glassfish server like it leaks in Tomcat? Why?
http://wiki.apache.org/tomcat/MemoryLeakProtection
Yes, it will leak and Glassfish won't even warn you according to this relatively recent Glassfish JIRA issue:
http://java.net/jira/browse/GLASSFISH-14128
What needs to be said however is that the ThreadLocal specific leaking is not a 'bug' in app/web servers per se, but a problem with code in components running in those containers (whether these components are servlets, session beans or whatever).
What app servers/web containers try to do in general is to shield developer from writing a lot of maintenance code and to make him focus on business logic. There needs to be however some understanding on his part of how the application server works (thread pools, classloaders, deploy/undeploy mechanism, ...) so that stuff like this ThreadLocal issue is done properly or avoided. It is not always easy and it can be very tricky. I remember reading about a memory leak issue in Glassfish? related to use of custom log levels.
What Apache Tomcat does is that it has a helper mechanism to warn user/deal with some commonly occurring memory leak issues in user code. But even in the link provided in the question, you may read that not all possible ThreadLocal memory leaks are done automatically using this mechanism.
Glassfish does not seem to have this added functionality yet.
This problem causes all sorts of issues. I posted about it a while ago
I need help finding my memory leak using MAT
We're manually freeing the objects ourselves. I think I saw in the GF bug lists that this had been fixed in the 3.1x release.

tomcat isolate webapps

multiple webapp running on same tomcat using same jvm. sometime, one webapp that have memory leak will cause entire jvm to crash and affect other webapps. any recommendation how to isolated that without need to use multiple jvm and tomcat
Within the same JVM everything shares the the same memory. There is no system to allocate separate pools or quota.
If one of your applications behaves really badly in this regard, the only thing you can do is run it isolated in a separate JVM (separate Tomcat).
Are the applications running as separate processes? Or the same one?
First off you should look at profiling to find the memory leak https://stackoverflow.com/questions/1716597/java-memory-leak-detection-tools.
However, as a quick solution from inside you could use Runtime.getRuntime().totalMemory() to see how much memory is in use, and if it grows above a certain limit, and you know which app is causing the problem, you could restart that app.
You could also try running System.gc() which is a terrible way to do it, and really shouldn't be used as it can be ignored by the JVM.
To the best of my knowledge, the short answer is: No, it can't be done. Tomcat uses a single memory space for all running apps.
My knee-jerk response is that you should fix the memory leak rather than trying to isolate the misbehaving app. Cure is better than quarantine. As I don't know the details of your problem, maybe this isn't practical for some reason.
You can't isolate apps in the same JVM (though you can do things like instrument a particular apps ClassLoader for diagnostics)
If your concern is administration/configuration though (and not total memory consumption) you can run multiple instances of Tomcat off the same install by using catalina.home and catalina.base
JSR 121 was designed to solve this, but it hasn't been implemented yet.
There is no standard system in Java to truly isolate memory used by web applications.
However, you could write some byte-code weaving logic to track how much memory a particular app has allocated. If it goes over a particular threshold, you could throw an exception and stop the app from allocating anymore memory. What do you want to do if you could track all the memory consumed by a web app? What are you trying to implement?
Note that this would only really work effectively for figuring out how much memory a webapp has allocated, not how much it is currently consuming in the system. In order to get that metric, you'd have to byte-code weave finalize() for all objects. Since finalize() gets run in a best-effort fashion by the JVM, this may not get you the most accurate value should the system be under load. The JVM would deprioritize these finalize threads and your value will never get updated even though objects have been cleaned up.
To bring this up to date, it is now possible to run multiple applications on a single JVM. Applications run in isolated java virtual containers which protect your applications from 'noisy neighbours' as well as allowing you to share resources across your applications. This gives you isolation, elasticity and increased application density for Apache Tomcat. Download it from www.elasticat.com NB I do work for Waratek who developed this new JVM

Trying to cause java.lang.OutOfMemoryException

I am trying to reproduce java.lang.OutOfMemoryException in Jboss4, which one of our client got, presumably by running the J2EE applications over days/weeks.
I am trying to find a way for the webapp to spitout java.lang.OutOfMemoryException in a matter of minutes (instead of days/weeks).
One thing come into mind is to write a selenium script and has the script bombards the webapps.
One other thing that we can do is to reduce JVM heap size, but we would prefer not to do this, as we want to see the limit of our system.
Any suggestions?
ps: I don't have access to the source code, as we just provide a hosting service (of course I could decompile the class files...)
If you don't have access to the source code of the J2EE app in question, the options that come to mind are:
Reduce the amount of RAM available to the JVM. You've already identified this one and said you don't want to do it.
Create a J2EE app (it could probably just be a JSP) and configure it to run within the same JVM as the target app, and have that app allocate a ridiculous amount of memory. That will reduce the amount of memory available to the target app, hopefully such that it fails in the way you're trying to force.
Try to use some profiling tools to investigate memory leakage. Also good to investigate memory damps that was taken after OOM happens and logs. IMHO: reducing memory is not the rightest way to investigate cose you can get issues not connected with real production one.
Do both, but in a controlled fashion :
Reduce the available memory to the absolute minimum (using -Xms1M -Xmx2M, as an example, but I fear your app won't even load with such limitations)
Do controlled "nuclear irradiation" : do Selenium scripts or each of the known working urls before to attack the presumed guilty one.
Finally, unleash the power that shall not be raised : start VisualVM and any other monitoring software you can think of (DB execution is a usual suspect).
If you are using Sun Java 6, you may want to consider attaching to the application with jvisualvm in the JDK. This will allow you to do in-place profiling without needing to alter anything in your scenario, and may possibly immediately reveal the culprit.
If you don't have the source use decompile it, at least if you think the terms of usage allows this and you live in a free country. You can use:
Java Decompiler or JAD.
In addition to all the others I must say that even if you can reproduce an OutOfMemory error, and find out where it occurred, you probably haven't found out anything worth knowing.
The trouble is that an OOM occurs when an allocation can not take place. The real problem however is not that allocation, but the fact that other allocations, in other parts of the code, have not been de-allocated (de-referenced and garbage collected). The failed allocation here might have nothing to do with the source of the trouble (no pun intended).
This problem is larger in your case as it might take weeks before trouble starts, suggesting either a sparsely used application, or an abnormal code path, or a relatively HUGE amount of memory in relation to what would be necessary if the code was OK.
It might be a good idea to ask around why this amount of memory is configured for JBoss and not something different. If it's recommended by the supplier than maybe they already know about the leak and require this to mitigate the effects of the bug.
For these kind of errors it really pays to have some idea in which code path the problem occurs so you can do targeted tests. And test with a profiler so you can see during run-time which objects (Lists, Maps and such) are growing without shrinking.
That would give you a chance to decompile the correct classes and see what's wrong with them. (Closing or cleaning in a try block and not a finally block perhaps).
In any case, good luck. I think I'd prefer to find a needle in a haystack. When you find the needle you at least know you have found it:)
The root of the problem is most likely a memory leak in the webapp that the client is running. In order to track it down, you need to run the app with a representative workload with memory profiling enabled. Take some snapshots, and then use the profiler to compare the snapshots to see where objects are leaking. While source-code would be ideal, you should be able to at least figure out where the leaking objects are being allocated. Then you need to track down the cause.
However, if your customer won't release binaries so that you can run an identical system to what he is running, you are kind of stuck, and you'll need to get the customer to do the profiling and leak detection himself.
BTW - there is not a lot of point causing the webapp to throw an OutOfMemoryError. It won't tell you why it is happening, and without understanding "why" you cannot do much about it.
EDIT
There is not point "measuring the limits", if the root cause of the memory leak is in the client's code. Assuming that you are providing a servlet hosting service, the best thing to do is to provide the client with instructions on how to debug memory leaks ... and step out of the way. And if they have a support contract that requires you to (in effect) debug their code, they ought to provide you with the source code to do your job.

Can the JVM provide snapshot persistence?

Is it possible to dump an image of a running JVM and later restore the previous state by loading the image into the JVM? I'm fairly certain the answer is negative, but would love to be wrong.
With all the dynamic languages available for the JVM comes an increase in interactivity, being able to save a coding session would help save time manually restoring the VM to a previous session.
There was a JSR 323 proposed for this a while back but it was rejected. You can find some links in those articles about the research behind this and what it would take. It was mostly rejected as an idea that was too immature.
I have heard of at least one startup (unfortunately don't recall the name) that was working on a virtualization technology over a hypervisor (probably Xen) that was getting pretty close to being able to move JVMs, including even things like file system refs and socket endpoints. Because they were at the hypervisor level, they had access to all of that stuff. By hooking that and the JVM, they had most of the pieces. I think they might have gone under though.
The closest thing you can get today is Terracotta, which allows you to cluster a portion of your JVM heap, storing it in a server array, which can be made persistent. On JVM startup, you connect to the cluster and can continue using whatever portions of your heap are specified as clustered. The actual objects are faulted in on an as-needed basis.
Not possible at present. In general, pausing and restarting a memory image of a process in a different context is incredibly hard to achieve: what are you going to do with open OS resources? Transfers to machines with different instruction sets? database connections?
Also images of the running JVM are probably quite large - maybe much larger than the subset of the state you are actually interested in. So it's not a good idea from a performance perspective.
A much better strategy is to have code that persists and recreates the application state: this is relatively feasible with most JVM dynamic languages. I do so similar stuff in Clojure, where you have an interactive environment (REPL) and it is quite possible to create and run a sequence of operations that rebuild the application state that you want in another JVM.
This is currently not possible in any of the JVMs I know. It would not be very difficult to implement something like this in the JVM if programs run disconnected from their environments. However, many programs have hooks into their environment (think file handles, database connections) which would make implementing something like this very hairy.
As of early 2023, there's some progress in this space and it seems a lot of things can at least be tried, even if without claims for their production readiness.
One such feature is called CRaC. You can check their docs or even get an OpenJDK build that includes the feature. The project has its own repo under OpenJDK and looks quite promising.
Another vendors/products to check:
Azul ReadyNow!
OpenJ9 InstantOn
What's also really exciting, is AWS Lambda SnapStart. It doesn't give you full snapshoting capabilities, and is intrinsically vendor-specific, but it's what a ton of Java engineering who use AWS Lambda were waiting for so long.

Categories

Resources