I am developing application which creates new classloader, executes some code with classes loaded using this classloader and then throws it away, starting another classloader and doing the same. I would like to reclaim all memory and kill all threads left over one use of the classloader.
My problem is that if the code loaded by this classloader is poorly written (or rather not expecting this use), it can leave some threads behind. Current problem in one such library is that there is a threadpool that cannot be shut down, stored as final static variable (shutdown() checks whether this is the 'default' threadpool stored in this variable and fails if it is so). I know how to modify final variables, but there's the danger that the variable lookup has been already optimized out.
I can't even shoot those threads using Thread.stop() (I am fine with inconsistent state of anything loaded in the dying classloader) because the threadpool would recreate this thread. I don't want to rewrite the shutdown code using reflection, and I am also reluctant to nulling all fields and hoping that it will crash. Generally, I'd like to keep the 'hacks' at minimum.
I know this is problem for application servers undeploying the applications - is there any good solution?
Related
I have a small java application running a set of computational heavy tasks. For processing the tasks, I use an external library which does most of the computation via native methods and some C code. Unfortunately, after solving one task, the library suffers from heavy memory leaks and can therefore only solve one task per application execution.
The memory problem is known to the coders from the library, but not fixed yet and maybe never will (it has something to do with the java garbage collector not properly working with the native inferface). Since there is no alternative for this particular library, I am looking for options to solve the tasks by sequentially application executions.
Currently, I have a bash wrapper script, which gets a list of tasks that should be executed and for each task the script calls the application with just this single task to execute.
Since tasks often need the results from previous tasks, this involves serializing and deserializing execution results to files. This does not seem to be good practice to me, also because the user has basically no way to interact with the program control flow.
Does anybody have an idea how I can to this sequential task execution inside one single java application? I guess this would involve starting a new JVM for each task exection, hopefully only transferring the task result and not the memory leaks from the new JVM to my application.
Edit providing further information:
Changing the root of the problem: Unfortunately, the library is not open source and I have neither access to the native methods nor to the java interface api.
New processes / JVMs: Is that the same in this context? I have not much experience with the java process api or starting new JVMs. My assumption is that this would involve starting a separate java program with its own main function using ProcessBuilder.start()?
Exchange of data: It is only a couple of kilobytes so performance is not an issue. Still, a solution without files would be preferable, but if I understand correctly memory mapped files also use local files. Sockets on the other hand do sound promising.
Funnily enough, I've faced the same issue. By definition, you need to accept nothing will be best practice or nice faced with having to use a faulty library you must use but cannot upgrade.
The solution we came up with was to isolate calls to the library in it's own process. This process was a child of a master process. The master process contains the good code and the child the bad. We were then able to keep track of the number of invocations of the child process and tear it down once it reached a certain number. We knew that we could get away with X invocations before the child process was corrupt.
Because of the nature of our problem, bringing up a fresh process enabled us to have another X invocations before repeating.
Any state was returned to the master process on a successful invocation. Any state gathered during an unsuccessful invocation was discarded and we started again.
Again, none of the above is "nice" but it worked for us.
For what it's worth, if I did this again, I'd use Akka and remote actors which would make all the sub-process, remoting etc far simpler.
That depends. Do you have the source code of this external application, i.e. can you recompile it? The easiest approach is obviously to fix the leak at its root. This might however be impractical. If the library, as you say, is implemented via native methods and some C code, I do not think that the problem has something to do with the Java garbage collector not properly working. Native methods and C code do not normally store their data on the JVM's heap and are therefore not garbage collected, i.e. it is the job of the library to clean up after itself.
If the leak is indeed in the bit of Java code that the library exposes, than there is a way. Memory leaks in Java occure by forgetting about references, e.g. consider the following example:
class Foo {
private ExpensiveObject eo;
Foo(ExpensiveObject eo) {
this.eo = eo;
}
}
The ExpensiveObject is alive (at least) as long as its referencing Foo instance. If you (or your library) do(es) not isolate instance life-cycles well enough, you get into trouble. If you do not have a chance to refactor, you can however use reflection to clean up the biggest mess from another place in your code:
void release(Foo foo) {
Field f = Foo.class.getDeclaredField("eo");
f.setAccessible(true);
f.set(foo, null);
}
This should however be considered a last-resort as it is quite a hack.
Alternatively, a better approach is normally to fork another instance of a JVM to do the dirty work. It seems like you are doing something similar already. By forking a JVM, you isolate the use of memory on a process level. Once the process dies, all memory is released by the OS. The problem with this approach is normally platform compatibility but as you already use a native library, this does not worsen your situation.
You say that you currently use files to communicate between these different processes. Why do you need to store data in a file? Rather consider using sockets or memory-mapped files (NIO), if performance is important for this matter.
The title might be a bit strong, but let me explain how I understand what happens. I guess this happened with Tomcat (and the message cited comes from Tomcat), but I'm not sure anymore.
TL;DR At the bottom there's a summary why I'm claiming that it is the web servers' fault.
I might be wrong (but without the possibility of being wrong there would be no reason to ask):
An application uses a library
the library uses a ThreadLocal
the ThreadLocal refers to an object from the library
each object refers to its ClassLoader
The webserver
pools its worker threads for efficiency
lends an arbitrary thread to an application
does nothing special (w.r.t. the thread pool) when an application stops or redeploys
If I understand it correctly, after a redeploy the old "dirty" threads continue to be reused. Their ThreadLocals refer to the old classes which refer to their ClassLoader which refer to the whole old class hierarchy. So a lot of stuff stays in the PermGen space which over time leads to an OutOfMemoryError. Is this right so far?
I'm assuming two things:
the redeploy frequency is a few time per hour
the thread creation overhead is a fraction of a millisecond
So a complete thread pool renewal upon each redeploy costs a fraction of a millisecond a few times per hour, i.e., there's a time overhead of 0.0001 * 12/3600 * 100% i.e. 0.000033%.
But instead of accepting this tiny overhead, there are countless problems. Is my calculation wrong or what am I overlooking?
As a warning we get the message
The web application ... created a ThreadLocal with key of type ... and a value of type ... but failed to remove it when the web application was stopped.
which should be better stated as
The web server ... uses a thread pool but failed to renew it after stopping (or redeploying) an application.
Or am I wrong? The time overhead is negligible even when all threads get recreated from time to time. But clearing their ThreadLocals before they are provided to the applications would suffice and be even faster.
Summary
There are some real problems (recently this one) and the user can do nothing about it. The library writers sometimes can and sometimes can not. IMHO the web servers could solve it pretty easily. The thing happens and has a cause. So I'm blaming the only one party which could do anything about it.
Proposal for what the web server should exactly do
The title of this question is more provocative than correct, but it has its point. And so does the answer by raphw. This linked question has another open bounty.
I think the web servers could solve it as follows:
ensure that each thread gets reused (or killed) sometime
store a LastCleanupTimestamp in a ThreadLocal (for new threads it's the creation time)
when re-using a thread, check if the cleanup timestamp is below some threshold (e.g., now minus some delta, e.g., 1 hour)
if so, clean all ThreadLocals and set a new LastCleanupTimestamp
This would assure that no such leak exists longer than delta plus the duration of the longest request plus the thread turnaround time. The cost would compose as follows:
checking a single ThreadLocal (i.e., some nanoseconds) per request
cleaning all ThreadLocals reflectively (i.e., some more nanoseconds once each delta per thread)
the cost from removing the data possibly useful for the application which stored them. This can't break an application as no application can assume to see a thread containing the thread locals it has set (since it can't even assume to see the thread itself anymore), but it may cost time needed to recreate the data (e.g., a cached DateFormat instance if someone still uses such a terrible thing).
It could be switched off by simply setting the thresold, if no app has been undeployed or redeployed recently.
TL;DR It's not web servers that create memory leaks. It's you.
Let me first state the problem more explicitly: ThreadLocal variables often refer to an instance of a Class that was loaded by a ClassLoader that was meant to be exclusively used by a container's application. When this application gets undeployed, the ThreadLocal reference gets orphaned. Since each instance keeps a reference to its Class and since each Class keeps a reference to its ClassLoader and since each ClassLoader keeps a reference to all classes it ever loaded, the entire class tree of the undeployed application cannot get garbage collected and the JVM instance suffers a memory leak.
Looking at this problem, you can optimize for either:
Allow as many requests per second as possible even throughout a redeploy (thus keep response time short and reuse threads from a thread pool)
Make sure that threads stay clean by discarding threads once they were used when a redeploy occurred (thus patch forgotten manual cleaning)
Most developers of web applications would argue that the first is more important since the second can be achieved by writing good code. And what would happen when a redeploy would happen concurrently to long lasting requests? You cannot shut down the old thread pool since this would interrupt running requests. (There is no globally defined maximum for how long a request cycle can take.) In the end, you would need a quite complex protocol for that and that would bring its own problems.
The ThreadLocal induced leak can however be avoided by always writing:
myThreadLocal.set( ... );
try {
// Do something here.
} finally {
myThreadLocal.remove();
}
That way, your thread will always turn out clean. (On a side note, this is almost like creating global variables: It is almost always a terrible idea. There are some web frameworks like for example Wicket that make a lot of use of this. Web frameworks like this are terrible to use when you need to do things concurrently and get very unintuitive for others to use. There is a trend away from the typical Java one thread per request model such as demonstrated with Play and Netty. Do not get stuck with this anti-pattern. Do use ThreadLocal sparingly! It is almost always a sign of bad design.)
You should further be aware that memory leaks that are induced by ThreadLocal are not always detected. Memory leaks are detected by scanning the web server's worker thread pool for ThreadLocal variables. If a ThreadLocal variable was found the variable's Class reveals its ClassLoader. If this ClassLoader or one of its parents is that of the web application that just got undeployed, the web server can safely assume a memory leak.
However, imagine that you stored some large array of Strings in a ThreadLocal variable. How can the web server assume that this array belongs to your application? The String.class was of course loaded with the JVM's bootstrap ClassLoader instance and cannot be associated with a particular web application. By removing the array, the web server might break some other application that is running in the same container. By not removing it, the web server might leak a large amount of memory. (This time, it is not a ClassLoader and its Classes that are leaked. Depending on the size of the array, this leak might however even be worse.)
And it gets worse. This time, imagine that you stored an ArrayList in your ThreadLocal variable. The ArrayList is part of the Java standard library and therefore loaded with the system ClassLoader. Again, there is no way of telling that the instance belongs to a particular web application. However, this time your ClassLoader and all its Classes will leak as well as all instances of such classes that are stored in the thread local ArrayList. This time, the web server even cannot certainly determine that a memory leak occurred when it finds that the ClassLoader was not garbage collected since garbage collection can only be recommended to a JVM (via System#gc()) but not enforced.
Renewing the thread pool is not as cheap as you might assume.
A web application cannot just go and throw away all threads in a thread pool whenever an application is undeployed. What if you stored some values in those threads? When a web application recycles a thread, it should (I am not sure if all web servers do this) find all non-leaking thread local variables and reregister them in the replaced Thread. The numbers you stated about efficiency would therefore not longer hold.
At the same time, the web server need to implement some logic that manages the replacement of all thread pool's Threads what does neither work in favor of your proposed time calculation. (You might have to deal with long lasting requests - think of running an FTP server in a servlet container -- such that this thread pool transition logic might be active for quite a long time.)
Furthermore, ThreadLocal is not the only possibility of creating a memory leak in a servlet container.
Setting a shut down hook is another example. (And it is unfortunately a common one. Here, you should manually remove the shut down hook when your application is undeployed. This problem would not be solved by discarding threads.) Shut down hooks are furthermore instances of custom subclasses of Thread that were always loaded by an application's class loader.
In general, any application that keeps a reference to an object that was loaded by a child class loader might create a memory leak. (This is generally possible via Thread#getContextClassLoader().) In the end, it is the developer's resposibility to not cause memory leaks, even in Java applications where many developer's misinterpret the automatic garbage collection as there are no memory leaks. (Think of Jochua Bloch's famous stack implementation example.)
After this general statement, I want to comment on Tomcat's memory leak protection:
Tomcat does not promise you to detect all memory leaks but covers specific types of such leaks as they are listed in their wiki. What Tomcat actually does:
Each Thread in the JVM is examined, and the internal structures of the
Thread and ThreadLocal classes are introspected to see if either the
ThreadLocal instance or the value bound to it were loaded by the
WebAppClassLoader of the application being stopped.
Some versions of Tomcat even try to compensate for the leak:
Tomcat 6.0.24 to 6.0.26 modify internal structures of the JDK
(ThreadLocalMap) to remove the reference to the ThreadLocal instance,
but this is unsafe (see #48895) so that it became optional and
disabled by default from 6.0.27. Starting with Tomcat 7.0.6, the
threads of the pool are renewed so that the leak is safely fixed.
However, you have to properly configure Tomcat to do so. The wiki entry on its memory leak protection even warns you how you can break other applications when TimerThreads are involved or how you might leak memory leaks when starting your own Threads or ThreadPoolExecutors or when using common dependencies for several web applications.
All the clean up work offered by Tomcat is a last resort! Its nothing you want to have in your production code.
Summarized: It is not Tomcat that creates a memory leak, it is your code. Some versions of Tomcat try to compensate for such leaks which are detectable if it is configured to do so. However, it is your responsibility to take care of memory leaks and you should see Tomcat's warnings as an invitation to fix your code rather than to reconfigure Tomcat to clean up your mess. If Tomcat detects memory leaks in your application, there might even be more. So take a heap and thread dump out of your application and find out where your code is leaking.
In Java util logging, I initiate the handler on init(), and close the handler at destroy() and it works perfectly fine: A log file was created, etc. If the user refreshs the page normally, it still just creates one log file.
However if the user refreshs the page with the applet a couple of times fast, it seems like the destroy() does not get called or maybe hasn't finished the task and since the init() gets called again, it assumes the previous file is still locked and creates a new log file.
I tried to use both destroy() and finalize() to close the handler but it does not work.
Anyone has any idea how to solve this issue?
Another minor question is: What actually happened if init() has not finished and the page gets refreshed. Is it going to continue the process and eventually failes to call destroy() or does it just stop right there?
Quote from Java Tutorials:
The Java Plug-in software creates a worker thread for every Java
applet .
In multithreaded environment you should be very careful with shared resources. Best and easiest approach is not to share anything (scales best and no deadlocks possible).
I assume, that you initialize your handler each time in "init"-method. If it's true, you should use one static shared logger (check this link). It will help to improve situation a bit, but if you start more than one browser with your applet - new log file still will be created. And this workaround is not recommended by Oracle and preserved for backward compatibility.
Recommended and easy to implement solution - "each applet should have it's own logger and write to own file". Code for log file name generation:
private SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMdd");
private String generateFileName() {
return String.format("applets-log/%s-%s.log", dateFormat.format(new Date()), UUID.randomUUID());
}
Also, Best Practices For Applet Development.
Answer to your minor question (changed):
According to discussion of this old bug in java plugin, applet could be terminated at any moment with some predefined interval for cleanup. So you should put resource cleanup code in your "stop" or "destroy" method, but you shouldn't rely what that code will be executed.
Applets lifecycle is controlled by browser and applets should not be
given capabilities to run when its hosting document is destroyed by
browser.
Since 6u10, both old plugin and new plugin enforce applet shutdown
after a fixed amount of time (1000ms in old plugin and 200ms in new
plugin) for applet to stop.
Hope you are not testing in FF. Read here:
https://bugzilla.mozilla.org/show_bug.cgi?id=638070
You've simply run into a fundamental limitation with multi-threaded environments.
You really cannot tell when destroy() or finalize() will be called relative to other threads. When the browser reloads the page, it may load the applet in a new thread. If the user hits reload twice quickly, it may create 2 new threads, call init() on the 2nd one (which the user actually sees) before calling init() on the one the user never sees and before calling destroy() on the previous one. At the other end of the life cycle, finalize() is called by the garbage collection thread possibly very long after the object is no longer needed. You are working in a multi-threaded environment and you cannot count on any order of operations between threads.
To quote the Javadoc:
An applet is a small program that is intended not to be run on its own, but rather to be embedded inside another application.
It is really the outer application that should control creating/opening and closing the log if you are going to have only one log file. If the outer application is a web browser, then you cannot solve the problem you are having. Then again, if you are running the applet in a web browser, you should not be writing logs to the file system. That is just generally impolite behavior.
If you absolutely had to have log files for applets inside a web browser, the easiest solution is for each call to init() to create a new file specific to that invocation of the applet. If you wanted to get ambitious you could use lock files to indicate which files were in use and at destroy() time concatenate the unlocked log files into one bigger one, but then again you have the problem of coordinating the concatenation processes across threads.
I have a simple test run of some medium-complexity code that won't terminate, i.e. the main method finishes but the process does not die.
Here is a rundown of the code (which is too long to be pasted here):
ProcessBuilder is used to create a bunch of subprocesses. They all die properly (if you can believe VisualVM).
We use log4j.
The main algorithm runs inside a FutureTask on which run and later get are called.
We don't explicitly use RMI, even though the list of threads seems to suggest so.
Obviously, I can call System.exit(0), but I'd like to know what is amiss here. I have not been able to produce a minimum failing example. Also, I can not identify an obvious culprit from the thread list; maybe you can?
Edit: See here for a thread dump.
Scorpion lead me to the right answer:
RMI Reaper is something like a garbage collector for remote objects, e.g. instances of (subclasses of) UnicastRemoteObject. It is a non-daemon thread and therefore blocks JVM termination if there are still exported objects which can not be cleaned up.
You can explicity force remote objects to be cleaned up in this sense by passing them to UnicastRemoteObject.unexportObject(., true). If you do this on all previously exported objects, RMI Reaper terminates and JVM is free to shut down.
You mention FutureTask. The first thing that comes to my mind is: are you using ExecutorService and forgetting to shut it down?
The second thing that comes to my mind is: are you reading to the end all the streams from the process? I worked with subprocesses long ago, and I don't remember exactly, but. I had problems similar to what you described, and by reading the streams to the end the problem would misteriously disappear!
This question already has answers here:
Why would you ever implement finalize()?
(21 answers)
Closed 5 years ago.
This is mostly out of curiosity.
I was wandering if anyone has encountered any good usage for Object.finalize() except for debugging/logging/profiling purposes ?
If you haven't encountered any what would you say a good usage would be ?
If your Java object uses JNI to instruct native code to allocate native memory, you need to use finalize to make sure it gets freed.
Late to the party here but thought I would still chime in:
One of the best uses I have found for finalizers is to call explicit termination methods which, for what ever reason, were not called. When this occurs, we also log the issue because it is a BUG!
Because:
There is no guarantee that finalizers will be executed promptly (or technically at all), per the language specification
Execution is largely dependent on the JVM implementation
Execution can sometimes be delayed if the GC has a lower thread priority
This leaves only a handful of tasks that they can address without much risk.
close external connections (db, socket etc)
close open files. may be even try to write some additional information.
logging
if this class runs external processes that should exist only while object exists you can try to kill them here.
But it is just a fallback that is used is "normal" mechanism did not work. Normal mechanism should be initiated explicitly.
Release resources that should be released manually in normal circumstances, but were not released for some reason. Perhaps with write a warning to the log.
I use it to write back data to a database when using soft references for caching database-backed objects.
I see one good use for finalize(): freeing resources that are available in large amounts and are not exclusive.
For example, by default there are 1024 file handles available for a Linux process and about 10000 for Windows. This is pretty much, so for most applications if you open a file, you don't have to call .close() (and use the ugly try...finally blocks), and you'll be OK - finally() will free it for you some time later. However for some pieces of code (like intensive server applications), releasing resources with .close() is a must, otherwise finally() may be called too late for you and you may run out of file handles.
The same technique is used by Swing - operating system resources for displaying windows and drawing aren't released by any .close() method, but just by finalize(), so you don't have to worry about all .close() or .dispose() methods like in SWT for example.
However, when there is very limited number of resources, or you must 'lock' resource to use it, also remember to 'unlock' it. For example if you create a file lock on a file, remember also to remove this lock, otherwise nobody else will be able to read or write this file and this can lead to deadlocks - then you can't rely on finalize() to remove this lock for you - you must do it manually at the right place.