I have a complex Java application that runs for a long time. the application does the same operation, with little nuances, over and over again.
My problem is that Windows task manager shows a lot of leaking handles for the java process (not file handles).
after some time of running the application in stress, I get strange failures, like the application getting stuck, dissappears, or I get FileNotFoundException when trying to open a file ("Insufficient system resources exist to complete the requested service").
I used Windows task manager to see what is the problem, and found that for the java.exe process - the number of handles is being increased very fast. the number of threads in not increased, and the amount of used RAM is also not increased.
I then used SysInternal's Process Explorer in order to understand what are these handles. I see that they are not file handles, but thousands of handles of type 'Mutant', with the name \BaseNamedObjects\??????n.
According to some sites in the web, Mutant in Process explorer means Mutex.
My Java app does not intentionally create any Mutexes.
The next step was to use profiling tools in order to narrow down the source of the leaks.
I used "J Optimizer" & "Java VirtualVM". with both applications, I cannot detect the leaking handles. they have memory leak detectors, But I can't find a way to detect leaking handles.
My question is :
How can I debug this problem ? How can I discover what causes the leaking handles ?
Thank you
If you're curious about how the references which leak are allocated, see this: How to view memory allocation stacktrace in Java VisualVM
I'm not sure how you'd trace how references failed to deallocate.
Related
We have a major challenge which have been stumping us for months now.
A couple of months ago, we took over the maintenance of a legacy application, where the last developer to touch the code, left the company several years ago.
This application needs to be more or less always online. It's developed many years ago without staging and test environments, and without a redundant infrastructure setup.
We're dealing with a legacy Java EJB application running on Payara application server (Glassfish derivative) on an Ubuntu server.
Within the last year or two, it has been necessary to restart Payara approximately once a week, and the Ubuntu server once a month.
This is due to a memory leak which slows down the application over a period of around a week. The GUI becomes almost entirely non-responsive, but a restart of Payara fixes this, at least for a while.
However after each Payara restart, there is still some kind of residual memory use. The baseline memory usage increases, thereby reducing the time between Payara restarts. Around every month, we thus do a full Ubuntu reboot, which fixes the issue.
Naturally we want to find the memory leak, but we are unable to run a profiler on the server because it's resource intensive, and would need to run for several days in order to capture the memory leak.
We have also tried several times to dump the heap using "gcore" command, but it always result in a segfault and then we need to reboot the Ubuntu server.
What other options / approaches do we have to figure out which objects in the heap are not being garbage collected?
I would try to clone the server in some way to another system where you can perform tests without clients being affected. Could even be a system with less resources, if you want to trigger a resource based problem.
To be able to observe the memory leak without having to wait for days, I would create a load test, maybe with Apache JMeter, to simulate accesses of a week within a day or even hours or minutes (don't know if the base load is at a level where that is feasible from the server and network infrastructure).
First you could set up the load test to act as a "regular" mix of requests like seen in the wild. After you can trigger the loss of response, you can try to find out, if there are specific requests that are more likely to be the cause for the leak than others. (It also could be that some basic component that is reused in nearly any call contains the leak, and so you cannot find out "the" call with the leak.)
Then you can instrument this test server with a profiler.
To get another approach (you could do it in parallel) you also can use a static code inspection tool like SonarQube to analyze the source code for typical patterns of memory leaks.
And one other idea comes to my mind, but it is coming with many preconditions: if you have recorded typical scenarios for the backend calls, and if you have enough development resources, and if it is a stateless web application where each call could be inspoected more or less individually, then you could try to set up partial integration tests where you simulate the incoming web calls, with database and file access, but if possible without the application server, and record the increase of the heap usage after each of the calls. Statistically you might be able to find out the "bad" call this way. (So this would be something I would try as very last option.)
Apart from heap dump have to tried any realtime app perf monitoring (APM) like appdynamics or the opensource alternative like https://github.com/scouter-project/scouter.
Alternate approach would be to analyse existing application issue Eg: Payara issues like these https://github.com/payara/Payara/issues/4098 or maybe the ubuntu patch you are currently running app on.
You can use jmap, an exe bundled with the JDK, to check the memory. From the documentation:-
jmap prints shared object memory maps or heap memory details of a given process or core file or a remote debug server.
For more information you can see the documentation or see the stackoverflow question How to analyse the heap dump using jmap in java
There is also a tool called jhat which can be used tp analise java heap.
From the documentation:-
The jhat command parses a java heap dump file and launches a webserver. jhat enables you to browse heap dumps using your favorite webbrowser. jhat supports pre-designed queries (such as 'show all instances of a known class "Foo"') as well as OQL (Object Query Language) - a SQL-like query language to query heap dumps. Help on OQL is available from the OQL help page shown by jhat. With the default port, OQL help is available at http://localhost:7000/oqlhelp/
See JHat Dcoumentation, or How to analyze the heap dump using jhat
I am working on a Java Web Application based on Java 6/Tomcat 6.0. It's a web based document management system. The customers may upload any kind of file to that web application. After uploading a file a new Thread is spawned, in which the uploaded file is analyzed. The analysis is done using a third party library.
This third-party-libraries works fine in about 90% of the analyze-jobs, but sometimes (depending on the uploaded file) the logic starts to use all remaining memory, leading to an OutOfMemoryError.
As the whole application is running in a single JVM, the OoM-Error is not only affecting the analyze-jobs, but has also impact on other features. In the worst case scenario, the application crashes completely or remains in an inconsistent state.
I am now looking for a rather quick (but safe) way to handle those OoM-Errors. Replacing the library currently is no option (that's why I have neither mentioned the name of the library, nor what kind of analysis is done). Does anybody have an idea of what could be done to work around this error?
I've been thinking about launching a new process (java.lang.ProcessBuilder) to have a new JVM. If the third-party-lib causes an OoM-Error there, it would not have effects on the web application. On the other hand, this would cause additional effort to synchronize the new Process with the Analysis-Part of the web application. Does anybody have any experience with such a system (especially with regards to the stability of the system)?
Some more information:
1) The analysis part can be summarized as a kind of text extraction. The module receives a file reference as input and writes the analysis result into a text file. The resulting text-file is further processed within the web applications business logic. Currently the workflow is synchronous. The business logics waits for the third-party-lib to complete its job. There is no queuing or other asynchronous approach.
2) I am quite sure that the third-party-library causes the OoM-Error. I've tested the analysis part in isolation with different files of different sizes. The file that causes the OoM-Error is quite small (about 4MB). I have done further tests with that particular file. While having a JVM with 256MB of heap, the analysis crashes due to the OoM-Error. The same test in a JVM with 512MB heap passes. However, increasing the heap size will only help for a short period of time, as a larger test file again causes the test to fail due to OoM-Error.
3) A Limit for the size of files being uploaded is in place; but of course you cannot have a limit of 4MB per file. Same is for the OS and architecture. The system has to work on both 32- and 64-bit systems (Windows and Linux)
It depends on both the client and the server as well as the design of the web app. You need to answer a few questions:
what is supposed to happen as a result of the analysis and when is it supposed to happen?
Does the client wait for the result of the analysis?
What is returned to the client?
You also need to determine the nature of the OOM.
It is possible that you might want to handle the file upload and the file analysis separately. For instance, your webapp can upload the file to somewhere in the file system and you can defer the analysis part to a web service, which would be passed a reference to the file location. The webservice may or may not be called asynchronously, depending on how and when the client that uploaded the file needs notification in the case of a problem in the analysis.
All of these factors go into your determination.
Other considerations, what JVM are you using, what is the OS and how is it configured in terms of system memory? Is it the JVM 32 or 64 bit, what is the max file size allowed on upload, what kind of garbage collectors have you tried.
It is possible that you can solve this problem from an infrastructure perspective as opposed to changing the code. Limiting the max size of the file upload, moving from 32 to 64 bit, changing the garbage collector, upgrading libraries after determining whether or not there is a bug or memory leak in one of them, etc.
One other red flag that is glaring, you say "a thread is spawned". While this sort of thing is possible it is often frowned upon in the JEE world. Spawning threads yourself can cause problems in how the container manages resources. Make sure you are not causing the issue yourself, try a file load independently in a test environment on a file that is known to cause problems (if that can be ascertained). This will help you determine ff the problem is the third party library or a design one.
Why not have a (possibly clustered) application per 3rd-party lib that handles file analysation. Those applications are called remotely (possibly asynchronously) from your main application. They are passed a URL which points to the file they should analyze and return their analysation results.
When a file upload completed the analyzation job is put into the queue. When an analyzation application is up again after it crashed it will resume consuming messages from the queue.
Now, I am aware that there is no such thing as "exiting" an app in Android. By this I mean that the process corresponding to an app is kept in memory even after all the activities in that app are destroyed. (For sake of simplicity, let's keep services and such out of the picture). The process is only killed when the system decides to do so in order to reclaim memory.
However, once all my activities have been destroyed, I would assume that the process corresponding to my app is no longer "active". By this I mean that since my app is not doing any work, I assume the process no longer performs allocations. Is this assumption correct?
I used the simple default HelloWorld example that Eclipse ADT gives me via the New Android Project Wizard and saw that this is not the case. Even after I close the app, I can still track allocations in DDMS. Can anyone explain the reason for this?
Allocation tracker has hints for you: columns Thread Id and Allocated in.
Watch these, and you'll learn which object and method did the allocation.
My inactive app shows allocations in DdmServer, which indicates that memory is used for DDMS service to work.
If you get other kind of allocations, check if your app has some outstanding threads, or other tasks that may be still running in background. If this is the case, make sure to do cleanup in Activity.onDestroy.
There is code running within the process because DDMS is attached to it. That code is the "remote" part of the remote debugging facility. Since there is code running there, that code will allocate memory and you will see those allocations.
If the debugger wasn't attached to the process, the OS could destroy the process if it wanted or needed to. However, because the debugger is attached, the process won't go away while you are watching it.
This is an example of the Observer effect, where you get unexpected results just because you are watching ;-)
There is a Java Struts application running on Tomcat, that have some memory errors. Sometimes it becomes slowly and hoard all of the memory of Tomcat, until it crashes.
I know how to find and repair "normal code errors", using tests, debugging, etc, but I don't know how to deal with memory errors (How can I reproduce? How can I test? What are the places of code where is more common create a memory error? ).
In one question: Where can I start? Thanks
EDIT:
A snapshot sended by the IT Department (I haven't direct access to the production application)
Use one of the many "profilers". They hook into the JVM and can tell you things like how many new objects are being created per second, and what type they are etc.
Here's just one of many: http://www.ej-technologies.com/products/jprofiler/overview.html
I've used this one and it's OK.
http://kohlerm.blogspot.com/
It is quite good intro how to find memory leaks using eclipse memory analyzer.
If you prefer video tutorials, try youtube, although it is android specific it is very informative.
If your application becomes slowly you could create a heap dump and compare it to another heap dump create when the system is in a healthy condition. Look for differences in larger data structures.
You should run it under profiler (jprofile or yourkit, for example) for some time and see for memory/resource usage. Also try to make thread dumps.
There are couple of options profiler is one of them, another is to dump java heap to a file and analyze it with a special tool (i.e. IBM jvm provides a very cool tool called Memory Analizer that presents very detailed report of allocated memory in the time of jvm crash - http://www.ibm.com/developerworks/java/jdk/tools/memoryanalyzer/).
3rd option is to start your server with jmx server enabled and connect to it via JConsole with this approach you would be able to monitor memory ussage/allocation in the runtime. JConsole is provided with standard sun jdk under bin directory (here u may find how to connect to tomcat via jconsole - Connecting remote tomcat JMX instance using jConsole)
I have an application running on Websphere Application Server 6.0 and it crashes nearly every day because of Out-Of-Memory. From verbose GC is certain there are the memory leaks(many of them)
Unfortunately the application is provided by external vendor and getting things fixed is slow & painful process. As part of the process I need to gather the logs and heapdumps each time the OOM occurs.
Now I'm looking for some way how to automate it. Fundamental problem is how to detect OOM condition. One way would be to create shell script which will periodically search for new heapdumps. This approach seems me a kinda dirty. Another approach might be to leverage the JMX somehow. But I have little or no experience in this area and don't have much idea how to do it.
Or is in WAS some kind of trigger/hooks for this? Thank you very much for every advice!
You can pass the following arguments to the JVM on startup and a heap dump will be automatically generated on an OutOfMemoryError. The second argument lets you specify the path for the heap dump file. By using this at least you could check for the existence of a specific file to see if a heap dump has occurred.
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=<value>
I see two options if you want heap dumping automated but #Mark's solution with heap dump on OOM isn't satisfactory.
You can use the MemoryMXBean to detect high memory pressure, and then programmatically create a heap dump if the usage (or usage delta) seems high.
You can periodically get memory usage info and generate heap dumps with a cron'd shell script using jmap (works both locally and remote).
It would be nice if you could have a callback on OOM, but, uhm, that callback probably would just crash with an OOM error. :)
Have you looked at JConsole ? It uses JMX to give you visibility of a variety of JVM metrics, including memory info. It would probably be worth monitoring your application using this to begin with, to get a feel for how/when the memory is consumed. You may find the memory is consumed uniformly over the day, or when using certain features.
Take a look at the detecting low memory section of the above link.
If you need you can then write a JMX client to watch the application automatically and trigger whatever actions required. JConsole will indicate which JMX methods you need to poll.
And alternative to waiting until the application has crashed may be to script a controlled restart like every night if you're optimistic that it can survive for twelve hours..
Maybe even websphere can do that for you !?
You could add a listener (Session scoped or Application scope attribute listener) class that would be called each time a new object is added in session/app scope.
In this - you can attempt to check the total memory used by app (Log it) as as call run gc (note that invoking it will not imply gc will always run)
(The above is for the logging part and gc based on usage growth)
For scheduled gc:
In addition you can keep a timer task class that runs after every few hrs and does a request for gc.
Our experience with ITCAM has been less than stellar from the monitoring perspective. We dumped it in favor of CA Wily Introscope.
Have you had a look on the jvisualvm tool in the latest Java 6 JDK's?
It is great for inspecting running code.
I'd dispute that the you need the heap dumps when the OOM occurs. Periodic gathering of the information over time should give the picture of what's going on.
As has been observed various tools exist for analysing these problems. I have had success with ITCAM for WebSphere, as an IBMer I have ready access to that. We were very quickly able to indentify the exact lines of code in out problem situation.
If there's any way you can get a tool of that nature then that's the way to go.
It should be possible to write a simple program to get the process list from the kernel and scan it to see if your WAS process is still running. On a Unix box you could probably whip up something in Perl in a few minutes (if you know Perl), not sure how difficult it would be under Windows. Run it as a scheduled task every five minutes or so, and if the process doesn't show up you could have it fork off another process that would deal with the heap dump and re-start WAS.