Detecting memory-leak programmatically

Detecting memory-leak programmatically - java

If, on purpose, I create an application that crunches data while suffering from memory-leaks, I can notice that the memory as reported by, say:
Runtime.getRuntime().freeMemory()
starts oscillating between 1 and 2 MB of free memory.
The application then enters a loop that goes like this: GC, processing some data, GC, etc. but because the GC happens so often, the application basically isn't doing much else anymore. Even the GUI takes age to respond (and, no, I'm not talking about EDT issues here, it's really the VM basically stuck in some endless GC'ing mode).
And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
Note that I'm not talking about ouf-of-memory errors nor about detecting the memory leak itself.
I'm talking about detecting that an application is running so low on memory that it is basically calling the GC all the time, leaving hardly any time to do something else (in my hypothetical example: crunching data).
Would it work, for example, to repeatedly read how much memory is available during, say, one minute, and see that if the number has been "oscillating" between different values all below, say, 4 MB, conclude that there's been some leak and that the application has become unusable?

And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
I don't think so. You can find out roughly how much heap memory is free at any given instant, but AFAIK you cannot reliably determine when you are running out of memory. (Sure, you can do things like scraping the GC log files, or trying to pick patterns in the free memory oscillations. But these are likely to be unreliable and fragile in the face of JVM changes.)
However, there is another (and IMO better) approach.
In recent versions of Hotspot (version 1.6 and later, I believe), you can tune the JVM / GC so that it will give up and throw an OOME sooner. Specifically, the JVM can be configured to check that:
the ratio of free heap to total heap is greater than a given threshold after a full GC, and/or
the time spent running the GC is less than a certain percentage of the total.
The relevant JVM parameters are "UseGCOverheadLimit", "GCTimeLimit" and "GCHeapFreeLimit". Unfortunately, Hotspot's tuning parameters are not well documented on the public web, but these ones are all listed here.
Assuming that you want your application to do the sensible thing ... give up when it doesn't have enough memory to run properly anymore ... then just launch the JVM with a smaller "GCTimeLimitor" or "GCHeapFreeLimit" than the defaults.
EDIT
I've discovered that the MemoryPoolMXBean API allows you to look at the peak usage of individual memory pools (heaps), and set thresholds. However, I've never tried this, and the APIs have lots of hints that suggest that not all JVMs implement the full API. So, I would still recommend the HotSpot tuning option approach (see above) over this one.

You can use getHeapMemoryUsage.

I see two attack vectors.
Either monitor your memory consumption.
When you more or less constantly use lots of the available memory it is very likely that you have a memory leak (or are just using too much memory). The vm will constantly try to free some memory without much success => constant high memory usage.
You need to distinguish that from a large zigzag pattern which happens often without being an indicator of memory problem. Basically you use more an more memory, but when gc finds time to do its job it finds lots of garbage to bring out, so everything is fine.
The other attack vector is to monitor how often and what kind of success the gc runs. If it runs often with only small gains in memory, it is likely you have a problem.
I don't know if you can access this kind of information directly from your program. But if nothing else I think you can specify parameters on startup which makes the gc log information into a file which in turn could get parsed.

What you could do is spawn a thread that wakes up periodically and calculates the amount of used memory and records the result. Then you can do regression analysis on the result to estimate the rate of memory growth in your application. If you know the rate of growth, and the maximum amount of memory, you can predict (with some confidence) when your application will run out of memory.

You can pass arguments to your java virtual machine that gives you GC diagnostics such as
-verbose:gc This flag turns on the logging of GC information. Available
in all JVMs.
-XX:+PrintGCTimeStamps Prints the times at which the GCs happen
relative to the start of the
application.
If you capture that output in a file, in your application you can periodcly read that file and parse it to know when the GC has happened. So you can work out the average time between every GC

I think the JVM does exactly this for you and throws java.lang.OutOfMemoryError: GC overhead limit exceeded. So if you catch OutOfMemoryError and check for that message then you have what you want, don't you?
See this question for more details

i've been using plumbr for memory leak detection and it's been a great experience, though the licence is very expensive: http://plumbr.eu/

Related

Should you cap the java heap size if you don't need that much?

Let's assume I have plenty of memory on a production Unix box. And I have a Java backend application that does not use all that much heap. Initial tests show that it seems fine with 100MB.
However, when uncapped, the memory grows until 1GB and beyond.
I probably wouldn't care if it wasn't for the fact that every now and then the processing stream that the application is part of seems to choke. One possible (very vague) explanation is that the culprit is the mentioned Java application.
Question : Could it be that leaving the heap unnecessary high defers the garbage collection for so long that, when it finally kicks in, it has "so much to do" that it visibly impacts the performance?
I should probably mention that we are still running Java 1.4 (pretty old system).

If you don't need it cap it. Yes you are correct giving too much heap space to a Java program 'may' cause the garbage collector threads to run for a longer period of time. What is 'too much' depends on the requirements of your program. I have no hard data to back this up, I have seen this happen in production level Java based servers in the past. Java 1.7 (the latest version) may not present the same issues as Java 1.4.

You are correct that the GC time grows with the size of the heap. Bigger heap means more work for GC. But, even with heap of several GBs you should see Full GC cycles take somewhere around 2-3s. Do you see such "chokes" or are your chokes much longer?
In general, it is tolerable to have GC time <5% of total application run time.
Furthermore, it is hard to blame GC, it would be helpful if you could show us some GC logs.

UsageMemory threashold in JConsole

I am looking into how to use JConsole to detect memory leaks.
I see that in Memory Pool in my MBeans I can define UsageThreashold for my Tenured Generation.
So if my application exceeds this threashold the heap memory becomes red in the Memory tab.
Question: How does this help? I mean how am I supposed to use this setting to analyze my memory? How am I supposed to figure out this value?

In my opinion I don't think that UsageThreashold parameter is the most helpful for you to detect memory leaks (but if someone knows some tricks with it, please do share). In my experience that parameter is more helpful to visually understand if my application is getting way too near my max heap size and I'm in danger of getting an OutOfMemoryException.
Still regarding using JConsole to search for memory leaks, I don't think there's a silver bullet for the process. But what I usually do is the following:
If exists a memory leak, it means that the objects (the ones that are leaking) won't get collected, hence, your Tenured Generation won't fully recover after any amount of GCs.
With the application running I connect JConsole and try to spot a leak by observing the memory tab, if after several computations of my application and also after various GCs occurring (including pressing the Perform GC button, which will result in a full gc) the memory never goes below, or at least to the memory value, it started tracking there's a great possibility that something is leaking. When the leak is big, you can even see a "stair graph" pattern in your memory.
Keep in mind that if your application has long computations running, which may consume memory this analyzes must be done carefully. You must understand when those processes have finished. For example, just run one of those computations and track the total evolution of memory, before, during and afterwards.
Also, I suggest you to try visualVM instead, because it also allows you to create heap dumps, which you can use in order to understand which objects are still in memory and explore the references graph to understand why they are not being collected.

you can use JMAP to see the histogram and/or to create heap dumps and study your memory consumption with tools like Eclipse MAT or YourKit.
JConsole is used more for monitoring and running MBeans and less for analysis and in my expirence JVisualvm is better for that since you can use it for sampling your code and see what methods are CPU consuming.

Garbage Collector going crazy after a few hours

Our JBoss 3.2.6 application server is having some performance issues and after turning on the verbose GC logging and analyzing these logs with GCViewer we've noticed that after a while (7 to 35 hours after a server restart) the GC going crazy. It seems that initially the GC is working fine and doing a GC every hour or so but at a certain point it starts going crazy and performing full GC's every minute. As this only happens in our production environment have not been able to try turning off explicit GCs (-XX:-DisableExplicitGC) or modify the RMI GC interval yet but as this happens after a few hours it does not seem to be caused by the know RMI GC issues.
Any ideas?
Update:
I'm not able to post the GCViewer output just yet but it does not seem to be hitting the max heap limitations at all. Before the GC goes crazy it is GC-ing just fine but when the GC goes crazy the heap doesn't get above 2GB (24GB max).
Besides RMI are there any other ways explicit GC can be triggered? (I checked our code and no calls to System.gc() are being made)

Is your heap filling up? Sometimes the VM will get stuck in a 'GC loop' when it can free up just enough memory to prevent a real OutOfMemoryError but not enough to actually keep the application running steadily.
Normally this would trigger an "OutOfMemoryError: GC overhead limit exceeded", but there is a certain threshold that must be crossed before this happens (98% CPU time spent on GC off the top of my head).
Have you tried enlarging heap size? Have you inspected your code / used a profiler to detect memory leaks?

You almost certainly have a memory leak and the if you let the application server continue to run it will eventually crash with an OutOfMemoryException. You need to use a memory analysis tool - one example would be VisualVM - and determine what is the source of the problem. Usually memory leaks are caused by some static or global objects that never release object references that they store.
Good luck!
Update:
Rereading your question it sounds like things are fine and then suddenly you get in this situation where GC is working much harder to reclaim space. That sounds like there is some specific operation that occurs that consumes (and doesn't release) a large amount of heap.
Perhaps, as #Tim suggests, your heap requirements are just at the threshold of max heap size, but in my experience, you'd need to pretty lucky to hit that exactly. At any rate some analysis should determine whether it is a leak or you just need to increase the size of the heap.

Apart from the more likely event of a memory leak in your application, there could be 1-2 other reasons for this.
On a Solaris environment, I've once had such an issue when I allocated almost all of the available 4GB of physical memory to the JVM, leaving only around 200-300MB to the operating system. This lead to the VM process suddenly swapping to the disk whenever the OS had some increased load. The solution was not to exceed 3.2GB. A real corner-case, but maybe it's the same issue as yours?
The reason why this lead to increased GC activity is the fact that heavy swapping slows down the JVM's memory management, which lead to many short-lived objects escaping the survivor space, ending up in the tenured space, which again filled up much more quickly.

I recommend when this happens that you do a stack dump.
More often or not I have seen this happen with a thread population explosion.
Anyway look at the stack dump file and see whats running. You could easily setup some cron jobs or monitoring scripts to run jstack periodically.
You can also compare the size of the stack dump. If it grows really big you have something thats making lots of threads.
If it doesn't get bigger you can at least see which objects (call stacks) are running.
You can use VisualVM or some fancy JMX crap later if that doesn't work but first start with jstack as its easy to use.

Debugging a strange memory leak - Java/Tomcat

I'm experiencing a very odd problem with a Java application running under Tomcat.
We tried to update the production code from a fresh newly produced in a 1-week sprint, the application has been running over months without hiccups and then this new code makes our Linux servers start swapping after some time.
The very strange thing is that when looking at VisualVM for memory usage it never exceeds the maximum heap size, the JVM does not throw an OutOfMemory, the machine only starts swapping and the JVM keeps running even after that.
So, it seems that's leaking memory from somewhere, it seems like it's from the new code but it's odd that it's not inside the JVM, any ideas in how to debug that?
Thanks!

Swapping is not a conclusive indicator of leakage. It results from low physical memory. Use vmstat on Linux to get swap usage. Try using a different machine, experiment with configurations --swap size, physical memory size, address space.
If you are confident that the problem is in your program try this:
Estimate the median and peak memory that your program should use. You must be able to account for all deviations from these metrics. If you cannot, proceed to step 3.
Assuming you did step 1 correctly and were able to account for all deviations, you can rule out the leak (sorry about such vague suggestions but debugging is only as good as the detective). You should now focus on GC tuning. First, enable GC logging. See if your heap is actually full and where the GC is spending most of its time collecting. This may be a good starting point to start optimizations. Try to see if adjusting GC options helps. Try experimenting with collection algorithms, max/min heap sizes, gen ratios etc. Only experiment when you have ruled out a leak (step 1).
Assuming you did step 1 correctly and were not able to account for all deviations, you can assume that you have a leak somwhere. Use a memory profiler to see what objects contribute to the heap size growth most. Leave a profiler running for an extended period of time --have your program handle some requests it routinely expects to get and then leave it relatively isolated after that. If the memory level keeps on growing you may have a leak somewhere. If not, then it is probably not a memory leak. Can you pin point the part of your program that may be creating them? If yes, try sending several requests that only target that part of your program. Does it replicate the problem deterministically? If no, repeat step 3. If yes, use divide and conquer and reapply step 3 till you can find the class/method that are the culprits. It can be a certain combination of multiple portions as well (meaning that individually they may look innocent but together they may form a brilliant crime syndicate).
Hope this helps, if not then please leave a comment to my post.
All the very best on your exercise!

I would suggest you look into creating heap dumps without using jvisualvm. For Unix-based Oracle JVM's this is normally done by sending a signal 3 to the JVM using kill.
For full details see http://www.startux.de/index.php/java/45-java-heap-dumpyvComment45
You can then see if the patterns changes.
If you do not get an idea from this, then this might be because you are storing a sub-string from a very large original string (which carries the underlying string array around), or because you hold on to operating system resources like open database connections etc.
You have checked your connection pool looks good?

If you aren't using it, I'd recommend using visual VM version 1.3.2 and all the plug-ins. It's a big jump up from earlier versions.
What happens to the perm gen space?
What are the memory settings you're using? Min and max, of course, but what about perm space size?

Can Sun JVM handle gigantic heap sizes without problems, and how?

I have heard several people claiming that you can not scale the JVM heap size up. I've heard claims of the practical limit being 4 gigabytes (I heard an IBM consultant say that), 10 gigabytes, 32 gigabytes, and so on... I simply can not believe any of those numbers and have been wondering about the issue now for a while.
So, I have three part question I would hope someone with experience could answer:
Given the following case how would you tune the heap and GC settings?
Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
Should this really still work? I think it should.
The case:
64 bit platform
64 cores
64 gigabytes of memory
The application server is client facing (ie. Jboss/tomcat web application server) - complete pauses of JVM would probably be noticed by end users
Sun JVM, probably 1.5
To prove I am not asking you guys to do my homework this is what I came up with:
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:-EliminateZeroing -Xmn768m -Xmx55000m
CMS should reduce the amount of pauses, although it comes with overhead. The other settings for CMS seem to default automatically to the number of CPUs so they seem sane to me. The rest that I added are extras that might do good or bad generally for performance, and they should probably be tested.
Definitely.

I think it's going to be difficult for anybody to give you anything more than general advice, without having further knowledge of your application.
What I would suggest is that you use VisualGC (or the VisualGC plugin for VisualVM) to actually look at what the garbage collection is doing when your app is running. Once you have a greater understanding of how the GC is working alongside your application, it'll be far easier to tune it.

#1. Given the following case how would you tune the heap and GC settings?
First, having 64 gigabytes of memory doesn't imply that you have to use them all for one JVM. Actually, it rather means you can run many of them. Then, it is impossible to answer your question without any access to your machine and application to measure and analyse things (knowing what your application is doing isn't enough). And no, I'm not asking to get access to your environment :)
#2. Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
The goal of tuning is to find a good compromise between frequency and duration of (major) GCs. With a ~55g heap, GC won't be frequent but will take noticeable time, for sure (the bigger the heap, the longer the major GC). Using a Parallel or Concurrent garbage collector will help on multiprocessor systems but won't entirely solve this issue. Why do you need ~55g (this is mega ultra huge for a webapp IMO), that's my question. I'd rather run many clustered JVMs to handle load if required (at some point, the database will become the bottleneck anyway with a data oriented application).
#3. Should this really still work? I think it should.
Hmm... not sure I get the question. What is "this"? Instantiating a JVM with a big heap? Yes, it should. Is it equivalent to running several JVMs? No, certainly not.
PS: 4G is the maximum theoretical heap limit for the 32-bit JVM running on a 64-bit operating system (see Why can't I get a larger heap with the 32-bit JVM?)
PPS: On 64-bit VMs, you have 64 bits of addressability to work with resulting in a maximum Java heap size limited only by the amount of physical memory and swap space your system provides. (see How large a heap can I create using a 64-bit VM?)

Obviously heap size is not unlimited and the larger is the heap size, the more your JVM will eventually spend on GC. Though I think it is possible to set heap size quite high on 64-bit JVM, I still think it's not really practical. The advice here is better to have several JVMs running with the same parameters i.e. cluster of JBoss/Tomcat nodes running on the same physical machine and you will get better throughput.
EDIT: Also your GC behavior depends on the taxonomy of your heap. If you have a lot of short-living objects and each request to the server creates a lot of those, then your GC will collect a lot of garbage very often and thus on large heap size this will result in longer pauses. If you have very many long-living objects (e.g. caching most of your data in memory) and the amount of short-living objects is not that big, then having bigger heap size is OK.

As Chris Rice already wrote, I wouldn't expect any obvious problems with the GC for heap sizes up to 32-64GB, although there may of course be some point of your application logic, which can cause problems.
Not directly related to GC, but I would still recommend you to perform a realistic load test on your production system. I used to work on a project, where we had a similar setup (relatively large, clustered JBoss/Tomcat setup to serve a public web application) and without exaggeration, JBoss is not behaving very well under high load or with a high number of concurrent calls if you are using EJBs. JBoss is spending a lot of time in synchronized blocks when accessing and managing the EJB instance pools and if you opt for a cluster, it will even wait for intra-cluster network communication within these synchronized blocks. Be especially aware of poorly performing state replication, if you are using SFSBs.

Only to add some more switches I would use by default: -Xms55g can help to reduce the rampup time because it frees Java from the need to check if it can fall back to the initial size and allows also better internal initial sizing of memory areas.
Additionally we made good experiences with NewSize to give you a large young size to get rid of short term garbage: -XX:NewSize=1g Additionally most webapps create a lot of short time garbage that will never survive the request processing. You can even make that bigger. With Xms55g, the VM reserves a large chunk already. Maybe downsizing can help.
-Xincgc helps to clean the young generation incrementally and return the cpu often to the user threads.
-XX:CMSInitiatingOccupancyFraction=70 If you really fill all that memory, try to start CMS garbage collection earlier.
-XX:+CMSIncrementalMode puts the CMS into incremental mode to return the cpu to the user threads more often.
Attach to the process with jstat -gc -h 10 <pid> 1s and watch the GC working.
Will you really fill up the memory? I assume that 64cpus for request processing might even be able to work with less memory. What do you store in there?

Depending on your GC pause analysis, you may wish to implement Incremental mode whereby the long pause may be broken out over a period of time.

I have found memory architecture plays a part in large memory sizes. Applications in general don't perform as well if they use more than one memory bank. The JVM appears to suffer as well, esp the GC which has to sweep the whole memory.
If you have an application which doesn't fit into one memory bank, your application has to pull in memory which is not local to a processor and use memory local to another processor.
On linux you can run numactl --hardware to see the layout of processors and memory banks.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.