Java GC:old generation becomes larger and larger and cannot be reclaimed - java

I am writing my servlet program and use jconsole and jmap to monitor its memory status.I find that when my program is running , Memory Pool "PS Old Gen" is becoming larger and larger and finally my servlet cannot respond to any request.
This is the shotcut of my JConsole output:
When I click the "Perform GC" Button, Nothing happened.
So , to see the details ,I use jmap to dump the details:
And this is my JConsole VM Summary output:
Any one can help me to get out what may be the problem?Your know , the GC "PS MarkSweep" and "PS SCavenge" is the default GC for my Server JVM.
Thank you.
I find a very weird phenomenon:During 15 hours from 18:00 of yesterday to 09::00 of today ,it seems that GC on "PS Old Gen" never occurred , which make the used memory of old generation is becoming larger and larger.i have just manually click the "Perform GC" button ,it seems that this GC is quite effective and reclaim a lot of memory. But why the Old generation GC didn't happen automatically for such a long time? We can see that before 18:00 of yesterday , the old generation GC is working properly.

Assuming that you did not add the option -histo:live when you were taking the jmap, which will result in taking the report with garbage + live objects, and referring to the memory drop happened when you manually click the "Perform GC" button, I doubt that the application don't have a memory leak but a bad object promotion rate from the Young Gen to Old Gen. Eventually the Old Gen will be filled up and will run full GC resulting the application to go unresponsive.
If my assumption is correct, I think your strategy should be to minimize promoting the objects to old gen. Rather than worrying about what to do to clear out the Old Gen which is more expensive. Referring the below comments you have mentioned, I think you application has a small Memory Footprint (< 0.5G ) relative to the max allocated memory 7G .
"All my data-intensive variables are defined in method. When method returned , these variables should be reclaimed, right?"
So there are few things you can do.
Tune the application to minimize the response times of your transactions so that the objects will be garbage collected before been promoted to the Old Gen
Increase the Young Gen size. Since you have around 7 GB to play with, why don't you allocate around 2 - 3 GB to Young Gen for a start ( i.e -XX:NewSize=2g ). A larger New Size will reduce the frequency of PCSacavenge ( Young Collections) and will reduce the rate of aging the live objects.
Then start adjusting the -XX:MaxTenuringThreshold=n . You can us the gc.log with -XX:+PrintTenuringDistribution. Size the survivor ratios -XX:SurvivorRatio=n. Note that by default -XX:+UseAdaptiveSizePolicy is on and this will alter the initial size of Survivor Ratios dynamically. Or else you can skip sizing the Survivor Ratios leaving the AdaptiveSizePolicy to do the job. But I'm not a big fan of AdaptiveSizePolicy.
Along with AdaptiveSizePolicy you can use -XX:MaxGCPauseMillis=n in order to give an indication to the Garbage Collector regarding the pauses you are expecting in your application when clearing the Old Gen. In this way the Collector will try to achieve the MaxGCPauseMillis by not waiting until there is too much work to do.
Or else you can switch to CMS collector which is built to handle response time issues like these.
Well I think if first two steps resolves your problem then you can leave the rest aside. You must not spoil a well running app by adding some additional stuff. Important thing is you have to tune the GC step by step.

Your memory leak happes within MongoDB code. The huge number of map entries you see are most probably the internals of BasicDBObject (ranked #6 in your dump), which extends HashMap. You may be able to resolve the issue by reconfiguring the MongoDB component.

Related

Survivor Space suddenly growing and increasing response times

I have a web application deployed on a wildfly-10.1.0 application server and doing some load tests with jmeter. At a certain user count for a short amount of time response times grow rapidly and plunge to a low level again. This behavior repeats itself several times. Interesting about this is that the size of the survivor space of the wildlfy jvm is consistent with the response time (see picture).
I already tried tuning the heap sizes of the wildfly jvm and of the young and old generation but the behavior stays the same. Has anyone an idea as to what leads the survivor space to grow in such a way and what I have to change in order to keep my response times low?
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
Major garbage collection are also Stop the World events. Often a major collection is much slower because it involves all live objects. So for Responsive applications,major garbage collections should be minimized.
keep your objects life time as small as possible, objects with long life time tends to be moved to old gen (survivor), old gen require major garbage collection to perform GC.

Java heap size growing too big with Infinispan cache

I am using an Infinispan cache to store values. The code writes to the cache every 10 minutes and the cache reaches a size of about 400mb.
It has a time to live of about 2 hours, and the maximum entries is 16 million although currently in my tests the number of entries doesn't go above 2 million or so (I can see this by checking the mbeans/metrics in jconsole).
When I start jboss the java heap size is 1.5Gb to 2Gb. The -Xmx setting for the maximum allocated memory to jboss is 4Gb.
When I disable the Infinispan cache the heap memory usage stays flat at around 1.5Gb to 2Gb. It is very constant and stays at that level.
=> The problem is: when I have the Infinispan cache enabled the java heap size grows to about 3.5Gb/4Gb which is way more than expected.
I have done a heap dump to check the size of the cache in Eclipse MAT and it is only 300 or 400mb (which is ok).
So I would expect the memory usage to go to 2.5Gb and stay steady at that level, since the initial heap size is 2Gb and the maximum cache size should only be around 500mb.
However it continues to grow and grow over time. Every 2 or 3 hours a garbage collection is done and that brings the usage down to about 1 or 1.5Gb but it then increases again within 30 minutes up to 3.5Gb.
The number of entries stays steady at about 2 million so it is not due to just more entries going in to the cache. (Also the number of evictions stays at 0).
What could be holding on to this amount of memory if the cache is only 400-500mb?
Is it a problem with my garbage collection settings? Or should I look at Infinispan settings?
Thanks!
Edit: you can see the heap size over time here.
What is strange is that even after what looks like a full GC, the memory shoots back up again to 3Gb. This corresponds to more entries going into the cache.
Edit: It turns out this has nothing to do with Infinispan. I narrowed down the problem to a single line of code that is using a lot of memory (about 1Gb more than without the call).
But I do think more and more memory is being taken by the Infinispan cache, naturally because more entries are being added over the 2 hour time to live.
I also need to have upwards of 50 users query on Infinispan. When the heap reaches a high value like this (even without the memory leak mentioned above), I know it's not an error scenario in java however I need as much memory available as possible.
Is there any way to "encourage" a heap dump past a certain point? I have tried using GC options to collect at a given proportion of heap for the old gen but in general the heap usage tends to creep up.
Probably what you're seeing is the JVM not collecting objects which have been evicted from the cache. Cache's in general have a curious relationship with the prevailing idea of generational GC.
The generational GC idea is that, broadly speaking, there are two types of objects in the JVM - short lived ones, which are used and thrown away quickly, and longer lived ones, which are usually used throughout the lifetime of the application. In this model you want to tune your GC so that you put most of your effort attempting to identify the short lived objects. This means that you avoid looking at the long-lived objects as much as possible.
Cache's disrupt this pattern by having some intermediate-length object lifespans (i.e. a few seconds / minutes / hours, depending on your cache). These objects often get promoted to the tenured generation, where they're not usually looked at until a full GC becomes necessary, even after they've been evicted from the cache.
If this is what's happening then you've a couple of choices:
ignore it, let the full GC semantics do its thing and just be aware that this is what's happening.
try to tune the GC so that it takes longer for objects to get promoted to the tenured generation. There are some GC flags which can help with that.

Does this memory usage pattern indicate that my Java application leaks memory?

I have a Java application that waits for the user to hit a key and then runs a task. Once done, it goes back and waits again. I was looking at memory profile for this application with jvisualvm, and it showed an increasing pattern.
Committed memory size is 16MB.
Used memory, on application startup, was 2.7 MB, and then it climbed with intermediate drops (garbage collection). Once this sawtooth pattern approached close to 16MB, a major drop occurred and the memory usage fell close to 4 MB. This major drop point has been increasing though. 4MB, 6MB, 8MB. The usage never goes beyond 16 MB but the whole sawtooth pattern is on a climb towards 16 MB.
Do I have a memory leak?
Since this is my first time posting to StackOverflow, do not have enough reputation to post an image.
Modern SunOracle JVMs use what is called a generational garbage collector:
When the collector runs it first tries a partial collection only releases memory that was allocated recently
recently created objects that are still active get 'promoted'
Once an object has been promoted a few times, it will no longer get cleaned up by partial collections even after it is ready for collection
These objects, called tenured, are only cleaned up when a full collection becomes necessary in order to make enough room for the program to continue running
So basically, bits of your program that stick around long enough to get missed by the fast 'partial' collections will hang around until JVM decides it has to do a full collection. If you let it go long enough you should eventually see the full collection happen and usage drop back down to your original starting point.
If that never happens and you eventually get an Out Of Memory exception, then you probably have a memory leak :)
That kind of sawtooth pattern is commonly observed and is not an indication of memory leak.
Because garbage collecting in big chunks is more efficient than constantly collecting small amounts, the JVM does the collecting in batches. That's why you see this pattern.
As stated by others, this behavior is normal. This is a good description of the garbage collection process. To summarize, the JVM usese a generational garbage collector. The vast majority of objects are very short-lived, and those that survive longer tend to last much longer. Knowing this, the GC will check the newer generation first to avoid having to repeatedly check the older objects which are less likely to be inaccessible. After a period of time, the survivors move to the older generation. This increasing saw-tooth is exactly what you're seeing- the rising troughs are due to the older generation growing larger as the survivors are being moved to it. If your program ran long enough eventually checking the newer generation wouldn't free up enough memory and it would have to GC the old generation as well.
Hope that helps.

What is the normal behavior of Java GC and Java Heap Space usage?

I am unsure whether there is a generic answer for this, but I was wondering what the normal Java GC pattern and java heap space usage looks like. I am testing my Java 1.6 application using JMeter. I am collecting JMX GC logs and plotting them with JMeter JMX GC and Memory plugin extension. The GC pattern looks quite stable with most GC operations being 30-40ms, occasional 90ms. The memory consumption goes in a saw-tooth pattern. The JHS usage grows constantly upwards e.g. to 3GB and every 40 minutes the memory usage does a free-fall drop down to around 1GB. The max-min delta however grows, so the sawtooth height constantly grows. Does it do a full GC every 40mins?
Most of your descriptions in general, are how the GC works. However, none of your specific observations, especially numbers, hold for general case.
To start with, each JVM has one or several GC implementations and you could choose which one to use. Take the mostly applied one i.e. SUN JVM (I like to call it this way) and the common server GC pattern as example.
Firstly, the memory are divided into 4 regions.
A young generation which holds all of the recently created objects. When this generation is full, GC does a stop-the-world collection by stopping your program from working, execute a black-gray-white algorithm and get the obselete objects and remove them. So this is your 30-40 ms.
If an object survived a certain rounds of GC in the young gen, it would be moved into a swap generation. The swap generation holds the objects until another number of GCs - then move them to the old generation. There are 2 swap generations which does a double buffering kind of thing to facilitate the young gen to work faster. If young gen dumps stuff to swap gen and found swap gen is mostly full, a GC would happen on swap gen and potentially move the survived objects to old gen. This most likely makes your 90ms, though I am not 100% sure how swap gen works. Someone correct me if I am wrong.
All the objects survived swap gen would be moved to the old generation. The old generation would only be GC-ed until it's mostly full. In your case, every 40 min.
There is another "permanent gen" which is used to load your jar target byte code and resources.
All size of the areas can be adjusted by JVM parameters.
You can try to use VisualVM which would give you a dynamic idea of how it works.
P.S. not all JVM / GC works the same way. If you use G1 collector, or JRocket, it might happens slightly different, but the general idea holds.
Java GC work in terms of generations of objects. There are young, tenure and permament generations. It seems like in your case: every 30-40ms GC process only young generation (and transfers survived objects into tenure generation). And every 40 mins it performs full collecting (it causes stop-the-world pause). Note: it happens not by time, but by percentage of used memory.
There are several JVM options, which allows you to chose generation's sizes, type of GC (there are several algorithms for GC, in java 1.6 Serial GC is used by default, for example -XX:-UseConcMarkSweepGC), parameters of GC work.
You'd better try to find good articles about generations and different types of GC (algorithms are really different, some of them allow to avoid stop-the-world pauses at all!)
yes, most likely. Instead of guessing you can use jstat to monitor your GCs.
I suggest you use a memory profiler to ensure there is nothing simple you can do ti improve the amount of garbage you are producing.
BTW, If you increase the size of the young generation, you can reduce how much garbage makes it into the tenured space reducing the frequency of full collections. You may find you less than one full collection per day if you tune it enough.
For a more extreme case, I have tuned a trading system to less than one collection per day (minor or major)

How to reduce java concurrent mode failure and excessive gc

In Java, the concurrent mode failure means that the concurrent collector failed to free up enough memory space form tenured and permanent gen and has to give up and let the full stop-the-world gc kicks in. The end result could be very expensive.
I understand this concept but never had a good comprehensive understanding of
A) what could cause a concurrent mode failure and
B) what's the solution?.
This sort of unclearness leads me to write/debug code without much of hints in mind and often has to shop around those performance flags from Foo to Bar without particular reasons, just have to try.
I'd like to learn from developers here how your experience is? If you had encountered such performance issue, what was the cause and how you addressed it?
If you have coding recommendations, please don't be too general. Thanks!
The first thing about CMS that I have learned is it needs more memory than the other collectors, about 25 to 50% more is a good starting point. This helps you avoid fragmentation, since CMS does not do any compaction like the stop the world collectors would. Second, do things that help the garbage collector; Integer.valueOf instead of new Integer, get rid of anonymous classes, make sure inner classes are not accessing inaccessible things (private in the outer class) stuff like that. The less garbage the better. FindBugs and not ignoring warnings will help a lot with this.
As far as tuning, I have found that you need to try several things:
-XX:+UseConcMarkSweepGC
Tells JVM to use CMS in tenured gen.
Fix the size of your heap: -Xmx2048m -Xms2048m This prevents GC from having to do things like grow and shrink the heap.
-XX:+UseParNewGC
use parallel instead of serial collection in the young generation. This will speed up your minor collections, especially if you have a very large young gen configured. A large young generation is generally good, but don't go more than half of the old gen size.
-XX:ParallelCMSThreads=X
set the number of threads that CMS will use when it is doing things that can be done in parallel.
-XX:+CMSParallelRemarkEnabled remark is serial by default, this can speed you up.
-XX:+CMSIncrementalMode allows application to run more by pasuing GC between phases
-XX:+CMSIncrementalPacing allows JVM to figure change how often it collects over time
-XX:CMSIncrementalDutyCycleMin=X Minimm amount of time spent doing GC
-XX:CMSIncrementalDutyCycle=X Start by doing GC this % of the time
-XX:CMSIncrementalSafetyFactor=X
I have found that you can get generally low pause times if you set it up so that it is basically always collecting. Since most of the work is done in parallel, you end up with basically regular predictable pauses.
-XX:CMSFullGCsBeforeCompaction=1
This one is very important. It tells the CMS collector to always complete the collection before it starts a new one. Without this, you can run into the situation where it throws a bunch of work away and starts again.
-XX:+CMSClassUnloadingEnabled
By default, CMS will let your PermGen grow till it kills your app a few weeks from now. This stops that. Your PermGen would only be growing though if you make use of Reflection, or are misusing String.intern, or doing something bad with a class loader, or a few other things.
Survivor ratio and tenuring theshold can also be played with, depending on if you have long or short lived objects, and how much object copying between survivor spaces you can live with. If you know all your objects are going to stick around, you can configure zero sized survivor spaces, and anything that survives one young gen collection will be immediately tenured.
Quoted from "Understanding Concurrent Mark Sweep Garbage Collector Logs"
The concurrent mode failure can either
be avoided by increasing the tenured
generation size or initiating the CMS
collection at a lesser heap occupancy
by setting
CMSInitiatingOccupancyFraction to a
lower value
However, if there is really a memory leak in your application, you're just buying time.
If you need fast restart and recovery and prefer a 'die fast' approach I would suggest not using CMS at all. I would stick with '-XX:+UseParallelGC'.
From "Garbage Collector Ergonomics"
The parallel garbage collector
(UseParallelGC) throws an
out-of-memory exception if an
excessive amount of time is being
spent collecting a small amount of the
heap. To avoid this exception, you can
increase the size of the heap. You can
also set the parameters
-XX:GCTimeLimit=time-limit and -XX:GCHeapFreeLimit=space-limit
Sometimes OOM pretty quick and got killed, sometime suffers long gc period (last time was over 10 hours).
It sounds to me like a memory leak is at the root of your problems.
A CMS failure won't (as I understand it) cause an OOM. Rather a CMS failure happens because the JVM needs to do too many collections too quickly, and CMS could not keep up. One situation where lots of collection cycles happen in a short period is when your heap is nearly full.
The really long GC time sounds weird ... but is theoretically possible if your machine was thrashing horribly. However, a long period of repeated GCs is quite plausible if your heap is very nearly full.
You can configure the GC to give up when the heap is 1) at max size and 2) still close to full after a full GC has completed. Try doing this if you haven't done so already. It won't cure your problems, but at least your JVM will get the OOM quickly, allowing a faster service restart and recovery.
EDIT - the option to do this is -XX:GCHeapFreeLimit=nnn where nnn is a number between 0 and 100 giving the minimum percentage of the heap that must be free after the GC. The default is 2. The option is listed in the aptly titled "The most complete list of -XX options for Java 6 JVM" page. (There are lots of -XX options listed there that don't appear in the Sun documentation. Unfortunately the page provides few details on what the options actually do.)
You should probably start looking to see if your application / webapp has memory leaks. If it has, your problems won't go away unless those leaks are found and fixed. In the long term, fiddling with the Hotspot GC options won't fix memory leaks.
I've found using -XX:PretenureSizeThreshold=1m to make 'large' object go immediately to tenured space greatly reduced my young GC and concurrent mode failures since it tends not to try to dump the young + 1 survivor amount of data (xmn=1536m survivorratio=3 maxTenuringThreashould=5) before a full CMS cycle can complete. Yes my survivor space is large, but about once ever 2 days something comes in the app that will need it (and we run 12 app servers each day for 1 app).

Categories

Resources