I have a Java app which shows different GC behaviors in different environments. In one environment, the heap usage graph is a slow sawtooth with major GCs every 10 hours or so, only when the heap is >90% full. In another environment, the JVM does major GCs every hour on the dot (the heap is normally between 10% and 30% at these times).
My question is, what are the factors which cause the JVM to decide to do a major GC?
Obviously it collects when the heap is nearly full, but there is some other cause at play which I am guessing is related to an hourly scheduled task within my app (although there is no spike in memory usage at this time).
I assume GC behaviour depends heavily on the JVM; I am using:
Java HotSpot(TM) 64-Bit Server VM 1.7.0_21 Oracle Corporation
No specific GC options, so using the default settings for 64-bit server (PS MarkSweep and PS Scavenge)
Other info:
This is a web app running in Tomcat 6.
Perm gen hovers around 10% in both environments.
The environment with the sawtooth behaviour has 7Gb max heap, the other has 14Gb.
Please, no guesswork. The JVM must have rules for deciding when to perform a major GC, and these rules must be encoded deep in the source somewhere. If anyone knows what they are, or where they are documented, please share!
I have found four conditions that can cause a major GC (given my JVM config):
The old gen area is full (even if it can be grown, a major GC will still be run first)
The perm gen area is full (even if it can be grown, a major GC will still be run first)
Someone is manually calling System.gc(): a bad library or something related to RMI (see links 1, 2 and 3)
The young gen areas are all full and nothing is ready to be moved into old gen (see 1)
As others have commented, cases 1 and 2 can be improved by allocating plenty of heap and permgen, and setting -Xms and -Xmx to the same value (along with the perm equivalents) to avoid dynamic heap resizing.
Case 3 can be avoided using the -XX:+DisableExplicitGC flag.
Case 4 requires more involved tuning, e.g., -XX:NewRatio=N (see Oracle's tuning guide).
Garbage collection is a pretty complicated topic, and while you could learn all the details about this, I think what’s happening in your case is pretty simple.
Sun’s Garbage Collection Tuning guide, under the “Explicit Garbage Collection” heading, warns:
applications can interact with garbage collection … by invoking full garbage collections explicitly … This can force a major collection to be done when it may not be necessary … One of the most commonly encountered uses of explicit garbage collection occurs with RMI … RMI forces full collections periodically
That guide says that the default time between garbage collections is one minute, but the sun.rmi Properties reference, under sun.rmi.dgc.server.gcInterval says:
The default value is 3600000 milliseconds (one hour).
If you’re seeing major collections every hour in one application but not another, it’s probably because the application is using RMI, possibly only internally, and you haven’t added -XX:+DisableExplicitGC to the startup flags.
Disable explicit GC, or test this hypothesis by setting -Dsun.rmi.dgc.server.gcInterval=7200000 and observing if GCs happen every two hours instead.
It depends on your configurations, since HotSpot configures itself differently in different Java environments. For example, in a server with more than 2GB and two processors some JVMs will be configured in '-server' mode instead of the default '-client' mode, which configure the sizes of the memory spaces (generations) differently, and that has an impact as to when garbage collection will occur.
A full GC can occur automatically, but also if you call the garbage collector in your code (ex: using System.gc()). Automatically, it depends on how the minor collections are behaving.
There are at least two algorithms being used. If you are using defaults, a copying algorithm is used for minor collections, and a mark-sweep algorithm for major collections.
A copying algorithm consists of copying used memory from one block to another, and then clearing the space containing the blocks with no references to them. The copying algorithm in the JVM uses uses a large area for objects that are created for the first time (called Eden), and two smaller ones (called survivors). Surviving objects are copied once from Eden and several times from the survivor spaces during each minor collection until they become tenured and are copied to another space (called tenured space) where they can only be removed in a major collection.
Most of the objects in Eden die quickly, so the first collection copies the surviving objects to the survivor spaces (which are by default much smaller). There are two survivors s1 and s2. Every time the Eden fills, the surviving objects from Eden and s1 are copied to s2, Eden and s1 are cleared. Next time, survivors from Eden and s2 are copied back to s1. They keep on being copied from s1 to s2 to s1 until a certain number of copies is reached, or because a block is too big and doesn't fit, or some other criteria. Then the surviving memory block is copied to the tenured generation.
The tenured objects are not affected by the minor collections. They accumulate until the area gets full (or the garbage collector is called). Then the JVM will run a mark-sweep algorithm in a major collection which will preserve only the surviving objects that still have references.
If you have larger objects that don't fit into the survivors, they might be copied directly to the tenured space, which will fill more quickly and you will get major collections more frequently.
Also, the sizes of the survivor spaces, amount of copies between s1 and s2, Eden size related to the size of s1 and s2, size of the tenured generation, all these may be automatically configured differently in different environments with JVM ergonomics, which may automatically select a -server or -client behavior. You might try to run both JVMs as -server or -client and check if they still behave differently.
Even if this will get down votes... My best guess (you will have to test this) would be that the heap needs to expand and when this happens a full gc will be triggered. Not all memory is allocated at once to JVM.
You can test this by setting -Xms and -Xmx to the same value, for example 7GB each
Related
I would like to understand why the GC gets triggered even though I have plenty of heap left unused.. I have allocated 1.7 GB of RAM. I still see 10% of GC CPU usage often.
I use this - -XX:+UseG1GC with Java 17
JVMs will always have some gc threads running (unless you use Epsilon GC which perform no gc, I do not recommend using this unless you know why you need it), because the JVM manages memory for you.
Heap in G1 is divided two spaces: young and old. All objects are created in young space. When the young space fills (it always do eventually, unless you are developing zero garbage), it will trigger some gc cleaning unreferenced objects from the young and promoting some objects which are still referenced to old.
Those spikes in the right screenshot will correspond to young collection events (where unreferenced objects get cleaned). Young space is always much more small than the old space. So it fills frequently. That is why you see those spikes regarding there is much more memory free.
DISCLAIMER This is a really very high level explanation of memory management in the JVM. Some important concepts have been not mentioned.
You can read more about g1 gc collector here
Also take a look at jstat tool which will help you understand what is happening in your heap.
(Committed and Max lines are the same)
I am looking at the memory usage for a Java application in newrelic. Here are several questions:
# 1
The committed PS Survivor Space Heap varied in past few days. But should it be a constant since it is configured by JVM?
# 2
From what I am understanding, the heap memory should decrease when there is a garbage collection. The memory of Eden could decrease when a major gc or a minor gc happens, while the memory of Old could decrease when a major gc happens.
But if you look at Old memory usage, some time between June 6th and 7th, the memory went up and then later it went down. This should represent that a major gc happend, right? However, there was still lots of unused memory left. It didn't seem it almost reach the limit. Then how did the major gc be triggered? Same for Eden memory usage, it never reached the limit but it still decreased.
The application fetches a file from other places. This file could be large and be processed in memory. Could this explain the issue above?
You need to provide more information about your configuration to answer this definitively, I will assume you are using the Hotspot JVM from Oracle and that you are using the G1 collector. Posting the flags you start the JVM with would also be useful.
The key term here is 'committed'. This is memory reserved by the JVM, but not necessarily in use (or even mapped to physical pages, it's just a range of virtual memory that can be used by the JVM). There's a good description of this in the MemoryUsage class of the java.lang.management package (check the API docs). It says, "committed represents the amount of memory (in bytes) that is guaranteed to be available for use by the Java virtual machine. The amount of committed memory may change over time (increase or decrease). The Java virtual machine may release memory to the system..." This is why you see it change.
Assuming you are using G1 then the collector performs incremental compaction. You are correct that if the collector could not keep up with allocation in the old gen and it was getting low on space it would perform a full compacting collection. This is not happening here as the last graph shows you are using nowhere near the allocated heap space. However, to avoid this G1 will collect and compact concurrently with your application. This is why you see usage go up (as you application instantiates more objects) and then go down (as the G1 collector reclaims space from no longer required objects). For a more detailed explanation of how G1 works there is a good read in the documentation, https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/g1_gc.html.
We have a fairly big application running on a JBoss 7 application server. In the past, we were using ParallelGC but it was giving us trouble in some servers where the heap was large (5 GB or more) and usually nearly filled up, we would get very long GC pauses frequently.
Recently, we made improvements to our application's memory usage and in a few cases added more RAM to some of the servers where the application runs, but we also started switching to G1 in the hopes of making these pauses less frequent and/or shorter. Things seem to have improved but we are seeing a strange behaviour which did not happen before (with ParallelGC): the Perm Gen seems to fill up pretty quickly and once it reaches the max value a Full GC is triggered, which usually causes a long pause in the application threads (in some cases, over 1 minute).
We have been using 512 MB of max perm size for a few months and during our analysis the perm size would usually stop growing at around 390 MB with ParallelGC. After we switched to G1, however, the behaviour above started happening. I tried increasing the max perm size to 1 GB and even 1,5 GB, but still the Full GCs are happening (they are just less frequent).
In this link you can see some screenshots of the profiling tool we are using (YourKit Java Profiler). Notice how when the Full GC is triggered the Eden and the Old Gen have a lot of free space, but the Perm size is at the maximum. The Perm size and the number of loaded classes decrease drastically after the Full GC, but they start rising again and the cycle is repeated. The code cache is fine, never rises above 38 MB (it's 35 MB in this case).
Here is a segment of the GC log:
2013-11-28T11:15:57.774-0300: 64445.415: [Full GC 2126M->670M(5120M), 23.6325510 secs]
[Eden: 4096.0K(234.0M)->0.0B(256.0M) Survivors: 22.0M->0.0B Heap: 2126.1M(5120.0M)->670.6M(5120.0M)]
[Times: user=10.16 sys=0.59, real=23.64 secs]
You can see the full log here (from the moment we started up the server, up to a few minutes after the full GC).
Here's some environment info:
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Startup options: -Xms5g -Xmx5g -Xss256k -XX:PermSize=1500M -XX:MaxPermSize=1500M -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Xloggc:gc.log
So here are my questions:
Is this the expected behaviour with G1? I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...
Is there something I can improve/corrrect in our startup parameters? The server has 8 GB of RAM, but it doesn't seem we are lacking hardware, performance of the application is fine until a full GC is triggered, that's when users experience big lags and start complaining.
Causes of growing Perm Gen
Lots of classes, especially JSPs.
Lots of static variables.
There is a classloader leak.
For those that don't know, here is a simple way to think about how the PremGen fills up. The Young Gen doesn't get enough time to let things expire and so they get moved up to Old Gen space. The Perm Gen holds the classes for the objects in the Young and Old Gen. When the objects in the Young or Old Gen get collected and the class is no longer being referenced then it gets 'unloaded' from the Perm Gen. If the Young and Old Gen don't get GC'd then neither does the Perm Gen and once it fills up it needs a Full stop-the-world GC. For more info see Presenting the Permanent Generation.
Switching to CMS
I know you are using G1 but if you do switch to the Concurrent Mark Sweep (CMS) low pause collector -XX:+UseConcMarkSweepGC, try enabling class unloading and permanent generation collections by adding -XX:+CMSClassUnloadingEnabled.
The Hidden Gotcha'
If you are using JBoss, RMI/DGC has the gcInterval set to 1 min. The RMI subsystem forces a full garbage collection once per minute. This in turn forces promotion instead of letting it get collected in the Young Generation.
You should change this to at least 1 hr if not 24 hrs, in order for the the GC to do proper collections.
-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000
List of every JVM option
To see all the options, run this from the cmd line.
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version
If you want to see what JBoss is using then you need to add the following to your standalone.xml. You will get a list of every JVM option and what it is set to. NOTE: it must be in the JVM that you want to look at to use it. If you run it external you won't see what is happening in the JVM that JBoss is running on.
set "JAVA_OPTS= -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal %JAVA_OPTS%"
There is a shortcut to use when we are only interested in the modified flags.
-XX:+PrintcommandLineFlags
Diagnostics
Use jmap to determine what classes are consuming permanent generation space. Output will show
class loader
# of classes
bytes
parent loader
alive/dead
type
totals
jmap -permstat JBOSS_PID >& permstat.out
JVM Options
These settings worked for me but depending how your system is set up and what your application is doing will determine if they are right for you.
-XX:SurvivorRatio=8 – Sets survivor space ratio to 1:8, resulting in larger survivor spaces (the smaller the ratio, the larger the space). The SurvivorRatio is the size of the Eden space compared to one survivor space. Larger survivor spaces allow short lived objects a longer time period to die in the young generation.
-XX:TargetSurvivorRatio=90 – Allows 90% of the survivor spaces to be occupied instead of the default 50%, allowing better utilization of the survivor space memory.
-XX:MaxTenuringThreshold=31 – To prevent premature promotion from the young to the old generation . Allows short lived objects a longer time period to die in the young generation (and hence, avoid promotion). A consequence of this setting is that minor GC times can increase due to additional objects to copy. This value and survivor space sizes may need to be adjusted so as to balance overheads of copying between survivor spaces versus tenuring objects that are going to live for a long time. The default settings for CMS are SurvivorRatio=1024 and MaxTenuringThreshold=0 which cause all survivors of a scavenge to be promoted. This can place a lot of pressure on the single concurrent thread collecting the tenured generation. Note: when used with -XX:+UseBiasedLocking, this setting should be 15.
-XX:NewSize=768m – allow specification of the initial young generation sizes
-XX:MaxNewSize=768m – allow specification of the maximum young generation sizes
Here is a more extensive JVM options list.
Is this the expected behaviour with G1?
I don't find it surprising. The base assumption is that stuff put into permgen almost never becomes garbage. So you'd expect that permgen GC would be a "last resort"; i.e. something the JVM would only do if its was forced into a full GC. (OK, this argument is nowhere near a proof ... but its consistent with the following.)
I've seen lots of evidence that other collectors have the same behaviour; e.g.
permgen garbage collection takes multiple Full GC
What is going on with java GC? PermGen space is filling up?
I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...
I think I found the same post. But someone's opinion that it ought to be possible is not really instructive.
Is there something I can improve/corrrect in our startup parameters?
I doubt it. My understanding is that this is inherent in the permgen GC strategy.
I suggest that you either track down and fix what is using so much permgen in the first place ... or switch to Java 8 in which there isn't a permgen heap anymore: see PermGen elimination in JDK 8
While a permgen leak is one possible explanation, there are others; e.g.
overuse of String.intern(),
application code that is doing a lot of dynamic class generation; e.g. using DynamicProxy,
a huge codebase ... though that wouldn't cause permgen churn as you seem to be observing.
I would first try to find the root cause for the PermGen getting larger before randomly trying JVM options.
You could enable classloading logging (-verbose:class, -XX:+TraceClassLoading -XX:+TraceClassUnloading, ...) and chek out the output
In your test environment, you could try monitoring (over JMX) when classes get loaded (java.lang:type=ClassLoading LoadedClassCount). This might help you find out which part of your application is responsible.
You could also try listing all the classes using the JVM tools (sorry but I still mostly use jrockit and there you would do it with jrcmd. Hope Oracle have migrated those helpful features to Hotspot...)
In summary, find out what generates so many classes and then think how to reduce that / tune the gc.
Cheers,
Dimo
I agree with the answer above in that you should really try to find what is actually filling your permgen, and I'd heavily suspect it's about some classloader leak that you want to find a root cause for.
There's this thread in the JBoss forums that goes through couple of such diagnozed cases and how they were fixed. this answer and this article discusses the issue in general as well. In that article there's a mention of possibly the easiest test you can do:
Symptom
This will happen only if you redeploy your application without
restarting the application server. The JBoss 4.0.x series suffered
from just such a classloader leak. As a result I could not redeploy
our application more than twice before the JVM would run out of
PermGen memory and crash.
Solution
To identify such a leak, un-deploy your application and then trigger a
full heap dump (make sure to trigger a GC before that). Then check if
you can find any of your application objects in the dump. If so,
follow their references to their root, and you will find the cause of
your classloader leak. In the case of JBoss 4.0 the only solution was
to restart for every redeploy.
This is what I'd try first, IF you think that redeployment might be related. This blog post is an earlier one, doing the same thing but discussing the details as well. Based on the posting it might be though that you're not actually redeploying anything, but permgen is just filling up by itself. In that case, examination of classes + anything else added to permgen might be the way (as has been already mentioned in previous answer).
If that doesn't give more insight, my next step would be trying out plumbr tool. They have a sort of guarantee on finding the leak for you, as well.
You should be starting your server.bat with java command with -verbose:gc
I think I have a memory leak.
(they say the first step is admitting the problem, right?)
Anyway, I think I do - see attached image for heap by regions: .
Green is Eden, blue/red is S0/S1, purple is old. I have unlimited tenuring (>15), lots of time passed between memory being allocated and it spilling to old gen. Hence - a memory leak. I think.
So - the question - how can I analyze what is leaking? As you can see, my Eden is very active. Lot's of objects being created and destroyed all the time.
Is there a way of taking a heap dump of the old gen only? Or somehow identify the old gen in a full heap dump (if so, with what tool)?
Edit 1:
Clarification: I'm not doing anything that should retain objects in memory. Everything I allocate after the initial startup should die young.
Edit2:
New findings: I took a heap dump, GCed like crazy and took another. The second one shows a significantly reduced level of old gen usage. The main difference between the two were objects held by finalizers.
Don't finalizers run in young GC cycles? Do they always wait for a full GC to be cleaned?
seeing some things propagate to old gen isn't a huge concern. After your old gen reaches a certain threshold a full GC will kick off. If that isn't able to reclaim the memory then you have an issue. The fact that you are seeing some memory allocated during a young collection shouldn't be an alarming concern.
lots of time passed between memory being allocated and it spilling to
old gen. Hence - a memory leak. I think
Not really.. just because memory is being added to old gen doesn't mean it is a memory leak. It is normal practice during a young collection that older objects get promoted to old gen. It is during those young collections when older objects get added to the old gen. This may just be your application still ramping up. In large scale applications there may be features not used every day, which may be getting into memory later then you expected.
That being said, if you really are concerned with any memory being added to the old gen and want to investigate further, I would recommend running this application on a demo environment. Attach a profiler (VisualVM will work) and load test (JMeter is good and free) your application. If you look at the objects you can get an idea of what generation an object is. You also want to see what happens when your old gen reaches a threshold where a full GC will kick off (normally in the 70%-90% range). If your old gen recovers back to the 20% threshold, then there is no leak. In some cases the old gen may never reach the point where a full GC gets kicked off, but instead level off as you expected. The load test will help identify that.
If it doesn't recover and you confirm you have a memory leak then you will want to capture a heap dump (hprof) and use a tool like MAT (Memory Analyzer Tool) to analyze the dump to find the culprit.
Using JVisualVM (part of the JDK since Java 6 Build 10 or something like that), you can look at the TYPE of objects that are in memory. That will help you track down where the leak is. Of course, it takes a lot of digging into the code, but that's the best tool I've used that always available and reliable.
Watch out for objects being passed around, it could be that you have a handle that's being kept in a list or array that's not being cleared out. I find that if I watch the number of objects being created, and kept, in JVisualVM over a period of a few minutes, I usually get an idea of where in the code to go dig for the offending objects not being released.
I've read a few articles, and I understood the following (please correct me and/or edit the question if I'm wrong):
The java heap is segmented like this:
Young Generation: objects that are created go here, this part is frequently and inexpensively garbage collected
Old Generation: objects that survive the garbage collections of the Young generation go here, this area is garbage collected less frequently and using a more CPU demanding process/algorithm (I believe it's called mark-sweep)
Edit: as stated by another user, PermGen is not a part of the region called heap
PermGen: this area is filled of your app classes metadata and many other things that do not depend on the application usage.
So, knowing this... why does my PermGen space grows when the app is under heavy load? For what I said before this space should not incrementally fill in spite of the app load, but as I said in the beginning probably I'm wrong about some assumptions.
In fact if the PermGen space is growing, is there a way of garbage collect or reset it?
Actually, in Sun's JVM Permanent Generation (PermGen) is completely separate from the heap. Are you sure you aren't looking at the Tenured Generation? It would be suspicious indeed if your Permanent Generation kept growing.
If your perm gen IS growing constantly, it is a difficult area to dig into. Generally it should grow when new classes are loaded for the first time (and potentially certain uses of reflection could also cause this). Interned strings are also stored in perm gen.
If you happen to be on Solaris, you could use jmap -permstat to dump out perm gen statistics, but that option does not appear to be available on Windows (and potentially other platforms). Here is the documentation on jmap for Java 6
From Sun's guide on JConsole (which will let you view the size of these pools):
For the HotSpot Java VM, the memory
pools for serial garbage collection
are the following.
Eden Space (heap): The pool from which memory is initially allocated
for most objects.
Survivor Space (heap): The pool containing objects that have survived
the garbage collection of the Eden
space.
Tenured Generation (heap): The pool containing objects that have existed
for some time in the survivor space.
Permanent Generation (non-heap): The pool containing all the reflective
data of the virtual machine itself,
such as class and method objects. With
Java VMs that use class data sharing,
this generation is divided into
read-only and read-write areas.
Code Cache (non-heap): The HotSpot Java VM also includes a code cache,
containing memory that is used for
compilation and storage of native
code.
The most common causes I've seen are:
Custom classloaders that don't carefully free up older classes after loading new ones.
Classes remaining in PermGen after redeploying an application multiple times (more common in Dev than Prod)
Heavy use of Proxy classes, which are created synthetically during runtime. It's easy to create new Proxy classes when an a single class definition could be reused for multiple instances.
This is one of the more annoying problems to debug. There are a lot of reasons you could be seeing growing permgen use. Here are 2 links I found very useful in both understanding how leaks happen as well as tracking down what is causing them.
http://frankkieviet.blogspot.com/2006/10/how-to-fix-dreaded-permgen-space.html
http://frankkieviet.blogspot.com/2006/10/classloader-leaks-dreaded-permgen-space.html
Are you doing something funky with the classloader chain? Are you calling intern() on a bunch of strings?
If you are working with Java EE application it's probably a classloader leak.
you might find the following additional links to be useful:
http://www.zeroturnaround.com/blog/rjc201/
http://www.ibm.com/developerworks/java/library/j-dclp3/index.html
The most common causes I've seen are:
Java classes are loaded
JAXBContext.newInstance
String.intern()
This is a very common problem when you are manipulating the classloader. This is seen a lot in Java EE apps when you are redeploying hibernate/cglib. For more info check out
http://opensource.atlassian.com/confluence/spring/display/DISC/Memory+leak+-+classloader+won%27t+let+go