High GC pasue time in GC logs [duplicate] - java

We have a Web Java based application running on JBoss with allowed maximum heap size of about 1.2 GB (total machine physical memory is 2 GB). At some point the application stops responding (to clients) for several minutes. After some analysis we found out that the culprit is the Full GC. Here's an excerpt from the verbose GC log:
74477.402: [Full GC [PSYoungGen: 3648K->0K(332160K)] [PSOldGen: 778476K->589497K(819200K)] 782124K->589497K(1151360K) [PSPermGen: 102671K->102671K(171328K)], 646.1546860 secs] [Times: user=3.84 sys=3.72, real=646.17 secs]
What I don't understand is how is it possible that the real time spent on Full GC is about 11 minutes (646 seconds), while user+sys times are just 7.5 seconds. 7.5 seconds sound to me much more logical time to spend for cleaning <200 MB from the old generation. Where does all the other time go?
Thanks a lot.

Where does all the other time go?
It is most likely that your application is causing virtual memory thrashing. Basically, your application needs significantly more pages of virtual memory than there are physical pages available to hold them. As a result, it is spending most of the time waiting for vm pages to be read from and written to disc.
For more information, read this wikipedia page.
The cure is to either reduce virtual memory usage or increase the amount of physical memory on the system. For example, you could:
run fewer applications on the machine,
reduce Java application heap sizes, or
if you are running in a virtual, increase the virtual's allocation of physical memory.
(Note however that reducing the JVM heap size can be a two-edged sword. If you reduce the heap size too much the application will either die from OutOfMemoryErrors, spend too much time garbage collecting, or suffer from not being able to cache things effectively.)

Related

Full GC do not seem execute (out of memory)

I have 2 web servers (4 cores / 16 GB RAM / Cent OS 7) behind a load balancer using round robin algorithm.
Applications are build with Java using Apache/Tomcat.
Servers, Apache, Tomcat and Webapps have the same configuration, with the heap size : -Xms12840m -Xmx12840m
The problem is that the 1st server goes through an out of memory. Kernel kill the java process because of an out of memory. While the 2nd server is more stable.
I tried to monitor and analyse the Heap dump memory using VisualVM and also the GC with jstat.
About the heap dump memory, I didn't found any memory leak, which does not mean that there are not.
But with VisualJM / Monitor I can observe that a Full GC is done on the 2nd server when the old generation is full. Which is not the case on the 1st server. In fact, it seems the first server is constantly more busy than the second despite of the round robin algorithm used by the load balancer.
So, on the 1st server it seems that the JVM has not the time to proceed to a full GC before the out of memory.
By default, the ratio between the young/old generation is 1:2
Minor GC on the young generation are ok, when the Eden if full a minor GC is done. But when the old generation growth near the 100%, there is no Full GC.
So, how can I optimise the GC in order to avoid the out of memory ?
Why the full GC is not done on server 1 ?
Is it because of a peak of requests on the server and then the JVM is not able to proceed a full GC in time ?
Thanks for your help.

Unexpected Full GC (Ergonomic)

I'm running application that consists of 3 stages: load initial data, update it using POST http requests, serve it using GET requests.
After all data is set up (initial load + POST updates) I'm calling System.gc() and I see
[Full GC (System.gc()) 2051102K->1082623K(3298304K), 13.3521960 secs]
So I expect that I need about 1000M for old gen and run my app with such setup (at 4G RAM docker container):
java -XX:MaxNewSize=2550m -XX:NewSize=2550m -Xms3750m -Xmx3750m -XX:+PrintGCTimeStamps -XX:+PrintGC -cp my.jar my.My
So I expect that from 3750M of heap 2550M will be used for new space to serve GET requests and 1200M will be for old gen.
During the work of my application I have
[Full GC (Ergonomics) 2632447K->1086729K(3298304K), 2.3360398 secs]
So it looks very strange for me that full GC is triggered. My understanding is that because GET requests don't add any long-living data it should be just Minor GC when new generation is full. So I don't understand
1) Why it is running Full GC (Ergonomics) - does it want to change generation layout?
2) Why it is running when only 2632447K of 3298304K are occupied?
A few points:
GET/POST has nothing to do with it, the logic that you implemented for GET/POST is what matters... e.g. the GET method might have a memory leak...
Growing memory doesn't necessarily reduce GC (it doesn’t work in linearly)
GC usually freezes for ~1 sec per live GB, so the fact that you're having 13 secs freeze for 1 GB worths checking
The next step should be enabling GC log and analyzing it to have a better understanding of the issue

JVM heap used percentage - when to generate alert

We have an application which is deployed on Tomcat 8 application server and currently monitoring server (Zabbix) is configured to generate alert if the heap memory is 90% utilized.
There were certain alerts generated which prompted us to do heap dump analysis. Nothing really came out of heap dump, there was no memory leak. There were lot of unreachable object which were not cleaned up because of no GC.
JVM configurations:
-Xms8192m -Xmx8192m -XX:PermSize=128M -XX:MaxPermSize=256m
-XX:+UseParallelGC -XX:NewRatio=3 -XX:+PrintGCDetails
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/app/apache-tomcat-8.0.33
-XX:ParallelGCThreads=2
-Xloggc:/app/apache-tomcat-8.0.33/logs/gc.log
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:GCLogFileSize=50m -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=30
We tried running garbage collection manually using jcmd command and it cleared up the memory. GC logs after running jcmd:
2016-11-04T03:06:31.751-0400: 1974627.198: [Full GC (System.gc()) [PSYoungGen: 18528K->0K(2049024K)] [ParOldGen: 5750601K->25745K(6291456K)] 5769129K->25745K(8340480K), [Metaspace: 21786K->21592K(1069056K)], 0.1337369 secs] [Times: user=0.19 sys=0.00, real=0.14 secs]
Questions:
Is there any configuration above due to which GC is not running automatically.
What is the reason of this behavior? I understand that Java will do GC when it needs to. But, if it is not running GC even when heap is 90% utilized, what should be the alert threshold (and if it even makes sense to have any alert based on heap utilization).
When the garbage collector decides to collect differs per garbage collector. I have not been able to find any hard promises on when your (Parallel GC) garbage collector runs. Many Garbage collectors also tune on several different variables, which can influence when it will run.
As you have noted yourself, your application can have high heap usage and still run fine. What you are looking for in an application is that the Garbage Collector is still efficient. Meaning it can clean up quiet a lot of garbage in a single run.
Some aspects of garbage collection
Most garbage collectors have two or more strategies, one for 'young' objects and one for 'old' objects. When a young object has not been collected in the latest (several) collects, it becomes an old object. The idea behind this is that if an object has not been collected it probably wont be collected next time either. (Most objects either live really short, or really long). The garbage collector does a very efficient, but not perfect cleaning of the young objects. When that doesn't free up enough data, then a more costly garbage collection is done on all (young en old) objects.
This will often generate a saw tooth (taken from this site):
Here you see many small drops in heap size and a slowly growing heap. Every now and then a large collection is done, and there is a large drop. The actually 'used' memory is the amount of memory left after a large collection.
Aspects to measure
This leads to the following aspects you can look at when determining the health of your application:
The amount of time spent by you application garbage collecting (both in total and as a percentage of CPU time).
The amount of memory available right after a garbage collect.
The a quick increment in the number of large garbage collects.
In most cases you will need monitor the behavior of your application under load, to see what are good values for you.
The parallel garbage collector uses a similar condition to determine if all is still well:
If more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, then an OutOfMemoryError is thrown.
All of these statistic you can see nicely using VisualVM and Jconsole. I am not sure which you can use as triggers in your monitoring tools

GC spinning all the time despite much free heap

I have an application running with -mx7000m. I can see it's allocated 5.5 GB heap. Yet for some reasons it's GCing constantly, and that being CMS it turns out to be quite CPU intensive. So, it has 5.5 GB heap already allocated, but somehow it's spending all the CPU time trying to keep the used heap as small as possible, which is around 2 GB - and the other 3.5G are allocated by JVM but unused.
I have a few servers with the exact same configuration and 2 out of 5 are behaving this way. What could explain it?
It's Java 6, the flags are: -server -XX:+PrintGCDateStamps -Duser.timezone=America/Chicago -Djava.awt.headless=true -Djava.io.tmpdir=/whatever -Xloggc:logs/gc.log -XX:+PrintGCDetails -mx7000m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC
The default threshold for triggering a CMS GC is 70% full (in Java 6). A rule of thumb is that the heap size should be about 2.5x the heap used after a full GC (but your use case is likely to be different)
So in your case, say you have
- 2.5 GB of young generation space
- 3 GB of tenured space.
When you tenured space reached 70% or ~2.1GB, it will start cleaning up the region.
The setting involved is the -XX:CMSInitiatingOccupancyFraction=70
However, if you want to reduce the impact GC, the simplest thing to do is to create less garbage. i.e. use a memory profiler and ensure you allocation rate is as low as possible. Your system will run very different if you are creating as much garbage as the CPUs can handle to say 100 MB/s or 1 MB/s or less.
The reason you might have different servers running differently as the relative sizes of the region might be different, say you have 0.5 young and 5.0 GB tenured, you shouldn't be seeing this. The difference could be purely down to how busy the machine was when you started the process or what it did since then.

Optimizing Tomcat / Garbage Collection

Our server has 128GB of RAM and 64 cores, running Tomcat 7.0.30 and Oracle jdk1.6.0_38, on CentOS 6.3.
Every 60 minutes we're seeing garbage collection that was taking 45 - 60 seconds. Adding -XX:-UseConcMarkSweepGC increased page load times by about 10% but got that down to about 3 seconds, which is an acceptable trade-off.
Our config:
-Xms30g
-Xmx30g
-XX:PermSize=8g
-XX:MaxPermSize=8g
-Xss256k
-XX:-UseConcMarkSweepGC
We set the heap at 30 GB to keep 32 bit addressing (I read that above 32 GB the 64 bit addressing takes up more memory, so you have to go to about 48 GB to see improvements).
Using VisualGC I can see that the Eden space is cycling through every 30 - 60 minutes, but not much happens with the Survivor 0, Survivor 1, Old Gen, and Perm Gen.
We have a powerful server. What other optimizations can we make to further decrease the 3 second GC time?
Any recommendations to improve performance or scaling?
Any other output or config info that would help?
It might sound counter-intuitive, but have you tried allocating a lot less memory? E.g. do you really need a 30G heap? In case you can get along with 4G or even less: Garbage collection might be more frequent, but when it happens it will be a lot faster. Typically I find this more desirable than allocating a lot of memory, suffering from the time it takes to clean it up.
Even if this will not help you because you really need 30G of memory, others might come along with a similar problem and they might benefit from allocating less.
Seems that you need Incremental GC to reduce pauses:
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
and for tracing without visualgc this always went well for me (output in catalina.out):
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
2013-01-05T22:52:13.954+0100: 15918369.557: [GC 15918369.557: [DefNew:
65793K->227K(98304K), 0.0031220 secs] 235615K->170050K(491520K),
0.0033220 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
After you can play with this:
-XX:NewSize=ABC -XX:MaxNewSize=ABC
-XX:SurvivorRatio=ABC
-XX:NewRatio=ABC
Reference: Virtual Machine Garbage Collection Tuning

Categories

Resources