I am using 16gb Heap along with G1GC.
For some of batches I am seeing continuous GC pause (G1 Evacuation Pause) multiple times within a second which is drastically slowing down the batch.
I have noticed that despite evacuation pause memory doesn't seem to be freed up (as shown in screenshot below, it stays at 14gb). So i tried increasing heap from 16gb to 40gb and even then Evacuation pause for Young generation is continuous. It does reduce from multiple times per second to once or twice per second.
I tried increasing XX:MaxGCPauseMillis to 3 seconds but that has not helped.
Any suggestions on what can help with this situation ?
Related
in NewRelic where we could able to see the GC pause times, currently GC cpu usage only shown.
when I run "jmap -heap ", i get the following
If you notice, PS Old Generation is just a little over 9% used while Eden is ~4.5% used.
At what percentage of Eden does minor GC occur?
At what percentage of PS Old generation does the stop the world GC occurs?
Exact percentage would vary based on algorithm you are using.
Refer https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/
Best solution : go to your java bin ex: java-1.8.0-oracle/bin.
and run jstat -gc -t PID 1s and see yourself when major are minor gc are getting triggered.
It will give you current memory size and GC occurrence count for all major and minor GC parameters.
refer the attachment for example.
enter image description here
Keeping it simple :
Minor GC is triggered when JVM does not have space for a new Object example :Eden is getting full.
Try the command : jstat -gc -t 5000 1s see behaviors of young generation.
Regarding At what percentage of PS Old generation does the stop the world GC occurs?
Even Minor GC stops application threads leading to stop the world.(but its comparatively negligible)
Stop the world event occur every time, when minor or major GC or full gc is triggered and to check when full GC will triggered.
Major GC is cleaning the Tenured space.
Full GC is cleaning the entire Heap – both Young and Tenured spaces.
You can set flags to modify them depending on algorithm you are using for GC.
We have a Web Java based application running on JBoss with allowed maximum heap size of about 1.2 GB (total machine physical memory is 2 GB). At some point the application stops responding (to clients) for several minutes. After some analysis we found out that the culprit is the Full GC. Here's an excerpt from the verbose GC log:
74477.402: [Full GC [PSYoungGen: 3648K->0K(332160K)] [PSOldGen: 778476K->589497K(819200K)] 782124K->589497K(1151360K) [PSPermGen: 102671K->102671K(171328K)], 646.1546860 secs] [Times: user=3.84 sys=3.72, real=646.17 secs]
What I don't understand is how is it possible that the real time spent on Full GC is about 11 minutes (646 seconds), while user+sys times are just 7.5 seconds. 7.5 seconds sound to me much more logical time to spend for cleaning <200 MB from the old generation. Where does all the other time go?
Thanks a lot.
Where does all the other time go?
It is most likely that your application is causing virtual memory thrashing. Basically, your application needs significantly more pages of virtual memory than there are physical pages available to hold them. As a result, it is spending most of the time waiting for vm pages to be read from and written to disc.
For more information, read this wikipedia page.
The cure is to either reduce virtual memory usage or increase the amount of physical memory on the system. For example, you could:
run fewer applications on the machine,
reduce Java application heap sizes, or
if you are running in a virtual, increase the virtual's allocation of physical memory.
(Note however that reducing the JVM heap size can be a two-edged sword. If you reduce the heap size too much the application will either die from OutOfMemoryErrors, spend too much time garbage collecting, or suffer from not being able to cache things effectively.)
I have a jetty application processing about 2k requests a second. Machine has 8 cores and JVM heap size is 8GB. There are a lot of memory mapped files and internal caching so that takes up most of the heap space (4.5 GB).
Here are the stats after the application is stable and the JVM is done tuning Young and Old gen spaces:
Young Generation : 2.6GB
Old Generation : 5.4GB
I'm seeing that my young GC is invoked every 3 seconds and the entire Eden space is cleared (i.e. very less data is passed onto old generation). I understand that filling up the Young generation so quickly means I'm allocating way too many objects and that this is an issue. But there is definitely no memory leak in my application, since the servers have been up since 2 weeks with no OOM crashes.
Young GC is a stop the world event. So my understanding is that all threads are paused during this time. So when I monitor latencies from the logs, I can see that every 2-3 seconds about 6-9 requests have a response time of > 100ms (My average response time is < 10 ms). And when Full GC is called, I see that 6-9 reqeusts have a response time of > 3 seconds (That's how long Full GC takes and since it's invoked very very less, it is not an issue here)
My question is since my jetty application has a 200 size threadpool and no bounded request queue, shouldn't calling young GC have an accordion effect on my response times? Will a 100 ms buffer be added to all the requests in my queue?
If so, what is the best way to measure response times from being added to the queue to the output response? Because the 6-9 request thing I mentioned above is from checking the logs. So basically, when the application logic is invoked to just before the response is sent, I maintain start and end time variables and subtract these 2 and dump it to the logs.
One way would be to check my load balancer. But since these servers are behind an ELB, I don't really have much access here other than average response times which don't really help me.
You should enable GC logging for your application. Try adding following jvm command line arguments
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:-PrintTenuringDistribution -XX:+PrintGCCause -XX:+PrintGCApplicationStoppedTime -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<REPLACE_ME> -XX:GCLogFileSize=20M -Xloggc:<path_to_gc_log_dir>/gc.log
Then look at the events in GC logs and try to correlate it with application logs
Our server has 128GB of RAM and 64 cores, running Tomcat 7.0.30 and Oracle jdk1.6.0_38, on CentOS 6.3.
Every 60 minutes we're seeing garbage collection that was taking 45 - 60 seconds. Adding -XX:-UseConcMarkSweepGC increased page load times by about 10% but got that down to about 3 seconds, which is an acceptable trade-off.
Our config:
-Xms30g
-Xmx30g
-XX:PermSize=8g
-XX:MaxPermSize=8g
-Xss256k
-XX:-UseConcMarkSweepGC
We set the heap at 30 GB to keep 32 bit addressing (I read that above 32 GB the 64 bit addressing takes up more memory, so you have to go to about 48 GB to see improvements).
Using VisualGC I can see that the Eden space is cycling through every 30 - 60 minutes, but not much happens with the Survivor 0, Survivor 1, Old Gen, and Perm Gen.
We have a powerful server. What other optimizations can we make to further decrease the 3 second GC time?
Any recommendations to improve performance or scaling?
Any other output or config info that would help?
It might sound counter-intuitive, but have you tried allocating a lot less memory? E.g. do you really need a 30G heap? In case you can get along with 4G or even less: Garbage collection might be more frequent, but when it happens it will be a lot faster. Typically I find this more desirable than allocating a lot of memory, suffering from the time it takes to clean it up.
Even if this will not help you because you really need 30G of memory, others might come along with a similar problem and they might benefit from allocating less.
Seems that you need Incremental GC to reduce pauses:
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
and for tracing without visualgc this always went well for me (output in catalina.out):
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
2013-01-05T22:52:13.954+0100: 15918369.557: [GC 15918369.557: [DefNew:
65793K->227K(98304K), 0.0031220 secs] 235615K->170050K(491520K),
0.0033220 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
After you can play with this:
-XX:NewSize=ABC -XX:MaxNewSize=ABC
-XX:SurvivorRatio=ABC
-XX:NewRatio=ABC
Reference: Virtual Machine Garbage Collection Tuning