The application is using -XX:+UseParallelGC.
There was a GC and real time took significant longer time than the user + sys (~1.05 secs):
[Times: user=0.04 sys=0.01, real=1.05 secs]
The PSYoungGen for the previous GC were cleaning same amount of memory and much less time. I don't have any flag enable to check if jit caused this. There was load on the machine this was running but was caused from the same jvm process that did the GC.
What reasons could cause something like this and what can I do to in the future to figure it out what's happening (if it happens again)?
[GC
Desired survivor size 720864 bytes, new threshold 1 (max 15)
[PSYoungGen: 18864K->454K(21121K)] 38016K->17722K(64833K), 0.0224350 secs] [Times: user=0.04 sys=0.01, real=1.05 secs]
To be able to diagnose this better in the future run with -XX:+PrintSafepointStatistics –XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCDetails -XX:+PrintGCCause for more useful logs. Which JVM version you're using may also be relevant.
Beyond that you'll have to monitor system stats, mostly paging and CPU utilization to see whether other processes or disk IO starve the JVM.
Related
I'm running Java7 (Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode) on a Linux Server 2.6.32-504.el6.x86_64 (RHEL); with few GC switches enabled as shown below.
The problem appears to be a significant increase in time while pausing application thread (> 3Sec); And based on the safepointstatistics it appear to be related to vmop operations.
I have observed there isn't much overhead due to GC nor any allocation failures, only minor collections take place during program execution. The GC log pasted below contains reference from the GC just before the application thread pausing took longer than 3Sec and the GC showing the actual delay.
Questions
Could this time sink be related to the Server freezing up or not being responsive, this is based on the assumption real time took 3.02 Sec and there has been no indication of any overhead due to GC. ([Times: user=0.02 sys=0.00, real=3.02 secs])
Is there any utility available that would monitor a systems responsiveness, or is there any recommended algorithm that could be used to measure the server responsivness
What causes increase in vmop time?
Does JVM perform any Disk IO while initiating garbage collection; in otherwords, before pausing the application thread at a safepoint, does JVM perform any diskIO; or can a system with high diskIO activity during GC cause the delay in pausing application thread.
Server Configurations:
Please note there are several applications running on this server, this is not a dedicated server for the mentioned application.
model name: Intel(R) Xeon(R) CPU X5365 # 3.00GHz / 8 Core
total used free shared buffers cached
Mem: 24602892 22515868 2087024 244 165796 10801380
-/+ buffers/cache: 11548692 13054200'
GC Options enabled:
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/opt/swxsmf_fep/working/gk-gc-CMS.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime\
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCApplicationConcurrentTime
Previous GC (showing no issues)
2015-04-08T19:05:24.622+0100: 522569.387: Application time: 16.4710580 seconds
2015-04-08T19:05:24.622+0100: 522569.387: [GC2015-04-08T19:05:24.622+0100: 522569.387: [ParNew: 102798K->79K(115456K), 0.0018020 secs] 105218K->2499K(371776K), 0.0019090 secs] [Times: user=0.02 sys=0.00, rea
l=0.00 secs]
2015-04-08T19:05:24.624+0100: 522569.389: Total time for which application threads were stopped: 0.0021910 seconds
GC where real time > 3 Sec
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
522588.500: GenCollectForAllocation [ 22 0 0 ] [ 0 0 0 0 3019 ] 0
2015-04-08T19:05:43.747+0100: 522588.512: Application time: 19.1232430 seconds
2015-04-08T19:05:43.748+0100: 522588.512: [GC2015-04-08T19:05:46.765+0100: 522591.530: [ParNew: 102735K->77K(115456K), 0.0017640 secs] 105155K->2497K(371776K), 3.0195450 secs] [Times: user=0.02 sys=0.00, real=3.02 secs]
2015-04-08T19:05:46.767+0100: 522591.532: Total time for which application threads were stopped: 3.0198060 seconds
Any input on this would be very much appreciated, please let me know if you require any futher details.
Time spent stopping threads is generally time your app isn't responding. So yes, I'd expect to see the app hang.
Have you tried jhiccup?
(And 4.) Here's something that came to mind: http://www.evanjones.ca/jvm-mmap-pause.html. It describes a stall ("real" pause) writing hsperf data during GC. There's also a repro case you can try on your machines.
I have a java application connecting to external webservice every 2 mins requesting for data. Once this data is received it is cached locally for business. Web service response is huge (24MB) and I am using axis2 BeanUtil.deserialize to read response before transforming and caching.
I am facing problem where Full GC is kicked in twice in every 2 mins cycle and takes around 2-3 secs.
I am using -Xms512M -Xmx2048M
I have seen application response time slowing down during refresh cycle/ full GC.
I want to know if Full GC is bad for an application and one should tune GC parameters to avoid Full GC?
I have tried various parameters NewRatio, UseConcMarkSweepGC but I can’t see any impact on Full GC & its time duration. Any pointer on this problem will be a great help.
GC trace:
2014-02-05T14:36:32.375+0100: 190.237: [GC 883801K->683780K(1542528K), 0.0884010 secs]
2014-02-05T14:36:36.544+0100: 194.406: [GC 916868K->890923K(1542528K), 0.1139410 secs]
2014-02-05T14:36:38.135+0100: 195.997: [GC 1106423K->1053715K(1542528K), 0.2047740 secs]
2014-02-05T14:36:38.340+0100: 196.202: [Full GC 1053715K->432946K(1589760K), 1.7281700 secs]
2014-02-05T14:36:40.900+0100: 198.762: [GC 666034K->647586K(1589760K), 0.0627390 secs]
2014-02-05T14:36:41.502+0100: 199.364: [GC 1000586K->884464K(1589760K), 0.1304590 secs]
2014-02-05T14:36:42.753+0100: 200.615: [GC 1117552K->1078064K(1589760K), 0.1406750 secs]
2014-02-05T14:36:42.894+0100: 200.756: [Full GC 1078064K->604578K(1864192K), 2.1971060 secs]
2014-02-05T14:36:46.148+0100: 204.010: [GC 837666K->666754K(1864192K), 0.0348680 secs]
2014-02-05T14:36:46.965+0100: 204.828: [GC 899842K->706109K(1864192K), 0.0825090 secs]
Regards,
Amber
Environment Details:
OS: Linux RedHat
Java: JRE 6 Update 21
I am using following GC setting for my app.
-server -d64 -Xms8192m -Xmx8192m -javaagent:lib/instrum.jar -XX\:MaxPermSize=256m -XX\:+UseParNewGC -X\:+ParallelRefProcEnabled -XX\:+UseConcMarkSweepGC -XX\:MaxGCPauseMillis=250 -XX\:+CMSIncrementalMode -XX\:+CMSIncrementalPacing -XX\:+CMSParallelRemarkEnabled -verbose\:gc -Xloggc\:/tmp/my-gc.log -XX\:DisableExplicitGC -XX\:+PrintGCTimeStamps -XX\:+PrintGCDetails -XX\:+UseCompressedOops
With there setting, there is single Full GC at the begining of application
2.946: [Full GC 2.946: [CMS: 0K->7394K(8111744K), 0.1364080 secs] 38550K->7394K(8360960K), [CMS Perm : 21247K->21216K(21248K)], 0.1365530 secs] [Times: user=0.10 sys=0.04, real=0.14 secs]
Which is followed by a 4-5 successful of CMS collections, But after this there is no trace of CMS in logs, there are entries on only minor collections.
379022.293: [GC 379022.293: [ParNew: 228000K->4959K(249216K), 0.0152000 secs] 7067945K->6845720K(8360960K) icms_dc=0 , 0.0153940 secs]
The heap is growing continuously and it has reached 7GB. We have to restart the application as we can not afford OOM or any kind of breakdown in production system.
I am not able to understand as to why CMS collector has stopped cleaning. Any clues/suggestions are welcome. Thanks in Advance.
======================================================================================
Updated 23rd Jan.
Thanks everyone for the responses till now. I have setup the application in test environment and tested the app with following set of JVM options:
Option #1
-server -d64 -Xms8192m -Xmx8192m -javaagent\:instrum.jar -XX\:MaxPermSize\=256m -XX\:+UseParNewGC -XX\:+UseConcMarkSweepGC -verbose\:gc -Xloggc\:my-gc.log -XX\:+PrintGCTimeStamps -XX\:+PrintGCDetails
Option #2
-server -d64 -Xms8192m -Xmx8192m -javaagent\:instrum.jar -XX\:MaxPermSize\=256m -XX\:+UseParNewGC -XX\:+UseConcMarkSweepGC -verbose\:gc -Xloggc\:my-gc.log -XX\:+DisableExplicitGC -XX\:+PrintGCTimeStamps -XX\:+PrintGCDetails
I ran the test with both settings for 2 days in parallel. These are my observations:
Option #1
The heap memory is stable but there are 90 ConcurrentMarkSweep collections and JVM spent 24 minutes. That’s too high. And I see following lines in GC logs and the pattern continues every one hour...
318995.941: [GC 318995.941: [ParNew: 230230K->8627K(249216K), 0.0107540 secs] 5687617K->5466913K(8360960K), 0.0109030 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
319050.363: [GC 319050.363: [ParNew: 230195K->9076K(249216K), 0.0118420 secs] 5688481K->5468316K(8360960K), 0.0120470 secs] [Times: user=0.12 sys=0.01, real=0.01 secs]
319134.118: [GC 319134.118: [ParNew: 230644K->8503K(249216K), 0.0105910 secs] 5689884K->5468704K(8360960K), 0.0107430 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
319159.250: [Full GC (System) 319159.250: [CMS: 5460200K->5412132K(8111744K), 19.1981050 secs] 5497326K->5412132K(8360960K), [CMS Perm : 72243K->72239K(120136K)], 19.1983210 secs] [Times: user=19.14 sys=0.06, real=19.19 secs]
I don’t see the concurrent mark and sweep logs. Does this mean CMS switched to throughput collector? If so why?
Option #2:
Since I see the Full GC (System) logs, I thought of adding -XX\:+DisableExplicitGC. But with that option the collection is not happening and the current heap size is 7.5G. What I am wondering is why CMS is doing the Full GC instead of concurrent collection.
This is a theory ...
I suspect that those CMS collections were not entirely successful. The event at 12477.056 looks like the CMS might have decided that it is not going to be able to work properly due to the 'pre-clean' step taking too long.
If that caused the CMS to decide to switch off, then I expect it will revert to using the classic "throughput" GC algorithm. And there's a good chance it would wait until the heap is full and then it would run a full GC. In short, if you'd just let it continue it would have been OK (modulo that you'd get big GC pauses every now and then.)
I suggest you run your application on a test server with the same heap size and other GC parameters, and see what happens when the server hits the limit. Does it actually throw an OOME?
CMS is running for you :P
You are using incremental mode on CMS (although really you should not bother as its likely punishing your throughput)
The icms_dc in your posted log line is a give away, the only thing that logs this in the JVM is ... the CMS collector, its saying for that GC run you did a small amount of tenure cleanup interwoven with the application.
This part of your log relates to parallel new (the give away there is the heap size)
379022.293: [GC 379022.293: [ParNew: 228000K->4959K(249216K), 0.0152000 secs]
this part is incremenatal CMS (iCMS)
7067945K->6845720K(8360960K) icms_dc=0 , 0.0153940 secs]
I would ask, why are you using iCMS, do you have a lot of Soft/Weak/Phantom references (or why are you using the ParallelRefProcEnabled flag) and have you actually seen an Out of memory, or insufferable pause.
Try backing down to CompressedOops, ParNewGC and CMS without anything else fancy and see if that works out for you.
I can see that the initial heap size -Xms is :8192m and max heap size is -Xmx8192m, which might be one of the reasons why GC is still waiting to start sweeping.
I would suggest to decrease the heap size and then check if the GC kicks in.
When you set the maximum size, it allocates that amount of virtual memory immediately.
When you set the minimum size, it has already allocated the maximum size. All the minimum size does is to take minimal steps to free up memory until this maximum is reached. This could be reducing the number of full GCs because you told it to use up to 8 GB freely.
You have a lot of options turned on (some of them the default) I suggest you strip back to a minimum set as they can have odd interactions when you turn lots of.
I would start with (assuming you have Solaris)
-mx8g -javaagent:lib/instrum.jar -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -verbose\:gc -Xloggc\:/tmp/my-gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
The options -server is the default on server class machines, -XX:+UseCompressedOops is the default on recent versions of Java and -XX:MaxGCPauseMillis=250 is just a hint.
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
I'm seeing the following symptoms on an application's GC log file with the Concurrent Mark-Sweep collector:
4031.248: [CMS-concurrent-preclean-start]
4031.250: [CMS-concurrent-preclean: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
4031.250: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 4036.346: [CMS-concurrent-abortable-preclean: 0.159/5.096 secs] [Times: user=0.00 sys=0.01, real=5.09 secs]
4036.346: [GC[YG occupancy: 55964 K (118016 K)]4036.347: [Rescan (parallel) , 0.0641200 secs]4036.411: [weak refs processing, 0.0001300 secs]4036.411: [class unloading, 0.0041590 secs]4036.415: [scrub symbol & string tables, 0.0053220 secs] [1 CMS-remark: 16015K(393216K)] 71979K(511232K), 0.0746640 secs] [Times: user=0.08 sys=0.00, real=0.08 secs]
The preclean process keeps aborting continously. I've tried adjusting CMSMaxAbortablePrecleanTime to 15 seconds, from the default of 5, but that has not helped. The current JVM options are as follows...
Djava.awt.headless=true
-Xms512m
-Xmx512m
-Xmn128m
-XX:MaxPermSize=128m
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:BiasedLockingStartupDelay=0
-XX:+DoEscapeAnalysis
-XX:+UseBiasedLocking
-XX:+EliminateLocks
-XX:+CMSParallelRemarkEnabled
-verbose:gc
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC
-Xloggc:gc.log
-XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenPrecleaningEnabled
-XX:CMSInitiatingOccupancyFraction=50
-XX:ReservedCodeCacheSize=64m
-Dnetworkaddress.cache.ttl=30
-Xss128k
It appears the concurrent-abortable-preclean never gets a chance to run. I read through https://blogs.oracle.com/jonthecollector/entry/did_you_know which had a suggestion of enabling CMSScavengeBeforeRemark, but the side effects of pausing did not seem ideal. Could anyone offer up any suggestions?
Also I was wondering if anyone had a good reference for grokking the CMS GC logs, in particular this line:
[1 CMS-remark: 16015K(393216K)] 71979K(511232K), 0.0746640 secs]
Not clear on what memory regions those numbers are referring to.
Edit Found a link to this http://www.sun.com/bigadmin/content/submitted/cms_gc_logs.jsp
[Times: user=0.00 sys=0.01, real=5.09 secs]
I would try investigate why CMS-concurrent-abortable-preclean-start doesn't get neither user nor sys CPU time in 5 seconds.
My suggestion is starting from a 'clean' JVM CMS startup flags like
-Djava.awt.headless=true
-Xms512m
-Xmx512m
-Xmn128m
-Xss128k
-XX:MaxPermSize=128m
-XX:+UseConcMarkSweepGC
-XX:+HeapDumpOnOutOfMemoryError
-Xloggc:gc.log
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC
then check if the problem reproduces and keep tweaking one parameter at a time.
As someone has already mentioned, the first step would be to increase the CMSInitiatingOccupancyFraction.
As a second step, I would use the flag -XX:-PrintTenuringDistribution and make sure that there is no premature promotion from the young generation to the old one. This would lead to old-to-young references which might lead to a longer abortable preclean phase.
If there is such a premature promotion, try to adjust the ratio between the eden and the survior spaces.
There is a good explanation here about this phenomenon:
Quote:
So when the system load is light(which means there will be no
minor gc), precleaning will always time out and full gc will always
fail. cpu is waste.
It won't fail. It'll be less parallel (i.e. less efficient, and would
have a longer pause time, for lesser work).
So all in all: this seems to be normal operation - the thread just waits for a minor GC to happen for 5 seconds, but there is no big issue when this does not happen: the JVM chooses a different (less efficient) strategy to continue with the GC.
For the service I'm using I added:
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80
This configures the JVM to start the marking only after 80% is full and it's worth giving it a try.
I know Java VM has "-XMx" and "-XMs" for setting the size of the heap. It also has a feature called "ergonomics", that can intelligently adjust the size of the heap. But, I have a problem at hand requiring the heap with strictly fixed size.
Here is the command line arguments:
"-Xms2m -Xmx2m -XX:+PrintGCDetails"
However, by observing the GC logs, it seems the size of the heap was not fixed at 2048K. See, e.g. 2368K, 2432K, 2176K, etc:
[GC [PSYoungGen: 480K->72K(704K)] 1740K->1332K(2368K), 0.0032190 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
[GC [PSYoungGen: 560K->64K(768K)] 2094K->1598K(2432K), 0.0033090 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
[GC [PSYoungGen: 544K->32K(768K)] 1675K->1179K(2176K), 0.0009960 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
Is there a way to do the "strict sizing" (no more no less) of a Java heap?
I believe the JVM will manage the heap as you intend, but the problem in your example is that the max heap size is effectively ignored as too low.
On Sun's Windows JVM version 1.6.0_06 I believe the minimum 'max heap size' is approximately 6MB (i.e. -Xmx6m). If you attempt a setting lower then this the heap may actually grow larger. (I had thought the minimum was 16m but a little experimentation shows that values as low as 6m appear to work.)
If you set -Xms8m and -Xmx8m, however, I think you'll find the heap stays at that size.
There are other options along with -Xmx and -Xms that determine the initial heap size. Check the jvm tuning guide for details.
I guess it is just too small. Try something higher, like 16m or 64m. Additionally the internal and the external size are different shoes. The heap will not be full all the time, so a less than Xmx is always possible, even a less than Xms in case the program just has been started. But externally, you will see that Xms amount of memory has been allocated.
When I first read that I thought it said 2 GB ;)
Don't forget Java uses at least 10 - 30 MB of non heap space so the few hundred K you save might not make as much difference as you think.
As far as I know this is not possible using the standard VM. Unfortunately I have no references at hand right now.