I'm seeing the following symptoms on an application's GC log file with the Concurrent Mark-Sweep collector:
4031.248: [CMS-concurrent-preclean-start]
4031.250: [CMS-concurrent-preclean: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
4031.250: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 4036.346: [CMS-concurrent-abortable-preclean: 0.159/5.096 secs] [Times: user=0.00 sys=0.01, real=5.09 secs]
4036.346: [GC[YG occupancy: 55964 K (118016 K)]4036.347: [Rescan (parallel) , 0.0641200 secs]4036.411: [weak refs processing, 0.0001300 secs]4036.411: [class unloading, 0.0041590 secs]4036.415: [scrub symbol & string tables, 0.0053220 secs] [1 CMS-remark: 16015K(393216K)] 71979K(511232K), 0.0746640 secs] [Times: user=0.08 sys=0.00, real=0.08 secs]
The preclean process keeps aborting continously. I've tried adjusting CMSMaxAbortablePrecleanTime to 15 seconds, from the default of 5, but that has not helped. The current JVM options are as follows...
Djava.awt.headless=true
-Xms512m
-Xmx512m
-Xmn128m
-XX:MaxPermSize=128m
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:BiasedLockingStartupDelay=0
-XX:+DoEscapeAnalysis
-XX:+UseBiasedLocking
-XX:+EliminateLocks
-XX:+CMSParallelRemarkEnabled
-verbose:gc
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC
-Xloggc:gc.log
-XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenPrecleaningEnabled
-XX:CMSInitiatingOccupancyFraction=50
-XX:ReservedCodeCacheSize=64m
-Dnetworkaddress.cache.ttl=30
-Xss128k
It appears the concurrent-abortable-preclean never gets a chance to run. I read through https://blogs.oracle.com/jonthecollector/entry/did_you_know which had a suggestion of enabling CMSScavengeBeforeRemark, but the side effects of pausing did not seem ideal. Could anyone offer up any suggestions?
Also I was wondering if anyone had a good reference for grokking the CMS GC logs, in particular this line:
[1 CMS-remark: 16015K(393216K)] 71979K(511232K), 0.0746640 secs]
Not clear on what memory regions those numbers are referring to.
Edit Found a link to this http://www.sun.com/bigadmin/content/submitted/cms_gc_logs.jsp
[Times: user=0.00 sys=0.01, real=5.09 secs]
I would try investigate why CMS-concurrent-abortable-preclean-start doesn't get neither user nor sys CPU time in 5 seconds.
My suggestion is starting from a 'clean' JVM CMS startup flags like
-Djava.awt.headless=true
-Xms512m
-Xmx512m
-Xmn128m
-Xss128k
-XX:MaxPermSize=128m
-XX:+UseConcMarkSweepGC
-XX:+HeapDumpOnOutOfMemoryError
-Xloggc:gc.log
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC
then check if the problem reproduces and keep tweaking one parameter at a time.
As someone has already mentioned, the first step would be to increase the CMSInitiatingOccupancyFraction.
As a second step, I would use the flag -XX:-PrintTenuringDistribution and make sure that there is no premature promotion from the young generation to the old one. This would lead to old-to-young references which might lead to a longer abortable preclean phase.
If there is such a premature promotion, try to adjust the ratio between the eden and the survior spaces.
There is a good explanation here about this phenomenon:
Quote:
So when the system load is light(which means there will be no
minor gc), precleaning will always time out and full gc will always
fail. cpu is waste.
It won't fail. It'll be less parallel (i.e. less efficient, and would
have a longer pause time, for lesser work).
So all in all: this seems to be normal operation - the thread just waits for a minor GC to happen for 5 seconds, but there is no big issue when this does not happen: the JVM chooses a different (less efficient) strategy to continue with the GC.
For the service I'm using I added:
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80
This configures the JVM to start the marking only after 80% is full and it's worth giving it a try.
Related
The application is using -XX:+UseParallelGC.
There was a GC and real time took significant longer time than the user + sys (~1.05 secs):
[Times: user=0.04 sys=0.01, real=1.05 secs]
The PSYoungGen for the previous GC were cleaning same amount of memory and much less time. I don't have any flag enable to check if jit caused this. There was load on the machine this was running but was caused from the same jvm process that did the GC.
What reasons could cause something like this and what can I do to in the future to figure it out what's happening (if it happens again)?
[GC
Desired survivor size 720864 bytes, new threshold 1 (max 15)
[PSYoungGen: 18864K->454K(21121K)] 38016K->17722K(64833K), 0.0224350 secs] [Times: user=0.04 sys=0.01, real=1.05 secs]
To be able to diagnose this better in the future run with -XX:+PrintSafepointStatistics –XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCDetails -XX:+PrintGCCause for more useful logs. Which JVM version you're using may also be relevant.
Beyond that you'll have to monitor system stats, mostly paging and CPU utilization to see whether other processes or disk IO starve the JVM.
I am running two tomcat servers in a single machine(Ram size 16GB), with the memory setting given below.
JAVA_MEM_OPTIONS="-Xmx4096M -Xms4096M -XX:NewSize=512M -XX:MaxNewSize=512M -XX:PermSize=256M -XX:MaxPermSize=256M -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:SoftRefLRUPolicyMSPerMB=5 -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC"
Top result shows 46.5%(7.44 GB) & 39.2%(6.24 GB) of memory are used by java process.
Tasks: 120 total, 1 running, 119 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.8%us, 0.4%sy, 0.0%ni, 90.9%id, 0.2%wa, 0.0%hi, 0.7%si, 0.0%st
Mem: 16424048k total, 16326072k used, 97976k free, 28868k buffers
Swap: 1959920k total, 1957932k used, 1988k free, 1082276k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26275 user1 25 0 8007m 7.3g 7832 S 13.6 46.5 785:31.89 java
28050 user2 25 0 7731m 6.1g 13m S 9.0 39.2 817:10.47 java
GC log shows each java process is using only 4GB of memory. Once the system memory usage goes up its never coming down.
user1 - java process
89224.492: [GC 89224.492: [ParNew: 443819K->17796K(471872K), 0.0367070 secs] 2032217K->1608465K(4141888K), 0.0368920 secs] [Times: user=0.13 sys=0.00, real=0.03 secs]
89228.247: [GC 89228.247: [ParNew: 437252K->22219K(471872K), 0.0607080 secs] 2027921K->1615327K(4141888K), 0.0609240 secs] [Times: user=0.15 sys=0.00, real=0.06 secs]
user2 - java process
89202.170: [GC 89202.170: [ParNew: 444989K->22909K(471872K), 0.0510290 secs] 2361057K->1945258K(4141888K), 0.0512370 secs] [Times: user=0.19 sys=0.00, real=0.05 secs]
89207.894: [GC 89207.894: [ParNew: 442365K->15912K(471872K), 0.0422190 secs] 2364714K->1945162K(4141888K), 0.0424260 secs] [Times: user=0.15 sys=0.00, real=0.04 secs]
How java process can use this much memory when heap memory is set to 4GB. How to debug the cause for the problem.
PS : At times i am executing shell script from java code. Will it leads to this sort of problem ?
There are several cases:
Having a large number of threads. Thread stack space is allocated outside of heap/perm space. To set a memory allocated per thread use -Xss option.
Using JNI. Take a look at this question Java app calls C++ DLL via JNI; how best to allocate memory?
Probably there are more, in practice, first case most likely is the reason.
I have a method which hits DB and fetches lot of records into memory for processing. After I fetch the records and before I start processing, I get the following log message. What does it mean ?
164575.034: [GC (Allocation Failure) 4937664K->3619624K(5602816K), 0.0338580 secs]
Options:
java.opts=-d64 -Xmx8g -XX:+PrintGCTimeStamps -verbose:gc -XX:MaxPermSize=512m -XX:+UseParallelGC -XX:+UseParallelOldGC
It just basically tells you that it had to run GC to allocate additional memory, it does not fit in memory otherwise. So it is just a reason for GC.
Environment Details:
OS: Linux RedHat
Java: JRE 6 Update 21
I am using following GC setting for my app.
-server -d64 -Xms8192m -Xmx8192m -javaagent:lib/instrum.jar -XX\:MaxPermSize=256m -XX\:+UseParNewGC -X\:+ParallelRefProcEnabled -XX\:+UseConcMarkSweepGC -XX\:MaxGCPauseMillis=250 -XX\:+CMSIncrementalMode -XX\:+CMSIncrementalPacing -XX\:+CMSParallelRemarkEnabled -verbose\:gc -Xloggc\:/tmp/my-gc.log -XX\:DisableExplicitGC -XX\:+PrintGCTimeStamps -XX\:+PrintGCDetails -XX\:+UseCompressedOops
With there setting, there is single Full GC at the begining of application
2.946: [Full GC 2.946: [CMS: 0K->7394K(8111744K), 0.1364080 secs] 38550K->7394K(8360960K), [CMS Perm : 21247K->21216K(21248K)], 0.1365530 secs] [Times: user=0.10 sys=0.04, real=0.14 secs]
Which is followed by a 4-5 successful of CMS collections, But after this there is no trace of CMS in logs, there are entries on only minor collections.
379022.293: [GC 379022.293: [ParNew: 228000K->4959K(249216K), 0.0152000 secs] 7067945K->6845720K(8360960K) icms_dc=0 , 0.0153940 secs]
The heap is growing continuously and it has reached 7GB. We have to restart the application as we can not afford OOM or any kind of breakdown in production system.
I am not able to understand as to why CMS collector has stopped cleaning. Any clues/suggestions are welcome. Thanks in Advance.
======================================================================================
Updated 23rd Jan.
Thanks everyone for the responses till now. I have setup the application in test environment and tested the app with following set of JVM options:
Option #1
-server -d64 -Xms8192m -Xmx8192m -javaagent\:instrum.jar -XX\:MaxPermSize\=256m -XX\:+UseParNewGC -XX\:+UseConcMarkSweepGC -verbose\:gc -Xloggc\:my-gc.log -XX\:+PrintGCTimeStamps -XX\:+PrintGCDetails
Option #2
-server -d64 -Xms8192m -Xmx8192m -javaagent\:instrum.jar -XX\:MaxPermSize\=256m -XX\:+UseParNewGC -XX\:+UseConcMarkSweepGC -verbose\:gc -Xloggc\:my-gc.log -XX\:+DisableExplicitGC -XX\:+PrintGCTimeStamps -XX\:+PrintGCDetails
I ran the test with both settings for 2 days in parallel. These are my observations:
Option #1
The heap memory is stable but there are 90 ConcurrentMarkSweep collections and JVM spent 24 minutes. That’s too high. And I see following lines in GC logs and the pattern continues every one hour...
318995.941: [GC 318995.941: [ParNew: 230230K->8627K(249216K), 0.0107540 secs] 5687617K->5466913K(8360960K), 0.0109030 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
319050.363: [GC 319050.363: [ParNew: 230195K->9076K(249216K), 0.0118420 secs] 5688481K->5468316K(8360960K), 0.0120470 secs] [Times: user=0.12 sys=0.01, real=0.01 secs]
319134.118: [GC 319134.118: [ParNew: 230644K->8503K(249216K), 0.0105910 secs] 5689884K->5468704K(8360960K), 0.0107430 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
319159.250: [Full GC (System) 319159.250: [CMS: 5460200K->5412132K(8111744K), 19.1981050 secs] 5497326K->5412132K(8360960K), [CMS Perm : 72243K->72239K(120136K)], 19.1983210 secs] [Times: user=19.14 sys=0.06, real=19.19 secs]
I don’t see the concurrent mark and sweep logs. Does this mean CMS switched to throughput collector? If so why?
Option #2:
Since I see the Full GC (System) logs, I thought of adding -XX\:+DisableExplicitGC. But with that option the collection is not happening and the current heap size is 7.5G. What I am wondering is why CMS is doing the Full GC instead of concurrent collection.
This is a theory ...
I suspect that those CMS collections were not entirely successful. The event at 12477.056 looks like the CMS might have decided that it is not going to be able to work properly due to the 'pre-clean' step taking too long.
If that caused the CMS to decide to switch off, then I expect it will revert to using the classic "throughput" GC algorithm. And there's a good chance it would wait until the heap is full and then it would run a full GC. In short, if you'd just let it continue it would have been OK (modulo that you'd get big GC pauses every now and then.)
I suggest you run your application on a test server with the same heap size and other GC parameters, and see what happens when the server hits the limit. Does it actually throw an OOME?
CMS is running for you :P
You are using incremental mode on CMS (although really you should not bother as its likely punishing your throughput)
The icms_dc in your posted log line is a give away, the only thing that logs this in the JVM is ... the CMS collector, its saying for that GC run you did a small amount of tenure cleanup interwoven with the application.
This part of your log relates to parallel new (the give away there is the heap size)
379022.293: [GC 379022.293: [ParNew: 228000K->4959K(249216K), 0.0152000 secs]
this part is incremenatal CMS (iCMS)
7067945K->6845720K(8360960K) icms_dc=0 , 0.0153940 secs]
I would ask, why are you using iCMS, do you have a lot of Soft/Weak/Phantom references (or why are you using the ParallelRefProcEnabled flag) and have you actually seen an Out of memory, or insufferable pause.
Try backing down to CompressedOops, ParNewGC and CMS without anything else fancy and see if that works out for you.
I can see that the initial heap size -Xms is :8192m and max heap size is -Xmx8192m, which might be one of the reasons why GC is still waiting to start sweeping.
I would suggest to decrease the heap size and then check if the GC kicks in.
When you set the maximum size, it allocates that amount of virtual memory immediately.
When you set the minimum size, it has already allocated the maximum size. All the minimum size does is to take minimal steps to free up memory until this maximum is reached. This could be reducing the number of full GCs because you told it to use up to 8 GB freely.
You have a lot of options turned on (some of them the default) I suggest you strip back to a minimum set as they can have odd interactions when you turn lots of.
I would start with (assuming you have Solaris)
-mx8g -javaagent:lib/instrum.jar -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -verbose\:gc -Xloggc\:/tmp/my-gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
The options -server is the default on server class machines, -XX:+UseCompressedOops is the default on recent versions of Java and -XX:MaxGCPauseMillis=250 is just a hint.
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
I know Java VM has "-XMx" and "-XMs" for setting the size of the heap. It also has a feature called "ergonomics", that can intelligently adjust the size of the heap. But, I have a problem at hand requiring the heap with strictly fixed size.
Here is the command line arguments:
"-Xms2m -Xmx2m -XX:+PrintGCDetails"
However, by observing the GC logs, it seems the size of the heap was not fixed at 2048K. See, e.g. 2368K, 2432K, 2176K, etc:
[GC [PSYoungGen: 480K->72K(704K)] 1740K->1332K(2368K), 0.0032190 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
[GC [PSYoungGen: 560K->64K(768K)] 2094K->1598K(2432K), 0.0033090 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
[GC [PSYoungGen: 544K->32K(768K)] 1675K->1179K(2176K), 0.0009960 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
Is there a way to do the "strict sizing" (no more no less) of a Java heap?
I believe the JVM will manage the heap as you intend, but the problem in your example is that the max heap size is effectively ignored as too low.
On Sun's Windows JVM version 1.6.0_06 I believe the minimum 'max heap size' is approximately 6MB (i.e. -Xmx6m). If you attempt a setting lower then this the heap may actually grow larger. (I had thought the minimum was 16m but a little experimentation shows that values as low as 6m appear to work.)
If you set -Xms8m and -Xmx8m, however, I think you'll find the heap stays at that size.
There are other options along with -Xmx and -Xms that determine the initial heap size. Check the jvm tuning guide for details.
I guess it is just too small. Try something higher, like 16m or 64m. Additionally the internal and the external size are different shoes. The heap will not be full all the time, so a less than Xmx is always possible, even a less than Xms in case the program just has been started. But externally, you will see that Xms amount of memory has been allocated.
When I first read that I thought it said 2 GB ;)
Don't forget Java uses at least 10 - 30 MB of non heap space so the few hundred K you save might not make as much difference as you think.
As far as I know this is not possible using the standard VM. Unfortunately I have no references at hand right now.