I have an application running with -mx7000m. I can see it's allocated 5.5 GB heap. Yet for some reasons it's GCing constantly, and that being CMS it turns out to be quite CPU intensive. So, it has 5.5 GB heap already allocated, but somehow it's spending all the CPU time trying to keep the used heap as small as possible, which is around 2 GB - and the other 3.5G are allocated by JVM but unused.
I have a few servers with the exact same configuration and 2 out of 5 are behaving this way. What could explain it?
It's Java 6, the flags are: -server -XX:+PrintGCDateStamps -Duser.timezone=America/Chicago -Djava.awt.headless=true -Djava.io.tmpdir=/whatever -Xloggc:logs/gc.log -XX:+PrintGCDetails -mx7000m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC
The default threshold for triggering a CMS GC is 70% full (in Java 6). A rule of thumb is that the heap size should be about 2.5x the heap used after a full GC (but your use case is likely to be different)
So in your case, say you have
- 2.5 GB of young generation space
- 3 GB of tenured space.
When you tenured space reached 70% or ~2.1GB, it will start cleaning up the region.
The setting involved is the -XX:CMSInitiatingOccupancyFraction=70
However, if you want to reduce the impact GC, the simplest thing to do is to create less garbage. i.e. use a memory profiler and ensure you allocation rate is as low as possible. Your system will run very different if you are creating as much garbage as the CPUs can handle to say 100 MB/s or 1 MB/s or less.
The reason you might have different servers running differently as the relative sizes of the region might be different, say you have 0.5 young and 5.0 GB tenured, you shouldn't be seeing this. The difference could be purely down to how busy the machine was when you started the process or what it did since then.
Related
It is related to my previous question
I set Xms as 512M, Xmx as 6G for one java process. I have three such processes.
My total ram is 32 GB. Out of that 2G is always occupied.
I executed free command to ensure that minimum 27G is free. But my jobs required only 18 GB max at any time.
It was running fine. Each job occupied around 4 to 5 GB but used around 3 to 4 GB. I understand that Xmx doesn't mean that process should always occupy 6 GB
When another X process started on the same server with another user, it has occupied 14G. Then one of my process got failed.
I understand that I need to increase ram or manage both collision jobs.
Here the question is that how can I force my job to use 6 GB always and why does it throw GC limit reached error in this case?
I used visualvm to monitor them. And jstat also.
Any advises are welcome.
Simple answer: -Xmx is not a hard limit to JVM. It only limits the heap available to Java inside JVM. Lower your -Xmx and you may stabilize process memory on a size that suits you.
Long answer: JVM is a complex machine. Think of this like an OS for your Java code. The Virtual Machine does need extra memory for its own housekeeping (e.g. GC metadata), memory occupied by threads' stack size, "off-heap" memory (e.g. memory allocated by native code through JNI; buffers) etc.
-Xmx only limits the heap size for objects: the memory that's dealt with directly in your Java code. Everything else is not accounted for by this setting.
There's a newer JVM setting -XX:MaxRam (1, 2) that tries to keep the entire process memory within that limit.
From your other question:
It is multi threading. 100 reader, 100 writer threads. Each one has it's own connection to the database.
Keep in mind that the OS' I/O buffers also need memory for their own function.
If you have over 200 threads, you also pay the price: N*(Stack size), and approx. N*(TLAB size) reserved in Young Gen for each thread (dynamically resizable):
java -Xss1024k -XX:+PrintFlagsFinal 2> /dev/null | grep -i tlab
size_t MinTLABSize = 2048
intx ThreadStackSize = 1024
Approximately half a gigabyte just for this (and probably more)!
Thread Stack Size (in Kbytes). (0 means use default stack size)
[Sparc: 512; Solaris x86: 320 (was 256 prior in 5.0 and earlier); Sparc 64 bit: 1024; Linux amd64: 1024 (was 0 in 5.0 and earlier); all others 0.] - Java HotSpot VM Options; Linux x86 JDK source
In short: -Xss (stack size) defaults depend on the VM and OS environment.
Thread Local Allocation Buffers are more intricate and help against allocation contention/resource locking. Explanation of the setting here, for their function: TLAB allocation and TLABs and Heap Parsability.
Further reading: "Native Memory Tracking" and Q: "Java using much more memory than heap size"
why does it throw GC limit reached error in this case.
"GC overhead limit exceeded". In short: each GC cycle reclaimed too little memory and the ergonomics decided to abort. Your process needs more memory.
When another X process started on the same server with another user, it has occupied 14g. Then one of my process got failed.
Another point on running multiple large memory processes back-to-back, consider this:
java -Xms28g -Xmx28g <...>;
# above process finishes
java -Xms28g -Xmx28g <...>; # crashes, cant allocate enough memory
When the first process finishes, your OS needs some time to zero out the memory deallocated by the ending process before it can give these physical memory regions to the second process. This task may need some time and until then you cannot start another "big" process that immediately asks for the full 28GB of heap (observed on WinNT 6.1). This can be worked around with:
Reduce -Xms so the allocation happens later in 2nd processes' life-time
Reduce overall -Xmx heap
Delay the start of the second process
We currently have problems with a java native memory leak. Server is quite big (40cpus, 128GB of memory). Java heap size is 64G and we run a very memory intensive application reading lot of data to strings with about 400 threads and throwing them away from memory after some minutes.
So the heap is filling up very fast but stuff on the heap becomes obsolete and can be GCed very fast, too. So we have to use G1 to not have STW breaks for minutes.
Now, that seems to work fine - heap is big enough to run the application for days, nothing leaking here. Anyway the Java process is growing and growing over time until all the 128G are used and the aplication crashes with an allocation failure.
I've read a lot about native java memory leaks, including the glibc issue with max. arenas (we have wheezy with glibc 2.13, so no fix possible here with setting MALLOC_ARENA_MAX=1 or 4 without a dist upgrade).
So we tried jemalloc what gave us graphs for:
inuse-space:
and
inuse-objects:
.
I don't get it what's the issue here, has someone an idea?
If I set MALLOC_CONF="narenas:1" for jemalloc as environment parameter for the tomcat process running our app, could that still use the glibc malloc version anyway somehow?
This is our G1 setup, maybe some issue here?
-XX:+UseCompressedOops
-XX:+UseNUMA
-XX:NewSize=6000m
-XX:MaxNewSize=6000m
-XX:NewRatio=3
-XX:SurvivorRatio=1
-XX:InitiatingHeapOccupancyPercent=55
-XX:MaxGCPauseMillis=1000
-XX:PermSize=64m
-XX:MaxPermSize=128m
-XX:+PrintCommandLineFlags
-XX:+PrintFlagsFinal
-XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-XX:-UseAdaptiveSizePolicy
-XX:+UseG1GC
-XX:MaxDirectMemorySize=2g
-Xms65536m
-Xmx65536m
Thanks for your help!
We never called System.gc() explicitly, and meanwhile stopped using G1, not specifying anything other than xms and xmx.
Therefore using nearly all the 128G for the heap now. The java process memory usage is high - but constant for weeks. I'm sure this is some G1 or at least general GC issue. The only disadvantage by this "solution" are high GC pauses, but they decreased from up to 90s to about 1-5s with increasing the heap, which is ok for the benchmark we drive with our servers.
Before that, I played around with -XX:ParallelGcThreads options which had significant influence on the memory leak speed when decreasing from 28 (default for 40 cpus) downwards to 1. The memory graphs looked somewhat like a hand fan using different values on different instances...
Extracting a heapdump I realized it has a lot of objects waiting finalization, most of them are instances from libraries like jdbc connections and so on.
Knowing that those instances on the queue are basically classes that implements finalize(), why would they simply not be finalized?
A few days ago I raised the memory of such instance. Initially it had 1GB with new generation set to 256 MB (-Xmx1g -XX:NewSize=256m -XX:MaxNewSize=256m). As we added some heavy caching functionalities we raised the memory assigned to that instance to 3 GB (-Xmx3G -XX:NewSize=512m -XX:MaxNewSize=512m). From that moment we start to see some out of memories. Investigating it a bit I found out a lot of java.lang.ref.Finalizer and objects waiting for finalization.
How could this be related to each other? May it be even related?
why would they simply not be finalized?
Some components take longer to finalize esp anything which involves IO. JDBC connections are relatively heavy weight network resources so they take even longer.
I suggest you use a connection pool (most JDBC libraries have them built in) This way you are not creating/destroying them all the time.
Note: to clarify 1Gb = 1 gig-bit or 128 MB (Mega-bytes) 256 mb is 256 milli-bits or about 1/4 of a bit. -XX:NewSize=512m is 512 MB not 256 MB. and -XX:MaxNewSize=512 wouldn't work as it is just 512 bytes, most likely you used -XX:MaxNewSize=512m
3Gb is 3 giga-bits but assuming you meant 3 GB it is not -Xmx1G which is 1 GB or 8 Gb.
Object.finalize() is called by the garbage collector on final step of the cleanup. The GC runs periodically (depending on which GC you are using, if 7 and 8 it's probably CMS, or G1 if you so configured). Having a lot of objects in 'waiting finalization' may mean that you have a large heap and enough memory that the GC did not need to run (CMS most likely, as G1 runs micro cleanups a lot more frequently).
Add GC tracing to your JVM startup parameters and monitor how often it runs: -XX:+PrintGCDetails -XX:+PrintGCTimeStamps See: http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html
If you are using lots of small objects with a heap >1Gb you may want to consider using the G1 garbage collector as it is better suited for such a task and doesn't have the 'stop the world' behavior of CMS.
I understand that a larger heap means longer GC pauses. I'm okay with that -- my code is doing analysis of some data, and all I care about is minimizing the time spent doing garbage collection, the length of a single pause doesn't make a difference to me.
Can making the heap too large hurt performance? My understanding is that "young" objects get GC'd quickly, but "old" objects can take longer, so my worry is that a large heap will push some short-lived objects into the longer-lived space. I do a lot of allocation of strings that get thrown away quickly (on the order of 60 GB over the course of a single run) and so I don't want to increase GC time spent on those.
I'm testing on a machine with 8 gb of RAM, so I've been running my code with -Xms4g -Xmx4g, and as of my last profiled run, I spent about 20% of my runtime doing garbage collection. I found that increasing the heap to 5 gb helped reduce it. The production server will have 32 gb of RAM, and much higher memory requirements.
Can I safely run it with -Xms31g -Xmx31g, or might that end up hurting performance?
Can making the heap too large hurt performance?
When you go over 31 GB you can lose CompressedOops which can mean you have to jump to 48 GB just to get more usable memory. I try to keep under 31 GB if I can.
My understanding is that "young" objects get GC'd quickly, but "old" objects can take longer, so my worry is that a large heap will push some short-lived objects into the longer-lived space.
For this reason I tend to have large young generations, e.g. up to 24 GB.
Can I safely run it with -Xms31g -Xmx31g, or might that end up hurting performance?
On a 32 GB machine this would be very bad. By the time you include the off heap the JVM uses, the OS, the disk cache, you are likely to find that a heap over 24-28 GB will hurt performance. I would start with 24 GB and see how that goes, you might find you can reduce it will little effect if 5 GB runs ok now.
You might find moving your data off heap will help GC times. I have run systems with 1 GB heap and 800 GB off heap, but it depends on your applications requirements.
I spent about 20% of my runtime doing garbage collection
I suggest you reduce your allocation rate. Using a memory profiler you can reduce your allocation rate to below 300 MB/s, but less than 30 MB/s is better. For an extreme system you might want less than 1 GB/hour as this would allow you to run all day without a minor collection.
We have a fairly big application running on a JBoss 7 application server. In the past, we were using ParallelGC but it was giving us trouble in some servers where the heap was large (5 GB or more) and usually nearly filled up, we would get very long GC pauses frequently.
Recently, we made improvements to our application's memory usage and in a few cases added more RAM to some of the servers where the application runs, but we also started switching to G1 in the hopes of making these pauses less frequent and/or shorter. Things seem to have improved but we are seeing a strange behaviour which did not happen before (with ParallelGC): the Perm Gen seems to fill up pretty quickly and once it reaches the max value a Full GC is triggered, which usually causes a long pause in the application threads (in some cases, over 1 minute).
We have been using 512 MB of max perm size for a few months and during our analysis the perm size would usually stop growing at around 390 MB with ParallelGC. After we switched to G1, however, the behaviour above started happening. I tried increasing the max perm size to 1 GB and even 1,5 GB, but still the Full GCs are happening (they are just less frequent).
In this link you can see some screenshots of the profiling tool we are using (YourKit Java Profiler). Notice how when the Full GC is triggered the Eden and the Old Gen have a lot of free space, but the Perm size is at the maximum. The Perm size and the number of loaded classes decrease drastically after the Full GC, but they start rising again and the cycle is repeated. The code cache is fine, never rises above 38 MB (it's 35 MB in this case).
Here is a segment of the GC log:
2013-11-28T11:15:57.774-0300: 64445.415: [Full GC 2126M->670M(5120M), 23.6325510 secs]
[Eden: 4096.0K(234.0M)->0.0B(256.0M) Survivors: 22.0M->0.0B Heap: 2126.1M(5120.0M)->670.6M(5120.0M)]
[Times: user=10.16 sys=0.59, real=23.64 secs]
You can see the full log here (from the moment we started up the server, up to a few minutes after the full GC).
Here's some environment info:
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Startup options: -Xms5g -Xmx5g -Xss256k -XX:PermSize=1500M -XX:MaxPermSize=1500M -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Xloggc:gc.log
So here are my questions:
Is this the expected behaviour with G1? I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...
Is there something I can improve/corrrect in our startup parameters? The server has 8 GB of RAM, but it doesn't seem we are lacking hardware, performance of the application is fine until a full GC is triggered, that's when users experience big lags and start complaining.
Causes of growing Perm Gen
Lots of classes, especially JSPs.
Lots of static variables.
There is a classloader leak.
For those that don't know, here is a simple way to think about how the PremGen fills up. The Young Gen doesn't get enough time to let things expire and so they get moved up to Old Gen space. The Perm Gen holds the classes for the objects in the Young and Old Gen. When the objects in the Young or Old Gen get collected and the class is no longer being referenced then it gets 'unloaded' from the Perm Gen. If the Young and Old Gen don't get GC'd then neither does the Perm Gen and once it fills up it needs a Full stop-the-world GC. For more info see Presenting the Permanent Generation.
Switching to CMS
I know you are using G1 but if you do switch to the Concurrent Mark Sweep (CMS) low pause collector -XX:+UseConcMarkSweepGC, try enabling class unloading and permanent generation collections by adding -XX:+CMSClassUnloadingEnabled.
The Hidden Gotcha'
If you are using JBoss, RMI/DGC has the gcInterval set to 1 min. The RMI subsystem forces a full garbage collection once per minute. This in turn forces promotion instead of letting it get collected in the Young Generation.
You should change this to at least 1 hr if not 24 hrs, in order for the the GC to do proper collections.
-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000
List of every JVM option
To see all the options, run this from the cmd line.
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version
If you want to see what JBoss is using then you need to add the following to your standalone.xml. You will get a list of every JVM option and what it is set to. NOTE: it must be in the JVM that you want to look at to use it. If you run it external you won't see what is happening in the JVM that JBoss is running on.
set "JAVA_OPTS= -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal %JAVA_OPTS%"
There is a shortcut to use when we are only interested in the modified flags.
-XX:+PrintcommandLineFlags
Diagnostics
Use jmap to determine what classes are consuming permanent generation space. Output will show
class loader
# of classes
bytes
parent loader
alive/dead
type
totals
jmap -permstat JBOSS_PID >& permstat.out
JVM Options
These settings worked for me but depending how your system is set up and what your application is doing will determine if they are right for you.
-XX:SurvivorRatio=8 – Sets survivor space ratio to 1:8, resulting in larger survivor spaces (the smaller the ratio, the larger the space). The SurvivorRatio is the size of the Eden space compared to one survivor space. Larger survivor spaces allow short lived objects a longer time period to die in the young generation.
-XX:TargetSurvivorRatio=90 – Allows 90% of the survivor spaces to be occupied instead of the default 50%, allowing better utilization of the survivor space memory.
-XX:MaxTenuringThreshold=31 – To prevent premature promotion from the young to the old generation . Allows short lived objects a longer time period to die in the young generation (and hence, avoid promotion). A consequence of this setting is that minor GC times can increase due to additional objects to copy. This value and survivor space sizes may need to be adjusted so as to balance overheads of copying between survivor spaces versus tenuring objects that are going to live for a long time. The default settings for CMS are SurvivorRatio=1024 and MaxTenuringThreshold=0 which cause all survivors of a scavenge to be promoted. This can place a lot of pressure on the single concurrent thread collecting the tenured generation. Note: when used with -XX:+UseBiasedLocking, this setting should be 15.
-XX:NewSize=768m – allow specification of the initial young generation sizes
-XX:MaxNewSize=768m – allow specification of the maximum young generation sizes
Here is a more extensive JVM options list.
Is this the expected behaviour with G1?
I don't find it surprising. The base assumption is that stuff put into permgen almost never becomes garbage. So you'd expect that permgen GC would be a "last resort"; i.e. something the JVM would only do if its was forced into a full GC. (OK, this argument is nowhere near a proof ... but its consistent with the following.)
I've seen lots of evidence that other collectors have the same behaviour; e.g.
permgen garbage collection takes multiple Full GC
What is going on with java GC? PermGen space is filling up?
I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...
I think I found the same post. But someone's opinion that it ought to be possible is not really instructive.
Is there something I can improve/corrrect in our startup parameters?
I doubt it. My understanding is that this is inherent in the permgen GC strategy.
I suggest that you either track down and fix what is using so much permgen in the first place ... or switch to Java 8 in which there isn't a permgen heap anymore: see PermGen elimination in JDK 8
While a permgen leak is one possible explanation, there are others; e.g.
overuse of String.intern(),
application code that is doing a lot of dynamic class generation; e.g. using DynamicProxy,
a huge codebase ... though that wouldn't cause permgen churn as you seem to be observing.
I would first try to find the root cause for the PermGen getting larger before randomly trying JVM options.
You could enable classloading logging (-verbose:class, -XX:+TraceClassLoading -XX:+TraceClassUnloading, ...) and chek out the output
In your test environment, you could try monitoring (over JMX) when classes get loaded (java.lang:type=ClassLoading LoadedClassCount). This might help you find out which part of your application is responsible.
You could also try listing all the classes using the JVM tools (sorry but I still mostly use jrockit and there you would do it with jrcmd. Hope Oracle have migrated those helpful features to Hotspot...)
In summary, find out what generates so many classes and then think how to reduce that / tune the gc.
Cheers,
Dimo
I agree with the answer above in that you should really try to find what is actually filling your permgen, and I'd heavily suspect it's about some classloader leak that you want to find a root cause for.
There's this thread in the JBoss forums that goes through couple of such diagnozed cases and how they were fixed. this answer and this article discusses the issue in general as well. In that article there's a mention of possibly the easiest test you can do:
Symptom
This will happen only if you redeploy your application without
restarting the application server. The JBoss 4.0.x series suffered
from just such a classloader leak. As a result I could not redeploy
our application more than twice before the JVM would run out of
PermGen memory and crash.
Solution
To identify such a leak, un-deploy your application and then trigger a
full heap dump (make sure to trigger a GC before that). Then check if
you can find any of your application objects in the dump. If so,
follow their references to their root, and you will find the cause of
your classloader leak. In the case of JBoss 4.0 the only solution was
to restart for every redeploy.
This is what I'd try first, IF you think that redeployment might be related. This blog post is an earlier one, doing the same thing but discussing the details as well. Based on the posting it might be though that you're not actually redeploying anything, but permgen is just filling up by itself. In that case, examination of classes + anything else added to permgen might be the way (as has been already mentioned in previous answer).
If that doesn't give more insight, my next step would be trying out plumbr tool. They have a sort of guarantee on finding the leak for you, as well.
You should be starting your server.bat with java command with -verbose:gc