We have been testing out G1 garbage collector recently with the following configuration:
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseG1GC -XX:MaxGCPauseMillis=1250 -XX:+PrintTenuringDistribution -Xloggc:${logdir}/gc-$(date +%Y_%m_%d-%H_%M).log -XX:+UseStringDeduplication -XX:+PrintStringDeduplicationStatistics -XX:+PrintPromotionFailure -XX:+PrintAdaptiveSizePolicy -XX:+PrintHeapAtGC -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=15 -XX:ParallelGCThreads=8 -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8M
JAVA_OPTS_HEAP: -Xms16g -Xmx16g
We had recently come across an issue recently where two java processes were running with the above configuration on a box with 48 GB RAM and both the processes went on to consume around 20 - 22 GB of RAM each (few small processes consuming the remaining memory), thus filling the entire RAM and then it triggered disk swaps which finally led to the OOM and process getting killed.
This seems worrying because neither the NMT reports this memory usage in a meaningful way nor do we get any clues for this usage from the GC logs. In the NMT stats, the application memory was under 16G and the metaspace usage was under 1G.
We had tried setting maxMetaSpaceSize to 2G but that didn't help either. The RAM usage seems to grow unbounded when the process is running for days.
From the other questions it does seem that G1 garbage collector does tend to consume more memory but the disk swaps is a worrying issue. Could someone please provide some pointers on how this issue could be resolved?
As to long for a comment I put it in as an answer.
A good reading which explains why a java process might consume more memory than -Xmx. Based ony our provided information I believe this would be also the reason in your case.
For G1 there is an OBE Getting Started with the G1 Garbage Collector with details about the function of the G1GC. Have a look there for Recommended Use Cases for G1. Maybe you would not benefit from using G1.
cited from from the OBE (Oracle By Example)
If you are using CMS or ParallelOldGC and your application is not experiencing long garbage collection pauses, it is fine to stay with your current collector.
Here you can find testing results of G1, Parallel, ConcMarkSweep, Serial and Shenandoah garbage collectors in terms of scaling and resource consumption, as well as some suggestions on what settings can be applied to improve results. So you can choose the most appropriate one for your project and reduce memory usage.
Related
I built a simple application using Springboot.The ZGC garbage collector I use when deploying to a Linux server USES a lot of memory..I tried to limit the maximum heap memory to 500MB with Xmx500m, but the JAVA program still used more than 1GB. When I used the G1 collector, it only used 350MB.I don't know why, is this a BUG of JDK11?Or do I have a problem with my boot parameters?
####Runtime environment
operating system: CentOS Linux release 7.8.2003
JDK version: jdk11
springboot version: v2.3.0.RELEASE
Here is my Java startup command
java -Xms128m -Xmx500m \
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC \
-jar app.jar
Here is a screenshot of the memory usage at run time
Heap memory usage
https://github.com/JoyfulAndSpeedyMan/assets/blob/master/2020-07-13%20201259.png?raw=true
System memory usage
https://github.com/JoyfulAndSpeedyMan/assets/blob/master/2020-07-13%20201357.png?raw=true
Here's what happens when you use the default garbage collector
Java startup command
java -Xms128m -Xmx500m \
-jar app.jar
Heap memory usage
https://github.com/JoyfulAndSpeedyMan/assets/blob/master/2020-07-13%20202442.png?raw=true
System memory usage
https://github.com/JoyfulAndSpeedyMan/assets/blob/master/2020-07-13%20202421.png?raw=true
By default jdk11 USES the G1 garbage collector. Theoretically, shouldn't G1 be more memory intensive than ZGC?Why didn't I use it that way?Did I misunderstand?Since I'm a beginner to the JVM, I don't understand why.
ZGC employs a technique known as colored pointers. The idea is to use some free bits in 64-bit pointers into the heap for embedded metadata. However, when dereferencing such pointers, these bits need to be masked, which implies some extra work for the JVM.
To avoid the overhead of masking pointers, ZGC involves multi-mapping technique. Multi-mapping is when multiple ranges of virtual memory are mapped to the same range of physical memory.
ZGC uses 3 views of Java heap ("marked0", "marked1", "remapped"), i.e. 3 different "colors" of heap pointers and 3 virtual memory mappings for the same heap.
As a consequence, the operating system may report 3x larger memory usage. For example, for a 512 MB heap, the reported committed memory may be as large as 1.5 GB, not counting memory besides the heap. Note: multi-mapping affects the reported used memory, but physically the heap will still use 512 MB in RAM. This sometimes leads to a funny effect that RSS of the process looks larger than the amount of physical RAM.
See also:
ZGC: A Scalable Low-Latency Garbage Collector by Per Lidén
Understanding Low Latency JVM GCs by Jean Philippe Bempel
JVM uses much more than just the heap memory - read this excellent answer to understand JVM memory consumption better: Java using much more memory than heap size (or size correctly Docker memory limit)
You'll need to go beyond the heap inspection and use things like Native Memory Tracking to get a clearer picture.
I don't know what's the particular issue with your application, but ZGC is often mentioned as to be good for large heaps.
It's also a brand new collector and got many changes recently - I'd upgrade to JDK 14 if you want to use it (see "Change Log" here: https://wiki.openjdk.java.net/display/zgc/Main)
This is a result of the throughput-latency-footprint tradeoff. When choosing between these 3 things, you can only pick 2.
ZGC is a concurrent GC with low pause times. Since you don't want to give up throughput, you trade latency and throughput for footprint. So, there is nothing surprising in such high memory consumption.
G1 is not a low-pause collector, so you shift that tradeoff towards footprint and get bigger pause times but win some memory.
The amount of OS memory the JVM uses (ie, "committed heap") depends on how often the GC runs (and also whether it uncommits unneeded memory if the app starts to use less), which is a tunable option. Unfortunately ZGC isn't (currently) as aggressive about this by default as G1, but both have some tuning options that you can try.
P.S. As others have noted, the RES htop column is misleading, but the VisualVM chart shows the real picture.
We currently have problems with a java native memory leak. Server is quite big (40cpus, 128GB of memory). Java heap size is 64G and we run a very memory intensive application reading lot of data to strings with about 400 threads and throwing them away from memory after some minutes.
So the heap is filling up very fast but stuff on the heap becomes obsolete and can be GCed very fast, too. So we have to use G1 to not have STW breaks for minutes.
Now, that seems to work fine - heap is big enough to run the application for days, nothing leaking here. Anyway the Java process is growing and growing over time until all the 128G are used and the aplication crashes with an allocation failure.
I've read a lot about native java memory leaks, including the glibc issue with max. arenas (we have wheezy with glibc 2.13, so no fix possible here with setting MALLOC_ARENA_MAX=1 or 4 without a dist upgrade).
So we tried jemalloc what gave us graphs for:
inuse-space:
and
inuse-objects:
.
I don't get it what's the issue here, has someone an idea?
If I set MALLOC_CONF="narenas:1" for jemalloc as environment parameter for the tomcat process running our app, could that still use the glibc malloc version anyway somehow?
This is our G1 setup, maybe some issue here?
-XX:+UseCompressedOops
-XX:+UseNUMA
-XX:NewSize=6000m
-XX:MaxNewSize=6000m
-XX:NewRatio=3
-XX:SurvivorRatio=1
-XX:InitiatingHeapOccupancyPercent=55
-XX:MaxGCPauseMillis=1000
-XX:PermSize=64m
-XX:MaxPermSize=128m
-XX:+PrintCommandLineFlags
-XX:+PrintFlagsFinal
-XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-XX:-UseAdaptiveSizePolicy
-XX:+UseG1GC
-XX:MaxDirectMemorySize=2g
-Xms65536m
-Xmx65536m
Thanks for your help!
We never called System.gc() explicitly, and meanwhile stopped using G1, not specifying anything other than xms and xmx.
Therefore using nearly all the 128G for the heap now. The java process memory usage is high - but constant for weeks. I'm sure this is some G1 or at least general GC issue. The only disadvantage by this "solution" are high GC pauses, but they decreased from up to 90s to about 1-5s with increasing the heap, which is ok for the benchmark we drive with our servers.
Before that, I played around with -XX:ParallelGcThreads options which had significant influence on the memory leak speed when decreasing from 28 (default for 40 cpus) downwards to 1. The memory graphs looked somewhat like a hand fan using different values on different instances...
I Am experiencing a regular high minor GC pause times(~ 9seconds).
The application is a server written in Java, executing 3 transactions/seconds.
Eventhough there's no I/O excessive activity
Heap parameters are:
-Xms1G
-Xmx14G
-XX:+UseConcMarkSweepGC
-XX:+DisableExplicitGC
-XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
What are the possible reasons for such minor gc pause times values?
As the other answer says, without GC log snippets it's not possible to answer this definitively. Some things to think about:
Assuming no impact from the underlying OS (scheduling, CPU thrashing), the time taken for a minor collection will be proportional to the amount of live data in the young gen. when the collector runs (every live object in the young gen. gets copied during a minor GC). Looking at the graph of your old gen. you are seeing consistent growth which would indicate you are promoting significant amounts of data during minor GC. You're either creating a lot of long-lived objects or you're maintaining references unnecessarily.
To reduce the pauses you could try reducing the size of the Eden space (so there is less potential data to copy on each minor GC) and also reduce the tenuring threshold so that objects get moved out of the survivor spaces more quickly. The downside of this will be your minor GCs will happen more frequently so you will probably see a degradation in throughput.
I would also change the -Xms value. You clearly need more than 1Gb in your heap so it would be best to set it to 14Gb to avoid the heap having to be resized by the JVM as the amount of data increases.
For questions in category "Why my GC pause that long?" you should always provide some snippets for GC logs.
As a pure speculation, here are few reasons why minor GC may be abnormally slow:
JVM were suspended from execution by OS (e.i. CPU starvation, swapping, virtual server freeze)
There are some issue with putty JVM at safepoint (unlikely looking at your pause pattern though)
Object survival spikes
Reference object processing overhead (you need add -XX:+PrintReferebceGC to get reference processing info into GC log)
Try using
-XX:+UseG1GC -XX:MaxGCPauseMillis=1000
This will try to keep max GC pause below 1s.
You need to assign enough memory using -Xmx and set the MaxGCPauseMillis as per need.
IMHO:
There is no sufficient evidence that there is indeed minor GC "pause time". What is shown is Garbage Collection Time, and GC time != GC Pause time. Garbage collection activity time and garbage collection pause time are two different beasts - or in technical words, I should say that these are JVM ergonomics/performance goals.
Minor GC pause time is commonly negilible and it is the major GC pause time that affects the application responsiveness.
There is something called as JVM/Ergonomics goals, and "Throughput Goal" is one of the three goals, so I think in this case at max what can be said is that throughput goal is not going very good for young generation as lot of time is spent on GC.
I have java web (tomcat) application that consumes 10G of heap memory (represent data structure in memory).
But it works when I will set e.g. Xmx to 20G.
When I set heap memory e.g. to 11G my application can not load all data - it enters in continuous full gc. Suppose that garbage collector also need memory to make full gc efficient.
Does it correct? How much memory I should add for efficent work of gc?
Thanks.
To find out optimal OldGen size:
Enable GC logging -XX:+PrintTenuringDistribution -XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:LogFile=jvm.log -XX:+HeapDumpOnOutOfMemoryError -Xloggc:gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -showversion
Calculate Live Data Size = occupancy of the old OldGen after a FullGC (see Advanced JVM Tuning for more details). Btw. try to get avg. by observing the OldGen occupancy after several FullGC cycles.
optimal OldGenSize == 2x to 3x of the Live Data Size.
Its better if you set permanent generation allocation as you have lots of data to load. Actually this permanent generation is used to hold reflective data of the VM itself such as class objects and method objects. These reflective objects are allocated directly into the permanent generation, and it is sized independently from the other generations.
-Xms10G -Xmx20G -XX:PermSize=5G -XX:MaxPermSize=12G
Recent JVM's have a lot of XX parameters for garbage collection (see here for example), but what are the options which can make a client side Swing application really perform better?
I should note that one of the things that really annoys me on client side java applications is the large delay in stop-the-world garbage collection. In Intelli-J IDEA I have seen it go three minutes or more.
EDIT: Thanks for all the responses. Just to report back I put on the CMS garbage collector for IDEA (which is a good common reference of the type of application that most everyone reading this question is familiar with) using the setting's suggested from here. I also set -XX:+StringCache to see if it would reduce memory requirements.
In general, the observation is that regular running performance is not degraded to the point where you can notice looking at it. The memory reduction is huge using the String Cache option, however the CMS method is not thorough and ends up requiring a stop the world garbage collection cycle (back to the three minute wait) to clear out the memory (400MB in one run).
However, given the reduced memory footprint, I might be able to just put a smaller maximum amount of memory which will keep the stop the world collections smaller in sizes.
IDEA 8.1.4 comes with JDK 1.6.0_12, so I didn't test G1 yet. Also, my machine only has 2 cores, so a G1 approach won't really be maximized. Time to hit the boss up for a better machine ;).
There is no single answer to this question, it is highly depending on what your application is doing and how it manages its objects. Maybe have a look at How does garbage collection work and Parallel and concurrent garbage collectors to understand the various options.
Then, check the Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning document that expands on GC tuning concepts and techniques for Java SE 6 that were introduced in the Tuning Garbage Collection with the 5.0 Java Virtual Machine document.
If you want to keep garbage collection pauses short, the concurrent collector is likely the right direction as it performs most of its work concurrently (i.e., while the application is still running). But finding the best setup will require profiling (consider measuring the GC throughput, the max and average pause time, the frequency of full GCs and their duration too).
(EDIT: Having read a comment from the OP, I think that reading My advice on JVM heap tuning, keep your fingers off the knobs! from the performance guru Kirk Pepperdine would be a good idea.)
Garbage collection tuning is more than an art then science, and it really depends on your application and its usage. If the standard stop-the-world strategies bother you, why not convert to the CMS (concurrent mark and sweep) or the new G1 collector?
The best way is to change the parameters and attach a profiler to examine the application behaviour.
This is quite automatic and works for us:
-server -Xss4096k -Xms12G -Xmx12G -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -verbose:gc -Xmaxf1 -XX:+UseCompressedOops -XX:+DisableExplicitGC -XX:+AggressiveOpts -XX:+ScavengeBeforeFullGC -XX:CMSFullGCsBeforeCompaction=10 -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:+CMSParallelRemarkEnabled -XX:GCTimeRatio=19 -XX:+UseAdaptiveSizePolicy -XX:MaxGCPauseMillis=500 -XX:+PrintGCTaskTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution -Xloggc:gc.log
There is no "best" option (if there was, anyone would use it, right?) but maybe an option which helps in your case. But here are some tips:
Use the latest VM. The GC code got better with every release.
Use the client jvm.dll (available sinve Java 1.5 in jre/bin/client/). This should be the default.
Allocating and freeing objects in Java is cheap. It's expensive to keep them around.
If you want better performance then give the garbage collector less work. Consider using a pool of objects rather than constantly creating and dumping them, and make sure you need every object you create.