Can Java Garbage Collector disturb HPA scaling down? - java

I have a Spring API with a heavy use of memory deployed on kubernetes cluster.
I configured the auto scale (HPA) to look at the memory consumption as a scale criterion, and running a load test everything works well at the time of scale up, however at the time of scale down the memory does not go down and consequently the pods created are not removed. If I run the tests again, new pods will be created, but never removed.
Doing an local analysis using visual VM, I believe the problem is correlated to the GC. Locally the GC works correctly during the test, but at the end of the requests it stops running leaving garbage behind, and it only runs again after a long time. So I believe that this garbage left behind is preventing the HPA from scale down.
Does anyone have any tips on what may be causing this effect or something that I can try?
PS. In the profiler I have no indications of any memory leak, and when I run the GC manually, the garbage left is removed
Here are some additional details:
Java Version: 11
Spring Version: 2.3
Kubernetes Version: 1.17
Docker Image: openjdk:11-jre-slim
HPA Requests Memory: 1Gi
HPA Limits Memory: 2Gi
HPA Memory Utilization Metrics: 80%
HPA Min Pods: 2
HPA Max Pods: 8
JVM OPS: -Xms256m -Xmx1G
Visual VM After Load test
New Relic Memory Resident After Load Test

There most likely isn't a memory leak.
The JVM requests a memory from the operating system up to the limit set by the -Xmx... command-line option. After each major GC run, the JVM looks at the ratio of heap memory in use to the (current) heap size:
If the ratio is too close to 1 (i.e. the heap is too full), the JVM requests memory from the OS to make the heap larger. It does this "eagerly".
If the ration is too close to 0 (i.e. the heap is too large), the JVM may shrink the heap and return some memory to the OS. It does this "reluctantly". Specifically, it may take a number of full GC runs before the JVM decides to release memory.
I think that what you are seeing is the effects of the JVM's heap sizing policy. If the JVM is idle, there won't be enough full GC to trigger the JVM to shrink the heap, and memory won't be given back to the OS.
You could try to encourage the JVM to give memory back by calling System.gc() a few times. But running a full GC is CPU intensive. And if you do manage to get the JVM to shrink the heap, then expanding the heap again (for the next big request) will entail more full GCs.
So my advice would be: don't try that. Use some other criteria to triggering your autoscaling ... if it makes any sense.
The other thing to note is that a JVM + application may use a significant amount of non-heap memory; e.g. the executable and shared native libraries, the native (C++) heap, Java thread stack, Java metaspace, and so on. None of that usage is constrained by the -Xmx option.

Related

The Java ZGC garbage collector USES a lot of memory

I built a simple application using Springboot.The ZGC garbage collector I use when deploying to a Linux server USES a lot of memory..I tried to limit the maximum heap memory to 500MB with Xmx500m, but the JAVA program still used more than 1GB. When I used the G1 collector, it only used 350MB.I don't know why, is this a BUG of JDK11?Or do I have a problem with my boot parameters?
####Runtime environment
operating system: CentOS Linux release 7.8.2003
JDK version: jdk11
springboot version: v2.3.0.RELEASE
Here is my Java startup command
java -Xms128m -Xmx500m \
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC \
-jar app.jar
Here is a screenshot of the memory usage at run time
Heap memory usage
https://github.com/JoyfulAndSpeedyMan/assets/blob/master/2020-07-13%20201259.png?raw=true
System memory usage
https://github.com/JoyfulAndSpeedyMan/assets/blob/master/2020-07-13%20201357.png?raw=true
Here's what happens when you use the default garbage collector
Java startup command
java -Xms128m -Xmx500m \
-jar app.jar
Heap memory usage
https://github.com/JoyfulAndSpeedyMan/assets/blob/master/2020-07-13%20202442.png?raw=true
System memory usage
https://github.com/JoyfulAndSpeedyMan/assets/blob/master/2020-07-13%20202421.png?raw=true
By default jdk11 USES the G1 garbage collector. Theoretically, shouldn't G1 be more memory intensive than ZGC?Why didn't I use it that way?Did I misunderstand?Since I'm a beginner to the JVM, I don't understand why.
ZGC employs a technique known as colored pointers. The idea is to use some free bits in 64-bit pointers into the heap for embedded metadata. However, when dereferencing such pointers, these bits need to be masked, which implies some extra work for the JVM.
To avoid the overhead of masking pointers, ZGC involves multi-mapping technique. Multi-mapping is when multiple ranges of virtual memory are mapped to the same range of physical memory.
ZGC uses 3 views of Java heap ("marked0", "marked1", "remapped"), i.e. 3 different "colors" of heap pointers and 3 virtual memory mappings for the same heap.
As a consequence, the operating system may report 3x larger memory usage. For example, for a 512 MB heap, the reported committed memory may be as large as 1.5 GB, not counting memory besides the heap. Note: multi-mapping affects the reported used memory, but physically the heap will still use 512 MB in RAM. This sometimes leads to a funny effect that RSS of the process looks larger than the amount of physical RAM.
See also:
ZGC: A Scalable Low-Latency Garbage Collector by Per Lidén
Understanding Low Latency JVM GCs by Jean Philippe Bempel
JVM uses much more than just the heap memory - read this excellent answer to understand JVM memory consumption better: Java using much more memory than heap size (or size correctly Docker memory limit)
You'll need to go beyond the heap inspection and use things like Native Memory Tracking to get a clearer picture.
I don't know what's the particular issue with your application, but ZGC is often mentioned as to be good for large heaps.
It's also a brand new collector and got many changes recently - I'd upgrade to JDK 14 if you want to use it (see "Change Log" here: https://wiki.openjdk.java.net/display/zgc/Main)
This is a result of the throughput-latency-footprint tradeoff. When choosing between these 3 things, you can only pick 2.
ZGC is a concurrent GC with low pause times. Since you don't want to give up throughput, you trade latency and throughput for footprint. So, there is nothing surprising in such high memory consumption.
G1 is not a low-pause collector, so you shift that tradeoff towards footprint and get bigger pause times but win some memory.
The amount of OS memory the JVM uses (ie, "committed heap") depends on how often the GC runs (and also whether it uncommits unneeded memory if the app starts to use less), which is a tunable option. Unfortunately ZGC isn't (currently) as aggressive about this by default as G1, but both have some tuning options that you can try.
P.S. As others have noted, the RES htop column is misleading, but the VisualVM chart shows the real picture.

GC and memory behaviour for Java and PHP app?

This question is regarding the Garbage collection behavior when request need more memory than allocated to Pod . If GC is not able to free memory, will it continue to run GC continuously or throw out
of memory.
One pod contains java based app and another contain PHP based. In case of java xmx value is same as given to pod limit.
I can only talk about Java GC. (PHP's GC behavior will be different.)
If GC is not able to free memory, will it continue to run GC continuously after regular interval or throw out of memory.
It depends on the JVM options.
A JVM starts with an initial size for the heap and will expand it as required. However, it will only expand the heap up to a fixed maximum size. That maximum size is determined when the JVM starts from either an option (-Xmx) or default heap size rules. It can't be changed after startup.
As the heap space used gets close to the limit, the GC is likely to occur more and more frequently. The default behavior on a modern JVM is to monitor the %time spent doing garbage collection. If it exceeds a (configurable) threshold, then you will get an OOME with a message about the GC Overhead Threshold having been exceeded. This can happen even if there is enough space to "limp along" for a bit longer.
You can turn off the GC Overhead Limit stuff, but it is inadvisable.
The JVM will also throw an OOME if it simply doesn't have enough heap space after doing a full garbage collection.
Finally, a JVM will throw an OOME if it tries to grow the heap and the OS refuses to give it the memory it requested. This could happen because:
the OS has run out of RAM
the OS has run out of swap space
the process has exceeded a ulimit, or
the process group (container) has exceeded a container limit.
The JVM is only marginally aware of the memory available in its environment. On a bare metal OS or a VM under a hypervisor, the default heap size depends on the amount of RAM. On a bare metal OS, that is physical RAM. On a VM, it will be ... what ever the guest OS sees as its physical memory.
With Kubernetes, the memory available to an application is likely to be further limited by cgroups or similar. I understand that recent Java releases have tweaks that make them more suitable for running in containers. I think this means that they can use the cgroup memory limits rather than the physical memory size when calculating a default heap size.

Kubernetes Pod memory usage does not fall when jvm runs garbage collection

I'm struggling to understand why my Java application is slowly consuming all memory available to the pod causing Kubernetes to mark the pod as out of memory. The JVM (OpenJDK 8) is started with the following arguments:
-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2
I'm monitoring the memory used by the pod and also the JVM memory and was expecting to see some correlation e.g. after major garbage collection the pod memory used would also fall. However I don't see this. I've attached some graphs below:
Pod memory:
Total JVM memory
Detailed Breakdown of JVM (sorry for all the colours looking the same...thanks Kibana)
What I'm struggling with is why when there is a significant reduction in heap memory just before 16:00 does the pods memory not also fall?
It looks like you are creating a pod with a resource limit of 1GB Memory.
You are setting -XX:MaxRAMFraction=2 which means you are allocating 50% of available memory to the JVM which seem to match what you are graphing as Memory Limit.
JVM then reserves around 80% of that which is what you are graphing in Memory Consumed.
When you look at Memory Consumed you will not see internal garbage collection (as in your second graph), because that GC memory is released back to JVM but is still reserved by it.
Is it possible that there is a memory leak in your java application? it is possibly causing more memory to get reserved over time, until the JVM limit (512MB) is met and your pod gets OOM killed.

Java memory increaseases in Task Manager but not in profiler

We developed an highly CPU intensive Java server application that is having a serious memory leak (or so it seems). As time passes, the application seems to eat up increasingly more memory (as seen with Windows Task Manager) but if I analyse it a specialized Java profiler the memory seems to be staying the same. For example, in task manager I see the application taking over 8gb of memory and growing, but in the Java profiler I see that heap memory is at most 2gb. I tried all possible combinations of JAVA_OPTS (-Xmx, -Xms, all types of GC) and nothing worked, Is the Java process not releasing memory back to OS? Is there any way to force it to do so?
1)
I suggest you to set -Xmx2100m and observe heap usage under load.
JVM may take as much OS memory as it decide to be performant, until it reaches Xmx limit. In modern JVMs default Xmx is calculated upon total memory available in OS, so it may be large value.
I think your app does not have memory leak, your JVM simply allocate a lot of memory, because it can.
Observe your JVM thru jvisualvm.
2)
Second suggestion - do you use any JNI code? Does your app call any native library (ie. dll under windows)?

JVM performance with these garbage collection settings

I have an enterprise level Java application that serves a few thousand users per day. This is a JAXB web service on weblogic 10.3.6 (Java 1.6 JVM), using Hibernate to hit an Oracle database. It also calls other web services.
We have it tuned the following GC settings on our production system:
-server -Xms2048m -Xmx2048m -XX:PermSize=512m -XX:MaxPermSize=512m
What is the effect of this GC sizing? The hardware has more than enough capacity to handle it.
I know that this sets the heap size and perm gen at a stable level. But what's the impact of that when you eventually have to do garbage collection?
To me it seems that it would make GC happen less frequently, but take longer when it does happen. Does that sound correct?
I would say please monitor the GC before deciding on the sizing as you never know how the application will behave under load. Have a look at this link and this it has some good references about GC and tools to calculate the same.
it would make GC happen less frequently, but take longer when it does happen
It might, it depends on your use case. You might even find that the GC is shorter in rare case.
A 2 GB heap isn't that much and I would use up to 26 GB without worrying about heap size. Above this size memory accesses are a little slower or use more memory.
Setting -Xmx & -Xms and PermSize & MaxPermSize to equal sizes will stop the JVM from resizing the heaps based on your requirement. These resizes are expensive as they trigger a Full GC.
-server will allow the JVM to make use of Server Compiler which will do more aggressive optimizations before compiling your code to native assembly instructions. Although now-a-days any machine with 2 or more cores and 2GB+ of memory will have server compiler on by default.
Increasing the memory doesn't always fix a problem. Sometimes adding more memory will be an overhead.
If you need details regarding GC, you can try this link
The very reason to tune something is to improve your application's performance and there by achieve your throughput and latency goals.

Categories

Resources