Can the OS stop a Java process from garbage collecting? - java

I'm monitoring a production system with AppDynamics and we just had the system slow to a crawl and almost freeze up. Just prior to this event, AppDynamics is showing all GC activity (minor and major alike) flatline for several minutes...and then come back to life.
Even during periods of ultra low load on the system, we still see our JVMs doing some GC activity. We've never had it totally flatline and drop to 0.
Also - the network I/O flatlined at the same instance of time as the GC/memory flatline.
So I ask: can something at the system level cause a JVM to freeze, or cause its garbage collection to hang/freeze? This is on a CentOS machine.

Does your OS have swapping enabled.
I've noticed HUGE problems with Java once it fills up all the ram on an OS with swapping enabled--it will actually devistate windows systems, effictevly locking them up and causing a reboot.
My theory is this:
The OS ram gets near full.
The OS requests memory back from Java.
This Triggers Java into a full GC to attempt to release memory.
The full GC touches nearly every piece of the VMs memory, even items that have been swapped out.
The system tries to swap data back into memory for the VM (on a system that is already out of ram)
This keeps snowballing.
At first it doesn't effect the system much, but if you try to launch an app that wants a bunch of memory it can take a really long time, and your system just keeps degrading.
Multiple large VMs can make this worse, I run 3 or 4 huge ones and my system now starts to sieze when I get over 60-70% RAM usage.
This is conjecture but it describes the behavior I've seen after days of testing.
The effect is that all the swapping seems to "Prevent" gc. More accurately the OS is spending most of the GC time swapping which makes it look like it's hanging doing nothing during GC.
A fix--set -Xmx to a lower value, drop it until you allow enough room to avoid swapping. This has always fixed my problem, if it doesn't fix yours then I'm wrong about the cause of your problem :)

It is really difficult to find the exact cause of your problem without more information.
But I can try to answer to your question :
Can the OS block the garbage collection ?
It is very unlikely than your OS blocks the thread garbage collector and let the other threads run. You should not investigate that way.
Can the OS block the JVM ?
Yes it perflecty can and it do it a lot, but so fast than you think that the processes are all running at the same time. jvm is a process like the other and his under the control of the OS. You have to check the cpu used by the application when it hangs (with monitoring on the server not in the jvm). If it is very low then I see 2 causes (but there are more) :
Your server doesn't have enough RAM and is swapping (RAM <-> disk), process becomes extremely slow. In this case cpu will be high on the server but low for the jvm
Another process or server grabs the resources and your application or server receive nothing. Check the priority on CentOs.

In theory, YES, it can. But it practice, it never should.
In most Java virtual machines, application threads are not the only threads that are running. Apart from the application threads, there are compilation threads, finalizer threads, garbage collection threads, and some more. Scheduling decisions for allocating CPU cores to these threads and other threads from other programs running on the machine are based on many parameters (thread priorities, their last execution time, etc), which try be fair to all threads. So, in practice no thread in the system should be waiting for CPU allocation for an unreasonably long time and the operating system should not block any thread for an unlimited amount of time.
There is minimal activity that the garbage collection threads (and other VM threads) need to do. They need to check periodically to see if a garbage collection is needed. Even if the application threads are all suspended, there could be other VM threads, such as the JIT compiler thread or the finalizer thread, that do work and ,hence, allocate objects and trigger garbage collection. This is particularly true for meta-circular JVM that implement VM threads in Java and not in a C/C++;
Moreover, most modern JVM use a generational garbage collector (A garbage collector that partitions the heap into separate spaces and puts objects with different ages in different parts of the heap) This means as objects get older and older, they need to be moved to other older spaces. Hence, even if there is no need to collect objects, a generational garbage collector may move objects from one space to another.
Of course the details of each garbage collector in different from JVM to JVM. To put more salt on the injury, some JVMs support more than one type of garbage collector. But seeing a minimal garbage collection activity in an idle application is no surprise.

Related

What is the difference between G1GC options -XX:ParallelGCThreads vs -XX:ConcGCThreads

when configuring the G1GC
we have 2 kinds of thread count
-XX:ParallelGCThreads and -XX:ConcGCThreads
what is the difference, how they are going to impact,
any reference is appreciated.
G1 algorithm has phases which some of them are "stop the world" phases that stops the application during garbage collection, and it also has phases which happens concurrently while application is running(candidate marking etc..), with that information in mind:
ParallelGCThreads option affects the number of threads used for phases when application threads are stopped, and the ConcGCThreads flag affects the number of threads used for concurrent phases.
It is the setting or precisely say JVM tuning settings... we inform JVM to use how many threads in that particular type of Garbage Collection.
I hope you are already aware of what is Garbage Collection, so when JVM runs Garbage Collection, it depends on what algorithm is set for your JVM as default collector.
You might already be knowing that there are various kind of Garbage collectors available, like G1, CMS, etc.
So, based on your setting (here, number of threads) GC algorithm will try to use that many threads for heap cleanup. While JVM runs FULL GC, it halts other threads processing.
Now suppose, your application is live and performing very heavy tasks, multiple users using it for multiple purpose (say very busy app), and JVM running FULL GC now, then in that case, all worker threads will come to a pause and GC will clean up. In this period, if all threads are acquired by JVM, then user will see delay in response. So, you can tell JVM, hey Use only that much (number) thread on that type (CMS or Parallel) of garbage collection run.
To get more on GC types and how they differ, whats suits your needs, refer some good article and docs from oracle.
Here is one reference for the options you mentioned.
-XX:ParallelGCThreads: Sets the number of threads used during parallel phases of the garbage collectors. The default value varies with the platform on which the JVM is running.
-XX:ConcGCThreads: Number of threads concurrent garbage collectors will use. The default value varies with the platform on which the JVM
is running.

Preventing Docker Container CPU Resets?

This is a tricky one and is a little hard to explain but I will give it a shot to see if anyone out there has had a similar issue + fix.
Quick background:
Running a large Java Spring App on Tomcat in a Docker container. Other containers are simple, 1 for a JMS Queue and the other for Mysql. I run on Windows and have given Docker as much CPU as I have (and memory too). I have set JAVA_OPTS for Catalina to max out memory as well as memory limits in my docker-compose, but the issue seems to be CPU related.
When the app is idling it normally is sitting around 103% CPU (8 Cores, 800% max). There is a process we use which (using a Thread Pool) runs some workers to go out and run some code. On my local host (no docker in between) it runs very fast and flies, spitting out logs at a good clip.
Problem:
When running in Docker watching docker stats -a I can see the CPU start to ramp up when this process begins. Meanwhile in the logs, everything is flying by like expected while the CPU grows and grows. It seems to get close to 700% and then it kind of dies, but it doesn't. When it hits this threshold I see the CPU drop drastically down to < 5% where it stays for a little while. At this time logs stop printing, so I assume nothing is happening. Eventually it will kick back in and go back ~120% and continue its process like nothing happened sometimes respiking to ~400%.
What I am trying
I have played around with the memory settings to no success but it seems more like a CPU issue. I know Java in Docker is a bit wonky but I have given it all the room I can on my beefy dev box where locally this process runs without a hitch. I find it odd the CPU spikes then dies, but the container itself doesn't die or reset. Has anyone seen a similar issue or know some ways to further attack this CPU issue with Docker?
Thanks.
There is an issue in resource allocation in JVM containers, which occurs as it is referring to the overall system matrices instead of container matrices. In JAVA 7 and 8, JVM ergonomics are applying the systems’ (instance) matrices such as the number of cores and memory instead of docker allocated resources (cores and memory). As a result of that, the JVM initialized a number of parameters based on core count and Memory as below.
JVM memory footprints
-Perm/metaspace
-JIT Bytecode
-Heap Size (JVM ergonomics ¼ of instance memory)
CPU
-No. JIT compiler threads
-No. Garbage Collection threads
-No. Thread in the common fork-join pool
Therefore, the containers tend to become unresponsive due to high CPU or terminate the container by OOM kill. The reason for this is the container CGGroups and Namespaces are ignored by JVM in order to limit the Memory and CPU cycles. Therefore, JVM tends to get more resources of instance instead of limiting the separate allocation of docker allocated resources.E
Example
Assume two containers are running on 4 cores instance with 8GB memory. When it comes to the docker initialization point, assume that the docker is with 1GB memory and 2048 CPU cycles as a hard limit. Here, each and every container see 4 cores and those JVM allocate Memory, JIT compilers and GC threads separately according to their stats. However, the JVM will see the overall number of cores on that instance (4) and use that value to initialize the number of default threads that we have seen earlier. Accordingly, the JVM matrices of two containers will be as mentioned below.
-4 * 2 Jit Compiler Threads
-4 * 2 Garbage Collection threads
-2 GB Heap size * 2 (¼ of Instance full memory instead of docker allocated memory)
In terms of Memory
As per the above example, the JVM will gradually increase the heap usage as the JVM sees 2GB heap max size, which is a quarter of Instance memory (8GB). Once the memory usage of a container reaches to the Hard limit of 1GB, the container will be terminated by OOM kill.
In terms of CPU
As per the above example, one JVM has initialized with 4 Garbage Collection Threads and 4 JIT compiler. However, the docker allocates only 2048 CPU cycles. Therefore, it leads to high CPU, more context switching and unresponsive container, and finally will terminate the container due to high CPU.
Solution
Basically, there are two processes namely, CGGroups and Namespaces, which handling that kind of situation on the OS level. However, JAVA 7 and 8 do not accept the CGgroup and Namespaces, but the releases after jdk_1.8.131 are able to enable CGroup limit by JVM parameter (-XX:+UseCGroupMemoryLimitForHeap, -XX:+UnlockExperimentalVMOptions). However, it’s only providing solutions for memory issues but no concern on CPU set issue.
With OpenJDK 9, the JVM will automatically detect CPUsets. Especially in orchestration, it further able to manually overwrite the default parameters for CPU set thread counts as per the count of CPU cycles on the container by using JVM flags (XX:ParallelGCThreads, XX:ConcGCThreads).

What happens when GC doesn't (have to) run and program finishes its execution?

Consider a very short program where I am allocating a little bit of memory. I have been taught that GC runs in circumstances when a program allocates a lot of memory and the allocation reaches a limit.
I don't know what that limit is exactly but I think it must be high enough so that GC doesn't run frequently and slows down the execution of the program.
My question is, what happens when the allocation doesn't reach the level at which GC prepares to run, during the lifetime of a program. Does it results in a memory leak?
Garbage collection doesn't run only when there is no more memory, a parallel collector actually runs in another thread and collects when it determines it has time to do so, but that's not the only strategy.
See Java garbage collector - When does it collect?
Nothing specific happens. Roughly saying: memory allocated for java process returns to the heap.
More specifically: JVM uses native memory for allocating memory for java process. After java process terminates this memory becomes free for other processes in the Operating System.
I suggest you read more about this things e.g. here: http://www.ibm.com/developerworks/library/j-nativememory-linux/
If your question is: what happens to the memory if your JVM ends before the GC is running?
Very simple: the complete memory is "returned" to the operating system and becomes available to that again.
In simple words:
When you start your JVM, xx MB of memory are given to it
When your JVM finds that it runs out of memory, the GC kicks in, to get rid of garbage (so that the JVM can continue to run)
When your JVM exits, those xx MB go back to the operating system
For step 3, it is absolutely irrelevant if/how often step 2 happened.
Finally: "heap" is how the JVM internally uses memory. The OS just knows about the absolute memory chunks it gave to the JVM.

Java desktop APP hanging OS

I have a desktop application which is 'kinda' memory hungry, always performing background tasks every 15 seconds. When analyzing it through JVisualVM, the used heap is around 60mb right after every garbage collection and about 210mb right before GC, though it can increase if the app is used very often. The heap size is always around 380/400mb. This pretty much stays the same all the time, at least when analyzing the application in my local machine.
The problem is, in some clients, in rare occasions, the system hangs because apparently the OS don't have enough memory to itself, because it's used in the heap (even if not used by the application itself).
So, in a 4gb machine, when the OS itself and another apps consume about 2gb, when using two or three instances of the app can easily cause the memory usage to go up to 95%, this is where i think it is hanging the OS.
Is there a way to make the heap size be just what the app needs? Or am i talking BS and the problem is happening for some other reason?
You can try to set the limit on the heap smaller. This will force the garbage collector to act but will make the memory foot print of your programs smaller.
There is a question here in SO with information on how to set that.

Java Garbage Collection- CPU Spikes - Longer connection establishment times

We have a pool of server that sits behind the load balancer. The machines in this pool does garbage collection every 6 seconds on average. It takes almost half a second to garbage collect. We also see a CPU spike during garbage collection.
The client machines see a spike in average time to make a connection to the server almost 10% during a day.
Theory : CPU is busy doing GC and that's why it cannot allocate a connection faster.
Is it a valid theory?
JVM : IBM
GC algorithm :gencon
Nursery : 5 GB
Heap Size : 18 GB
I'd say with that many allocations all bets are off--it could absolutely get worse over time, I mean if you are doing GC every 6 seconds all day long that seems problematic.
Do you have access to that code? Can it be re-written to reuse objects and be more intelligent about allocation? I've done a few embedded systems and the trick is to NEVER call new once the system is up and running (Quite doable if you have control over the entire system)
If you don't have access to the code, check into some of the GC tuning options available (including the selection of the garbage collector used)--both distributed with the JDK and 3rd party options. You may be able to improve performance with a few command-line modifications.
It's possible I guess.
Given garbage collection is such an intensive process, is there any reason for it to occur every 6 seconds? I'm not familiar with the IBM JVM or the particular collection algorithm you are using so I can't really comment on those. However, there are some good tuning documents provided by Sun (now offered by Oracle) that discuss the different types of collectors and when you would use them. See this link for some ideas.
One way to prove your theory could to be add some code that logs the time a connection was requested and the time when it was actually allocated. If the GC related CPU spikes seem to coincide with longer times in allocating connections, then that'd prove your theory. Your problem will then become how to get around it.

Categories

Resources