A use case for a manual GC invocation?

A use case for a manual GC invocation? - java

I've read why is it bad practice to call System.gc(), and many others, e.g. this one describing a really disastrous misuse of System.gc(). However, there are cases when the GC takes too long and avoiding long pauses, e.g., by avoiding garbage is not exactly trivial and makes the code harder to maintain.
IMHO calling GC manually is fine in the following common scenario:
There are multiple interchangeable webserves with a failover in front of them.
Every server uses a few gigabytes of heap and the STW pauses take much longer than an average request.
The failover has no idea when GC is going to happen.
The failover can exempt a server when told to.
The algorithm seems to be trivial: Periodically select a server, let no more requests be send to it, let it finished its running requests, let it do its GC, and re-activate the server.
I wonder if I am missing something?1,2
What are the alternatives?
Long running requests could be a problem, but let's assume there are none. Or simply limit waiting to some period comparable with what GC takes. Making a slow request even slower doesn't sound too bad.
An option like -XX:+DisableExplicitGC could make the algorithm useless, but just don't use it (my use case includes dedicated servers I'm in charge of).

For low latency trading systems I use the GC in an atypical manner.
You want to avoid any collection, even minor ones during the trading day. A way to do this is to create less than 300 KB of garbage a second. This is around 1 GB per hour, or up to 24 GB per day. When you use a 24 GB Eden space it means there is no minor/major GCs. However to ensure a GC occurs at a time which is planned and acceptable, a System.gc() is called at say 5 AM each morning and you have a clean Eden space for the next day.
There are times, when you create more garbage than expected e.g. failing to reconnect to a data source, and you might get a small number of minor collections. However this only happens when something is going wrong.
For more details http://vanillajava.blogspot.co.uk/2011/06/how-to-avoid-garbage-collection.html
by avoiding garbage is not exactly trivial and makes the code harder to maintain.
Avoiding garbage entirely is near impossible. However 300 KB/s is not so hard for a JVM. (You can have more than one JVM these days on one machine with 24 GB Eden spaces)
Note if you can keep below 50 KB/s of garbage you can run all week with out a GC.
Periodically select a server, let no more requests be send to it, let it finished its running requests, let it do its GC, and re-activate the server.
You can treat a GC as a failure to meet your SLA condition. In this case you can remove a server when you determine this is about to happen from your cluster, Full GC it and return it to the cluster.

However, there are cases when the GC takes too long and avoiding long pauses
You have to distinguish between pauses caused by young-only, mixed/concurrent phase and full GCs.
In most cases it's the full GCs that you want to avoid while the other ones are acceptable, which can often be achieved with some GC-tuning and optimizing the code to avoid large allocation bursts.
In principle G1 should be able to run forever on young/mixed cycles and a full GC could be considered a soft failure. CMS can at least do so for many days with careful tuning, but may eventually succumb to fragmentation and require a full GC for compacting.
In cases where even the young GC pauses are not acceptable or garbage piles up too fast for the concurrent phase to handle with acceptable pause times then the strategy you outline may be a viable workaround.
Note that there also other use-cases for manually triggering GCs, such as GC-managed native resources, e.g. direct byte buffers, although those are fairly troublesome in general.
Also note that not all System.gc() calls are created equal, there is the ExplicitGCInvokesConcurrent option too.

Related

Impact of Java streams from GC perspective or handling short-lived objects by the GC

There are some articles available online where they mention some of the cons of using Stream-s over old loop-s:
https://blog.jooq.org/2015/12/08/3-reasons-why-you-shouldnt-replace-your-for-loops-by-stream-foreach/
https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html
But is there any impact from the GC perspective? As I assume (is it correct?) that every stream call creates some short-lived objects underneath. If the particular code fragment which uses streams is called frequently by the underlying system could it cause eventually some performance issue from the GC perspective or put extra pressure on GC? Or the impact is minimal and could be ignored most of the time?
Are there any articles covering this more in detail?

To be fair, it's very complicated to give an answer when Holger already linked the main idea via his answer; still I will try to.
Extra pressure on GC - may be. Extra time for a GC cycle to execute - most probably not. Ignorable? I'd say totally. In the end what you care from a GC - that it takes little time to reclaim lots of space, preferably with super tiny stop-the-world events.
Let's talk about the potential overhead in the GC main two phases : mark and evacuation/realocation (Shenandoah/ZGC). First mark phase, where GC finds out what is garbage (by actually identifying what is alive).
If objects that were created by the Stream internals are not reachable, they will never be scanned (zero overhead here) and if they are reachable, scanning them will be extremely fast. The other side of the story is: when you create an Object and GC might touch it while it's running in the mark phase, the slow path of a LoadBarrier (in case of Shenandoah) will be active. This will add some tens of ns I assume to the total time of that particular phase of the GC as well as some space in the SATB queues. Aleksey Shipilev in one talk said that he tried to measure the overhead from executing a single barrier and could not, so he measured 3 and the time was in the region of tens of ns. I don't know the exact details of ZGC, but a LoadBarrier is there in place too.
The main point is that this mark phase is done in a concurrent fashion, while the application is running, so you application will still run perfectly fine. And even if some GC code will be triggered to do something specific work (Load Barrier), it will be extremely fast and completely transparent to you.
The second phase is "compactation", or making space for future allocations. What a GC does is move live objects from regions with the most garbage (Shenandoah for sure) to regions that are empty. But only live objects. So if a certain region has 100 objects and only 1 is alive, only 1 will be moved, then that entire region is going to be marked as free. So potentially if the Stream implementation generated only garbage (i.e.: not currently alive), it is "free lunch" for GC, it will not even know it existed.
The better picture here is that this phase is still done concurrently. To keep the "concurrency" active, you need to know how much was allocated from start to end of a GC cycle. This amount is the minimum "extra" space you need to have on top of the java process in order for a GC to be happy.
So overall, you are looking at a super tiny impact; if any at all.

How to deal with long Full Garbage Collection cycle in Java

We inherited a system which runs in production and started to fail every 10 hours recently. Basically, our internal software marks the system that is has failed if it is unresponsive for a minute. We found that our problem that our Full GC cycles last for 1.5 minutes, we use 30 GB heap. Now the problem is that we cannot optimize a lot in a short period of time and we cannot partition of our service quickly but we need to get rid of 1.5 minutes pauses as soon as possible as our system fails because of these pauses in production. For us, an acceptable delay is 20 milliseconds but not more. What will be the quickest way to tweak the system? Reduce the heap to trigger GCs frequently? Use System.gc() hints? Any other solutions? We use Java 8 default settings and we have more and more users - i.e. more and more objects created.
Some GC stat

You have a lot of retained data. There is a few options which are worth considering.
increase the heap to 32 GB, this has little impact if you have free memory. Looking again at your totals it appears you are using 32 GB rather than 30 GB, so this might not help.
if you don't have plenty of free memory, it is possible a small portion of your heap is being swapped as this can increase full GC times dramatically.
there might be some simple ways to make the data structures more compact. e.g. use compact strings, use primitives instead of wrappers e.g. long for a timestamp instead of Date or LocalDateTime. (long is about 1/8th the size)
if neither of these help, try moving some of the data off heap. e.g. Chronicle Map is a ConcurrentMap which uses off heap memory can can reduce you GC times dramatically. i.e. there is no GC overhead for data stored off heap. How easy this is to add highly depends on how your data is structured.
I suggest analysing how your data is structured to see if there is any easy ways to make it more efficient.

There is no one-size-fits-all magic bullet solution to your problem: you'll need to have a good handle on your application's allocation and liveness patterns, and you'll need to know how that interacts with the specific garbage collection algorithm you are running (function of version of Java and command line flags passed to java).
Broadly speaking, a Full GC (that succeeds in reclaiming lots of space) means that lots of objects are surviving the minor collections (but aren't being leaked). Start by looking at the size of your Eden and Survivor spaces: if the Eden is too small, minor collections will run very frequently, and perhaps you aren't giving an object a chance to die before its tenuring threshold is reached. If the Survivors are too small, objects are going to be promoted into the Old gen prematurely.
GC tuning is a bit of an art: you run your app, study the results, tweak some parameters, and run it again. As such, you will need a benchmark version of your application, one which behaves as close as possible to the production one but which hopefully doesn't need 10 hours to cause a full GC.
As you stated that you are running Java 8 with the default settings, I believe that means that your Old collections are running with a Serial collector. You might see some very quick improvements by switching to a Parallel collector for the Old generation (-XX:+UseParallelOldGC). While this might reduce the 1.5 minute pause to some number of seconds (depending on the number of cores on your box, and the number of threads you specify for GC), this will not reduce your max pause to to 20ms.

When this happened to me, it was due to a memory leak caused by a static variable eating up memory. I would go through all recent code changes and look for any possible memory leaks.

What the frequency of the Garbage Collection in Java?

Page 6 of the the document Memory Management in the Java
HotSpot™ Virtual Machine contains the following paragraphs:
Young generation collections occur relatively frequently and are
efficient and fast because the young generation space is usually small
and likely to contain a lot of objects that are no longer referenced.
Objects that survive some number of young generation collections are
eventually promoted, or tenured, to the
old generation. See Figure 1. This generation is typically larger than the young generation and its occupancy
grows more slowly. As a result, old generation collections are infrequent, but take significantly longer to
complete
Could someone please define what "frequent" and "infrequent" mean in the statements above? Are we talking microseconds, milliseconds, minutes, days?

It is not possible to give a definite answer to this. It really depends on a lot of factors, including the platform (JVM version, settings, etc), the application, and the workload.
At one extreme, it is possible for an application to never trigger a garbage collector. It might simply sit there doing nothing, or it might perform an extremely long computation in which no objects are created after the JVM initialization and application startup.
At the other extreme it is theoretically possible for one garbage collection end and another one to start within few nanoseconds. For example, this could happen if your application is in the last stages of dying from a full heap, or if it is allocating pathologically large arrays.
So:
Are we talking microseconds, milliseconds, minutes, days?
Possibly all of the above, though the first two would definitely be troubling if you observed them in practice.
A well behaved application should not run the GC too often. If your application is triggering a young space collection more than once or twice a second, then this could lead to performance problems. And too frequent "full" collections is worse because their impact is greater. However, it is certainly plausible for a poorly designed / implemented application to behave like this.
There is also the issue that the interval between GC runs is not always meaningful. For instance some of the HotSpot GCs actually have GC threads running concurrently with normal application threads. If you have enough cores, enough RAM and enough memory bus bandwidth, then a constantly running concurrent GC may not appreciably affect application performance.
Terminology note:
Strictly speaking a concurrent GC is one where the GC can run at the same time as the application threads.
Strictly speaking a parallel GC is one where the GC itself uses multiple threads.
A GC can be concurrent without being parallel, and vice versa.

Its a relative term. Young collections could be many times a seconds up to a few hours. Old generations collections can be every few seconds, up to daily. You should expect to have many more young collections than old collections in a most systems.
Its highly unlikely to be many days. If the GC occurs too often e.g. << 100 ms apart you get get a OutOfMemoryError: GC Overhead Exceeded as the JVM prevenets that from happening.

As it is, the terms "frequent" , "infrequent" are relative. And the timings are, in fact, not fixed. It depends on the system in question. It depends on lots of things like:
Your heap size and settings for different parts of the heap (young, old gen, perm gen)
Your application's memory behaviour. How many objects does it create and how fast? how long those objects are referenced etc?
If your application is monster memory eater, gc would run as if its running for its life. If your application does not demand too much of memory, then gc would run at intervals decided by how full the memory is.

TL DL: "Frequent" and "infrequent" are relative terms that depends on the memory allocation rate and the heap size. If you want a precise answer, you need to measure it yourself for your particular application.
Let's say your app has two modes, mode-1 allocates memory and does computation and mode-2 sits idle.
If mode-1 allocation is smaller than the heap available, no gc need to occur until it finishes. Maybe it used so little RAM that it could do a second round of mode-1 without collection. However, eventually you'll run out of free heap, and jvm will perform an "infrequent" collection.
However, if mode-1 allocation is a significant fraction of, or larger, than the young-generation heap, collection would happen more "frequently". During the young gen collection, allocations that survive (imagine data is needed through the entire mode-1 operation), will be promoted to old-gen, giving the young-gen more room. Young-gen allocation and collection can now continue. Eventually old-gen heap would run out, and must be collected, thus "infrequently".
So then, how frequent is frequent? It depends on the allocation rate and the heap size. If jvm is bumping into the heap limit often, it'll collect often. If there is plenty of heap (let's say 100GB), then jvm doesn't need to collect for a long long time. The down side is that when it finally does a collection, it might take a long time to free 100GB, stopping the jvm for many seconds (or minutes!). The current JVMs are smarter than that and would occasionanlly force a collection (preferably in mode-2). And with parallel collectors, it could happen all the time if necessary.
Ultimately, the frequency is task and heap dependent, as well as how various vm parameters are set. If you want a precise answer, you must measure them yourself for your particular application.

Because spec says "relatively frequently" and infrequent (regarding Young generation), we can't estimate the frequency in absolute units like microseconds, milliseconds, minutes or days

Java's Serial garbage collector performing far better than other garbage collectors?

I'm testing an API, written in Java, that is expected to minimize latency in processing messages received over a network. To achieve these goals, I'm playing around with the different garbage collectors that are available.
I'm trying four different techniques, which utilize the following flags to control garbage collection:
1) Serial: -XX:+UseSerialGC
2) Parallel: -XX:+UseParallelOldGC
3) Concurrent: -XX:+UseConcMarkSweepGC
4) Concurrent/incremental: -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
I ran each technique over the course of five hours. I periodically used the list of GarbageCollectorMXBean provided by ManagementFactory.getGarbageCollectorMXBeans() to retrieve the total time spent collecting garbage.
My results? Note that "latency" here is "Amount of time that my application+the API spent processing each message plucked off the network."
Serial: 789 GC events totaling 1309 ms; mean latency 47.45 us, median latency 8.704 us, max latency 1197 us
Parallel: 1715 GC events totaling 122518 ms; mean latency 450.8 us, median latency 8.448 us, max latency 8292 us
Concurrent: 4629 GC events totaling 116229 ms; mean latency 707.2 us, median latency 9.216 us, max latency 9151 us
Incremental: 5066 GC events totaling 200213 ms; mean latency 515.9 us, median latency 9.472 us, max latency 14209 us
I find these results to be so improbable that they border on absurd. Does anyone know why I might be having these kinds of results?
Oh, and for the record, I'm using Java HotSpot(TM) 64-Bit Server VM.

I'm working on a Java application that is expected to maximize throughput and minimize latency
Two problems with that:
Those are often contradictory goals, so you need to decide how important each is against the other (would you sacrifice 10% latency to get 20% throughput gain or vice versa? Are you aiming for some specific latency target, beyond which it doesn't matter whether it's any faster? Things like that.)
Your haven't given any results around either of these
All you've shown is how much time is spent in the garbage collector. If you actually achieve more throughput, you would probably expect to see more time spent in the garbage collector. Or to put it another way, I can make a change in the code to minimize the values you're reporting really easily:
// Avoid generating any garbage
Thread.sleep(10000000);
You need to work out what's actually important to you. Measure everything that's important, then work out where the trade-off lies. So the first thing to do is re-run your tests and measure latency and throughput. You may also care about total CPU usage (which isn't the same as CPU in GC of course) but while you're not measuring your primary aims, your results aren't giving you particularly useful information.

I don't find this surprising at all.
The problem with serial garbage collection is that while it's running, nothing else can run at all (aka "stops the world"). That has a good point though: it keeps the amount of work spent on garbage collection to just about its bare minimum.
Almost any sort of parallel or concurrent garbage collection has to do a fair amount of extra work to ensure that all modifications to the heap appear atomic to the rest of the code. Instead of just stopping everything for a while, it has to stop just those things that depend on a particular change, and then for just long enough to carry out that specific change. It then lets that code start running again, gets to the next point that it's going to make a change, stops other pieces of code that depend on it, and so on.
The other point (though in this case, probably a fairly minor one) is that as you process more data, you generally expect to generate more garbage, and therefore spend more time doing garbage collection. Since the serial collector does stop all other processing while it does its job, that not only makes the garbage collection fast, but also prevents any more garbage from being generated during that time.
Now, why do I say that's probably a minor contributor in this case? That's pretty simple: the serial collector only used up a little over a second out of five hours. Even though nothing else got done during that ~1.3 seconds, that's such a small percentage of five hours that it probably didn't make any much (if any) real difference to your overall throughput.
Summary: the problem with serial garbage collection isn't that it uses excessive time overall -- it's that it can be very inconvenient if it stops the world right when you happen to need fast response. At the same time, I should add that as long as your collection cycles are short, this can still be fairly minimal. In theory, the other forms of GC mostly limit your worst case, but in fact (e.g., by limiting the heap size) you can often limit your maximum latency with a serial collector as well.

There was an excellent talk by a Twitter engineer at the 2012 QCon Conference on this topic - you can watch it here.
It discussed the various "generations" in the Hotspot JVM memory and garbage collection (Eden, Survivor, Old). In particular note that the "Concurrent" in ConcurrentMarkAndSweep only applies to the Old generation, i.e. objects that hang around for a while.
Short-lived objects are GCd from the "Eden" generation - this is cheap, but is a "stop-the-world" GC event regardless of which GC algorithm you have chosen!
The advice was to tune the young generation first e.g. allocate lots of new Eden so there's more chance for objects to die young and be reclaimed cheaply. Use +PrintGCDetails, +PrintHeapAtGC, +PrintTenuringDistribution... If you get more than 100% survivor then there wasn't room, so objects get quickly promoted to Old - this is Bad.
When tuning for the Old generatiohn, if latency is top priority, it was recommended to try ParallelOld with auto-tune first (+AdaptiveSizePolicy etc), then try CMS, then maybe the new G1GC.

You can not say one GC is better than the other. it depends on your requirements and your application.
but if u want to maximize throughput and minimize latency: GC is your enemy! you should not call GC at all and also try to prevent JVM from calling GC.
go with serial and use object pools.

With serial collection, only one thing happens at a time. For example, even when multiple CPUs are
available, only one is utilized to perform the collection. When parallel collection is used, the task of
garbage collection is split into parts and those subparts are executed simultaneously, on different
CPUs. The simultaneous operation enables the collection to be done more quickly, at the expense of
some additional complexity and potential fragmentation.
While the serial GC uses only one thread to process a GC, the parallel GC uses several threads to process a GC, and therefore, faster. This GC is useful when there is enough memory and a large number of cores. It is also called the "throughput GC."

How to reduce java concurrent mode failure and excessive gc

In Java, the concurrent mode failure means that the concurrent collector failed to free up enough memory space form tenured and permanent gen and has to give up and let the full stop-the-world gc kicks in. The end result could be very expensive.
I understand this concept but never had a good comprehensive understanding of
A) what could cause a concurrent mode failure and
B) what's the solution?.
This sort of unclearness leads me to write/debug code without much of hints in mind and often has to shop around those performance flags from Foo to Bar without particular reasons, just have to try.
I'd like to learn from developers here how your experience is? If you had encountered such performance issue, what was the cause and how you addressed it?
If you have coding recommendations, please don't be too general. Thanks!

The first thing about CMS that I have learned is it needs more memory than the other collectors, about 25 to 50% more is a good starting point. This helps you avoid fragmentation, since CMS does not do any compaction like the stop the world collectors would. Second, do things that help the garbage collector; Integer.valueOf instead of new Integer, get rid of anonymous classes, make sure inner classes are not accessing inaccessible things (private in the outer class) stuff like that. The less garbage the better. FindBugs and not ignoring warnings will help a lot with this.
As far as tuning, I have found that you need to try several things:
-XX:+UseConcMarkSweepGC
Tells JVM to use CMS in tenured gen.
Fix the size of your heap: -Xmx2048m -Xms2048m This prevents GC from having to do things like grow and shrink the heap.
-XX:+UseParNewGC
use parallel instead of serial collection in the young generation. This will speed up your minor collections, especially if you have a very large young gen configured. A large young generation is generally good, but don't go more than half of the old gen size.
-XX:ParallelCMSThreads=X
set the number of threads that CMS will use when it is doing things that can be done in parallel.
-XX:+CMSParallelRemarkEnabled remark is serial by default, this can speed you up.
-XX:+CMSIncrementalMode allows application to run more by pasuing GC between phases
-XX:+CMSIncrementalPacing allows JVM to figure change how often it collects over time
-XX:CMSIncrementalDutyCycleMin=X Minimm amount of time spent doing GC
-XX:CMSIncrementalDutyCycle=X Start by doing GC this % of the time
-XX:CMSIncrementalSafetyFactor=X
I have found that you can get generally low pause times if you set it up so that it is basically always collecting. Since most of the work is done in parallel, you end up with basically regular predictable pauses.
-XX:CMSFullGCsBeforeCompaction=1
This one is very important. It tells the CMS collector to always complete the collection before it starts a new one. Without this, you can run into the situation where it throws a bunch of work away and starts again.
-XX:+CMSClassUnloadingEnabled
By default, CMS will let your PermGen grow till it kills your app a few weeks from now. This stops that. Your PermGen would only be growing though if you make use of Reflection, or are misusing String.intern, or doing something bad with a class loader, or a few other things.
Survivor ratio and tenuring theshold can also be played with, depending on if you have long or short lived objects, and how much object copying between survivor spaces you can live with. If you know all your objects are going to stick around, you can configure zero sized survivor spaces, and anything that survives one young gen collection will be immediately tenured.

Quoted from "Understanding Concurrent Mark Sweep Garbage Collector Logs"
The concurrent mode failure can either
be avoided by increasing the tenured
generation size or initiating the CMS
collection at a lesser heap occupancy
by setting
CMSInitiatingOccupancyFraction to a
lower value
However, if there is really a memory leak in your application, you're just buying time.
If you need fast restart and recovery and prefer a 'die fast' approach I would suggest not using CMS at all. I would stick with '-XX:+UseParallelGC'.
From "Garbage Collector Ergonomics"
The parallel garbage collector
(UseParallelGC) throws an
out-of-memory exception if an
excessive amount of time is being
spent collecting a small amount of the
heap. To avoid this exception, you can
increase the size of the heap. You can
also set the parameters
-XX:GCTimeLimit=time-limit and -XX:GCHeapFreeLimit=space-limit

Sometimes OOM pretty quick and got killed, sometime suffers long gc period (last time was over 10 hours).
It sounds to me like a memory leak is at the root of your problems.
A CMS failure won't (as I understand it) cause an OOM. Rather a CMS failure happens because the JVM needs to do too many collections too quickly, and CMS could not keep up. One situation where lots of collection cycles happen in a short period is when your heap is nearly full.
The really long GC time sounds weird ... but is theoretically possible if your machine was thrashing horribly. However, a long period of repeated GCs is quite plausible if your heap is very nearly full.
You can configure the GC to give up when the heap is 1) at max size and 2) still close to full after a full GC has completed. Try doing this if you haven't done so already. It won't cure your problems, but at least your JVM will get the OOM quickly, allowing a faster service restart and recovery.
EDIT - the option to do this is -XX:GCHeapFreeLimit=nnn where nnn is a number between 0 and 100 giving the minimum percentage of the heap that must be free after the GC. The default is 2. The option is listed in the aptly titled "The most complete list of -XX options for Java 6 JVM" page. (There are lots of -XX options listed there that don't appear in the Sun documentation. Unfortunately the page provides few details on what the options actually do.)
You should probably start looking to see if your application / webapp has memory leaks. If it has, your problems won't go away unless those leaks are found and fixed. In the long term, fiddling with the Hotspot GC options won't fix memory leaks.

I've found using -XX:PretenureSizeThreshold=1m to make 'large' object go immediately to tenured space greatly reduced my young GC and concurrent mode failures since it tends not to try to dump the young + 1 survivor amount of data (xmn=1536m survivorratio=3 maxTenuringThreashould=5) before a full CMS cycle can complete. Yes my survivor space is large, but about once ever 2 days something comes in the app that will need it (and we run 12 app servers each day for 1 app).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.