Why gc on old generation takes longer than gc on young generation

Why gc on old generation takes longer than gc on young generation - java

While performing GC, the JVM go over the live objects, and sweep unmarked objects.
According to:
How to Tune Java Garbage Collection
"The execution time of Full GC is relatively longer than that of Minor GC"
Will this always be the case ?
If we have ~100 objects in the Old Generation space, and the average number of live objects (created and sweep objected) in the eden space is more than 100, it that still true ?
In addition , suppose we perform compact phase , then a rule of thumb says for better performance copy a small number of large size objects than copy large number of small size objects.
So what am I missing here ?

"The execution time of Full GC is relatively longer than that of Minor
GC"
Yes.
When garbage collection happens memory is divided into generations, i.e. separate pools holding objects of different ages. Almost all most used configurations uses two generations, one for young objects (Young Generation) and one for old objects (Old Generation)
Different algorithms can be used to perform garbage collection in the different generations, each algorithm optimized based on commonly observed characteristics for that particular generation.
Generational garbage collection exploits the following observations, known as the weak generational hypothesis, regarding applications written in several programming languages, including the Java programming language:
Most allocated objects are not referenced (considered live) for long, that is, they die young.
Few references from older to younger objects exist.
Young generation collections occur relatively frequently and are efficient and fast because the young generation space is usually small and likely to contain a lot of objects that are no longer referenced.
Objects that survive some number of young generation collections are eventually promoted, or tenured, to the old generation.
This generation is typically larger than the young generation and its occupancy
grows more slowly. As a result, old generation collections are infrequent, but take significantly longer to complete.
The garbage collection algorithm chosen for a young generation typically puts a premium on speed, since young generation collections are frequent.
On the other hand, the old generation is typically managed by an algorithm
that is more space efficient, because the old generation takes up most of the heap and old generation algorithms have to work well with low garbage densities.
Read this white paper for a better understanding. The above content is referenced from there.

Related

How generation help garbage collector?

I've read few articles about how garbage collection works and still don't understand how using generations helps? As I understood the main idea is that we start collection from the youngest generation and move to older generations. But why the authors of this idea decided that starting from the youngest generation is the most efficient way?

The older the generation, means object has been used quite a many times, and possibly will need again.
Removing recently created object makes no sense, May be its temporary(scope : local) object.

The authors start with the youngest generation first simply because that's what gets filled up first after your application starts, however in reality which generation is being swept and when is non-deterministic as your application runs.
The important points with generational GC are:
the young generation uses a copying collector which is copying objects to a space that it considers to be empty (the unused survivor spaces) from eden and the current survivor space and is therefore fast and the GC pause is minimal.
add to this fact that most objects die young and therefore the pause required to copy a small number of surviving objects from the eden and the current surviver space is small as only objects with live references are copied, after which eden and the previous survivor space can be wiped.
after being copied several times objects are copied to the tenured (old) generation; Eventually the tenured generation will fill up, however, this time there's not a clean space to copy the objects to, so the garbage collector has to sweap and compact within the generation, which is slow (when compared to the copy performed in eden and the survivor space) meaning a longer pause.
the good news, based on the most objects die young heuristic is, major GCs happen much less frequently than minor keeping GC pauses to a minimum over the lifetime of an application.
there's also a benefit that all new objects are allocated on the top of the heap, meaning there's mininal instructions required to do so, with defragmentation occurring naturally as part of the copy process.
Both these pages, Oracle Garbage Collection Tuning and Useful JVM Flags – Part 5 (Young Generation Garbage Collection), describe this.

Read this one.
Using different generations, makes the allocation of objects easy and fast as MOST of the allocations are done in a single region of Heap - Eden. Based on the observation that most objects die young from Weak Generational Hypothesis, collections in Young generation have more garbage which will reclaim more memory and its relatively small compared to the heap which means that time taken to scan the objects is also less. Thats why Young generation GCs are fast.
For more details on GC and generations, you can refer to this

What is the normal behavior of Java GC and Java Heap Space usage?

I am unsure whether there is a generic answer for this, but I was wondering what the normal Java GC pattern and java heap space usage looks like. I am testing my Java 1.6 application using JMeter. I am collecting JMX GC logs and plotting them with JMeter JMX GC and Memory plugin extension. The GC pattern looks quite stable with most GC operations being 30-40ms, occasional 90ms. The memory consumption goes in a saw-tooth pattern. The JHS usage grows constantly upwards e.g. to 3GB and every 40 minutes the memory usage does a free-fall drop down to around 1GB. The max-min delta however grows, so the sawtooth height constantly grows. Does it do a full GC every 40mins?

Most of your descriptions in general, are how the GC works. However, none of your specific observations, especially numbers, hold for general case.
To start with, each JVM has one or several GC implementations and you could choose which one to use. Take the mostly applied one i.e. SUN JVM (I like to call it this way) and the common server GC pattern as example.
Firstly, the memory are divided into 4 regions.
A young generation which holds all of the recently created objects. When this generation is full, GC does a stop-the-world collection by stopping your program from working, execute a black-gray-white algorithm and get the obselete objects and remove them. So this is your 30-40 ms.
If an object survived a certain rounds of GC in the young gen, it would be moved into a swap generation. The swap generation holds the objects until another number of GCs - then move them to the old generation. There are 2 swap generations which does a double buffering kind of thing to facilitate the young gen to work faster. If young gen dumps stuff to swap gen and found swap gen is mostly full, a GC would happen on swap gen and potentially move the survived objects to old gen. This most likely makes your 90ms, though I am not 100% sure how swap gen works. Someone correct me if I am wrong.
All the objects survived swap gen would be moved to the old generation. The old generation would only be GC-ed until it's mostly full. In your case, every 40 min.
There is another "permanent gen" which is used to load your jar target byte code and resources.
All size of the areas can be adjusted by JVM parameters.
You can try to use VisualVM which would give you a dynamic idea of how it works.
P.S. not all JVM / GC works the same way. If you use G1 collector, or JRocket, it might happens slightly different, but the general idea holds.

Java GC work in terms of generations of objects. There are young, tenure and permament generations. It seems like in your case: every 30-40ms GC process only young generation (and transfers survived objects into tenure generation). And every 40 mins it performs full collecting (it causes stop-the-world pause). Note: it happens not by time, but by percentage of used memory.
There are several JVM options, which allows you to chose generation's sizes, type of GC (there are several algorithms for GC, in java 1.6 Serial GC is used by default, for example -XX:-UseConcMarkSweepGC), parameters of GC work.
You'd better try to find good articles about generations and different types of GC (algorithms are really different, some of them allow to avoid stop-the-world pauses at all!)

yes, most likely. Instead of guessing you can use jstat to monitor your GCs.
I suggest you use a memory profiler to ensure there is nothing simple you can do ti improve the amount of garbage you are producing.
BTW, If you increase the size of the young generation, you can reduce how much garbage makes it into the tenured space reducing the frequency of full collections. You may find you less than one full collection per day if you tune it enough.
For a more extreme case, I have tuned a trading system to less than one collection per day (minor or major)

what's the difference between ParallelGC and ParallelOldGC?

I have some questions about the GC Algorithm:
First when we use the parameters such UseSerialGC, UseParallelGC, UseParallelOldGC and so on, we specify a GC Algorithm. Each of them all can do GC in all generation, is it right?
For example, if I use java -XX:+UseSerialGC, all generation will use serial GC as the GC Algorithm.
Second can I use ParallelGC in Old Gneneration and use SerialGC in young generation?
The last as the title what's the difference between ParallelGC and ParallelOldGC?

Take a look at the HotSpot VM Options:
-XX:+UseParallelGC = Use parallel garbage collection for scavenges. (Introduced in 1.4.1).
-XX:+UseParallelOldGC = Use parallel garbage collection for the full collections. Enabling this option automatically sets -XX:+UseParallelGC. (Introduced in 5.0 update 6.)
where Scavenges = Young generation GC.

Well, after lots of search and research what I have come to understand is as below,
-XX:+UseParallelGC - This enables GC to use multiple threads in Young generation but for old/tenured generation still Serial Mark and Compact algorithm is used.
-XX:+UseParallelOldGC - This enables GC to use Parallel Mark and Compact algorithm in old/tenured generation.
Let's understand -
The algorithm and the memory arrangement, such as mark and copy, swap spaces, that works in Young generation does not work for Old generation for many reasons
Low mortality - In the Old Generation, the "mortality rate" is significantly lower than the same in the Young Generation. In a Typical Java application most objects die quickly and few live longer. As the objects which survive in young generation and promoted to old generation, it is observed that these objects tend to live a longer life. Which leads to very less mortality rate in old generation compared to young generation.
Significantly size - The Old Generation is significantly larger than the Young Generation. Because the Young Generation quickly clears up, relatively little space is available for the many short-lived objects (small Young Generation). In the Old Generation, objects accumulate over time. Therefore, there must be much more space in an old generation than in the Young Generation (big old generation)
Little allocation - In the Old Generation less allocation happens than in the Young Generation. This is because in the Old Generation objects arise only when the Garbage Collector promotes surviving objects from the Young to the Old Generation. In the Young Generation, on the other hand, all the objects that the application generates with new, i.e the majority of the allocations, occur in the Young Generation.
Taking these differences into account, an algorithm has been chosen for the Young Generation that will finish garbage collection as soon as possible, because it has to be called often because of the high mortality rate [point (1)]. In addition, the algorithm must ensure that the most efficient possible memory allocation [point (3)] is then possible because much is allocated in the Young Generation. The mark-and-copy algorithm on the Young Generation has these properties.
On the other hand, this algorithm does not make sense on the Old Generation. The situation is different: the garbage collector has to take care of many objects in the Old Generation [point (2)] and most of them are still alive; only a small part has become unreachable and can be released [point (1)]. If the garbage collector were to copy all the surviving objects on each garbage collection, just as it does with mark-and-copy, then it would spend a lot of time copying it without gaining much.
Therefore, the mark-and-sweep algorithm is made on the old generation, where nothing is copied, but simply the unreachable objects are released. Since this algorithm leads to the fragmentation of the heap, one has additionally considered a variation of the mark-and-sweep algorithm, in which, following the sweep phase, a compaction is made, by which the fragmentation is reduced. This algorithm is called a mark-and-compact algorithm.
A mark and compact algorithm can be time consuming as it needs to traverse the object graph in following for stages.
Marking.
Calculation of new locations.
Reference adjustments.
Moving
In the Calculation of new location phase, when ever it gets a free space, tries to find an object which can move to this space(defragmentation). Stores the the pair for use in later phases. This causes the algorithm consume more time.
Though mark and compare solves some issues specific to tenured generation, it has got some serious issue as this is an STW(Stop the world) event, and consumes much time, can seriously impact the application.
Alternative algorithms for the old generation
In order to reduce the break times, alternatives to the serial mark-and-compact algorithm have been considered:
A parallel mark-and-compact algorithm that still latches all application threads, but then handles labeling and subsequent compaction with multiple garbage collector threads. While this is still a stop-the-world approach, the resulting pause is shorter on a multi-core or multi-processor machine than the serial mark-and-compact algorithm. This parallel algorithm on the Old Generation (called "ParallelOld") has been available since Java 5 Update 6 and is selected with the option -XX: + UseParallelOldGC.
A competing mark-and-sweep algorithm that at least partially rivals the application without stopping its threads, and occasionally needs short stop-the-world phases. This concurrent mark-and-sweep algorithm (called "CMS") has been around since Java 1.4.1; it is switched on with the option -XX:+UseConcMarkSweepGC. Importantly, this is just a mark-and-sweep algorithm; Compaction does not take place, leading to the already discussed problem of fragmentation.
So in a nutshell -XX: + UseParallelOldGC is used as an indication to use multiple threads while doing major collection using Mark and Compact algorithm. If this is used instead, minor or young collection are parallel, but major collections are still single threaded.
I hope this answers .

Those are two gc policies applied to different regions of a Java Heap namely New and Old generations. Here's a link that helps to clarify which options imply other ones. It's helpful especially when starting out to understand what you're getting when you specify say ParallelOldGC or ParNewGC.
http://www.fasterj.com/articles/oraclecollectors1.shtml

From Oracle Java SE 8 docs:
https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/collectors.html
The parallel collector (also known as the throughput collector) performs minor collections in parallel, which can significantly reduce garbage collection overhead. It is intended for applications with medium-sized to large-sized data sets that are run on multiprocessor or multithreaded hardware. The parallel collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseParallelGC.
Parallel compaction is a feature that enables the parallel collector to perform major collections in parallel. Without parallel compaction, major collections are performed using a single thread, which can significantly limit scalability. Parallel compaction is enabled by default if the option -XX:+UseParallelGC has been specified. The option to turn it off is -XX:-UseParallelOldGC.
So if you specify -XX:+UseParallelGC, By default major collection will also be done using multiple threads. The reverse is also true i.e. if you specify -XX:+UseParallelOldGC, minor collections will also be done using multiple threads.

Generational Garbage Collection

As I understand, a generational GC divides objects into generations.
And on each cycle, GC runs on only one generation.
Why? Why Garbage Collecting of only one generation is enough?
P.S: I understand all these from here .

If you read the link I provided in earlier question you had about Generational GC, you will understand why it does so, the cycle is when the white set memory is filled up.
To optimize for this scenario, memory
is managed in generations, or memory
pools holding objects of different
ages. Garbage collection occurs in
each generation when the generation
fills up. Objects are allocated in a
generation for younger objects or the
young generation, and because of
infant mortality most objects die
there. When the young generation fills
up it causes a minor collection. Minor
collections can be optimized assuming
a high infant mortality rate. The
costs of such collections are, to the
first order, proportional to the
number of live objects being
collected. A young generation full of
dead objects is collected very
quickly. Some surviving objects are
moved to a tenured generation. When
the tenured generation needs to be
collected there is a major collection
that is often much slower because it
involves all live objects.
Basically, each objects is divided into generations (based on the hypothesis about the object) and places them into a memory heap for a particular generation. When that memory heap is filled up, the GC cycle begins, and those objects that still references are moved to another memory heap and fresh objects are added.

It's not always enough -- it's just that it's usually enough, so it saves time by not examining objects that are likely to stay alive anyway.
Every object has a generation, saying how many garbage collections it has survived. If an object has survived a few garbage collections, chances are that it will also survive the next one.
MSDN has a great explanation:
A generational garbage collector makes the following assumptions:
The newer an object is, the shorter its lifetime will be.
The older an object is, the longer its lifetime will be.
Newer objects tend to have strong relationships to each other and are frequently accessed around the same time.
Compacting a portion of the heap is faster than compacting the whole heap.
Because of this, you could save some time by only trying to collect younger objects, and collecting the older generations only if that doesn't free up enough memory.

The answer is there really.
It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or the generational hypothesis).
And
Generational garbage collection is a heuristic approach, and some unreachable objects may not be reclaimed on each cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to reclaim all available space.
Basically, generational collection gives you better performance over a full garbage collection at the cost of completeness. That's why a mixture of the two is used in practice.

how the live objects are figured out in young generation collection?

I understand that time taken by YGC is proportional to number of live objects in Eden. I also understand that how the live objects are figured out in Major collections (All the objects in thread stacks and static objects and further objects reachable from those objects transitively.)
But I dont understand how the live objects are figured out in young generation collection ?
if it parses the thread stacks, then it need to parse eden + tenured space which is not the case I think. So how does JVM find the live objects in eden and copies them in To Survivor space ?

how the live objects are figured out in young generation collection ?
A good high-level description of how generational collection is implemented in HotSpot may be found in this article.
In general a generational collector marks the young generation as follows (assuming we just have two generations):
It marks young objects and traces references starting with the thread stack frames and the static frames. When it finds a reference to an old generation object it ignores it.
Then it repeats the process for references in the old generation that refer to young generation objects. The tricky bit is identifying these references in the old generation without marking the entire old generation.
Now we have marked all of the objects in the new generation that are reachable ... and the rest (in that generation) can be reclaimed.
In HotSpot, old generation objects that contain young generation references are identified using the "Card Table". The old generation is divided into regions of 512 bytes, and each region has a "Card". If the region contains any old -> new generation pointers, a bit in the Card is set. Objects in regions with the Card bit set are then traced during a new generation collection.
The tricky thing is maintaining the Card table as new space references are written to old generation objects. In HotSpot, this is implemented using a software write-barrier that sets the appropriate Card's dirty bit whenever a new space reference is written into the memory region corresponding to the Card. As the linked article notes, this makes setting a reference field in an object more expensive, but it is apparently worth it due to the time saved by being able to collect only the new generation most of the time.

To trace just the youngest generation, the garbage collector scans the same root set (stacks and registers) and ALSO all the older (non-collected) generations that have been modified since the previous young generation scan. Only those objects that have been modified can possibly point at young generation objects, as unmodified objects cannot possibly point at objects that were created after their last modification.
So the tricky part is, how does the GC know which objects have been modified since the last GC? There are a number of techniques that can be used, but they basically come down to keeping track of writes to old generation objects. This can be done by trapping writes (write barriers) or just keeping track of all write targets (write buffers, card marking), all of which add overhead to the execution of the program while the GC is not running (so it doesn't show up as GC pause time, but does show up in the total elapsed time). Hardware support helps a lot, if its available. The tracking need not be exact, as long as every modified older generation object is scanned (scanning unmodified objects is a waste of time, but won't hurt anything).

I am quoting the relevant text from the article by Brian Goetz here.
Tracing garbage collectors, such as
copying, mark-sweep, and mark-compact,
all start scanning from the root set,
traversing references between objects,
until all live objects have been
visited. A generational tracing
collector starts from the root set,
but does not traverse references that
lead to objects in the older
generation, which reduces the size of
the object graph to be traced.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.