Generational Garbage Collection

Generational Garbage Collection - java

As I understand, a generational GC divides objects into generations.
And on each cycle, GC runs on only one generation.
Why? Why Garbage Collecting of only one generation is enough?
P.S: I understand all these from here .

If you read the link I provided in earlier question you had about Generational GC, you will understand why it does so, the cycle is when the white set memory is filled up.
To optimize for this scenario, memory
is managed in generations, or memory
pools holding objects of different
ages. Garbage collection occurs in
each generation when the generation
fills up. Objects are allocated in a
generation for younger objects or the
young generation, and because of
infant mortality most objects die
there. When the young generation fills
up it causes a minor collection. Minor
collections can be optimized assuming
a high infant mortality rate. The
costs of such collections are, to the
first order, proportional to the
number of live objects being
collected. A young generation full of
dead objects is collected very
quickly. Some surviving objects are
moved to a tenured generation. When
the tenured generation needs to be
collected there is a major collection
that is often much slower because it
involves all live objects.
Basically, each objects is divided into generations (based on the hypothesis about the object) and places them into a memory heap for a particular generation. When that memory heap is filled up, the GC cycle begins, and those objects that still references are moved to another memory heap and fresh objects are added.

It's not always enough -- it's just that it's usually enough, so it saves time by not examining objects that are likely to stay alive anyway.
Every object has a generation, saying how many garbage collections it has survived. If an object has survived a few garbage collections, chances are that it will also survive the next one.
MSDN has a great explanation:
A generational garbage collector makes the following assumptions:
The newer an object is, the shorter its lifetime will be.
The older an object is, the longer its lifetime will be.
Newer objects tend to have strong relationships to each other and are frequently accessed around the same time.
Compacting a portion of the heap is faster than compacting the whole heap.
Because of this, you could save some time by only trying to collect younger objects, and collecting the older generations only if that doesn't free up enough memory.

The answer is there really.
It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or the generational hypothesis).
And
Generational garbage collection is a heuristic approach, and some unreachable objects may not be reclaimed on each cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to reclaim all available space.
Basically, generational collection gives you better performance over a full garbage collection at the cost of completeness. That's why a mixture of the two is used in practice.

Related

Does Garbage Collectors (GC) collect only objects or also overwrite the data stored by it?

HotSpot's Garbage Collectors (GC)
When HotSpot's Garbage Collectors (GC) runs does it only collect objects or overwrite stored data to prevent memory dumping?

This is a rather broad question because there are many different algorithms for GC.
Let's take Hotspot and G1 as an example.
A minor GC copies objects from Eden space to a survivor space, between survivor spaces and promotes objects to the old generation (depending on object age). In all of these, the memory used by those objects will subsequently be overwritten but not deallocated.
A major GC will copy objects from one region (which is a logical area of memory) to another to compact objects, eliminating fragmentation. Again, the memory used by these objects will be overwritten at some point in the future.
Some collectors like Zing from Azul (who I work for) uncommit unused memory when the heap usage shrinks and all allocated pages are no longer required. Not all GCs do this, though. This returns memory pages to the OS allowing them to be used for other applications.

Why are immutable objects 'more efficient' for generational GC?

"Young GC becomes inefficient if we have tenured objects referring to younger generations" is quoted as one of the reasons to favor immutable objects.
What exactly happens when the collector comes across such an object in the old generation?
Why should it be any more cumbersome than collecting an older object referring to an object in the young generation?

To collect the Eden space (of the young gen.) any live objects are copied from the Eden space to one of the survivor spaces. Objects already in a survivor space are copied from the 'from' space to the 'to' space unless they are old enough to be promoted to the old generation (in which case they are copied there).
All of this involves object relocation. To do this safely any objects in the old generation that point to objects in the new generation (that are being relocated during a minor GC) must have those references updated. The more objects that have references to objects being relocated, the more work the GC has to do during a minor GC.
If you use only immutable objects the number of objects that will contain pointers from the old gen. to the young gen. will be very small (most likely zero). There are only two ways this could happen:
An object is promoted to the old gen. whilst an object it refers to
is still in a survivor space.
An object is large enough to be allocated directly in the old gen. and refers to an object in the young gen.
To summarise the answer, by using immutable objects you're reducing the possible number of object references that the GC has to update during a minor collection, therefore improving its efficiency.

What exactly happens when the collector comes across such an object in the old generation?How does it handle it?
Typical it can't, before the collector considers collecting objects in the older generation it promotes anything it could not reap in the younger generation so by the time the collectors considers if the parent object can be collected they are no longer in the younger generation. I think they issue is what happens when it comes across the object in the younger generation, it has to skip it, and could skip it hundreds or thousands of times as it does the younger generation GCs before it has to do one on the older generation.
Why should it be any more cumbersome than collecting a younger object referring to an object in the tenured generation?
Being referenced by an older generation object means it is effectively frozen in the young generation, being referenced by a younger generation object is no issue as the younger generations are all resolved before it starts collecting from the older generation.
I think as long as you are disciplined about de-referencing all your unused objects then it will not hurt your GC efficiency but that can be a lot of extra work in a big application.

Why gc on old generation takes longer than gc on young generation

While performing GC, the JVM go over the live objects, and sweep unmarked objects.
According to:
How to Tune Java Garbage Collection
"The execution time of Full GC is relatively longer than that of Minor GC"
Will this always be the case ?
If we have ~100 objects in the Old Generation space, and the average number of live objects (created and sweep objected) in the eden space is more than 100, it that still true ?
In addition , suppose we perform compact phase , then a rule of thumb says for better performance copy a small number of large size objects than copy large number of small size objects.
So what am I missing here ?

"The execution time of Full GC is relatively longer than that of Minor
GC"
Yes.
When garbage collection happens memory is divided into generations, i.e. separate pools holding objects of different ages. Almost all most used configurations uses two generations, one for young objects (Young Generation) and one for old objects (Old Generation)
Different algorithms can be used to perform garbage collection in the different generations, each algorithm optimized based on commonly observed characteristics for that particular generation.
Generational garbage collection exploits the following observations, known as the weak generational hypothesis, regarding applications written in several programming languages, including the Java programming language:
Most allocated objects are not referenced (considered live) for long, that is, they die young.
Few references from older to younger objects exist.
Young generation collections occur relatively frequently and are efficient and fast because the young generation space is usually small and likely to contain a lot of objects that are no longer referenced.
Objects that survive some number of young generation collections are eventually promoted, or tenured, to the old generation.
This generation is typically larger than the young generation and its occupancy
grows more slowly. As a result, old generation collections are infrequent, but take significantly longer to complete.
The garbage collection algorithm chosen for a young generation typically puts a premium on speed, since young generation collections are frequent.
On the other hand, the old generation is typically managed by an algorithm
that is more space efficient, because the old generation takes up most of the heap and old generation algorithms have to work well with low garbage densities.
Read this white paper for a better understanding. The above content is referenced from there.

some questions on Garbage Collection internals?

I am trying to understand how Garbage collection process works. Came across good link .
Most of the articles says that during minor GC collection object is moved from eden to survivor space and during major GC collection
object is moved from survivor to tenured space otherwise all unreachable objects memory is reclaimed. I have three questions(need to ask
in single go as they are related) based on above statements :-
1)Minor vs Major GC collection ? What is the difference between two that one is called major and other is called minor collection?
As per my understanding during minor collection happens in parallel to application run while major collection makes application to
pause during that period.
2) What actually happens when object is moved from eden to survivor space ? Does the memory location of object is changed internally?
3) Why not just one space exist instead of three i.e eden, survivor and tenured space exist ? I know there is must be a reason behind it but i am missing it.
My point is when GC runs , collect unreachable object and leaves the reachable ones in that space only. Just one space seems to be sufficient. So what advantage three different
spaces are proving over one?

1) Minor GC occurs on new generation, major GC occurs on old generation. Whether it is parallel to the application or not depends on the kind of GC, only CMS and G1 can work concurrently
2) Yes, moving object during GC changes its physical location so all pointers to this object will be updated
3) This is to avoid often and long application freezing during GC. If it was one big heap then application would often freeze for long periods of time. JVM creates objects in small young generation, GCs in it occur frequently but quickly. Most objects created by JVM die quickly and they never get to old generation, so major GC happens rarily or it may never happen at all.

Source for my answers is this Oracle article on GC basics, so these answers would apply for HotSpot. No clue as to other VMs, although I would guess that the general idea might remain the same if the same implementation techniques were used in other VMs.
Minor vs Major GC collection? What is the difference between two that one is called major and other is called minor collection?
Minor GC is GC of the young generation, where new objects are allocated. Major GC is GC of all live objects, including the permanent generation (which is a bit interesting to me, but that's what the article says). Also, it appears that both major and minor GC are stop-the-world events.
What actually happens when object is moved from eden to survivor space? Does the memory location of object is changed internally?
I can't seem to find a reference at the moment, but I would assume so. Allowing for memory location to be changed lets compaction be performed, which improves memory allocation performance and ease. Allowing each space to be compacted separately makes sense, so I would guess that moving an object from one part of the heap to another would involve physically moving the object from one memory location to another.
Why not just one space exist instead of three (i.e eden, survivor and tenured space) exist?
Short answer: efficiency. If you have only one space, you'd have to check all objects when you GC, which becomes inefficient if you have lots of long-lived objects (and you're almost guaranteed to have a decent number in a long-running application), as those long-lived objects are likely to still be reachable from one GC to the next. Splitting the heap allows for GC to be optimized, as most of the GC efforts can be concentrated where object life can be assumed to be short (i.e. young generation), with longer-living objects being GC'd less frequently.

How generation help garbage collector?

I've read few articles about how garbage collection works and still don't understand how using generations helps? As I understood the main idea is that we start collection from the youngest generation and move to older generations. But why the authors of this idea decided that starting from the youngest generation is the most efficient way?

The older the generation, means object has been used quite a many times, and possibly will need again.
Removing recently created object makes no sense, May be its temporary(scope : local) object.

The authors start with the youngest generation first simply because that's what gets filled up first after your application starts, however in reality which generation is being swept and when is non-deterministic as your application runs.
The important points with generational GC are:
the young generation uses a copying collector which is copying objects to a space that it considers to be empty (the unused survivor spaces) from eden and the current survivor space and is therefore fast and the GC pause is minimal.
add to this fact that most objects die young and therefore the pause required to copy a small number of surviving objects from the eden and the current surviver space is small as only objects with live references are copied, after which eden and the previous survivor space can be wiped.
after being copied several times objects are copied to the tenured (old) generation; Eventually the tenured generation will fill up, however, this time there's not a clean space to copy the objects to, so the garbage collector has to sweap and compact within the generation, which is slow (when compared to the copy performed in eden and the survivor space) meaning a longer pause.
the good news, based on the most objects die young heuristic is, major GCs happen much less frequently than minor keeping GC pauses to a minimum over the lifetime of an application.
there's also a benefit that all new objects are allocated on the top of the heap, meaning there's mininal instructions required to do so, with defragmentation occurring naturally as part of the copy process.
Both these pages, Oracle Garbage Collection Tuning and Useful JVM Flags – Part 5 (Young Generation Garbage Collection), describe this.

Read this one.
Using different generations, makes the allocation of objects easy and fast as MOST of the allocations are done in a single region of Heap - Eden. Based on the observation that most objects die young from Weak Generational Hypothesis, collections in Young generation have more garbage which will reclaim more memory and its relatively small compared to the heap which means that time taken to scan the objects is also less. Thats why Young generation GCs are fast.
For more details on GC and generations, you can refer to this

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.