I've read few articles about how garbage collection works and still don't understand how using generations helps? As I understood the main idea is that we start collection from the youngest generation and move to older generations. But why the authors of this idea decided that starting from the youngest generation is the most efficient way?
The older the generation, means object has been used quite a many times, and possibly will need again.
Removing recently created object makes no sense, May be its temporary(scope : local) object.
The authors start with the youngest generation first simply because that's what gets filled up first after your application starts, however in reality which generation is being swept and when is non-deterministic as your application runs.
The important points with generational GC are:
the young generation uses a copying collector which is copying objects to a space that it considers to be empty (the unused survivor spaces) from eden and the current survivor space and is therefore fast and the GC pause is minimal.
add to this fact that most objects die young and therefore the pause required to copy a small number of surviving objects from the eden and the current surviver space is small as only objects with live references are copied, after which eden and the previous survivor space can be wiped.
after being copied several times objects are copied to the tenured (old) generation; Eventually the tenured generation will fill up, however, this time there's not a clean space to copy the objects to, so the garbage collector has to sweap and compact within the generation, which is slow (when compared to the copy performed in eden and the survivor space) meaning a longer pause.
the good news, based on the most objects die young heuristic is, major GCs happen much less frequently than minor keeping GC pauses to a minimum over the lifetime of an application.
there's also a benefit that all new objects are allocated on the top of the heap, meaning there's mininal instructions required to do so, with defragmentation occurring naturally as part of the copy process.
Both these pages, Oracle Garbage Collection Tuning and Useful JVM Flags – Part 5 (Young Generation Garbage Collection), describe this.
Read this one.
Using different generations, makes the allocation of objects easy and fast as MOST of the allocations are done in a single region of Heap - Eden. Based on the observation that most objects die young from Weak Generational Hypothesis, collections in Young generation have more garbage which will reclaim more memory and its relatively small compared to the heap which means that time taken to scan the objects is also less. Thats why Young generation GCs are fast.
For more details on GC and generations, you can refer to this
Related
What is the criteria to put a young object in an old region making it an old object or keep it in survivor regions?
The point 4 of Young GC of the official tutorial state:
"Live objects are evacuated (i.e., copied or moved) to one or more
survivor regions. If the aging threshold is met, some of the objects
are promoted to old generation regions."
But I can't find what that criteria is.
EDIT:
Amit Bhati pointed me to the MaxTenuringThreshold parameter. I don't understand much from the official doc about it but I think I started to understand how it works.
With your help I think I found the answer here:
-XX:InitialTenuringThreshold=7 Sets the initial tenuring threshold for use in adaptive GC sizing in the parallel young collector. The
tenuring threshold is the number of times an object survives a young
collection before being promoted to the old, or tenured, generation.
-XX:MaxTenuringThreshold=n Sets the maximum tenuring threshold for use in adaptive GC sizing. The current largest value is 15. The default
value is 15 for the parallel collector and is 4 for CMS.
It is under the Debugging Options title :)
Under Garbage First (G1) Garbage Collection Options you can find this:
-XX:MaxTenuringThreshold=n Maximum value for tenuring threshold. The default value is 15.
It is not very descriptive if you have not read InitialTenuringThreshold description on the other section. It seems InitialTenuringThreshold is not a valid G1 option but I think the algorithm is the described there.
The following doc is good at explaining how to alter (reduce) the rate at which items are promoted from the survivor spaces to the Old Gen in the G1 collector.
http://java-is-the-new-c.blogspot.co.uk/2013/07/tuning-and-benchmarking-java-7s-garbage.html (the section entitled Tuning The Young Generation)
As the above answers say, the MaxTenuringThreshold is the key setting, but this is only an upper limit, and would be ignored if your YoungGen wasn't big enough to allow this to be honoured. In which case you'd need to increase either the overall YoungGen via NewRatio or just the SurvivorSpace via SurvivorRatio
From the Javadocs:
The heap space is divided into the old and the new generation. The new
generation includes the new object space (eden), and two survivor
spaces. The JVM allocates new objects in the eden space, and moves
longer lived objects from the new generation to the old generation.
The young generation uses a fast copying garbage collector which
employs two semi-spaces (survivor spaces) in the eden, copying
surviving objects from one survivor space to the second. Objects that
survive multiple young space collections are tenured, meaning they are
copied to the tenured generation. The tenured generation is larger and
fills up less quickly. So, it is garbage collected less frequently;
and each collection takes longer than a young space only collection.
Collecting the tenured space is also referred to as doing a full
generation collection.
The frequent young space collections are quick (a few milliseconds),
while the full generation collection takes a longer (tens of
milliseconds to a few seconds, depending upon the heap size).
Other GC algorithms, such as the Concurrent Mark Sweep (CMS)
algorithm, are incremental. They divide the full GC into several
incremental pieces. This provides a high probability of small pauses.
This process comes with an overhead and is not required for enterprise
web applications.
Also check this article: Java Garbage Collectors – Moving to Java7 Garbage-First (G1) Collector
The young generation comprises of one Eden and two Survivor spaces.
The live objects in Eden are copied to the initially empty survivor
space, labeled S1 in the figure, except for ones that are too large to
fit comfortably in the S1 space. Such objects are directly copied to
the old generation. The live objects in the occupied survivor space
(labeled S0) that are still relatively young are also copied to the
other survivor space, while objects that are relatively old are copied
to the old generation. If the S1 space becomes full, the live objects
from Eden or S0 that have not been copied to it are tenured,
regardless of their age. Any objects remaining in Eden or the S0 space
after live objects have been copied are not live and need not be
examined. Figure below illustrates the heap after young generation
collection:
The young generation collection leads to stop the world pause. After
collection, eden and one survivor space are empty. Now let’s see how
CMS handles old generation collection. It essentially consists of two
major steps – marking all live objects and sweeping them.
While performing GC, the JVM go over the live objects, and sweep unmarked objects.
According to:
How to Tune Java Garbage Collection
"The execution time of Full GC is relatively longer than that of Minor GC"
Will this always be the case ?
If we have ~100 objects in the Old Generation space, and the average number of live objects (created and sweep objected) in the eden space is more than 100, it that still true ?
In addition , suppose we perform compact phase , then a rule of thumb says for better performance copy a small number of large size objects than copy large number of small size objects.
So what am I missing here ?
"The execution time of Full GC is relatively longer than that of Minor
GC"
Yes.
When garbage collection happens memory is divided into generations, i.e. separate pools holding objects of different ages. Almost all most used configurations uses two generations, one for young objects (Young Generation) and one for old objects (Old Generation)
Different algorithms can be used to perform garbage collection in the different generations, each algorithm optimized based on commonly observed characteristics for that particular generation.
Generational garbage collection exploits the following observations, known as the weak generational hypothesis, regarding applications written in several programming languages, including the Java programming language:
Most allocated objects are not referenced (considered live) for long, that is, they die young.
Few references from older to younger objects exist.
Young generation collections occur relatively frequently and are efficient and fast because the young generation space is usually small and likely to contain a lot of objects that are no longer referenced.
Objects that survive some number of young generation collections are eventually promoted, or tenured, to the old generation.
This generation is typically larger than the young generation and its occupancy
grows more slowly. As a result, old generation collections are infrequent, but take significantly longer to complete.
The garbage collection algorithm chosen for a young generation typically puts a premium on speed, since young generation collections are frequent.
On the other hand, the old generation is typically managed by an algorithm
that is more space efficient, because the old generation takes up most of the heap and old generation algorithms have to work well with low garbage densities.
Read this white paper for a better understanding. The above content is referenced from there.
I am trying to understand how Garbage collection process works. Came across good link .
Most of the articles says that during minor GC collection object is moved from eden to survivor space and during major GC collection
object is moved from survivor to tenured space otherwise all unreachable objects memory is reclaimed. I have three questions(need to ask
in single go as they are related) based on above statements :-
1)Minor vs Major GC collection ? What is the difference between two that one is called major and other is called minor collection?
As per my understanding during minor collection happens in parallel to application run while major collection makes application to
pause during that period.
2) What actually happens when object is moved from eden to survivor space ? Does the memory location of object is changed internally?
3) Why not just one space exist instead of three i.e eden, survivor and tenured space exist ? I know there is must be a reason behind it but i am missing it.
My point is when GC runs , collect unreachable object and leaves the reachable ones in that space only. Just one space seems to be sufficient. So what advantage three different
spaces are proving over one?
1) Minor GC occurs on new generation, major GC occurs on old generation. Whether it is parallel to the application or not depends on the kind of GC, only CMS and G1 can work concurrently
2) Yes, moving object during GC changes its physical location so all pointers to this object will be updated
3) This is to avoid often and long application freezing during GC. If it was one big heap then application would often freeze for long periods of time. JVM creates objects in small young generation, GCs in it occur frequently but quickly. Most objects created by JVM die quickly and they never get to old generation, so major GC happens rarily or it may never happen at all.
Source for my answers is this Oracle article on GC basics, so these answers would apply for HotSpot. No clue as to other VMs, although I would guess that the general idea might remain the same if the same implementation techniques were used in other VMs.
Minor vs Major GC collection? What is the difference between two that one is called major and other is called minor collection?
Minor GC is GC of the young generation, where new objects are allocated. Major GC is GC of all live objects, including the permanent generation (which is a bit interesting to me, but that's what the article says). Also, it appears that both major and minor GC are stop-the-world events.
What actually happens when object is moved from eden to survivor space? Does the memory location of object is changed internally?
I can't seem to find a reference at the moment, but I would assume so. Allowing for memory location to be changed lets compaction be performed, which improves memory allocation performance and ease. Allowing each space to be compacted separately makes sense, so I would guess that moving an object from one part of the heap to another would involve physically moving the object from one memory location to another.
Why not just one space exist instead of three (i.e eden, survivor and tenured space) exist?
Short answer: efficiency. If you have only one space, you'd have to check all objects when you GC, which becomes inefficient if you have lots of long-lived objects (and you're almost guaranteed to have a decent number in a long-running application), as those long-lived objects are likely to still be reachable from one GC to the next. Splitting the heap allows for GC to be optimized, as most of the GC efforts can be concentrated where object life can be assumed to be short (i.e. young generation), with longer-living objects being GC'd less frequently.
We know that there's few main memory domains: Young, Tenured (Old gen) and PermGen.
Young domain is divided into Eden and Survivor (with two).
OldGen is for surviving objects.
MaxTenuringThreshold keeps objects from being finally copied to the OldGen space too early. It's pretty clear and understandable.
But how does it work? How is the garbage collector dealing with these objects which are still surviving till MaxTenuringThreshold and in what way? Where are they located?
Objects are being copied back to Survivor spaces for garbage collection.. or does it happen somehow else?
Each object in Java heap has a header which is used by Garbage Collection (GC) algorithm. The young space collector (which is responsible for object promotion) uses a few bit(s) from this header to track the number of collections object that have survived (32-bit JVM use 4 bits for this, 64-bit probably some more).
During young space collection, every single object is copied. The Object may be copied to one of survival spaces (one which is empty before young GC) or to the old space. For each object being copied, GC algorithm increases it's age (number of collection survived) and if the age is above the current tenuring threshold it would be copied (promoted) to old space. The Object could also be copied to the old space directly if the survival space gets full (overflow).
The journey of Object has the following pattern:
allocated in eden
copied from eden to survival space due to young GC
copied from survival to (other) survival space due to young GC (this could happen few times)
promoted from survival (or possible eden) to old space due to young GC (or full GC)
the actual tenuring threshold is dynamically adjusted by JVM, but MaxTenuringThreshold sets an upper limit on it.
If you set MaxTenuringThreshold=0, all objects will be promoted immediately.
I have few articles about java garbage collection, there you can find more details.
(Disclaimer: This covers HotSpot VM only)
As Alexey states, the actually used tenuring threshold is determined by the JVM dynamically. There is very little value in setting it. For most applications the default value of 15 will be high enough, as usually way more object survive the collection.
When many objects survive the collection, the survivor spaces overflow directly to old. This is called premature promotion and an indicator of a problem. However it seldom can be solved by tuning MaxTenuringThreshold.
In those cases sometimes SurvivorRatio might be used to increase the space in the survivor spaces, allowing the tenuring to actually work.
However, most often enlarging the young generation is the only good choice (from configuration perspective).
If you are looking from coding perspective, you should avoid excess object allocation to let tenuring work as designed.
To answer exactly what you asked:
When an object reaches its JVM determinded tenuring threshold, it is copied to old. Before that, it will be copied to the empty survivor space. Objects that have been surviving a while but are de-referenced before reaching the threshold are cleaned from survivor very efficiently.
As I understand, a generational GC divides objects into generations.
And on each cycle, GC runs on only one generation.
Why? Why Garbage Collecting of only one generation is enough?
P.S: I understand all these from here .
If you read the link I provided in earlier question you had about Generational GC, you will understand why it does so, the cycle is when the white set memory is filled up.
To optimize for this scenario, memory
is managed in generations, or memory
pools holding objects of different
ages. Garbage collection occurs in
each generation when the generation
fills up. Objects are allocated in a
generation for younger objects or the
young generation, and because of
infant mortality most objects die
there. When the young generation fills
up it causes a minor collection. Minor
collections can be optimized assuming
a high infant mortality rate. The
costs of such collections are, to the
first order, proportional to the
number of live objects being
collected. A young generation full of
dead objects is collected very
quickly. Some surviving objects are
moved to a tenured generation. When
the tenured generation needs to be
collected there is a major collection
that is often much slower because it
involves all live objects.
Basically, each objects is divided into generations (based on the hypothesis about the object) and places them into a memory heap for a particular generation. When that memory heap is filled up, the GC cycle begins, and those objects that still references are moved to another memory heap and fresh objects are added.
It's not always enough -- it's just that it's usually enough, so it saves time by not examining objects that are likely to stay alive anyway.
Every object has a generation, saying how many garbage collections it has survived. If an object has survived a few garbage collections, chances are that it will also survive the next one.
MSDN has a great explanation:
A generational garbage collector makes the following assumptions:
The newer an object is, the shorter its lifetime will be.
The older an object is, the longer its lifetime will be.
Newer objects tend to have strong relationships to each other and are frequently accessed around the same time.
Compacting a portion of the heap is faster than compacting the whole heap.
Because of this, you could save some time by only trying to collect younger objects, and collecting the older generations only if that doesn't free up enough memory.
The answer is there really.
It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or the generational hypothesis).
And
Generational garbage collection is a heuristic approach, and some unreachable objects may not be reclaimed on each cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to reclaim all available space.
Basically, generational collection gives you better performance over a full garbage collection at the cost of completeness. That's why a mixture of the two is used in practice.