Why Major Garbage collection is slower than Minor?

Why Major Garbage collection is slower than Minor? - java

Gone thru this link but still
has confusion what actually happens in minor and major GC collection.
Say i have 100 objects in younger generation out of which 85 object are unreachabe objects. Now when Minor GC runs,
it will reclaim the memory of 85 objects and move 15 objects to older(tenured) generation.
Now 15 live objects exists in older generation out of which 3 are unreachable. Say Major GC takes places. It will keep
15 objects as it is and reclaim the memory for 3 unreachable object. Major GC is said to be slower than minor GC. My question is why ? Is it because of major GC happens on generally greater number of objects than minor as minor gc occurs more frequently than major?
As per understanding major GC should be faster as it needs to do less work i.e reclaiming memory from unreachable objects than minor GC because
high mortality rate in young generation.

1) Minor GC will first move 15 objects to one of survivor spaces, eg SS1, next GC will move those who are still alive to SS2, next GC will move those who survived back to SS1 and so forth. Only those who survived several (eg 8) relocations (minor GCs) will finally go to old generation.
2) Major GC happens only when JVM cannot allocate an object in old generation because there is no free space in it. To clean memory from dead objects GC goes over all objects in old generation, since old generation is several times larger than new generation, it may hold several times more objects, so GC processing will take several times longer

My question is why? Is it because of major GC happens on generally greater number of objects than minor as minor gc occurs more frequently than major?
You pretty much hit the nail on its head. From the Oracle article, emphasis mine:
Often a major collection is much slower because it involves all live objects.
So not only does a major GC analyze those 15 objects in the old generation, it also goes through the young generation (again) and permgen and GCs those areas of the heap. Minor GC only analyzes the young generation, so there generally wouldn't be as many objects to look at.
As per understanding major GC should be faster as it needs to do less work (i.e reclaiming memory from unreachable objects) than minor GC because high mortality rate in young generation.
I think I understand why you think that. I could imagine that major GC could be run very soon after a minor GC, when objects are promoted to an almost-full old generation. Thus, the young generation would (presumably) not contain too many objects to collect.
However, if I'm remembering things correctly, the old generation is usually larger than the young generation, so not only does the GC have to analyze more space, it also has to go over permgen again, as well as the remaining objects in the young generation (again). So that would probably be why major GC is slower -- simply because there's more stuff to do. You might be able to make major GC faster than minor GC by changing the sizes of the generation spaces such that the young generation is larger than both the old generation and permgen, but I don't think that would be a common setting to use...

Related

Understanding GC: Allocation failure and filled OldGen with temporary String objects

I have followed up with a couple of good questions and their answers but I still have a doubt.
This is what I understand and would like to see if the understanding is correct.
GC (Allocation Failure) kicks in whenever new memory is to be allocated on YoungGen.
Also, the fact that depending on the size of the object, some objects might have to be pushed to OldGen and significantly larger objects could directly be moved to OldGen.
Application Behavior: The reason for 'Allocation Failure' was the creation of huge strings. On debugging further with JFR and HeapDump, everything points to a lot of char[] and String objects which are created in our system on a temporary basis (i.e. YoungGen candidate). Some of these strings indeed are huge (~25KB each). Although, there was enough space available in the YoungGen as per the error message and Heap is not even close to maximum memory possible.
During the same time, OldGen was increasing and was not getting cleaned even after full GC. There could be another memory leak but there is nothing that points to that. So, I don't understand why OldGen remains at the same level even after the full GC.
Apart from the validation of my understanding, the question is: Can the creation of a lot of temporary String/char[] objects (via strA + strB, new String()/StringBuilder().toString(), String.split(), String.substring(), Stream->buffer conversion etc.) cause GC to run very frequently even when the application has a lot of memory available in the YoungGen and heap in general? If yes, when and what are the alternatives?
Thanks!

I would say that the answer is a conditional yes.
Remember that young gen is split into 3 parts, eden, S0 and S1 which means that you do not have as much memory in young gen as you might think. If you overflow one of the survivor spaces, the remainder will be pushed to old gen (premature promotion), filling up old gen. Note also that promotion from young gen to old is based on the number of gc cycles. If you have frequent young gen gc where objects supposed to be short-lived are moved to old gen (because you have not finished with the temp objects), then you will fill up old gen. Note also that that just because you do a full gc, there are no guarantees that you will actually get any memory back.
So, use a tool like censum to analyse your gc logs and look especially for premature promotion.
It might be that you will have to resize your young gen/old gen ratio.

some questions on Garbage Collection internals?

I am trying to understand how Garbage collection process works. Came across good link .
Most of the articles says that during minor GC collection object is moved from eden to survivor space and during major GC collection
object is moved from survivor to tenured space otherwise all unreachable objects memory is reclaimed. I have three questions(need to ask
in single go as they are related) based on above statements :-
1)Minor vs Major GC collection ? What is the difference between two that one is called major and other is called minor collection?
As per my understanding during minor collection happens in parallel to application run while major collection makes application to
pause during that period.
2) What actually happens when object is moved from eden to survivor space ? Does the memory location of object is changed internally?
3) Why not just one space exist instead of three i.e eden, survivor and tenured space exist ? I know there is must be a reason behind it but i am missing it.
My point is when GC runs , collect unreachable object and leaves the reachable ones in that space only. Just one space seems to be sufficient. So what advantage three different
spaces are proving over one?

1) Minor GC occurs on new generation, major GC occurs on old generation. Whether it is parallel to the application or not depends on the kind of GC, only CMS and G1 can work concurrently
2) Yes, moving object during GC changes its physical location so all pointers to this object will be updated
3) This is to avoid often and long application freezing during GC. If it was one big heap then application would often freeze for long periods of time. JVM creates objects in small young generation, GCs in it occur frequently but quickly. Most objects created by JVM die quickly and they never get to old generation, so major GC happens rarily or it may never happen at all.

Source for my answers is this Oracle article on GC basics, so these answers would apply for HotSpot. No clue as to other VMs, although I would guess that the general idea might remain the same if the same implementation techniques were used in other VMs.
Minor vs Major GC collection? What is the difference between two that one is called major and other is called minor collection?
Minor GC is GC of the young generation, where new objects are allocated. Major GC is GC of all live objects, including the permanent generation (which is a bit interesting to me, but that's what the article says). Also, it appears that both major and minor GC are stop-the-world events.
What actually happens when object is moved from eden to survivor space? Does the memory location of object is changed internally?
I can't seem to find a reference at the moment, but I would assume so. Allowing for memory location to be changed lets compaction be performed, which improves memory allocation performance and ease. Allowing each space to be compacted separately makes sense, so I would guess that moving an object from one part of the heap to another would involve physically moving the object from one memory location to another.
Why not just one space exist instead of three (i.e eden, survivor and tenured space) exist?
Short answer: efficiency. If you have only one space, you'd have to check all objects when you GC, which becomes inefficient if you have lots of long-lived objects (and you're almost guaranteed to have a decent number in a long-running application), as those long-lived objects are likely to still be reachable from one GC to the next. Splitting the heap allows for GC to be optimized, as most of the GC efforts can be concentrated where object life can be assumed to be short (i.e. young generation), with longer-living objects being GC'd less frequently.

How generation help garbage collector?

I've read few articles about how garbage collection works and still don't understand how using generations helps? As I understood the main idea is that we start collection from the youngest generation and move to older generations. But why the authors of this idea decided that starting from the youngest generation is the most efficient way?

The older the generation, means object has been used quite a many times, and possibly will need again.
Removing recently created object makes no sense, May be its temporary(scope : local) object.

The authors start with the youngest generation first simply because that's what gets filled up first after your application starts, however in reality which generation is being swept and when is non-deterministic as your application runs.
The important points with generational GC are:
the young generation uses a copying collector which is copying objects to a space that it considers to be empty (the unused survivor spaces) from eden and the current survivor space and is therefore fast and the GC pause is minimal.
add to this fact that most objects die young and therefore the pause required to copy a small number of surviving objects from the eden and the current surviver space is small as only objects with live references are copied, after which eden and the previous survivor space can be wiped.
after being copied several times objects are copied to the tenured (old) generation; Eventually the tenured generation will fill up, however, this time there's not a clean space to copy the objects to, so the garbage collector has to sweap and compact within the generation, which is slow (when compared to the copy performed in eden and the survivor space) meaning a longer pause.
the good news, based on the most objects die young heuristic is, major GCs happen much less frequently than minor keeping GC pauses to a minimum over the lifetime of an application.
there's also a benefit that all new objects are allocated on the top of the heap, meaning there's mininal instructions required to do so, with defragmentation occurring naturally as part of the copy process.
Both these pages, Oracle Garbage Collection Tuning and Useful JVM Flags – Part 5 (Young Generation Garbage Collection), describe this.

Read this one.
Using different generations, makes the allocation of objects easy and fast as MOST of the allocations are done in a single region of Heap - Eden. Based on the observation that most objects die young from Weak Generational Hypothesis, collections in Young generation have more garbage which will reclaim more memory and its relatively small compared to the heap which means that time taken to scan the objects is also less. Thats why Young generation GCs are fast.
For more details on GC and generations, you can refer to this

What is the normal behavior of Java GC and Java Heap Space usage?

I am unsure whether there is a generic answer for this, but I was wondering what the normal Java GC pattern and java heap space usage looks like. I am testing my Java 1.6 application using JMeter. I am collecting JMX GC logs and plotting them with JMeter JMX GC and Memory plugin extension. The GC pattern looks quite stable with most GC operations being 30-40ms, occasional 90ms. The memory consumption goes in a saw-tooth pattern. The JHS usage grows constantly upwards e.g. to 3GB and every 40 minutes the memory usage does a free-fall drop down to around 1GB. The max-min delta however grows, so the sawtooth height constantly grows. Does it do a full GC every 40mins?

Most of your descriptions in general, are how the GC works. However, none of your specific observations, especially numbers, hold for general case.
To start with, each JVM has one or several GC implementations and you could choose which one to use. Take the mostly applied one i.e. SUN JVM (I like to call it this way) and the common server GC pattern as example.
Firstly, the memory are divided into 4 regions.
A young generation which holds all of the recently created objects. When this generation is full, GC does a stop-the-world collection by stopping your program from working, execute a black-gray-white algorithm and get the obselete objects and remove them. So this is your 30-40 ms.
If an object survived a certain rounds of GC in the young gen, it would be moved into a swap generation. The swap generation holds the objects until another number of GCs - then move them to the old generation. There are 2 swap generations which does a double buffering kind of thing to facilitate the young gen to work faster. If young gen dumps stuff to swap gen and found swap gen is mostly full, a GC would happen on swap gen and potentially move the survived objects to old gen. This most likely makes your 90ms, though I am not 100% sure how swap gen works. Someone correct me if I am wrong.
All the objects survived swap gen would be moved to the old generation. The old generation would only be GC-ed until it's mostly full. In your case, every 40 min.
There is another "permanent gen" which is used to load your jar target byte code and resources.
All size of the areas can be adjusted by JVM parameters.
You can try to use VisualVM which would give you a dynamic idea of how it works.
P.S. not all JVM / GC works the same way. If you use G1 collector, or JRocket, it might happens slightly different, but the general idea holds.

Java GC work in terms of generations of objects. There are young, tenure and permament generations. It seems like in your case: every 30-40ms GC process only young generation (and transfers survived objects into tenure generation). And every 40 mins it performs full collecting (it causes stop-the-world pause). Note: it happens not by time, but by percentage of used memory.
There are several JVM options, which allows you to chose generation's sizes, type of GC (there are several algorithms for GC, in java 1.6 Serial GC is used by default, for example -XX:-UseConcMarkSweepGC), parameters of GC work.
You'd better try to find good articles about generations and different types of GC (algorithms are really different, some of them allow to avoid stop-the-world pauses at all!)

yes, most likely. Instead of guessing you can use jstat to monitor your GCs.
I suggest you use a memory profiler to ensure there is nothing simple you can do ti improve the amount of garbage you are producing.
BTW, If you increase the size of the young generation, you can reduce how much garbage makes it into the tenured space reducing the frequency of full collections. You may find you less than one full collection per day if you tune it enough.
For a more extreme case, I have tuned a trading system to less than one collection per day (minor or major)

Generational Garbage Collection

As I understand, a generational GC divides objects into generations.
And on each cycle, GC runs on only one generation.
Why? Why Garbage Collecting of only one generation is enough?
P.S: I understand all these from here .

If you read the link I provided in earlier question you had about Generational GC, you will understand why it does so, the cycle is when the white set memory is filled up.
To optimize for this scenario, memory
is managed in generations, or memory
pools holding objects of different
ages. Garbage collection occurs in
each generation when the generation
fills up. Objects are allocated in a
generation for younger objects or the
young generation, and because of
infant mortality most objects die
there. When the young generation fills
up it causes a minor collection. Minor
collections can be optimized assuming
a high infant mortality rate. The
costs of such collections are, to the
first order, proportional to the
number of live objects being
collected. A young generation full of
dead objects is collected very
quickly. Some surviving objects are
moved to a tenured generation. When
the tenured generation needs to be
collected there is a major collection
that is often much slower because it
involves all live objects.
Basically, each objects is divided into generations (based on the hypothesis about the object) and places them into a memory heap for a particular generation. When that memory heap is filled up, the GC cycle begins, and those objects that still references are moved to another memory heap and fresh objects are added.

It's not always enough -- it's just that it's usually enough, so it saves time by not examining objects that are likely to stay alive anyway.
Every object has a generation, saying how many garbage collections it has survived. If an object has survived a few garbage collections, chances are that it will also survive the next one.
MSDN has a great explanation:
A generational garbage collector makes the following assumptions:
The newer an object is, the shorter its lifetime will be.
The older an object is, the longer its lifetime will be.
Newer objects tend to have strong relationships to each other and are frequently accessed around the same time.
Compacting a portion of the heap is faster than compacting the whole heap.
Because of this, you could save some time by only trying to collect younger objects, and collecting the older generations only if that doesn't free up enough memory.

The answer is there really.
It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or the generational hypothesis).
And
Generational garbage collection is a heuristic approach, and some unreachable objects may not be reclaimed on each cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to reclaim all available space.
Basically, generational collection gives you better performance over a full garbage collection at the cost of completeness. That's why a mixture of the two is used in practice.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.