I have been going through the garbage collection in java(jdk 6 hot spot JVM).I have few questions which I hope the community will help me to resolve.
What I understand:
1)Heap is divided into
a)Young generation -Eden and Survivor :New objects and arrays are
created into the young generation.Minor garbage collection will
operate in the young generation. Objects, that are still alive, will
be moved from the eden space to the survivor space.
b)Old generation/Tenured Generation:Major collection will move the still alive objects from young generation to old generation.
2)Non Heap is divided into
a)Code Cache
b)Perm generation.
What I want to know:
1)what if survivor gets full..how will minor garbage collection work.
2)When and how is the perm generation garbage collected.
3)Also what happens to the stack..where is it stored or residing?How is its size controlled?
When the survivor space is full, objects are moved into the old generation. Although, technically, most of the time when an object gets moved from the survivor space into the old generation, it’s not because the survivor space is full, but because the object has survived a certain number of minor collections, usually 10–15.
Very rarely. It’s mostly the binary code for Java classes, so space can only be freed up if a bunch of classes are unloaded from memory. Most programs use the same set of classes throughout the life of the program, so collecting the permanent generation is generally a waste of time. Basically Java will only do a collection here if it’s about to run out of memory.
The stack is something outside the heap, and its size is controlled by the fact that objects are only stored on the stack if they are guaranteed to have a limited lifetime. These are mostly local variables. Suppose you have a local StringBuilder variable that you use to build up the return value of a method. You never pass it outside your own method, and you call stringBuilder().toString() to create a new object at the end of the method. Since Java can tell that the StringBuilder object won’t outlive the running of the method, it can put it on the stack and deallocate it immediately when the method returns instead of passing it off to the garbage collector.
The stack size is controlled by being fixed at the point that it is created. If you ever try to use more space than is available on the stack, you will get a "stack overflow" exception.
A stack is the part of the memory. The local automatic variable is created on this stack and method arguments are passed. When a process starts, it get a default stack size which is fixed for each process. In today's operating system, generally, the default stack size is 1 Mb, which is enough for most of the process. Under abnormal condition, the stack limit exceeds. This is known as stack overflow.
Related
I have followed up with a couple of good questions and their answers but I still have a doubt.
This is what I understand and would like to see if the understanding is correct.
GC (Allocation Failure) kicks in whenever new memory is to be allocated on YoungGen.
Also, the fact that depending on the size of the object, some objects might have to be pushed to OldGen and significantly larger objects could directly be moved to OldGen.
Application Behavior: The reason for 'Allocation Failure' was the creation of huge strings. On debugging further with JFR and HeapDump, everything points to a lot of char[] and String objects which are created in our system on a temporary basis (i.e. YoungGen candidate). Some of these strings indeed are huge (~25KB each). Although, there was enough space available in the YoungGen as per the error message and Heap is not even close to maximum memory possible.
During the same time, OldGen was increasing and was not getting cleaned even after full GC. There could be another memory leak but there is nothing that points to that. So, I don't understand why OldGen remains at the same level even after the full GC.
Apart from the validation of my understanding, the question is: Can the creation of a lot of temporary String/char[] objects (via strA + strB, new String()/StringBuilder().toString(), String.split(), String.substring(), Stream->buffer conversion etc.) cause GC to run very frequently even when the application has a lot of memory available in the YoungGen and heap in general? If yes, when and what are the alternatives?
Thanks!
I would say that the answer is a conditional yes.
Remember that young gen is split into 3 parts, eden, S0 and S1 which means that you do not have as much memory in young gen as you might think. If you overflow one of the survivor spaces, the remainder will be pushed to old gen (premature promotion), filling up old gen. Note also that promotion from young gen to old is based on the number of gc cycles. If you have frequent young gen gc where objects supposed to be short-lived are moved to old gen (because you have not finished with the temp objects), then you will fill up old gen. Note also that that just because you do a full gc, there are no guarantees that you will actually get any memory back.
So, use a tool like censum to analyse your gc logs and look especially for premature promotion.
It might be that you will have to resize your young gen/old gen ratio.
An object can be promoted from Young Generation to Old Generation when it reaches the Tenuring Threshold or when the "TO" Survival Space is full when it is being transferred.
Therefore, my question is: In order to improve performance, if I know my object will be frequently used (referenced), is it possible to automatically/manually declare an object in Old/Permanent Generation so that not declaring it in the Eden would delay the necessity of Minor Garbage Collection, thus delaying the "Stop The World" event and improving the applications performance?
Generally:
No - not for a specific single object.
In more detail:
An allocation rougly looks the following:
Use thread local allocation buffer (TLAB), if tlab_top + size <= tlab_end. This is the fastest path. Allocation is just the tlab_top pointer increment.
If TLAB is almost full, create a new TLAB in the Eden space and retry in a fresh TLAB.
If TLAB remaining space is not enough but is still to big to discard, try to allocate an object directly in the Eden space. Allocation in the Eden space needs to be done using an atomic operation, since Eden is shared between all threads.
If allocation in the Eden space fails (eden_top + size > eden_end), typically a minor collection occurs.
If there is not enough space in the Eden space even after a Young GC, an attempt to allocate directly in the old generation is made.
"Hack":
The following parameter:
XX:PretenureSizeThreshold=size
This parameter is default set to 0, so it is deactivated. If set, it defines a size threshold for objects to be automatically allocated into the old generation.
BUT:
You should use this with care: Setting this parameter wrong may change your GCs behaviour drastically. And: Only a few percent of objects survive their first GC, so most objects don't have to be copied during the young GC.
Therefore, the young GC is very fast and you should not really need to "optimize" it by forcing object allocation to old generation.
Java parameters:
If you want to get an overview over possible Java paramters, run the following:
java -XX:+PrintVMOptions -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal -version
This will print all flags you can set.
Different garbage collectors:
Also keep in mind that there are different garbage collectors out there, and that it is planned that Java 9 should use the garbage first (G1) GC as default garbage collector, which again may handle big objects differently (by allocating them into humangous regions).
Additional source:
Stack overflow question: Size of Huge Objects directly allocated to Old Generation
You cannot create an object directly in old generation, it has to go through the eden space and survivor spaces (the young generation) before reaching the old generation. However, if you know that your objects are long lived (For example if your have implemented something like a cache) you can set the following JVM parameters:
-XX:InitialTenuringThreshold=7: Sets the initial tenuring threshold to use in adaptive GC sizing in the parallel young collector. The tenuring threshold is the number of times an object survives a young collection before being promoted to the old, or tenured, generation.
-XX:MaxTenuringThreshold=n: Sets the maximum tenuring threshold for use in adaptive GC sizing. The current largest value is 15. The default value is 15 for the parallel collector and is 4 for CMS.
Source: http://www.oracle.com/technetwork/articles/java/vmoptions-jsp-140102.html
So, you can reduce the application's tenuring threshold.
I actually did this and the stop the world GC time reduced for minor GC ( I had a huge 250GB JVM so the effect was quite profound
).
I am trying to understand how Garbage collection process works. Came across good link .
Most of the articles says that during minor GC collection object is moved from eden to survivor space and during major GC collection
object is moved from survivor to tenured space otherwise all unreachable objects memory is reclaimed. I have three questions(need to ask
in single go as they are related) based on above statements :-
1)Minor vs Major GC collection ? What is the difference between two that one is called major and other is called minor collection?
As per my understanding during minor collection happens in parallel to application run while major collection makes application to
pause during that period.
2) What actually happens when object is moved from eden to survivor space ? Does the memory location of object is changed internally?
3) Why not just one space exist instead of three i.e eden, survivor and tenured space exist ? I know there is must be a reason behind it but i am missing it.
My point is when GC runs , collect unreachable object and leaves the reachable ones in that space only. Just one space seems to be sufficient. So what advantage three different
spaces are proving over one?
1) Minor GC occurs on new generation, major GC occurs on old generation. Whether it is parallel to the application or not depends on the kind of GC, only CMS and G1 can work concurrently
2) Yes, moving object during GC changes its physical location so all pointers to this object will be updated
3) This is to avoid often and long application freezing during GC. If it was one big heap then application would often freeze for long periods of time. JVM creates objects in small young generation, GCs in it occur frequently but quickly. Most objects created by JVM die quickly and they never get to old generation, so major GC happens rarily or it may never happen at all.
Source for my answers is this Oracle article on GC basics, so these answers would apply for HotSpot. No clue as to other VMs, although I would guess that the general idea might remain the same if the same implementation techniques were used in other VMs.
Minor vs Major GC collection? What is the difference between two that one is called major and other is called minor collection?
Minor GC is GC of the young generation, where new objects are allocated. Major GC is GC of all live objects, including the permanent generation (which is a bit interesting to me, but that's what the article says). Also, it appears that both major and minor GC are stop-the-world events.
What actually happens when object is moved from eden to survivor space? Does the memory location of object is changed internally?
I can't seem to find a reference at the moment, but I would assume so. Allowing for memory location to be changed lets compaction be performed, which improves memory allocation performance and ease. Allowing each space to be compacted separately makes sense, so I would guess that moving an object from one part of the heap to another would involve physically moving the object from one memory location to another.
Why not just one space exist instead of three (i.e eden, survivor and tenured space) exist?
Short answer: efficiency. If you have only one space, you'd have to check all objects when you GC, which becomes inefficient if you have lots of long-lived objects (and you're almost guaranteed to have a decent number in a long-running application), as those long-lived objects are likely to still be reachable from one GC to the next. Splitting the heap allows for GC to be optimized, as most of the GC efforts can be concentrated where object life can be assumed to be short (i.e. young generation), with longer-living objects being GC'd less frequently.
I've read few articles about how garbage collection works and still don't understand how using generations helps? As I understood the main idea is that we start collection from the youngest generation and move to older generations. But why the authors of this idea decided that starting from the youngest generation is the most efficient way?
The older the generation, means object has been used quite a many times, and possibly will need again.
Removing recently created object makes no sense, May be its temporary(scope : local) object.
The authors start with the youngest generation first simply because that's what gets filled up first after your application starts, however in reality which generation is being swept and when is non-deterministic as your application runs.
The important points with generational GC are:
the young generation uses a copying collector which is copying objects to a space that it considers to be empty (the unused survivor spaces) from eden and the current survivor space and is therefore fast and the GC pause is minimal.
add to this fact that most objects die young and therefore the pause required to copy a small number of surviving objects from the eden and the current surviver space is small as only objects with live references are copied, after which eden and the previous survivor space can be wiped.
after being copied several times objects are copied to the tenured (old) generation; Eventually the tenured generation will fill up, however, this time there's not a clean space to copy the objects to, so the garbage collector has to sweap and compact within the generation, which is slow (when compared to the copy performed in eden and the survivor space) meaning a longer pause.
the good news, based on the most objects die young heuristic is, major GCs happen much less frequently than minor keeping GC pauses to a minimum over the lifetime of an application.
there's also a benefit that all new objects are allocated on the top of the heap, meaning there's mininal instructions required to do so, with defragmentation occurring naturally as part of the copy process.
Both these pages, Oracle Garbage Collection Tuning and Useful JVM Flags – Part 5 (Young Generation Garbage Collection), describe this.
Read this one.
Using different generations, makes the allocation of objects easy and fast as MOST of the allocations are done in a single region of Heap - Eden. Based on the observation that most objects die young from Weak Generational Hypothesis, collections in Young generation have more garbage which will reclaim more memory and its relatively small compared to the heap which means that time taken to scan the objects is also less. Thats why Young generation GCs are fast.
For more details on GC and generations, you can refer to this
I've read an extensive amount of documentation about the HotSpot GC of Java SE 6 and 7. When talking about strategies for obtaining contiguous regions of free memory, two 'competing' approaches are presented: that of Evacuation (usually applied on the young gen), where live objects are copied from 'from' to an empty 'to' and that of Compaction (fall-back of CMS), where live object are moved to one side inside a fragmented region to form a contiguous block of used an unused memory.
Both approaches are proportional to the size of the 'live set'. The difference is that evacuation requires x2 times space than the live set, where the compaction does not.
Why do we need the Evacuation technique at all? The amount of copying that needs to be done is the same, however it requires reservation of more heap size, and it does not allow for faster remapping of references.
True: the evacuation can be executed in parallel (where-as compaction cannot, or at least not as easily) but this trait is never mentioned and seems not that important (considering that remapping is much more expensive than moving).
One big problem is that with "evacuation" the vacated space is, indeed vacant, while with "compaction" some other object Y may be moved into the space where object X was. This makes it a lot harder to correct pointers, since one can't simply use the fact that a pointer points to an invalid location to clue the code that it needs to be updated. And one can't store the "forwarding pointer" in the "invalid" location.
This makes GC much less concurrent -- the app must be in "GC freeze" for a longer period of time.
Compaction is more suitable in cases where the number of reclaimable objects is expected to be low(e.g. Tenured generation) because after a few GC cycles the long living objects tend to occupy the lower portion of the heap and hence less work is needed to be done by the collector. If in such a case a copying collector is used that would perform very poorly because almost the same surviving objects from the previous cycles will need to be copied again and again from one location to the other.
Copying is suitable when the number of reclaimable objects is very high(e.g. Young generation) since very few surviving objects needs to be copied. If in such a case compaction is used that may perform poorly because the surviving objects may be scattered across the heap.
Other than that as mentioned in #Hot Licks answer Copying collector allows us to store a forwarding pointer which prevents from running into an infinite loop in case another object from the same "From" space refers to an already moved object.
Also, Compaction can not begin until all the live objects are identified, but live objects can be copied to the new location as soon as they are identified(using multiple threads).