Java - Marked/Unmarked Objects in Concurrent Mark Sweep GC - java

I was going through this link from oracle and just trying to understand/confirm some points.
1) For CMS phases - If an object is marked as "Reachable" it also means that the object is live? Or "Live" and "Reachable" aren't "One and the Same" ?
2) If something is not marked as "Reachable" that is by default, Unreachable ? Or the simple principle "If I haven't marked you as Reachable, you are unreachable" ?
2) Even though it doesn't explicitly mention, I am assuming that after a certain threshold (may be some time stamp or some counter) is met, all old generation (NOT marked as "Reachable") objects are cleaned?
I must say that the link is quite nice but I guess I am one of those readers who looks for explicitly "Yes/No" statements. So if anyone can confirm with a simple yes/no to those questions above it will do :).
Thanks a lot.

If an object is not marked. It's "Unreachable"
"Unreachable" objects are not dead yet. It still lives in memory. But it's useless since no objects has reference to it. Dead in this context means "Kicked out of the old generation space".
With CMS GC, you have to set a old generation usage threshold with JVM option, It has it's default value. After memory usage reached threshold it starts to sweep out "Unreachable" objects (now it is released from memory)

The general formal definition of what is colloquially called “live”, “not garbage” or “dead”, “garbage” considers only reachability.
Compare with The Java® Language Specification, §12.6.1. Implementing Finalization:
Every object can be characterized by two attributes: it may be reachable, finalizer-reachable, or unreachable, and it may also be unfinalized, finalizable, or finalized.
A reachable object is any object that can be accessed in any potential continuing computation from any live thread.
A finalizer-reachable object can be reached from some finalizable object through some chain of references, but not from any live thread.
An unreachable object cannot be reached by either means.
An unfinalized object has never had its finalizer automatically invoked.
A finalized object has had its finalizer automatically invoked.
A finalizable object has never had its finalizer automatically invoked, but the Java Virtual Machine may eventually automatically invoke its finalizer.
So yes¹, reachable means “live” and unreachable means “dead” or “garbage” and yes¹, not being reachable implies being unreachable and marking the reachable objects is the most straight-forward way to test unreachability.
¹ just because you said you like the answer in terms of “yes” or “no”
The 3rd point can’t be answered with “yes” or “no”, as there is no such thing as “cleaning”.
unfinalized objects are referenced by a special list. If these objects are only reachable through this special reference, they get enqueued for finalization which turn them finalizable. These object are not unreachable yet.
Note that the JVM optimizes this step as the majority of all object doesn’t actually need finalization. If a class inherits the finalize() method from java.lang.Object or has an empty finalize() method, it is considered a “trivial finalizer” and instances of this class are not added to the list of unfinalized objects in the first place. This also applies to finalize() methods consisting of a sole super.finalize() call to another trivial finalizer.
So unreachable objects are objects whose finalizer has been executed already or having a “trivial finalizer”. In either case, no action is needed to “clean” them. These objects are not like garbage on the street that has to be picked up and put into the bin. The memory location still contains what it contained when the object was alive, but it’s unused. In fact, it was already unused before the garbage collector detected it.
The key point to end an objects life cycle is to make the memory available to new allocations. The sweeping of the CMS implies going through the memory and add the addresses of unreachable objects to a list of free memory. This phase starts directly after the marking, but as the C in CMS suggests, concurrently.
An alternative is compacting where other still reachable objects are moved to the location of unreachable objects. And copying will move (aka copy) all reachable objects to a new memory region, making the entire source region available to new allocations.
Common to all alternatives is that they are not doing anything with the garbage to “clean” it. Even when being part of the free memory, their memory will still contain whatever it had before, until being actually occupied and hence overwritten by another object.

Related

Importance of phantomreference in java

I wanted to understand the below statement in bold. What does it means? (Link)
An object which overrides finalize() must now be determined to be
garbage in at least two separate garbage collection cycles in order to
be collected. When the first cycle determines that it is garbage, it
becomes eligible for finalization. Because of the (slim, but
unfortunately real) possibility that the object was "resurrected"
during finalization, the garbage collector has to run again before the
object can actually be removed. And because finalization might not
have happened in a timely fashion, an arbitrary number of garbage
collection cycles might have happened while the object was waiting for
finalization. This can mean serious delays in actually cleaning up
garbage objects, and is why you can get OutOfMemoryErrors even when
most of the heap is garbage.
What phantomreference solves
With PhantomReference, this situation is impossible -- when a PhantomReference
is enqueued, there is absolutely no way to get a pointer to the now-dead object (which is good, because it isn't in memory any longer).
Because PhantomReference cannot be used to resurrect an object, the object can
be instantly cleaned up during the first garbage collection cycle in which it is found to be phantomly reachable.
Please help me understand the problem & the solution
Thanks
Contrary to popular belief, finalize methods are not triggered when their associated objects are garbage-collected, but rather when their associated objects would have been garbage-collected but for the existence of their non-default finalize methods. Objects cannot actually be garbage-collected until the system can be 100% certain that no reference to them will ever exist, but the act of running a finalize method creates a strong rooted reference to the object in question which will exist at least until the method exits. If during the execution of finalize a reference to the object gets stored elsewhere, that reference could continue to exist indefinitely. Consequently, no object whose finalized method is going to be called, nor any other object to which such an object holds a direct or indirect strong reference, can be collected until after the finalize method has run and the next GC cycle confirms that no reference to the object exists anymore.
The PhantomReference class serves to encapsulate a different paradigm: rather than keeping an object alive so the system can notify it that it's been abandoned and the only reason it's still alive is so it can receive notification of abandonment, objects requiring cleanup should create helper objects to process notification of their abandonment. If the helper objects avoid keeping references to any outside objects they don't "own", their existence won't interfere with the collection of their parent object, or other objects to which the parents hold direct or indirect references. The helper objects generally won't hold enough information to let them "do much", but that's good because they shouldn't have to do much. Instead, their design should be focused on performing the cleanup that will be required if their parent is abandoned.

How does Java solve retain cycles in garbage collection?

I know that a retain cycle (at least in Objective-C and Swift) is when two objects claim ownership of one another (they have references to each other). And in Objective-C we can solve the issue by declaring one of them weak.
From what I have read and understood, the Java GC is not affected by retain cycles, and we do not have to worry about weak references. How does it solve it?
The Java (JVM) garbage collector works by looking for "reachable" objects - from the root(s) of the object tree. If they can't be reached (if they have no outside object references) then entire object graphs can be discarded.
Essentially it just just traverses the tree from root(s) to leaf nodes and marks all objects it encounters. Any memory not taken up by marked objects in the heap is swept (marked as free). This is called mark and sweep. img src
This can't be done easily in objective-c because it uses reference counting, not mark and sweep which has it's flaws
The reason there can be no retain cycles is because if they aren't linked to the "tree" anywhere, they aren't marked and can be discarded.
The garbage collector looks for reachable objects, starting from the roots (typically: variables on the call stack or global variables). So if two objects reference each other but are not otherwise reachable they won't be flagged as "live" and will be collected.
As the name suggests, Garbage Collection refers to removing of objects
which are no longer in use. It is a well known fact that irrespective
of their scope objects, Java stores objects in heap. Thus, if we keep
on creating objects without clearing the heap, our computers might run
out of heap space and we get ‘Out of Memory’ error. Garbage Collection
in Java is a mechanism which is controlled and executed by the Java
Virtual Machine (JVM) to release the heap space occupied by the
objects which are no more in use. In contrast to C++, garbage
collection in java relives the developer from the Memory Management
related activities. The JVM executes this process with the help of a
demon thread called the ‘Garbage Collector’. The garbage collector
thread first invokes the finalize method of the object. This performs
the cleanup activity on the said object. As a developer we cannot
force the JVM to run the garbage collector thread. Though there are
methods e.g Runtime.gc () or System.gc(), but none of these assures
the execution of garbage collector thread. These methods are used to
send garbage collection requests to the JVM. It is up to the Java
Virtual machine when it will initiate the garbage collection process.
Take a look at this stuff
How Garbage Collection works in Java
In basic terms, Garbage Collection works by walking the object graphs from a number of predefined roots. Anything not accessible from those roots is garbage, therefore one object referencing another is irrelevant unless either can be accessed from one or more roots.
It's all explained in more detail in How Garbage Collection Really Works.
The behavior of a tracing garbage collector may be viewed as analogous to that of a bowling alley pinsetter, which automatically sweeps up all pins that have been knocked over without disrupting pins that are still standing. Rather than trying to identify knocked-over pins, the pinsetter grabs all of the pins that are still standing, lifts them off the alley, and then runs a sweeper bar over the alley surface, removing wholesale any pins that might happen to be there without knowing or caring where they are.
A tracing GC works by visiting a certain set of "rooted" object references (which are regarded as always "reachable") and objects that are reachable via references held in reachable objects. The GC will mark such objects and protect their contents somehow. Once all such objects have been visited, the system will then visit some "special" objects (e.g. lists of weak or phantom references, or references to objects with finalizers) and others which are reachable from them but weren't reachable from ordinary rooted references, and then regard any storage which hasn't been guarded as eligible for reuse.
The system will need to specially treat objects that were reachable from special objects but weren't reachable from ordinary ones, but otherwise won't need to care about "ordinary" objects that become eligible for collection. If an object doesn't have a finalizer and isn't targeted by a weak or phantom reference, the GC may reuse its associated storage without ever bothering to look at any of it. There's no need for the GC to worry about the possibility that a group of objects that aren't reachable via any rooted references might hold references to each other because the GC wouldn't bother examining of those references even if they existed.

How to keep alive Java objects?

I'm just thinking about a way of keeping away Java objects from garbage collection even if it is not being referred for a reasonable amount of time.
How to do that?
Have a static container in your main class that you put a reference to the objects in. It can be a Map, List, whatever. Then you'll always have a reference to the object, and it won't be reclaimed. (Why you would want to do this is another question...)
Which is to say: As long as a reachable reference to an object exists, it will not be garbage-collected. If your code has a reference and tries to use it, the object will be there. You don't have to do anything special to make that happen (nor should you). (A reachable reference means that the reference is from something that is, itself, reachable from something other than the things it references. Put more simply: The GC understands about circular references and so can clean up A and B even if they refer to each other, as long as nothing else refers to either of them.)
[...] even if it is not being referred for a reasonable amount of time.
If there's any chance what so ever that an object will be accessed in the future, the object will not be garbage collected.
This is due to the fact that if you have a reference to the object, it won't be garbage collected, and if you don't have a reference to the object, there's no way you will be able to access it ever.
In other words, an ordinary reference will never mystically turn into a null just because the garbage collector observed that the object hadn't been accessed for a long time and thought it was time to reclaim it.
You could also create a static instance of the object in its own class. For example if it is a singleton, having a static instance field in the class.
There are mechanisms that will hold a reference to an object, but still allow it to be garbage collected, if there are no other references otherwise.
Look at WeakReference and SoftReference. If you want more details on reachability as far as the jvm is concerned, see:
http://download.oracle.com/javase/6/docs/api/java/lang/ref/package-summary.html#reachability
As far as time is concerned, the garbage collector doesn't know or care about how often an object is used. Either another object has a reference to the target (even if it's not using it), or there are no references to the target. If there are no references to the object, it could never be used again, and will eventually be freed (even if you wanted to, you couldn't obtain a reference to the object again) The longer-living an object is, the longer it takes for the jvm to free it, due to generational garbage collection.
I'm just thinking about a way of keeping away Java objects from garbage collection even if it is not being referred for a reasonable amount of time.
On the face of it, this question doesn't make sense. If an object is not referenced (or more a precisely, if it is not reachable) then the garbage collector will collect it. If you want to prevent an object from being garbage collected then you have to make sure that it is reachable. (Actually, it has to be strongly reachable to guarantee that it won't be GC'ed : see #Austen Holmes answer and the page that he references.)
But actually, I think that you are confusing "refered" / referenced / reachable with accessed or used; i.e. with the act of accessing a field or call a method of the object. If that is what you are asking, then I can assure that the garbage collector neither knows or cares whether your code has recently accessed / used an object.
The reachability criteria is actually about whether your code could access the object at some point in the future, and (therefore) whether the object needs to be kept so that this will work. The reachability rule means that if an object could be accessed, then it will be kept. It makes no difference how long it was since you last accessed it.

Java "dead" objects not being garbage collected

I know that during garbage collection in Java, objects that don't have any more references to them are marked as "dead" so that they can be deleted from memory by the garbage collector.
My question is if, during a garbage collection phase, all of the "dead" objects get deleted from memory or some of them survive? Why would a "dead" object survive a garbage collection phase?
LATER EDIT
Thank you for all of your answers. I can deduce that the main reason why "dead" objects would not be deleted is due to timing or spacing limitations of the way the Garbage Collector operates.
However, supposing that the Garbage Collector can reach all of the "dead" objects, I was wondering if there is a way to declare, reference, use, dereference, etc.. an object such that somehow it would skip the deletion phase even though it is "dead". I was thinking maybe objects belonging to classes which have static methods or inner classes or something like that may be kept in memory for some reason, even though they have no references to them.
Is such a scenario possible?
Thank you
My question is if, during a garbage collection phase, all of the "dead" objects get deleted from memory or some of them survive? Why would a "dead" object survive a garbage collection phase?
All current HotSpot GCs are generational collectors. Quoting from Wikipedia:
"It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or the generational hypothesis). A generational GC (also known as ephemeral GC) divides objects into generations and, on most cycles, will place only the objects of a subset of generations into the initial white (condemned) set. Furthermore, the runtime system maintains knowledge of when references cross generations by observing the creation and overwriting of references. When the garbage collector runs, it may be able to use this knowledge to prove that some objects in the initial white set are unreachable without having to traverse the entire reference tree. If the generational hypothesis holds, this results in much faster collection cycles while still reclaiming most unreachable objects."
What this means for your question is that most GC cycles collect only garbage objects in young generations. A garbage object in the oldest generation can survive multiple GC cycles ... until the old generation is finally collected. (And in the new G1 GC, apparently the old generation is collected a bit at a time ... which can delay reclamation even further.)
Other causes for (notionally) unreachable objects to survive include:
Unreachable objects with (unexecuted) finalizers are attached to a finalization queue by the garbage collector for processing after the GC has finished.
Objects that are softly, weakly or phantom referenced are actually still reachable, and are handled by their respective reference queue managers after the GC has finished.
Objects that are reachable by virtue of JNI global references, etcetera. (thanks #bestss)
Various hidden references exist that relate instances, their classes and their classloaders.
There is a hidden reference from an inner instance to its outer instance.
There is a hidden reference from a class to the intern'd String objects that represent its string literals.
However, these are all consequences of the definition of reachability:
"A reachable object is any object that can be accessed in any potential continuing computation from any live thread." - JLS 12.6.1
It is also worth noting that the rules for the GC have an element of conservativeness about them. They say that a reachable object won't be deleted, but they don't say that an object that is (strictly) unreachable will be deleted. This allows for cases where an object cannot be accessed but the runtime system is unable to figure that out.
Your followup question:
However, supposing that the Garbage Collector can reach all of the "dead" objects, I was wondering if there is a way to declare, reference, use, dereference, etc.. an object such that somehow it would skip the deletion phase even though it is "dead".
"Dead" is not a well-defined term. If the garbage collector can reach the objects, they are by definition reachable. They will not be deleted while they are still reachable.
If they are both dead AND reachable (whatever "dead" means!) then the fact that they are reachable means they won't be deleted.
What you are proposing doesn't make sense.
I was thinking maybe objects belonging to classes which have static methods or inner classes or something like that may be kept in memory for some reason, even though they have no references to them. Is such a scenario possible?
Static methods don't have references ... unless they happen to be on the call stack. Then the local variables may contain references just like any other method call. Normal reachability rules apply.
Static fields are GC roots, for as long as the class itself exists. Normal reachability rules apply.
Instances of inner classes are no different to instance of other classes from a GC perspective. There can be a reference to an outer class instance in an inner class instance, but that leads to normal reachability.
In summary, there are some unexpected "causes" for reachability, but they are all a logical consequence of the definition of reachability.
As the System.gc() javadoc says
When control returns from the method
call, the Java Virtual Machine has
made a best effort to reclaim space
from all discarded objects.
From which you can infer that a call to the garbage collector does not insure that all unused object will be reclaimed. As the garbage collection can completely differ between implementation, no definitive answer can be given. There is even java implementations without any garbage collection.
One potential explanation for an unreachable object not being collected is time. As of Java 1.5 the amount of time the JVM spends garbage collecting can be limited using on of the following options...
-XX:MaxGCPauseMillis
-XX:GCTimeRatio=<nnn>
Both options are explained in detail here
There are dead objects in "young" generation and there are dead objects in "old" generation. If GC being performed in "minor GC", only dead objects from young generation will be collected.
Additionally, you can use finalize() method to stop VM from collecting your object by throwing exception from finalize() (at least, this is how I understand Object.finalize() javadoc: Any exception thrown by the finalize method causes the finalization of this object to be halted, but is otherwise ignored).
The behaviour of the garbage collector is not fully specified. If a particular implementation choose not to collect certain objects, it is allowed to do so. This could be done to avoid spending large periods of time in the garbage collector, which could have detrimental effects to the operation of the application.
Imagine you had a collection which contained millions of small objects, most of which were not referenced anywhere else. If the only references to that collection was cleared, would you want the GC to spend a long time cleaning out those millions of small objects, or would you want it to do so over the course of several calls? In most cases, the latter would be better for the application.

can any unused object escape from Garbage Collector?

Is there any possibility that a object which is not referenced anywhere and still existing on heap. I mean is there a possibility that a unused object getting escaped from garbage collector and be there on the heap until the end of the application.
Wanted to know because if it is there, then while coding i can be more cautious.
If an object is no longer referenced, it does still exist on the heap, but it is also free to be garbage-collected (unless we are talking Class objects, which live in PermGen space and never get garbage-collected - but this is generally not something you need to worry about).
There is no guarantee on how soon that will be, but your application will not run out of memory before memory from those objects is reclaimed.
However, garbage collection does involve overhead, so if you are creating more objects than you need to and can easily create less, then by all means do so.
Edit: in response to your comment, if an object is truly not referenced by anything, it will be reclaimed during garbage collection (assuming you are using the latest JVM from Sun; I can't speak toward other implementations). The reason why is as follows: all objects are allocated contiguously on the heap. When GC is to happen, the JVM follows all references to "mark" objects that it knows are reachable - these objects are then moved into another, clean area. The old area is then considered to be free memory. Anything that cannot be found via a reference cannot be moved. The point is that the GC does not need to "find" the unreferenced objects. If anything, I would be more worried about objects that are still referenced when they are not intended to be, which will cause memory leaks.
You should know that, before a JVM throws an out-of-memory exception, it will have garbage collected everything possible.
If an instance is no longer referenced, it is a possible candidate for garbage collection. This means, that sooner or later it can be removed but there are no guaranties. If you do not run out of of memory, the garbage collector might not even run, thus the instance my be there until the program ends.
The CG system is very good at finding not referenced objects. There is a tiny, tiny chance that you end up keeping a weird mix of references where the garbage collector can not decide for sure if the object is no longer referenced or not. But this would be a bug in the CG system and nothing you should worry about while coding.
It depends on when and how often the object is used. If you allocate something then deallocate (i.e., remove all references to it) it immediately after, it will stay in "new" part of the heap and will probably be knocked out on the next garbage collection run.
If you allocate an object at the beginning of your program and keep it around for a while (if it survives through several garbage collections), it will get promoted to "old" status. Objects in that part of the heap are less likely to be collected later.
If you want to know all the nitty-gitty details, check out some of Sun's gc documentation.
Yes; imagine something like this:
Foo foo = new Foo();
// do some work here
while(1) {};
foo.someOp(); // if this is the only reference to foo,
// it's theoreticaly impossible to reach here, so it
// should be GC-ed, but all GC systems I know of will
// not Gc it
I am using definition of: garbage = object that can never be reached in any execution of the code.
Garbage collection intentionally makes few guarantees about WHEN the objects are collected. If memory never gets too tight, it's entirely possible that an unreferenced object won't be collected by the time the program ends.
The garbage collector will eventually reclaim all unreachable objects. Note the "eventually": this may take some time. You can somewhat force the issue with System.gc() but this is rarely a good idea (if used without discretion, then performance may decrease).
What can happen is that an object is "unused" (as in: the application will not use it anymore) while still being "reachable" (the GC can find a path of references from one of its roots -- static fields, local variables -- to the object). If you are not too messy with your objects and structures then you will not encounter such situations. A rule of thumb would be: if the application seems to take too much RAM, run a profiler on it; if thousands of instances of the same class have accumulated without any apparent reason, then there may be some fishy code somewhere. Correction often involves explicitly setting a field to null to avoid referencing an object for too long.
This is theoretically possible (there is no guarantee the GC will always find all objects), but should not worry you for any real application - it usually does not happen and certainly does not affect a significant chunk of memory.
In theory, the garbage collector will find all unused objects. There could, of course, be bugs in the garbage collector…
That said, "In theory there is no difference between theory and practice, in practice, there is." Under some, mostly older, garbage collectors, if an object definition manages to reach the permanent generation, then it will no longer be garbage collected under any circumstances. This only applied to Class definitions that were loaded, not to regular objects that were granted tenured status.
Correspondingly, if you have a static reference to an object, that takes up space in the "regular" object heap, this could conceivably cause problems, since you only need to hold a reference to the class definition from your class definition, and that static data cannot be garbage collected, even if you don't actually refer to any instances of the class itself.
In practice though, this is a very unlikely event, and you shouldn't need to worry about it. If you are super concerned about performance, then creating lots of "long-lived" objects, that is, those that escape "escape-analysis", will create extra work for the garbage collector. For 99.99% of coders this is a total non-issue though.
My advice - Don't worry about it.
Reason - It is possible for a non-referenced object to stay on the heap for some time, but it is very unlikely to adversely affect you because it is guaranteed to be reclaimed before you get an out of memory error.
In general, all objects to which there are no live hard references, will be garbage-collected. This is what you should assume and code for. However, the exact moment this happens is not predictable.
Just for completeness, two tricky situations [which you are unlikely to run into] come into my mind:
Bugs in JVM or garbage collector code
So called invisible references - they rarely matter but I did have to take them into account one or two times during the last 5 years in a performance-sensitive application I work on

Categories

Resources