Circular References in Java

Circular References in Java - java

Given an aggregation of class instances which refer to each other in a complex, circular, fashion: is it possible that the garbage collector may not be able to free these objects?
I vaguely recall this being an issue in the JVM in the past, but I thought this was resolved years ago. yet, some investigation in jhat has revealed a circular reference being the reason for a memory leak that I am now faced with.
Note: I have always been under the impression that the JVM was capable of resolving circular references and freeing such "islands of garbage" from memory. However, I am posing this question just to see if anyone has found any exceptions.

Only a very naive implementation would have a problem with circular references. Wikipedia has a good article on the different GC algorithms. If you really want to learn more, try (Amazon) Garbage Collection: Algorithms for Automatic Dynamic Memory Management . Java has had a good garbage collector since 1.2 and an exceptionally good one in 1.5 and Java 6.
The hard part for improving GC is reducing pauses and overhead, not basic things like circular reference.

The garbage collector knows where the root objects are: statics, locals on the stack, etc and if the objects aren't reachable from a root then they will be reclaimed. If they are reachable, then they need to stick around.

Ryan, judging by your comment to Circular References in Java, you fell into the trap of referencing objects from a class, which was probably loaded by the bootstrap/system classloader. Every class is referenced by the classloader that loaded the class, and can thus be garbage-collected only if the classloader is no longer reachable. The catch is that the bootstrap/system classloader is never garbage collected, therefore, objects reachable from classes loaded by the system classloader cannot be garbage-collected either.
The reasoning for this behavior is explained in JLS. For example, Third Edition 12.7 http://java.sun.com/docs/books/jls/third_edition/html/execution.html#12.7.

If I remember correctly, then according to the specifications, there are only guarantees about what the JVM can't collect (anything reachable), not what it will collect.
Unless you are working with real-time JVMs, most modern garbage collectors should be able to handle complex reference structures and identify "subgraphs" that can be eliminated safely. The efficiency, latency, and likelihood of doing this improve over time as more research ideas make their way into standard (rather than research) VMs.

No, at least using Sun's official JVM, the garbage collector will be able to detect these cycles and free the memory as soon as there are no longer any references from the outside.

The Java specification says that the garbage collector can garbage collect your object
ONLY If it is not reachable from any thread.
Reachable means there is a reference, or chain of references that leads from A to B,
and can go via C,D,...Z for all it cares.
The JVM not collecting things has not been a problem for me since 2000, but your mileage may vary.
Tip: Java serialization caches objects to make object mesh transfer efficient. If you have many large, transient objects, and all your memory is getting hogged, reset your serializer to clear it's cache.

A circular reference happens when one object refers to another, and that other one refers to the first object. For example:
class A {
private B b;
public void setB(B b) {
this.b = b;
}
}
class B {
private A a;
public void setA(A a) {
this.a = a;
}
}
public class Main {
public static void main(String[] args) {
A one = new A();
B two = new B();
// Make the objects refer to each other (creates a circular reference)
one.setB(two);
two.setA(one);
// Throw away the references from the main method; the two objects are
// still referring to each other
one = null;
two = null;
}
}
Java's garbage collector is smart enough to clean up the objects if there are circular references, but there are no live threads that have any references to the objects anymore. So having a circular reference like this does not create a memory leak.

Just to amplify what has already been said:
The application I've been working on for six years recently changed from Java 1.4 to Java 1.6, and we've discovered that we've had to add static references to things that we didn't even realize were garbage collectable before. We didn't need the static reference before because the garbage collector used to suck, and it is just so much better now.

Reference counting GCs are notorious for this issue. Notably, Suns JVM doesn't use a reference counting GC.
If the object can not be reach from the root of the heap (typically, at a minimum, through the classloaders if nothing else0, then the objects will be destroyed as they are not copied during a typical Java GC to the new heap.

The garbage collector is a very sophisticated piece of software -- it has been tested in a huge JCK test-suite. It is NOT perfect BUT there is a very good chance that as long as the java compiler(javac) will compile all of your classes and JVM will instantiate it, then you should be good.
Then again, if you are holding references to the root of this object graph, the memory will NOT be freed BUT if you know what you're doing, you should be OK.

Related

How to make an object eligible for garbage collection from inside the object?

I'm looking for a way to delete an object in Java, make it eligible for GC. I have a Java class that needs to have its delete() method called.
public class SomeObj {
// some implementation stuff
//...
void delete() {
//remove yourself from some lists in the program
//...
this = null; // <- this line is illegal
//delete this; <- if I was in C++, I could do this
}
}
How should I do this? Apparently, I'm going to have to refactor my code because this smells like bad design.

For better or worse, Java is a language that runs in a garbage-collecting environment. An object has some kind of existence in an application so longer as it is reachable via references. Once it is no longer reachable -- when no other object holds a reference to it -- it is "deleted" so far as the application is concerned.
That the object still has some after-life in the heap is a matter for the garbage collector, not the application. An application that depends on being able to control the existence of objects to which there are no references is broken in some logical sense.
The usual, semi-legitimate reason for wanting to nudge an unreferenced object out of the heap for good is to conserve heap space. There have been many, many occasions when I've known when an object is really finished with better than the garbage collector ever could. Objects that store temporary results with method scope are a good example. I'm primarily a C and C++ developer, and I really want a method on java.lang.Object called ImDoneWithYouNow(). Sadly, it doesn't exist, and we have to rely on the GC implementation to take care of memory management.

You don't need (and really shouldn't have) a "destructor". Once no other object references the object in question, it becomes eligible for garbage collection, and will be removed by the garbage collector when it sees fit.

Is unused object available for garbage collection when it's still visible in stack?

In the following example there are two functionally equivalent methods:
public class Question {
public static String method1() {
String s = new String("s1");
// some operations on s1
s = new String("s2");
return s;
}
public static String method2() {
final String s1 = new String("s1");
// some operations on s1
final String s2 = new String("s2");
return s2;
}
}
however in first(method1) of them string "s1" is clearly available for garbage collection before return statement. In second(method2) string "s1" is still reachable (though from code review prospective it's not used anymore).
My question is - is there anything in jvm spec which says that once variable is unused down the stack it could be available for garbage collection?
EDIT:
Sometimes variables can refer to object like fully rendered image and that have impact on memory.
I'm asking because of practical considerations. I have large chunk of memory-greedy code in one method and thinking if I could help JVM (a bit) just by splitting this method into few small ones.
I really prefer code where no reassignment is done since it's easier to read and reason about.
UPDATE: per jls-12.6.1:
Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner
So it looks like it's possible for GC to claim object which still visible. I doubt, however that this optimisation is done during offline compilation (it would screw up debugging) and most likely will be done by JIT.

No, because your code could conceivably retrieve it and do something with it, and the abstract JVM does not consider what code is coming ahead. However, a very, very, very clever optimizing JVM might analyze the code ahead and find that there is no way s1 could ever be referenced, and garbage collect it. You definitely can't count on this, though.

If you're talking about the interpreter, then in the second case S1 remains "referenced" until the method exits and the stack frame is rolled up. (That is, in the standard interpreter -- it's entirely possible for GC to use liveness info from method verification. And, in addition (and more likely), javac may do its own liveness analysis and "share" interpreter slots based on that.)
In the case of the JITC, however, an even mildly optimizing one might recognize that S1 is unused and recycle that register for S2. Or it might not. The GC will examine register contents, and if S1 has been reused for something else then the old S1 object will be reclaimed (if not otherwise referenced). If the S1 location has not been reused then the S1 object might not be reclaimed.
"Might not" because, depending on the JVM, the JITC may or may not provide the GC with a map of where object references are "live" in the program flow. And this map, if provided, may or may not precisely identify the end of the "live range" (the last point of reference) of S1. Many different possibilities.
Note that this potential variability does not violate any Java principles -- GC is not required to reclaim an object at the earliest possible opportunity, and there's no practical way for a program to be sensitive to precisely when an object is reclaimed.

VM is free to optimized the code to nullify s1 before method exit (as long as it's correct), so s1 might be eligible for garbage earlier.
However that is hardly necessary. Many method invocations must have happened before the next GC; all the stack frames have been cleared anyway, no need to worry about a specific local variable in a specific method invocation.
As far as Java the language is concerned, garbages can live forever without impact program semantics. That's why JLS hardly talks about garbage at all.

in first of them string "s1" is clearly available for garbage collection before return statement
It isn't clear at all. I think you are confusing 'unused' with 'unreachable'. They aren't necessarily the same thing.
Formally speaking the variable is live until its enclosing scope terminates, so it isn't available for garbage collection until then.
However "a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner" JLS #12.6.1.

Basically stack frames and static area are considered as roots by GC. So if object is referenced from any stack frame its considered alive. The problem with reclaiming some objects from active stack frame is that GC works in parallel with application(mutator). How do you think GC should find out that object is unused while method is in progress? That would require a synchronization which would be VERY heavy and complex, in fact this will break the idea of GC to work in parallel with mutator. Every thread might keep variables in processor registers. To implement your logic, they should also be added to GC roots. I cant even imagine how to implement it.
To answer you question. If you have any logic which produces a lot of objects which are unused in the future, separate it to a distinct method. This is actually a good practice.
You should also take int account optimizations by JVM(like EJP pointed out). There is also an escape analysis, which might prevent object from heap allocation at all. But rely your codes performance on them is a bad practice

java cyclic reference and garbage collections

Let's consider the following 2 cyclic referencing examples:
Straight forward cyclic referencing
class A {
B b;
}
class B {
A a;
}
WeakReferenceing
class A {
B b;
}
class B {
WeakReference<A> aRef;
}
The following SO question answered by #Jon Skeet makes clear that the straight forward example will also be garbage collected as long as no "GC walk" from a known root exists to the cycle.
My question is as follows:
Is there any reason performance or otherwise to use or not to use the idiom represented in example 2 - the one employing a WeakReference?

Is there any reason performance or otherwise to use or not to use the idiom represented in example 2
The Java Reference types have a couple of performance implications:
They use more space than regular references.
They are significantly more work for the garbage collector than ordinary references.
I also believe that they can cause the collection of objects to be delayed by one or more GC cycles ... depending on the GC implementation.
In addition the application has to deal with the possibility that a WeakReference may be broken
By contrast, there are no performance or space overheads for normal cyclic references as you use them in your first example.
In summary, your weak reference idiom reduces performance and increases program complexity ... with no tangible benefits that I can see.
My guess is that this Question derives from the mistaken notion that cyclic references are more expensive than non-cyclic references in Java ... or that they are somehow problematic. (What other logical reason would cause one to propose an "idiom" like this?) In fact, this is not the case. Java garbage collectors don't suffer from the problems of reference counting; e.g. C++ "smart pointers". Cyclic references are handled correctly (i.e. without leaking memory) and efficiently in Java.

The problem is you do not know when GC will clear the weakreference objects.
It may be cleared just as you declare it! GC is very eager to collect it.
Or you can have root reference to the weakreference object to prevent it from the garbage collection.
Or check its status through RegisteredQueue.
It's like finalize method. You do not know when GC will execute this method.
Sources:
http://pawlan.com/monica/articles/refobjs/
http://docs.oracle.com/javase/7/docs/api/java/lang/ref/WeakReference.html

Two Object references point to each other

In objective C, there is a chance that two different references can point to each other.
But is this possible in Java? I mean, can two object references point to each other? If it's possible, when are they going to be garbage collected?
And, In case of nested classes, two objects (inner class's and outer class's) are linked to each other - how are these objects garbage collected?

I assume you are talking about circular references . Java's GC considers objects "garbage" if they aren't reachable through a chain starting at a GC root. Even though objects may point to each other to form a cycle, they're still eligible for GC if cut off from the root.
There are four kinds of GC roots in Java:
Local variables are kept alive by the stack of a thread. This is not a real object virtual reference and thus is not visible. For all intents and purposes, local variables are GC roots.
Active Java threads are always considered live objects and are therefore GC roots. This is especially important for thread local variables.
Static variables are referenced by their classes. This fact makes them de facto GC roots. Classes themselves can be garbage-collected, which would remove all referenced static variables. This is of special importance when we use application servers, OSGi containers or class loaders in general.
JNI References are Java objects that the native code has created as part of a JNI call. Objects thus created are treated specially because the JVM does not know if it is being referenced by the native code or not. Such objects represent a very special form of GC root.
You can also read here for more information.

Yes, you can do this. Like this:
class Pointy {
public Pointy other;
}
Pointy one = new Pointy();
Pointy two = new Pointy();
one.other = two;
two.other = one;
They're garbage collected when both objects are not pointed at by anything other than one another, or other objects which are "unreachable" from current running code. The Java garbage collectors are "tracing" garbage collectors, which means they can discover this sort of issue.
Conversely, reference-counted systems (like Objective C without its "modern" garbage collection -- I don't know what the default is) cannot normally detect this sort of issue, so the objects can be leaked.

Of course you can have objects reference each other. You could simply pass the this pointer in both objects to each other, which is perfectly valid.
However, that doesn't mean that the objects are still accessible from the GC root. Think of it as a (graph) tree. If you cut off a complete branch from the trunk, the whole branch is lost, no matter how many objects are involved or are maintaing references to each other.

Long lived Java WeakReferences

I am currently trying to diagnose a slow memory leak in my application. The facts I have so far are as follows.
I have a heap dump from a 4 day run of the application.
This heap dump contains ~800 WeakReference objects which point to objects (all of the same type, which I will call Foo for the purposes of this question) retaining 40mb of memory.
Eclipse Memory Analysis Tool shows that each of the Foo objects referred to by these WeakReferences is not referred to by any other objects. My expectation is that this should make these Foo objects Weakly Reachable and thus they should be collected at the next GC.
Each of these Foo objects has a timestamp which shows that they were allocated over the course of the 4 day run. I also have logs during this time which confirm that Garbage Collection was happening.
A huge number of Foo objects are being created by my application and only a very small fraction of them are ending up in this state within the heap dump. This suggests to me that the root cause is some sort of race condition.
My application uses JNI to call through to a native library. The JNI code calls NewGlobalRef 4 times during start of day initialisation to get references to Java classes which it uses.
What could possibly cause these Foo classes to not be collected despite only being referenced by WeakReferences (according to Eclipse Memory Analyser Tool)?
EDIT1:
#mindas
The WeakReference I am using is equivalent to the following example code.
public class FooWeakRef extends WeakReference<Foo>
{
public long longA;
public long longB;
public String stringA;
public FooWeakRef(Foo xiObject, ReferenceQueue<Foo> xiQueue)
{
super(xiObject, xiQueue);
}
}
Foo does not have a finalizer and any finalizer would not be a consideration so long as the WeakRefs have not been cleared. An object is not finalizable when it is weakly reachable. See this page for details.
#kasten The weakreferences are cleared before the object is finalizable. My heap dump shows that this has not happened.
#jarnbjo I refer to the WeakReference Javadoc:
"Suppose that the garbage collector determines at a certain point in time that an object is weakly reachable. At that time it will atomically clear all weak references to that object and all weak references to any other weakly-reachable objects from which that object is reachable through a chain of strong and soft references."
This suggests to me that the GC should be detecting the fact that my Foo objects are "Weakly reachable" and "At that time" clearing the weak references.
EDIT 2
#j flemm - I know that 40mb doesn't sound like much but I am worried that 40mb in 4 days means 4000mb in 100 days. All of the docs I have read suggest that objects which are weakly reachable should not hang around for several days. I am therefore interested in any other explanations about how an object could be strongly referenced without the reference showing up in a heap dump.
I am going to try allocating some large objects when some of these dangling Foo objects are present and see whether the JVM collects them. However, this test will take a couple of days to setup and complete.
EDIT 3
#jarnbjo - I understand that I have no guarantee about when the JDK will notice that an object is weakly reachable. However, I would expect that an application under heavy load for 4 days would provide enough opportunities for the GC to notice that my objects are weakly reachable. After 4 days I am strongly suspicious that the remaining weakly references objects have been leaked somehow.
EDIT 4
#j flemm - Thats really interesting! Just to clarify, are you saying that GC is happening on your app and is not clearing Soft/Weak refs? Can you give me any more details about what JVM + GC Config you are using? My app is using a memory bar at 80% of the heap to trigger GC. I was assuming that any GC of the old gen would clear Weak refs. Are you suggesting that a GC only collects Weak refs once the memory usage is above a higher threshold? Is this higher limit configurable?
EDIT 5
#j flemm - Your comment about clearing out WeakRefs before SoftRefs is consistent with the Javadoc which states:
SoftRef: "Suppose that the garbage collector determines at a certain point in time that an object is softly reachable. At that time it may choose to clear atomically all soft references to that object and all soft references to any other softly-reachable objects from which that object is reachable through a chain of strong references. At the same time or at some later time it will enqueue those newly-cleared soft references that are registered with reference queues."
WeakRef: "Suppose that the garbage collector determines at a certain point in time that an object is weakly reachable. At that time it will atomically clear all weak references to that object and all weak references to any other weakly-reachable objects from which that object is reachable through a chain of strong and soft references. At the same time it will declare all of the formerly weakly-reachable objects to be finalizable. At the same time or at some later time it will enqueue those newly-cleared weak references that are registered with reference queues."
For clarity, are you saying that the Garbage Collector runs when your app has more than 50% free memory and in this case it does not clear WeakRefs? Why would the GC run at all when your app has >50% free memory? I think your app is probably just generating a very low amount of garbage and when the collector runs it is clearing WeakRefs but not SoftRefs.
EDIT 6
#j flemm - The other possible explanation for your app's behaviour is that the young gen is being collected but that your Weak and Soft refs are all in the old gen and are only cleared when the old gen is being collected. For my app I have stats showing that the old gen is being collected which should mean that WeakRefs get cleared.
EDIT 7
I am starting a bounty on this question. I am looking for any plausible explanations for how WeakRefs could fail to be cleared while GC is happening. If the answer is that this is impossible I would ideally like to be pointed at the appropriate bits of OpenJDK which show WeakRefs being cleared as soon as an object is determined to be weakly reachable and that weak reachability is resolved every time GC runs.

I have finally got round to checking the Hotspot JVM source code and found the following code.
In referenceProcessor.cpp:
void ReferenceProcessor::process_discovered_references(
BoolObjectClosure* is_alive,
OopClosure* keep_alive,
VoidClosure* complete_gc,
AbstractRefProcTaskExecutor* task_executor) {
NOT_PRODUCT(verify_ok_to_handle_reflists());
assert(!enqueuing_is_done(), "If here enqueuing should not be complete");
// Stop treating discovered references specially.
disable_discovery();
bool trace_time = PrintGCDetails && PrintReferenceGC;
// Soft references
{
TraceTime tt("SoftReference", trace_time, false, gclog_or_tty);
process_discovered_reflist(_discoveredSoftRefs, _current_soft_ref_policy, true,
is_alive, keep_alive, complete_gc, task_executor);
}
update_soft_ref_master_clock();
// Weak references
{
TraceTime tt("WeakReference", trace_time, false, gclog_or_tty);
process_discovered_reflist(_discoveredWeakRefs, NULL, true,
is_alive, keep_alive, complete_gc, task_executor);
}
The function process_discovered_reflist has the following signature:
void
ReferenceProcessor::process_discovered_reflist(
DiscoveredList refs_lists[],
ReferencePolicy* policy,
bool clear_referent,
BoolObjectClosure* is_alive,
OopClosure* keep_alive,
VoidClosure* complete_gc,
AbstractRefProcTaskExecutor* task_executor)
This shows that WeakRefs are being unconditionally cleared by ReferenceProcessor::process_discovered_references.
Searching the Hotspot code for process_discovered_reference shows that the CMS collector (which is what I am using) calls this method from the following call stack.
CMSCollector::refProcessingWork
CMSCollector::checkpointRootsFinalWork
CMSCollector::checkpointRootsFinal
This call stack looks like it is invoked every time a CMS collection is run.
Assuming this is true, the only explanation for a long lived weakly referenced object would be either a subtle JVM bug or if the GC had not been run.

You might want to check if you have leaked classloader issue. More on this topic you could find in this blog post

You need to clarify on what is the link between Foo and WeakReference. The case
class Wrapper<T> extends WeakReference<T> {
private final T referent;
public Wrapper(T referent) {
super(t);
this.referent = referent;
}
}
is very different from just
class Wrapper<T> extends WeakReferece<T> {
public Wrapper(T referent) {
super(t);
}
}
or its inlined version, WeakReference<Foo> wr = new WeakReference<Foo>(foo).
So I assume your case is not like I described in my first code snippet.
As you have said you are working with JNI, you might want to check if you have any unsafe finalizers. Every finalizer should have finally block calling super.finalize() and it's easy to slip.
You probably need to tell us more about the nature of your objects to offer better ideas.

Try SoftReference instead. Javadoc says: All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError.
WeakReference doesn't have such guarantees, which makes them more suitable for caches, but sometimes SoftReferences are better.

#iirekm No: WeakReferences are 'weaker' than SoftReferences, meaning that a WeakReference will always be garbage collected before a SoftReference.
More info in this post: Understanding Java's Reference classes: SoftReference, WeakReference, and PhantomReference
Edit: (after reading comments) Yes surely Weak References are 'Weaker' than SoftReferences, typo. :S
Here's some use cases to throw further light on the subject:
SoftReference: In-memory cache (Object stays alive until VM deems that there's not enough heap mem)
WeakReference: Auto-clearing Listeners (Object should be cleared on next GC cycle after deemed being Weakly reachable)
PhantomReference: Avoiding out-of-memory errors when handling unusually large objects (When scheduled in reference queue, we know that host object is to be cleared, safe to allocate another large object). Think of it as a finalize() alternative, without the ability to bring dead objects back to life (as you potentially could with finalize)
This being said, nothing prevents the VM (please correct me if I'm wrong) to let the Weakly reachable objects stay alive as long as it is not running out of memory (as in the orig. author's case).
This is the best resource I could find on the subject: http://www.pawlan.com/monica/articles/refobjs/
Edit 2: Added "to be" in front of cleared in PhantomRef

I am not acquainted with Java, but you may be using a generational garbage collector, which will keep your Foo and FooWeakRef objects alone (not collected) as long as
they passed in an older generation
there is enough memory to allocate new objects in younger generations
Does the log that indicates that garbage collection occurred discriminates between major and minor collections?

For non-believers who claim that weak references are cleared before soft references:
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;
import java.lang.ref.SoftReference;
import java.lang.ref.WeakReference;
import java.util.HashMap;
import java.util.Map;
public class Test {
/**
* #param args
*/
public static void main(String[] args) {
ReferenceQueue<Object> q = new ReferenceQueue<Object>();
Map<Reference<?>, String> referenceToId = new HashMap<Reference<?>, String>();
for(int i=0; i<100; ++i) {
Object obj = new byte [10*1024*1024]; // 10M
SoftReference<Object> sr = new SoftReference<Object>(obj, q);
referenceToId.put(sr, "soft:"+i);
WeakReference<Object> wr = new WeakReference<Object>(obj, q);
referenceToId.put(wr, "weak:"+i);
for(;;){
Reference<?> ref = q.poll();
if(ref == null) {
break;
}
System.out.println("cleared reference " + referenceToId.get(ref) + ", value=" + ref.get());
}
}
}
}
If your run it with either -client or -server, you'll see that soft references are always cleared before weak references, which also agrees with Javadoc: http://download.oracle.com/javase/1.4.2/docs/api/java/lang/ref/package-summary.html#reachability
Typically soft/weak references are used in connection with Maps to make kinds of caches. If keys in your Map are compared with == operator, (or unoverriden .equals from Object), then it's best to use Map which operates on SoftReference keys (eg from Apache Commons) - when the object 'disappears' no other object will ever be equal in the '==' sense to the old one. If keys of your Map are compared with advanced .equals() operator, like String or Date, many other objects may match to the 'disappearing' one, so it's better to use the standard WeakHashMap.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.