As I know static fields (along with Threads, local variables and method arguments, JNI references) act as GC roots.
I cannot provide a link that would confirm this, but I have read a lot of articles on it.
Why can't a non-static field act as a GC root?
First off, we need to be sure we're on the same page as to what a tracing garbage collection algorithm does in its mark phase.
At any given moment, a tracing GC has a number of objects that are known to be alive, in the sense that they are reachable by the running program as it stands right now. The main step of mark phrase involves following the non-static fields of those objects to find more objects, and those new objects will now also be known to be alive. This step is repeated recursively until no new alive objects are found by traversing the existing live objects. All objects in memory not proved live are considered dead. (The GC then moves to the next phase, which is called the sweep phase. We don't care about that phase for this answer.)
Now this alone is not enough to execute the algorithm. In the beginning, the algorithm has no objects that it knows to be alive, so it can't start following anyone's non-static fields. We need to specify a set of objects that are considered known to be alive from the start. We choose those objects axiomatically, in the sense that they don't come from a previous step of the algorithm -- they come from outside. Specifically, they come from the semantics of the language. Those objects are called roots.
In a language like Java, there are two sets of objects that are definite GC roots. Anything that is accessible by a local variable that's still in scope is obviously reachable (within its method, which still hasn't returned), therefore it's alive, therefore it's a root. Anything that is accessible through a static field of a class is also obviously reachable (from anywhere), therefore it's alive, therefore it's a root.
But if non-static fields were considered roots as well, what would happen?
Say you instantiate an ArrayList<E>. Inside, that object has a non-static field that points to an Object[] (the backing array that represents the storage of the list). At some point, a GC cycle starts. In the mark phase, the Object[] is marked as alive because it is pointed to by the ArrayList<E> private non-static field. The ArrayList<E> is not pointed to by anything, so it fails to be considered alive. Thus, in this cycle, the ArrayList<E> is destroyed while the backing Object[] survives. Of course, at the next cycle, the Object[] also dies, because it is not reachable by any root. But why do this in two cycles? If the ArrayList<E> was dead in the first cycle and if Object[] is used only by a dead object, shouldn't the Object[] also be considered dead in the same move, to save time and space?
That's the point here. If we want to be maximally efficient (in the context of a tracing GC), we need to get rid of as many dead objects as possible in a single GC.
To do that, a non-static field should keep an object alive only if the enclosing object (the object that contains the field) has been proved to be alive. By contrast, roots are objects we call alive axiomatically (without proof) in order to kick-start the algorithm's marking phase. It is in our best interest to limit the latter category to the bare minimum that doesn't break the running program.
For example, say you have this code:
class Foo {
Bar bar = new Bar();
public static void main(String[] args) {
Foo foo = new Foo();
System.gc();
}
public void test() {
Integer a = 1;
bar.counter++; //access to the non-static field
}
}
class Bar {
int counter = 0;
}
When the garbage collection starts, we get one root that's the local variable Foo foo. That's it, that's our only root.
We follow the root to find the instance of Foo, which is marked as alive and then we attempt to find its non-static fields. We find one of them, the Bar bar field.
We follow the fields to find the instance of Bar, which is marked as alive and then we attempt to find its non-static fields. We find that it contains no more fields that are reference types, so the GC doesn't need to bother for that object anymore.
Since we can't get find new alive objects in this round of recursion, the mark phase can end.
Alternatively:
class Foo {
Bar bar = new Bar();
public static void main(String[] args) {
Foo foo = new Foo();
foo.test();
}
public void test() {
Integer a = 1;
bar.counter++; //access to the non-static field
System.gc();
}
}
class Bar {
int counter = 0;
}
When the garbage collection starts, the local variable Integer a is a root and the Foo this reference (the implicit reference that all non-static methods get) is also a root. The local variable Foo foo from main is also a root because main hasn't gone out of scope yet.
We follow the root to find the instance of Integer and instance of Foo (we find one of these objects twice, but this doesn't matter for the algorithm), which are marked as alive and then we attempt to follow their non-static fields. Let's say the instance of Integer has no more fields to class instances. The instance of Foo gives us one Bar field.
We follow the field to find the instance of Bar, which is marked as alive and then we attempt to find its non-static fields. We find that it contains no more fields that are reference types, so the GC doesn't need to bother for that object anymore.
Since we can't get find new alive objects in this round of recursion, the mark phase can end.
A non static field has a reference held by the instance that contains it, so it cannot be a GC root on its own right.
Related
I just read this article: The Truth About Garbage Collection
In section "A.3.3 Invisible" it is explained how and when an object gets into the invisible state.
In the below code, the object assigned to the variable foo will become invisible after leaving the try/catch block and will remainly strongly referenced until the run method exits (which will never happen, because the while loop runs forever).
public void run() {
try {
Object foo = new Object();
foo.doSomething();
} catch (Exception e) {
// whatever
}
while (true) { // do stuff } // loop forever
}
It is stated in this article:
However, an efficient implementation of the JVM is unlikely to zero
the reference when it goes out of scope.
Why is that not efficient?
My attempt at an explanation is as follows:
Say the stack for this method contains four elements, with the now invisible object being at the bottom.
If you want to collect the object instantly, you would have to pop and store three elements, pop and discard the fourth element and then push the three still valid elements back onto the stack.
If you collect the invisible object after control flow has left the run method, the VM could simply pop all four elements and discard them.
The local variables are not on the operand stack, but in the local variables area in the activation frame, accessed, in the case of references via aload and astore bytecodes and zeroing a local variable does not involve any pushing and popping.
Zeroing is inefficient because it is not needed:
it would not cause an immediate garbage collection cycle
the zero may soon be overwritten by another value as dictated by the logic of the program.
going out of the scope means that the local variable is no longer part of the root set for garbage collection. As such what value it held immediately before going out of scope - zero or a valid reference - is immaterial; it won't be examined anyway.
EDIT:
Some comments on the last statement.
Indeed, at a bytecode level there are no scopes and a local variable slot may remain a part of the root set until the method returns. Of course, a JVM implementation can determine when a local variable slot is dead (i.e. all possible paths to method return either don't access the variable or are stores) and don't consider it a part of the root set, but it is by no means required to do so.
The very simple answer is b/c is inefficient.
There are many garbage collector algorithms and some may aggressively collect. Some compilers do allocation on the stack but the most obvious in your case is: doSomething() may actually keep (leak) a reference to the object elsewhere.
I need to call methods of a class with multiple methods very often in a simulation loop.
Some of these methods need to access temporary objects for storing information in them. After leaving the method the stored data is not needed anymore.
For example:
Class class {
method1() {
...
SomeObject temp = new SomeObject();
...
}
method2() {
...
SomeObject temp = new SomeObject();
SomeObject temp2 = new SomeObject();
...
}
}
I need to optimize as much as possible. The most expensive (removable) problem is that too many allocations happen.
I assume it would be better not to allocate the space needed for those objects every time so I want to keep them.
Would it be more efficient to store them in a static way or not?
Like:
Class class {
private (static?) SomeObject temp;
private (static?) SomeObject temp2;
methods...
}
Or is there even a better way? Thank you for your help!
Edit based on answers:
Not the memory footprint is the actual problem but the garbage collection cleaning up the mess.
SomeObject is a Point2D-like class, nothing memory expensive (in my opinion).
I am not sure whether it is better to use (eventually static) class level objects as placeholder or some more advanced method which I am not aware of.
I would be wary in this example of pre-mature optimization. There are downsides, typically, that it makes the code more complex (and complexity makes bugs more likely), harder to read, could introduce bugs, may not offer the speedup you expected, etc. For a simple object such as representing a 2D point coordinate, I wouldn't worry about re-use. Typically re-use gains the most benefit if you are either working with a large amount of memory, avoid lengthy expensive constructors, or are pulling object construction out of a tight loop that is frequently executed.
Some different strategies you could try:
Push responsiblity to caller One way would be to to have the caller pass in an object pre-initialized, making the method parameter final. However, whether this will work depends on what you need to do with the object.
Pointer to temporary object as method parameter Another way would be to have the caller pass as an object as a parameter that's purpose is essentially to be a pointer to an object where the method should do its temporary storage. I think this technique is more commonly used in C++, but works similarly, though sometimes shows up in places like graphics programming.
Object Pool One common way to reuse temporary objects is to use an object pool where objects are allocated from a fixed bank of "available" objects. This has some overhead, but if the objects are large, and frequently used for only short periods of time, such that memory fragmentation might be a concern, the overhead may be enough less to be worth considering.
Member Variable If you are not concerned about concurrent calls to the method (or have used synchronization to prevent such), you could emulate the C++ism of a "local static" variable, by creating a member variable of the class for your storage. It makes the code less readable and slightly more room to introduce accidental interference with other parts of your code using the variable, but lower overhead than an object pool, and does not require changes to your method signature. If you do this, you may optionally also wish to use the transient keyword on the variable as well to indicate the variable does not need to be serialized.
I would shy away from a static variable for the temporary unless the method is also static, because this may have a memory overhead for the entire time your program runs that is undesirable, and the same downsides as a member variable for this purpose x2 (multiple instances of the same class)
Keep in mind that temp and temp2 are not themselves objects, but variables pointing to an object of type SomeObject. The way you are planning to do it, the only difference would be that temp and temp2 would be instance variables instead of local variables. Calling
temp = new SomeObject();
Would still allocate a new SomeObject onto the heap.
Additionally, making them static or instance variables instead of local would cause the last assigned SomeObjects to be kept strongly reachable (as long as your class instance is in scope for instance variables), preventing them from being garbage collected until the variables are reassigned.
Optimizing in this way probably isn't effective. Currently, once temp and temp2 are out of scope, the SomeObjects they point to will be eligible for garbage collection.
If you're still interested in memory optimization, you will need to show what the SomeObject is in order to get advice as to how you could cache the information it's holding.
How large are these objects. It seems to me that you could have class level objects (not necessarily static. I'll come back to that). For SomeObject, you could have a method that purges its contents. When you are done using it in one place, call the method to purge its contents.
As far as static, will multiple callers use this class and have different values? If so, don't use static.
First, you need to make sure that you are really have this problem. The benefit of a Garbage Collector is that it takes care of all temporary objects automatically.
Anyways, suppose you run a single threaded application and you use at most MAX_OBJECTS at any giving time. One solution could be like this:
public class ObjectPool {
private final int MAX_OBJECTS = 5;
private final Object [] pool = new Object [MAX_OBJECTS];
private int position = 0;
public Object getObject() {
// advance to the next object
position = (position + 1) % MAX_OBJECTS;
// check and create new object if needed
if(pool[position] == null) {
pool[position] = new Object();
}
// return next object
return pool[position];
}
// make it a singleton
private ObjectPool() {}
private static final ObjectPool instance = new ObjectPool();
public static ObjectPool getInstance() { return instance;}
}
And here is the usage example:
public class ObjectPoolTest {
public static void main(String[] args) {
for(int n = 0; n < 6; n++) {
Object o = ObjectPool.getInstance().getObject();
System.out.println(o.hashCode());
}
}
}
Here is the output:
0) 1660364311
1) 1340465859
2) 2106235183
3) 374283533
4) 603737068
5) 1660364311
You can notice that the first and the last numbers are the same - the MAX_OBJECTS + 1 iterations returns the same temporary object.
Can I override one obejct with another if they are instance of same class, their size is the same, using sun.misc.Unsafe?
edit:
By "override" I mean to "delete" first object, ant to fill the memory with the second one. Is it possible?
By "override" I mean to "delete" first object, ant to fill the memory
with the second one. Is it possible?
Yes and no.
Yes - If you allocate some memory with Unsafe and write a long, then write another long in it (for example), then yes, you have deleted the first object and filled the memory with a second object. This is similar to what you can do with ByteBuffer. Of course, long is a primitive type, so it is probably not what you mean by 'object'.
Java allows this, because it has the control on allocated memory.
No - Java works with references to objects and only provides references to these objects. Moreover, it tends to move objects around in memory (i.e., for garbage collection).
There is no way to get the 'physical address' and move memory content from one object address to another, if that's what you are trying. Moreover, you can't actually 'delete' the object, because it may be referenced from elsewhere in the code.
However, there is always the possibility of having reference A point to another objectB instead of objectA with A = objectB; You can even make this atomic with Unsafe.compareAndSwapObject(...).
Workaround - Now, let's imagine that reference A1, A2, A3 point to the same objectA. If you want all of them to suddently point to objectB, you can't use Unsafe.compareAndSwapObject(...), because only A1 would point to objectB, while A2 and A3 would still point to objectA. It would not be atomic.
There is a workaround:
public class AtomicReferenceChange {
public static Object myReference = new Object();
public static void changeObject(Object newObject) {
myReference = newObject;
}
public static void main(String[] args) {
System.out.println(AtomicReferenceChange.myReference);
AtomicReferenceChange.changeObject("333");
System.out.println(AtomicReferenceChange.myReference);
}
}
Instead of having multiple references to the same object, you could define a public static reference and have your code use AtomicReferenceChange.myReference everywhere. If you want to change the referenced object atomically, use the static method changeObject(...).
public class App1
{
public static void main(String[] args)
{
Point point_1 = new Point(5,5);
Point point_2 = new Point(7,8);
Circle circle_1 = new Circle(point_2, 10);
point_1 = null;
point_2 = null;
}
}
How many object references exist after this code has executed? Why?
After this code has executed, exactly none, since it will have exited :-)
If you mean at the point just before exit, there's a reference on the stack to your circle and a reference in your circle to the second point, assuming the constructor stores it.
Despite formulation problems, the snippet is actually quite instructive on certain aspects of garbage collectibility. Let's take a look at it line-by-line.
Point point_1 = new Point(5,5);
So we've declared a reference variable point_1, and it points to a new Point. Let's assume for now that the constructor of Point doesn't do anything fancy and simply set fields final int x, y with the given values.
Thus, we now have something like this:
Now let's take a look at the next line:
Point point_2 = new Point(7,8);
Now we have something like this:
Now let's take a look at the next line:
Circle circle_1 = new Circle(point_2, 10);
Here again we don't quite know how Circle is implemented, but it's reasonable to assume that it has a final Point center and final int radius fields, and with the Point center specifically, it simply sets the reference to the given Point (i.e. no defensive copying since Point is immutable).
So now we may have something like this:
Then with the next two statements, we set point_1 and point_2 to point to null respectively:
point_1 = null;
point_2 = null;
So now we have something like this:
We can now observe that:
The object [aPoint(5 5)] is no longer reachable
The object [aPoint(7 8)], though no longer refered to by point_2, is still refered to by [aCircle(10)].center.
Garbage collectibility is defined by whether or not an object is reachable by a live reference. The object [aPoint(5 5)], we can strongly assume (based on how we think Point is implemented), is no longer reachable, so it is eligible for collection (it's a garbage! No one can "pick it up" now!).
On the other hand, the object [aPoint(7, 8)] is still referred to by [aCircle(10)].center, so we can say that it's NOT eligible for collection (it's not a garbage! Someone is still "hanging on" to it!).
Moral
So no, definitely setting a reference to null does NOT make the object previously being referred to automatically eligible for collection. It depends on the object itself, whether or not there are any references to the object.
Certainly, though, setting a reference to null CAN help make an object be eligible for collection, e.g. when that reference is the last remaining to the object.
You do NOT however, have to ALWAYS set a reference to null to make garbage collection "works". When variables goes out of scope, the reference is no longer alive, so in those kinds of cases explicitly setting to null is simply redundant code.
The classic example when explicitly setting to null DOES work is the Stack example: when the top element is popped from the Stack, the Stack should no longer refer to the object from its internal data structure.
See also
Effective Java 2nd Edition, Item 6: Eliminate obsolete object references
Related questions
Does variable = null set it for garbage collection
The answer is:
Define what you mean for an object reference to "exist".
It is impossible to know how many object references were even created, without details of the Point and Circle classes.
The answer is irrelevant, because after the main method exits none of the objects will be reachable ... whether or not the references still "exist".
We might infer that at the point in time immediately before the main method returns there will be one reachable reference to a Circle object and one reachable reference to a Point. But one has to make some (reasonable) assumptions about how those two classes are implemented to make that inference. (For example, one has to assume that the respective constructors don't add the Point and Circle reference to some static data structure.)
Are objects cleaned up when references to them are nulled?
No. Objects are cleaned up when the garbage collector runs, and it determines that the objects in question are no longer reachable. In this sense, "reachable" means that you can get to the object by following a chain of references to the object starting from:
a static attribute of some class
a local variable of some method that is currently being executed by some thread
an attribute of some other reachable object, or
an element of some other reachable array.
(I've simplified the explanations of GC and reachability a bit to avoid confusing the OP with things he/she won't understand yet.)
I recently wrote a class for an assignment in which I had to store names in an ArrayList (in java). I initialized the ArrayList as an instance variable private ArrayList<String> names. Later when I checked my work against the solution, I noticed that they had initialized their ArrayList in the run() method instead.
I thought about this for a bit and I kind of feel it might be a matter of taste, but in general how does one choose in situations like this? Does one take up less memory or something?
PS I like the instance variables in Ruby that start with an # symbol: they are lovelier.
(meta-question: What would be a better title for this question?)
In the words of the great Knuth "Premature optimization is the root of all evil".
Just worry that your program functions correctly and that it does not have bugs. This is far more important than an obscure optimization that will be hard to debug later on.
But to answer your question - if you initialize in the class member, the memory will be allocated the first time a mention of your class is done in the code (i.e. when you call a method from it). If you initialize in a method, the memory allocation occurs later, when you call this specific method.
So it is only a question of initializing later... this is called lazy initialization in the industry.
Initialization
As a rule of thumb, try to initialize variables when they are declared.
If the value of a variable is intended never to change, make that explicit with use of the final keyword. This helps you reason about the correctness of your code, and while I'm not aware of compiler or JVM optimizations that recognize the final keyword, they would certainly be possible.
Of course, there are exceptions to this rule. For example, a variable may by be assigned in an if–else or a switch. In a case like that, a "blank" declaration (one with no initialization) is preferable to an initialization that is guaranteed to be overwritten before the dummy value is read.
/* DON'T DO THIS! */
Color color = null;
switch(colorCode) {
case RED: color = new Color("crimson"); break;
case GREEN: color = new Color("lime"); break;
case BLUE: color = new Color("azure"); break;
}
color.fill(widget);
Now you have a NullPointerException if an unrecognized color code is presented. It would be better not to assign the meaningless null. The compiler would produce an error at the color.fill() call, because it would detect that you might not have initialized color.
In order to answer your question in this case, I'd have to see the code in question. If the solution initialized it inside the run() method, it must have been used either as temporary storage, or as a way to "return" the results of the task.
If the collection is used as temporary storage, and isn't accessible outside of the method, it should be declared as a local variable, not an instance variable, and most likely, should be initialized where it's declared in the method.
Concurrency Issues
For a beginning programming course, your instructor probably wasn't trying to confront you with the complexities of concurrent programming—although if that's the case, I'm not sure why you were using a Thread. But, with current trends in CPU design, anyone who is learning to program needs to have a firm grasp on concurrency. I'll try to delve a little deeper here.
Returning results from a thread's run method is a bit tricky. This method is the Runnable interface, and there's nothing stopping multiple threads from executing the run method of a single instance. The resulting concurrency issues are part of the motivation behind the Callable interface introduced in Java 5. It's much like Runnable, but can return a result in a thread-safe manner, and throw an Exception if the task can't be executed.
It's a bit of a digression, but if you are curious, consider the following example:
class Oops extends Thread { /* Note that thread implements "Runnable" */
private int counter = 0;
private Collection<Integer> state = ...;
public void run() {
state.add(counter);
counter++;
}
public static void main(String... argv) throws Exception {
Oops oops = new Oops();
oops.start();
Thread t2 = new Thread(oops); /* Now pass the same Runnable to a new Thread. */
t2.start(); /* Execute the "run" method of the same instance again. */
...
}
}
By the end of the the main method you pretty much have no idea what the "state" of the Collection is. Two threads are working on it concurrently, and we haven't specified whether the collection is safe for concurrent use. If we initialize it inside the thread, at least we can say that eventually, state will contain one element, but we can't say whether it's 0 or 1.
From wikibooks:
There are three basic kinds of scope for variables in Java:
local variable, declared within a method in a class, valid for (and occupying storage only for) the time that method is executing. Every time the method is called, a new copy of the variable is used.
instance variable, declared within a class but outside any method. It is valid for and occupies storage for as long as the corresponding object is in memory; a program can instantiate multiple objects of the class, and each one gets its own copy of all instance variables. This is the basic data structure rule of Object-Oriented programming; classes are defined to hold data specific to a "class of objects" in a given system, and each instance holds its own data.
static variable, declared within a class as static, outside any method. There is only one copy of such a variable no matter how many objects are instantiated from that class.
So yes, memory consumption is an issue, especially if the ArrayList inside run() is local.
I am not completely I understand your complete problem.
But as far as I understand it right now, the performance/memory benefit will be rather minor. Therefore I would definitely favour the easibility side.
So do what suits you the best. Only address performance/memory optimisation when needed.
My personal rule of thumb for instance variables is to initialize them, at least with a default value, either:
at delcaration time, i.e.
private ArrayList<String> myStrings = new ArrayList<String>();
in the constructor
If it's something that really is an instance variable, and represents state of the object, it is then completely initialized by the time the constructor exits. Otherwise, you open yourself to the possibility of trying to access the variable before it has a value. Of course, that doesn't apply to primitives where you will get a default value automatically.
For static (class-level) variables, initialize them in the declaration or in a static initializer. I use a static initializer if I have do calculations or other work to get a value. Initialize in the declaration if you're just calling new Foo() or setting the variable to a known value.
You have to avoid Lazy initialization. It leads to problems later.
But if you have to do it because the initialization is too heavy you have to do it like this:
Static fields:
// Lazy initialization holder class idiom for static fields
private static class FieldHolder {
static final FieldType field = computeFieldValue();
}
static FieldType getField() { return FieldHolder.field; }
Instance fields:
// Double-check idiom for lazy initialization of instance fields
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) { // First check (no locking)
synchronized(this) {
result = field;
if (result == null) // Second check (with locking)
field = result = computeFieldValue();
}
}
return result;
}
Acording to Joshua Bolch book's "Effective Java™
Second Edition" (ISBN-13: 978-0-321-35668-0):
"Use lazy initialization judiciously"