Prevent Java 7 from premature GC

Prevent Java 7 from premature GC - java

Similar to Can JIT be prevented from optimising away method calls? I'm attempting to track memory usage of long-lived data store objects, however I'm finding that if I initialize a store, log the system memory, then initialize another store, sometimes the compiler (presumably the JIT) is smart enough to notice that these objects are no longer needed.
public class MemTest {
public static void main(String[] args) {
logMemory("Initial State");
MemoryHog mh = new MemoryHog();
logMemory("Built MemoryHog");
MemoryHog mh2 = new MemoryHog();
logMemory("Built Second MemoryHog"); // by here, mh may be GCed
}
}
Now the suggestion in the linked thread is to keep a pointer to these objects, but the GC appears to be smart enough to tell that the objects aren't used by main() anymore. I could add a call to these objects after the last logMemory() call, but that's a rather manual solution - every time I test an object, I have to do some sort of side-effect triggering call after the final logMemory() call, or I may get inconsistent results.
I'm looking for general case solutions; I understand that adding a call like System.out.println(mh.hashCode()+mh2.hashCode()) at the end of the main() method would be sufficient, but I dislike this for several reasons. First, it introduces an external dependency on the testing above - if the SOUT call is removed, the behavior of the JVM during the memory logging calls may change. Second, it's prone to user-error; if the objects being tested above change, or new ones are added, the user must remember to manually update this SOUT call as well, or they'll introduce difficult to detect inconsistencies in their test. Finally, I dislike that this solution prints at all - it seems like an unnecessary hack that I can avoid with a better understanding of the JIT's optimizations. To the last point, Patricia Shanahan's answer offers a reasonable solution (explicitly print that the output is for memory sanity purposes) but I'd still like to avoid it if possible.
So my initial solution is to store these objects in a static list, and then iterate over them in the main class's finalize method*, like so:
public class MemTest {
private static ArrayList<Object> objectHolder = new ArrayList<>();
public static void main(String[] args) {
logMemory("Initial State", null);
MemoryHog mh = new MemoryHog();
logMemory("Built MemoryHog", mh); // adds mh to objectHolder
MemoryHog mh2 = new MemoryHog();
logMemory("Built Second MemoryHog", mh2); // adds mh2 to objectHolder
}
protected void finalize() throws Throwable {
for(Object o : objectHolder) {
o.hashCode();
}
}
}
But now I've only offloaded the problem one step - what if the JIT optimizes away the loop in the finalize method, and decides these objects don't need to be saved? Admittedly, maybe simply holding the objects in the main class is enough for Java 7, but unless it's documented that the finalzie method can't be optimized away, there's still nothing theoretically preventing the JIT/GC from getting rid of these objects early, since there's no side effects in the contents of my finalize method.
One possibility would be to change the finalize method to:
protected void finalize() throws Throwable {
int codes = 0;
for(Object o : loggedObjects) {
codes += o.hashCode();
}
System.out.println(codes);
}
As I understand it (and I could be wrong here), calling System.out.println() will prevent the JIT from getting rid of this code, since it's a method with external side effects, so even though it doesn't impact the program, it can't be removed. This is promising, but I don't really want some sort of gibberish being output if I can help it. The fact that the JIT can't (or shouldn't!) optimize away System.out.println() calls suggests to me that the JIT has a notion of side effects, and if I can tell it this finalize block has such side effects, it should never optimize it away.
So my questions:
Is holdijng a list of objects in the main class enough to prevent them from ever being GCed?
Is looping over those objects and calling something trivial like .hashCode() in the finalize method enough?
Is computing and printing some result in this method enough?
Are there other methods (like System.out.println) the JIT is aware of that cannot be optimized away, or even better, is there some way to tell the JIT not to optimize away a method call / code block?
*Some quick testing confirms, as I suspected, that the JVM doesn't generally run the main class's finalize method, it abruptly exits. The JIT/GC may still not be smart enough to GC my objects simply because the finalize method exists, even if it doesn't get run, but I'm not confident that's always the case. If it's not documented behavior, I can't reasonably trust it will remain true, even if it's true now.

Here's a plan that may be overkill, but should be safe and reasonably simple:
Keep a List of references to the objects.
At the end, iterate over the list summing the hashCode() results.
Print the sum of the hash codes.
Printing the sum ensures that the final loop cannot be optimized out. The only thing you need to do for each object creation is put it in a List add call.

Yes, it would be legal for mh1 to be garbage collected at that point. At that point, there is no code that could possibly use the variable. If the JVM could detect this, then the corresponding MemoryHog object will be treated as unreachable ... if the GC were to run at that point.
A later call like System.out.println(mh1) would be sufficient to inhibit collection of the object. So would using it in a "computation"; e.g.
if (mh1 == mh2) { System.out.println("the sky is falling!"); }
Is holding a list of objects in the main class enough to prevent them from ever being GCed?
It depends on where the list is declared. If the list was a local variable, and it became unreachable before mh1, then putting the object into the list will make no difference.
Is looping over those objects and calling something trivial like .hashCode() in the finalize method enough?
By the time the finalize method is called, the GC has already decided that the object is unreachable. The only way that the finalize method could prevent the object being deleted would be to add it to some other (reachable) data structure or assign it to a (reachable) variable.
Are there other methods (like System.out.println) the JIT is aware of that cannot be optimized away,
Yea ... anything that makes the object reachable.
or even better, is there some way to tell the JIT not to optimize away a method call / code block?
No way to do that ... apart from making sure that the method call or code block does something that contributes to the computation being performed.
UPDATE
First, what is going on here is not really JIT optimization. Rather, the JIT is emitting some kind of "map" that the GC is using to determine when local variables (i.e. variables on the stack) are dead ... depending on the program counter (PC).
Your examples to inhibit collection all involve blocking the JIT via SOUT, I'd like to avoid that somewhat hacky solution.
Hey ... ANYTHING that depends on the exact timing of when things are garbage collected is a hack. You are not supposed to do that in a properly engineered application.
I updated my code to make it clear that the list that's holding my objects is a static variable of the main class, but it seems if the JIT's smart enough it could still theoretically GC these values once it knows the main method doesn't need them.
I disagree. In practice, the JIT cannot determine that a static will never be referenced. Consider these cases:
Before the JIT runs, it appears that nothing will use static s again. After the JIT has run, the application loads a new class that refers to s. If the JIT "optimized" the s variable, the GC would treat it as unreachable, and either null it or create a dangling references. When the dynamically loaded class then looked at s it would then see the wrong value ... or worse.
If the application ... or any libraries used by the application ... uses reflection, then it can refer to the value of any static variable without this being detectable by the JIT.
So while it is theoretically possible to do this optimization is a small number of cases:
in the vast majority of cases, you can't, and
in the few cases that you can, the pay-off (in terms of performance improvement) is most likely negligible.
I similarly updated my code to clarify that I'm talking about the finalize method of the main class.
The finalize method of the main class is irrelevant because:
you are not creating an instance of the main class, and
the finalize method CANNOT refer to the local variables of another method (e.g. the main method).
... it's existence prevents the JIT from nuking my static list.
Not true. The static list can't be nuked anyway; see above.
As I understand it, there's something special about SOUT that the JIT is aware of that prevents it from optimizing such calls away.
There is nothing special about sout. It is just something that we KNOW that influences the results of the computation and that we therefore KNOW that the JIT cannot legally optimize away.

Related

Are static methods/fields/blocks part of metaspace? Is metaspace apart of heap and is in native memory?

Here in 2021 and lets say Java 13:
Are static methods/fields/blocks part of metaspace? Is metaspace apart of heap and is in native memory?
( I've read many topics here, marked from 2011 since PermGen ages, so I wanna know how is it in 2021 and Java 13)

static methods and blocks aren't a thing in the same way fields are. Thus, you've asked 2 utterly unrelated questions:
Where do methods and other code go, static or not?
Where do (static) fields go?
Where do methods and other code go?
Think about it: A method is just a block of code, and it is static; even a non-static method is the same actual 'content' for any instance. It's just that in a non-static method, any reference to 'a field' is syntax sugared to this.x, and the this ref points at a different object.
There is no functional difference between a and b here:
private class Foo {
int x;
public void a() {
System.out.println(this.x);
}
public static void b(Foo instance) {
System.out.println(instance.x);
}
}
So, all methods and blocks are in this sense 'static': They exist only once in memory no matter how many instances exist, and regardless of whether a method is static or not.
It would be an utter waste of gigantic amounts of memory if e.g. having a few million instances of java.lang.String in memory meant that your computer is holding a few million copies of the toLowerCase() method in memory.
So, that's not how it works. There'll be only one toLowerCase() in memory. Even though that is not a static method.
What's in memory, specifically, is the entire class, as in, the bytecode of it. In addition, more can be in memory: Java has a so-called hotspot compiler, which means that java keeps continuous track of various statistics about a method (how often it is invoked, for example, and even if it is overridable (it is not marked final, is not private, and is not in a final class) but is never actually overridden, as in, no class is loaded that does that - that's all tracked. From time to time the JVM will take a moment and does a fairly intelligent rewrite of a method into optimized machine code, making assumptions based on that bookkeeping. For example, it'll 'hardcode' links to methods that could be overridden but never are, but it will then invalidate these optimized machine code blocks if later on these conclusions cease to be true (for example, now you DO use a class that overrides that method).
The point is: The original bytecode must remain as a hotspotted take may become invalid later, but the whole point of hotspotting is to keep the optimized machine-code (the hotspotted code) around for future executions as well, so now there are 2 separate 'takes' on the same method in memory somewhere: The basic bytecode, and the optimized variant of it.
Where all this goes is not specified. Who knows where it goes - the java language spec and the JVM spec simply do not state it. Note that the command line options of java (the executable) aren't in any spec either. Certainly the -X and the -XX options aren't specced at all. The idea that there is a hotspotted variant isn't specced either; it's just how just about every JVM implementation out there operates.
So where does it go? You'd have to peruse the manual of your JVM implementor. It's not something that fits within the domain of 'a java question'. However, generally, yes, that is precisely what 'metaspace' / 'permgen' are about.
Where do static fields go?
On heap. They do not exist in permgen or metaspace. It's just that they are 'associated' with the instance of the java.lang.Class, effectively (I'm oversimplifying a tad), instead of any particular instance. That Class is never getting unloaded unless you're using dynamic classloading, and therefore, that variable is never eligible for garbage collection as you'd expect. Nevertheless, the ref exists in heap.

Is unused object available for garbage collection when it's still visible in stack?

In the following example there are two functionally equivalent methods:
public class Question {
public static String method1() {
String s = new String("s1");
// some operations on s1
s = new String("s2");
return s;
}
public static String method2() {
final String s1 = new String("s1");
// some operations on s1
final String s2 = new String("s2");
return s2;
}
}
however in first(method1) of them string "s1" is clearly available for garbage collection before return statement. In second(method2) string "s1" is still reachable (though from code review prospective it's not used anymore).
My question is - is there anything in jvm spec which says that once variable is unused down the stack it could be available for garbage collection?
EDIT:
Sometimes variables can refer to object like fully rendered image and that have impact on memory.
I'm asking because of practical considerations. I have large chunk of memory-greedy code in one method and thinking if I could help JVM (a bit) just by splitting this method into few small ones.
I really prefer code where no reassignment is done since it's easier to read and reason about.
UPDATE: per jls-12.6.1:
Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner
So it looks like it's possible for GC to claim object which still visible. I doubt, however that this optimisation is done during offline compilation (it would screw up debugging) and most likely will be done by JIT.

No, because your code could conceivably retrieve it and do something with it, and the abstract JVM does not consider what code is coming ahead. However, a very, very, very clever optimizing JVM might analyze the code ahead and find that there is no way s1 could ever be referenced, and garbage collect it. You definitely can't count on this, though.

If you're talking about the interpreter, then in the second case S1 remains "referenced" until the method exits and the stack frame is rolled up. (That is, in the standard interpreter -- it's entirely possible for GC to use liveness info from method verification. And, in addition (and more likely), javac may do its own liveness analysis and "share" interpreter slots based on that.)
In the case of the JITC, however, an even mildly optimizing one might recognize that S1 is unused and recycle that register for S2. Or it might not. The GC will examine register contents, and if S1 has been reused for something else then the old S1 object will be reclaimed (if not otherwise referenced). If the S1 location has not been reused then the S1 object might not be reclaimed.
"Might not" because, depending on the JVM, the JITC may or may not provide the GC with a map of where object references are "live" in the program flow. And this map, if provided, may or may not precisely identify the end of the "live range" (the last point of reference) of S1. Many different possibilities.
Note that this potential variability does not violate any Java principles -- GC is not required to reclaim an object at the earliest possible opportunity, and there's no practical way for a program to be sensitive to precisely when an object is reclaimed.

VM is free to optimized the code to nullify s1 before method exit (as long as it's correct), so s1 might be eligible for garbage earlier.
However that is hardly necessary. Many method invocations must have happened before the next GC; all the stack frames have been cleared anyway, no need to worry about a specific local variable in a specific method invocation.
As far as Java the language is concerned, garbages can live forever without impact program semantics. That's why JLS hardly talks about garbage at all.

in first of them string "s1" is clearly available for garbage collection before return statement
It isn't clear at all. I think you are confusing 'unused' with 'unreachable'. They aren't necessarily the same thing.
Formally speaking the variable is live until its enclosing scope terminates, so it isn't available for garbage collection until then.
However "a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner" JLS #12.6.1.

Basically stack frames and static area are considered as roots by GC. So if object is referenced from any stack frame its considered alive. The problem with reclaiming some objects from active stack frame is that GC works in parallel with application(mutator). How do you think GC should find out that object is unused while method is in progress? That would require a synchronization which would be VERY heavy and complex, in fact this will break the idea of GC to work in parallel with mutator. Every thread might keep variables in processor registers. To implement your logic, they should also be added to GC roots. I cant even imagine how to implement it.
To answer you question. If you have any logic which produces a lot of objects which are unused in the future, separate it to a distinct method. This is actually a good practice.
You should also take int account optimizations by JVM(like EJP pointed out). There is also an escape analysis, which might prevent object from heap allocation at all. But rely your codes performance on them is a bad practice

Why is adding to a static ArrayList in a constructor a leak?

I am using the NetBeans IDE and it is giving me a warning that does not make sense to me. The warning states "Leaking this in constructor". The following code is the basic setup (I just removed code irrelevant to the issue). Basically I just want to keep a list of all Square objects made. Is this a warning I need to worry about? Or is it just the possible cause of a memory leak depending on the situation?
Either way, can someone explain why this would be considered a leak?
public class Square {
private static ArrayList<Square> squares;
public Square() {
if(squares == null) {
squares = new ArrayList<>();
}
squares.add(this); // I get a warning on this line
}
}
I know it is just a warning, but I don't like to ignore warnings unless I fully understand what is going on and can make the informed choice for a specific situation.
Thanks!

(not really an answer, but...)
If your objective is really to maintain, in a list, the list of all squares you create, there is a better way to achieve that:
public class Square
{
private static final List<Square> allSquares = new ArrayList<Square>();
// Constructor: private!
private Square() {}
// Create a square
public static Square newSquare()
{
Square ret = new Square();
allSquares.add(ret);
return ret;
}
}
You will note that there is no this escape in the constructor.
For creating a new square, you will then do:
Square mySquare = Square.newSquare();

I don't think the warning concerns garbage collection, although that is indeed a problem here. The instances will never be GC'ed (unless the whole ClassLoader is collected).
The warning is saying that this is being passed to another method from within the constructor. Before the constructor finishes, this is not necessarily a fully-formed and initialized object according to the logic enshrined in the constructor. Anything in the constructor is intended to happen before anything else gets its hands on the object. But something else is getting to use this before the constructor finishes. That could cause surprising bugs.

can someone explain why this would be considered a leak?
It is a (potential) leak because the squares list and any object in the list won't be garbage collected. If there is no other code to either remove objects from the list, or clear or null the list, then objects will leak via the list.
Perhaps you need to understand what "memory leak" means in the context of a garbage collected language. In a language like C or C++, a storage leak happens when objects are lost; i.e. the code that should have free / disposed the object fails to do so. In an garbage collected language, a leak occurs when the GC fails to free / dispose an object because it appears to still be in use; i.e. the GC can still find the object by tracing.
However on rereading the question, I agree with Sean Owen. The message "Leaking this in constructor" is most likely talking about the fact that the constructor is making the object's reference visible before the constructor has completed. This is also referred to as "unsafe publication". It can be a source of insidious concurrency bugs. (It could even be a problem in a single-threaded application; e.g. if you create a subclass of Square ...)

Disclaimer: this is just a guess!
If this is the only place that squares is modified, then it means that Square objects can never be garbage-collected, as there'll always be at least one reference to each object. If so, perhaps your IDE is smart enough to spot this.

Why is adding to a static ArrayList in a constructor a memory leak?
Nope. That's not the root cause of the warning you are getting. However there are multiple other problems.
this should not be passed outside the constructor. (sporadic issues can arise if you don't follow this)
Square class holds a list of all it's objects created. That means for each object created there exists at least on reference.
Square aSq = new Square(); // two references, aSq and reference in ArrayList
new Square(); // one reference in ArrayList
So, until the class is present in memory, all the objects created will never be Garbage Colleced. and hence memory leak.

I assume you understand that static variables are referenced by Class objects which are referenced by ClassLoaders. So unless your ClassLoader itself qualifies for garbage collection , or the ClassLoader drops your referencing Class by some means, it will never be garbage collected. So, everytime your constructor is called , you are adding data to a list that will almost never gets garbage collected.

For whatever reason, I think this is a bad practice.
The raison d'etre for OOP, is encapsulation of a set of contracts. And those contracts must be clearly visible to the subscriber of the class.
When someone instantiates new Square(), it would not be obvious to the programmer that the instantiation is also adding this instance into a non-garbage-collected list.
So, I the programmer happily instantiate Square frequently and trivially believing that it would be garbage collected as long as I consciously abrogate all references to each of those trivially instantiated Squares. But I am wrong - because despite all my conscious efforts, none of the instances would be garbage collected.
You know, garbage collection has afforded us programmers some habitual short-cut practices. For example, the following while structure exemplifies a simplification of trivially instantiated objects. The programmer has no qualms repeatedly reinstantiating String ins, because he/she knows the previous instance would be garbage collected.
String ins = new String();
while(c != null && ins != null) {
ins = new String();
readInputInto(ins);
}
However, replace String with your Square and put it into the hands of the common programmer. The programmer would assume the same garbage-collectability as String!
Therefore, it is not good practice to try to do too much in your constructor, especially hiding an action that has significant impact inside the constructor.
You should simply have the instance added to the list outside the constructor.

How to ensure finalize() is always called (Thinking in Java exercise)

I'm slowly working through Bruce Eckel's Thinking in Java 4th edition, and the following problem has me stumped:
Create a class with a finalize( ) method that prints a message. In main( ), create an object of your class. Modify the previous exercise so that your finalize( ) will always be called.
This is what I have coded:
public class Horse {
boolean inStable;
Horse(boolean in){
inStable = in;
}
public void finalize(){
if (!inStable) System.out.print("Error: A horse is out of its stable!");
}
}
public class MainWindow {
public static void main(String[] args) {
Horse h = new Horse(false);
h = new Horse(true);
System.gc();
}
}
It creates a new Horse object with the boolean inStable set to false. Now, in the finalize() method, it checks to see if inStable is false. If it is, it prints a message.
Unfortunately, no message is printed. Since the condition evaluates to true, my guess is that finalize() is not being called in the first place. I have run the program numerous times, and have seen the error message print only a couple of times. I was under the impression that when System.gc() is called, the garbage collector will collect any objects that aren't referenced.
Googling a correct answer gave me this link, which gives much more detailed, complicated code. It uses methods I haven't seen before, such as System.runFinalization(), Runtime.getRuntime(), and System.runFinalizersOnExit().
Is anybody able to give me a better understanding of how finalize() works and how to force it to run, or walk me through what is being done in the solution code?

When the garbage collector finds an object that is eligible for collection but has a finalizer it does not deallocate it immediately. The garbage collector tries to complete as quickly as possible, so it just adds the object to a list of objects with pending finalizers. The finalizer is called later on a separate thread.
You can tell the system to try to run pending finalizers immediately by calling the method System.runFinalization after a garbage collection.
But if you want to force the finalizer to run, you have to call it yourself. The garbage collector does not guarantee that any objects will be collected or that the finalizers will be called. It only makes a "best effort". However it is rare that you would ever need to force a finalizer to run in real code.

Outside of toy scenarios, it's generally not possible to ensure that a finalize will always be called on objects to which no "meaningful" references exist, because the garbage collector has no way of knowing which references are "meaningful". For example, an ArrayList-like object might have a "clear" method which sets its count to zero, and makes all elements within the backing array eligible to be overwritten by future Add calls, but doesn't actually clear the elements in that backing array. If the object has an array of size 50, and its Count is 23, then there may be no execution path by which code could ever examine the references stored in the last 27 slots of the array, but there would be no way for the garbage-collector to know that. Consequently, the garbage-collector would never call finalize on objects in those slots unless or until the container overwrote those array slots, the container abandoned the array (perhaps in favor of a smaller one), or all rooted references to the container itself were destroyed or otherwise ceased to exist.
There are various means to encourage the system to call finalize on any objects for which no strong rooted references happen to exist (which seems to be the point of the question, and which other answers have already covered), but I think it's important to note the distinction between the set of objects to which strong rooted references exist, and the set of objects that code may be interested in. The two sets largely overlap, but each set can contain objects not in the other. Objects' finalizers` run when the GC determines that the objects would no longer exist but for the existence of finalizers; that may or may not coincide with the time code they cease being of interest to anyone. While it would be helpful if one could cause finalizers to run on all objects that have ceased to be of interest, that is in general not possible.

A call to garabage collecter (System.gc()) method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse (i.e its just a suggestion to the jvm, and does not bind it to perform the action then and there, it may or may not do the same). When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all discarded objects. finalize() is called by the garbage collector on an object when garbage collection determines that there are no more references to the object

run new constructor() and System.gc() more than twice.
public class Horse {
boolean inStable;
Horse(boolean in){
inStable = in;
}
public void finalize(){
if (!inStable) System.out.print("Error: A horse is out of its stable!");
}
}
public class MainWindow {
public static void main(String[] args) {
for (int i=0;i<100;i++){
Horse h = new Horse(false);
h = new Horse(true);
System.gc();
}
}
}

Here's what worked for me (partially, but it does illustrate the idea):
class OLoad {
public void finalize() {
System.out.println("I'm melting!");
}
}
public class TempClass {
public static void main(String[] args) {
new OLoad();
System.gc();
}
}
The line new OLoad(); does the trick, as it creates an object with no reference attached. This helps System.gc() run the finalize() method as it detects an object with no reference. Saying something like OLoad o1 = new OLoad(); will not work as it will create a reference that lives until the end of main(). Unfortunately, this works most of the time. As others pointed out, there's no way to ensure finalize() will be always called, except to call it yourself.

When is a Java local variable eligible for GC?

Given the following program:
import java.io.*;
import java.util.*;
public class GCTest {
public static void main(String[] args) throws Exception {
List cache = new ArrayList();
while (true) {
cache.add(new GCTest().run());
System.out.println("done");
}
}
private byte[] run() throws IOException {
Test test = new Test();
InputStream is = test.getInputStream();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buff = new byte[256];
int len = 0;
while (-1 != (len = is.read())) {
baos.write(buff, 0, len);
}
return baos.toByteArray();
}
private class Test {
private InputStream is;
public InputStream getInputStream() throws FileNotFoundException {
is = new FileInputStream("GCTest.class");
return is;
}
protected void finalize() throws IOException {
System.out.println("finalize");
is.close();
is = null;
}
}
}
would you expect the finalize to ever be called when the while loop in the run method is still executing and the local variable test is still in scope?
More importantly, is this behaviour defined anywhere? Is there anything by Sun that states that it is implementation-defined?
This is kind of the reverse of the way this question has been asked before on SO where people are mainly concerned with memory leaks. Here we have the GC aggressively GCing a variable we still have an interest in. You might expect that because test is still "in scope" that it would not be GC'd.
For the record, it appears that sometimes the test "works" (i.e. eventually hits an OOM) and sometimes it fails, depending on the JVM implementation.
Not defending the way this code is written BTW, it's just a question that came up at work.

While the object won't be garbage collected if it is still in scope, the JIT compiler might take it out of scope if the variable isn't actually used any further in the code (hence the differing behavior you are seeing) even though when you read the source code the variable still seems to be "in scope."
I don't understand why you care if an object is garbage collected if you don't reference it anymore in code, but if you want to ensure objects stay in memory, the best way is to reference them directly in a field of a class, or even better in a static field. If a static field references the object, it won't get garbage collected.
Edit: Here is the explicit documentation you are looking for.
> I'm assuming an object cannot die before a local reference to it has gone out of scope.
This can not be assumed. Neither the
Java spec nor the JVM spec guarantees
this.
Just because a variable is in scope,
doesn't mean the object it points to
is reachable. Usually it is the case
that an object pointed to by an
in-scope variable is reachable, but
yours is a case where it is not. The
compiler can determine at jit time
which variables are dead and does not
include such variables in the oop-map.
Since the object pointed to by "nt"
can [sic - should be cannot] be
reached from any live variable, it is
eligible for collection.

I recommend that you and your co-worker read the The Truth About Garbage Collection.
Right at the start, it says this:
The specification for the Java
platform makes very few promises about
how garbage collection actually works. [elided]
While it can seem confusing, the fact
that the garbage collection model is
not rigidly defined is actually
important and useful-a rigidly defined
garbage collection model might be
impossible to implement on all
platforms. Similarly, it might
preclude useful optimizations and hurt
the performance of the platform in the
long term.
In your example, the test variable becomes "invisible" (see A.3.3 of above) in the while loop. At this point some JVMs will continue to view the variable as containing a "hard reference", and other JVMs will treat it as if the variable has been nulled. Either behaviour is acceptable for a compliant JVM
Quoting from the JLS edition 3 (section 12.6.1 paragraph 2):
A reachable object is any object that
can be accessed in any potential
continuing computation from any live
thread.
Notice that reachability is not defined in terms of scopes at all. The quoted text continues as follows:
Optimizing transformations of
a program can be designed that reduce
the number of objects that are
reachable to be less than those which
would naively be considered reachable.
For example, a compiler or code
generator may choose to set a variable
or parameter that will no longer be
used to null to cause the storage for
such an object to be potentially
reclaimable sooner.
(My emphasis added.) This means that an object object may be garbage collected and finalization may occur earlier or later than you would expect. It is also worth noting that some JVMs take more than one GC cycles before unreachable objects are finalized.
The bottom line is that a program that depends on finalization happening earlier or later is inherently non-portable, and to my mind buggy.

Slightly off-topic, but finalize() should never be used to close() a file. The language does not guarantee that finalize() will ever get called. Always use a try ... finally construct to guarantee file closure, database cleanup, etc.

What are you observing that you find strange? Each time you execute run(), you create a new instance of Test. Once run completes, that instance of test is out of scope and eligible for garbage collection. Of course "eligible for garbage collection" and "is garbage collected" are not the same thing. I'd expect that if you run this program, you'd see a bunch of finalize messages scroll by as invocations of run complete. As the only console output I see is these messages, I don't see how you would know which instance of Test is being finalized when you see each message. You might get more interesting results if you added a println at the beginning of each invocation of run, and maybe even added a counter to the Test object that gets incremented each time a new one is created, and which is output with the finalize message. Then you could see what was really happening. (Well, maybe you're running this with a debugger, but that could also obscure more.)

As test is only used once, it can be removed immediately after the call to it. Even if the each call to read used a call to getInputStream instead of using the local is variable, use of the object could be optimised away. FIleInputStream cannot be finalised prematurely due to its use of locking. Finalisers are difficult.
In any case, your finaliser is pointless. The underlying FileInputStream will close itself on finalisation anyway.

In theory Test must not be in the scope since it is at the method level run() and the local variables should be garbage collected as you come out of the method.However you are storing the results in list, and i have read it somehere that lists are prone for storing weak references that are not garbage collected easily (depending on jvm implementation).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.