When can Hotspot allocate objects on the stack? [duplicate]

When can Hotspot allocate objects on the stack? [duplicate] - java

This question already has answers here:
Eligibility for escape analysis / stack allocation with Java 7
(3 answers)
Closed 5 years ago.
Since somewhere around Java 6, the Hotspot JVM can do escape analysis and allocate non-escaping objects on the stack instead of on the garbage collected heap. This results in a speedup of the generated code and reduces pressure on the garbage collector.
What are the rules for when Hotspot is able to stack allocate objects? In other words when can I rely on it to do stack allocation?
edit: This question is a duplicate, but (IMO) the answer below is a better answer than what is available at the original question.

I have done some experimentation in order to see when Hotspot is able to stack allocate. It turns out that its stack allocation is quite a bit more limited than what you might expect based on the available documentation. The referenced paper by Choi "Escape Analysis for Java" suggests that an object that is only ever assigned to local variables can always be stack allocated. But that is not true.
All of this are implementation details of the current Hotspot implementation, so they could change in future versions. This refers to my OpenJDK install which is version 1.8.0_121 for X86-64.
The short summary, based on quite a bit of experimentation, seems to be:
Hotspot can stack-allocate an object instance if
all its uses are inlined
it is never assigned to any static or object fields, only to local variables
at each point in the program, which local variables contain references to the object must be JIT-time determinable, and not depend on any unpredictable conditional control flow.
If the object is an array, its size must be known at JIT time and indexing into it must use JIT-time constants.
To know when these conditions hold you need to know quite a bit about how Hotspot works. Relying on Hotspot to definately do stack allocation in a certain situation can be risky, as a lot of non-local factors are involved. Especially knowing if everything is inlined can be difficult to predict.
Practically speaking, simple iterators will usually be stack allocatable if you just use them to iterate. For composite objects only the outer object can ever be stack allocated, so lists and other collections always cause heap allocation.
If you have a HashMap<Integer,Something> and you use it in myHashMap.get(42), the 42 may stack allocate in a test program, but it will not in a full application because you can be sure that there will be more than two types of key objects in HashMaps in the entire program, and therefore the hashCode and equals methods on the key won't inline.
Beyond that I don't see any generally applicable rules, and it will depend on the specifics of the code.
Hotspot internals
The first important thing to know is that escape analysis is performed after inlining. This means that Hotspot's escape analysis is in this respect more powerful than the description in the Choi paper, since an object returned from a method but local to the caller method can still be stack allocated. Because of this iterators can nearly always be stack allocated if you do e.g. for(Foo item : myList) {...} (and the implementation of myList.iterator() is simple enough, which they usually are.)
Hotspot only compiles optimized versions of methods once it determines the method is 'hot', so code that is not run a lot of times does not get optimized at all, in which case there is no stack allocation or inlining whatsoever. But for those methods you usually don't care.
Inlining
Inlining decisions are based on profiling data that Hotspot collects first. The declared types do not matter so much, even if a method is virtual Hotspot can inline it based on the types of the objects it sees during profiling. Something similar holds for branches (i.e. if-statements and other control flow constructs): If during profiling Hotspot never sees a certain branch being taken, it will compile and optimize the code based on the assumption that the branch is never taken. In both cases, if Hotspot cannot prove that its assumptions will always be true, it will insert checks in the compiled code known as 'uncommon traps', and if such a trap is hit Hotspot will de-optimize and possibly re-optimize taking the new information into account.
Hotspot will profile which object types occur as receivers at which call sites. If Hotspot only sees a single type or only two distinct types occuring at a call site, it is able to inline the called method. If there are only one or two very common types and other types occur much less often Hotspot should also still be able to inline the methods of the common types, including a check for which code it needs to take. (I'm not entirely sure about this last case with one or two common types and more uncommon types though). If there are more than two common types, Hotspot will not inline the call at all but instead generate machine code for an indirect call.
'Type' here refers to the exact type of an object. Implemented interfaces or shared superclasses are not taken into account. Even if different receiver types occur at a call site but they all inherit the same implementation of a method (e.g. multiple classes that all inherit hashCode from Object), Hotspot will still generate an indirect call and not inline. (So i.m.o. hotspot is quite stupid in such cases. I hope future versions improve this.)
Hotspot will also only inline methods that are not too big. 'Not too big' is determined by the -XX:MaxInlineSize=n and -XX:FreqInlineSize=n options. Inlinable methods with a JVM bytecode size below MaxInlineSize are always inlined, methods with a JVM bytecode size below FreqInlineSize are inlined if the call is 'hot'. Larger methods are never inlined. By default MaxInlineSize is 35 and FreqInlineSize is platform dependent but for me it is 325. So make sure your methods are not too big if you want them inlined. It can sometimes help to split out the common path from a large method, so that it can be inlined into its callers.
Profiling
One important thing to know about profiling is that profiling sites are based on the JVM bytecode, which itself is not inlined in any way. So if you have e.g. a static method
static <T,U> List<U> map(List<T> list, Function<T,U> func) {
List<U> result = new ArrayList();
for(T item : list) { result.add(func.call(item)); }
return result;
}
that maps a SAM Function callable over a list and returns the transformed list, Hotspot will treat the call to func.call as a single program-wide call site. You might call this map function at several spots in your program, passing a different func in at each call site (but the same one for one call site). In that case you might expect that Hotspot is able to inline map, and then also the call to func.call since at every use of map there is only a single func type. If this were so, Hotspot would be able to optimize the loop down very tightly. Unfortunately Hotspot is not smart enough for that. It only keeps a single profile for the func.call call site, lumping all the func types that you pass to map together. You will probably use more than two different implementations of func, so Hotspot will not be able to inline the call to func.call. Link for more details, and archived link as the original appears to be gone.
(As an aside, in Kotlin the equivalent loop can be fully inlined as the Kotlin compiler can do inlining of calls at the bytecode level. So for some uses it could be significantly faster than Java.)
Scalar Replacement
Another important thing to know is that Hotspot does not actually implement stack allocation of objects. Instead it implements scalar replacement, which means that an object is deconstructed into its constituent fields and those fields are stack allocated like normal local variables. This means that there is no object left at all. Scalar replacement only works if there is never a need to create a pointer to the stack-allocated object. Some forms of stack allocation in e.g. C++ or Go would be able to allocate full objects on the stack and then pass references or pointers to them to called functions, but in Hotspot this does not work. Therefore if there is ever a need to pass an object reference to a non-inlined method, even if the reference would not escape the called method, Hotspot will always heap-allocate such an object.
In principle Hotspot could be smarter about this, but right now it is not.
Test program
I used the following program and variations to see when Hotspot will do scalar replacement.
// Minimal example for which the JVM does not scalarize the allocation. If field is final, or the second allocation is unconditional, it will.
class Scalarization {
int field = 0xbd;
long foo(long i) { return i * field; }
public static void main(String[] args) {
long result = 0;
for(long i=0; i<100; i++) {
result += test();
}
System.out.println("Result: "+result);
}
static long test() {
long ctr = 0x5;
for(long i=0; i<0x10000; i++) {
Scalarization s = new Scalarization();
ctr = s.foo(ctr);
if(i == 0) s = new Scalarization();
ctr = s.foo(ctr);
}
return ctr;
}
}
If you compile and run this program with javac Scalarization.java; java -verbose:gc Scalarization you can see if scalar replacement worked by the number of garbage collections. If scalar replacement works, no garbage collection happened on my system, if scalar replacement did not work I see a few garbage collections.
Variants that Hotspot is able to scalarize run significantly faster than versions where it does not. I verified the generated machine code (instructions) to make sure Hotspot was not doing any unexpected optimizations. If hotspot is able to scalar replace the allocations, it can then also do some additional optimizations on the loop, unrolling it a few iterations and then combining those iterations together. So in the scalarized versions the effective loop count is lower with each iteraton doing the work of multiple source code level iterations. So the speed difference is not only due to allocation and garbage collection overhead.
Observations
I tried a number of variations on the above program. One condition for scalar replacement is that the object must never be assigned to an object (or static) field, and presumably also not into an array. So in code like
Foo f = new Foo();
bar.field = f;
the Foo object cannot be scalar replaced. This holds even if bar itself is scalar replaced, and also if you never again use bar.field. So an object can only ever be assigned to local variables.
That alone is not enough, Hotspot must also be able to determine statically at JIT-time which object instance will be the target of a call. For example, using the following implementations of foo and test and removing field causes heap allocation:
long foo(long i) { return i * 0xbb; }
static long test() {
long ctr = 0x5;
for(long i=0; i<0x10000; i++) {
Scalarization s = new Scalarization();
ctr = s.foo(ctr);
if(i == 50) s = new Scalarization();
ctr = s.foo(ctr);
}
return ctr;
}
While if you then remove the conditional for the second assignment no more heap allocation occurs:
static long test() {
long ctr = 0x5;
for(long i=0; i<0x10000; i++) {
Scalarization s = new Scalarization();
ctr = s.foo(ctr);
s = new Scalarization();
ctr = s.foo(ctr);
}
return ctr;
}
In this case Hotspot can determine statically which instance is the target for each call to s.foo.
On the other hand, even if the second assignment to s is a subclass of Scalarization with a completely different implementation, as long as the assignment is unconditional Hotspot will still scalarize the allocations.
Hotspot does not appear to be able to move an object to the heap that was previously scalar replaced (at least not without deoptimizing). Scalar replacement is an all-or-nothing affair. So in the original test method both allocations of Scalarization always happen on the heap.
Conditionals
One important detail is that Hotspot will predict conditionals based on its profiling data. If a conditional assignment is never executed, Hotspot will compile code under that assumption, and then might be able to do scalar replacement. If at a later point in time the condtion does get taken, Hotspot will need to recompile the code with this new assumption. The new code will not do scalar replacement since Hotspot can no longer determine the receiver instance of following calls statically.
For instance in this variant of test:
static long limit = 0;
static long test() {
long ctr = 0x5;
long i = limit;
limit += 0x10000;
for(; i<limit; i++) { // In this form if scalarization happens is nondeterministic: if the condition is hit before profiling starts scalarization happens, else not.
Scalarization s = new Scalarization();
ctr = s.foo(ctr);
if(i == 0xf9a0) s = new Scalarization();
ctr = s.foo(ctr);
}
return ctr;
}
the conditional assignemnt is only executed once during the lifetime of the program. If this assignment occurs early enough, before Hotspot starts full profiling of the test method, Hotspot never notices the conditional being taken and compiles code that does scalar replacement. If profiling has already started when the conditional is taken, Hotspot will not do scalar replacement. With the test value of 0xf9a0, whether scalar replacement happens is nondeterministic on my computer, since exactly when profiling starts can vary (e.g. because profiling and optimized code is compiled on background threads). So if I run the above variant it sometimes does a few garbage collections, and sometimes does not.
Hotspot's static code analysis is much more limited than what C/C++ and other static compilers can do, so Hotspot is not as smart in following the control flow in a method through several conditionals and other control structures to determine the instance that a variable refers to, even if it would be statically determinable for the programmer or a smarter compiler. In many cases the profiling information will make up for that, but it is something to be aware of.
Arrays
Arrays can be stack allocated if their size is known at JIT time. However indexing into an array is not supported unless Hotspot can also statically determine the index value at JIT-time. So stack allocated arrays are pretty useless. Since most programs don't use arrays directly but use the standard collections this is not very relevant, as embedded objects such as the array containing the data within an ArrayList already need to be heap-allocated due to their embedded-ness. I suppose the reasoning for this restriction is that there exists no indexing operation on local variables so this would require additional code generation functionality for a pretty rare use case.

Related

Are static methods/fields/blocks part of metaspace? Is metaspace apart of heap and is in native memory?

Here in 2021 and lets say Java 13:
Are static methods/fields/blocks part of metaspace? Is metaspace apart of heap and is in native memory?
( I've read many topics here, marked from 2011 since PermGen ages, so I wanna know how is it in 2021 and Java 13)

static methods and blocks aren't a thing in the same way fields are. Thus, you've asked 2 utterly unrelated questions:
Where do methods and other code go, static or not?
Where do (static) fields go?
Where do methods and other code go?
Think about it: A method is just a block of code, and it is static; even a non-static method is the same actual 'content' for any instance. It's just that in a non-static method, any reference to 'a field' is syntax sugared to this.x, and the this ref points at a different object.
There is no functional difference between a and b here:
private class Foo {
int x;
public void a() {
System.out.println(this.x);
}
public static void b(Foo instance) {
System.out.println(instance.x);
}
}
So, all methods and blocks are in this sense 'static': They exist only once in memory no matter how many instances exist, and regardless of whether a method is static or not.
It would be an utter waste of gigantic amounts of memory if e.g. having a few million instances of java.lang.String in memory meant that your computer is holding a few million copies of the toLowerCase() method in memory.
So, that's not how it works. There'll be only one toLowerCase() in memory. Even though that is not a static method.
What's in memory, specifically, is the entire class, as in, the bytecode of it. In addition, more can be in memory: Java has a so-called hotspot compiler, which means that java keeps continuous track of various statistics about a method (how often it is invoked, for example, and even if it is overridable (it is not marked final, is not private, and is not in a final class) but is never actually overridden, as in, no class is loaded that does that - that's all tracked. From time to time the JVM will take a moment and does a fairly intelligent rewrite of a method into optimized machine code, making assumptions based on that bookkeeping. For example, it'll 'hardcode' links to methods that could be overridden but never are, but it will then invalidate these optimized machine code blocks if later on these conclusions cease to be true (for example, now you DO use a class that overrides that method).
The point is: The original bytecode must remain as a hotspotted take may become invalid later, but the whole point of hotspotting is to keep the optimized machine-code (the hotspotted code) around for future executions as well, so now there are 2 separate 'takes' on the same method in memory somewhere: The basic bytecode, and the optimized variant of it.
Where all this goes is not specified. Who knows where it goes - the java language spec and the JVM spec simply do not state it. Note that the command line options of java (the executable) aren't in any spec either. Certainly the -X and the -XX options aren't specced at all. The idea that there is a hotspotted variant isn't specced either; it's just how just about every JVM implementation out there operates.
So where does it go? You'd have to peruse the manual of your JVM implementor. It's not something that fits within the domain of 'a java question'. However, generally, yes, that is precisely what 'metaspace' / 'permgen' are about.
Where do static fields go?
On heap. They do not exist in permgen or metaspace. It's just that they are 'associated' with the instance of the java.lang.Class, effectively (I'm oversimplifying a tad), instead of any particular instance. That Class is never getting unloaded unless you're using dynamic classloading, and therefore, that variable is never eligible for garbage collection as you'd expect. Nevertheless, the ref exists in heap.

Are object initializations in Java "Foo f = new Foo() " essentially the same as using malloc for a pointer in C?

I am trying to understand the actual process behind object creations in Java - and I suppose other programming languages.
Would it be wrong to assume that object initialization in Java is the same as when you use malloc for a structure in C?
Example:
Foo f = new Foo(10);
typedef struct foo Foo;
Foo *f = malloc(sizeof(Foo));
Is this why objects are said to be on the heap rather than the stack? Because they are essentially just pointers to data?

In C, malloc() allocates a region of memory in the heap and returns a pointer to it. That's all you get. Memory is uninitialized and you have no guarantee that it's all zeros or anything else.
In Java, calling new does a heap based allocation just like malloc(), but you also get a ton of additional convenience (or overhead, if you prefer). For example, you don't have to explicitly specify the number of bytes to be allocated. The compiler figures it out for you based on the type of object you're trying to allocate. Additionally, object constructors are called (which you can pass arguments to if you'd like to control how initialization occurs). When new returns, you're guaranteed to have an object that's initialized.
But yes, at the end of the call both the result of malloc() and new are simply pointers to some chunk of heap-based data.
The second part of your question asks about the differences between a stack and a heap. Far more comprehensive answers can be found by taking a course on (or reading a book about) compiler design. A course on operating systems would also be helpful. There are also numerous questions and answers on SO about the stacks and heaps.
Having said that, I'll give a general overview I hope isn't too verbose and aims to explain the differences at a fairly high level.
Fundamentally, the main reason to have two memory management systems, i.e. a heap and a stack, is for efficiency. A secondary reason is that each is better at certain types of problems than the other.
Stacks are somewhat easier for me to understand as a concept, so I start with stacks. Let's consider this function in C...
int add(int lhs, int rhs) {
int result = lhs + rhs;
return result;
}
The above seems fairly straightforward. We define a function named add() and pass in the left and right addends. The function adds them and returns a result. Please ignore all edge-case stuff such as overflows that might occur, at this point it isn't germane to the discussion.
The add() function's purpose seems pretty straightforward, but what can we tell about its lifecycle? Especially its memory utilization needs?
Most importantly, the compiler knows a priori (i.e. at compile time) how large the data types are and how many will be used. The lhs and rhs arguments are sizeof(int), 4 bytes each. The variable result is also sizeof(int). The compiler can tell that the add() function uses 4 bytes * 3 ints or a total of 12 bytes of memory.
When the add() function is called, a hardware register called the stack pointer will have an address in it that points to the top of the stack. In order to allocate the memory the add() function needs to run, all the function-entry code needs to do is issue one single assembly language instruction to decrement the stack pointer register value by 12. In doing so, it creates storage on the stack for three ints, one each for lhs, rhs, and result. Getting the memory space you need by executing a single instruction is a massive win in terms of speed because single instructions tend to execute in one clock tick (1 billionth of a second a 1 GHz CPU).
Also, from the compiler's view, it can create a map to the variables that looks an awful lot like indexing an array:
lhs: ((int *)stack_pointer_register)[0]
rhs: ((int *)stack_pointer_register)[1]
result: ((int *)stack_pointer_register)[2]
Again, all of this is very fast.
When the add() function exits it has to clean up. It does this by subtracting 12 bytes from the stack pointer register. It's similar to a call to free() but it only uses one CPU instruction and it only takes one tick. It's very, very fast.
Now consider a heap-based allocation. This comes into play when we don't know a priori how much memory we're going to need (i.e. we'll only learn about it at runtime).
Consider this function:
int addRandom(int count) {
int numberOfBytesToAllocate = sizeof(int) * count;
int *array = malloc(numberOfBytesToAllocate);
int result = 0;
if array != NULL {
for (i = 0; i < count; ++i) {
array[i] = (int) random();
result += array[i];
}
free(array);
}
return result;
}
Notice that the addRandom() function doesn't know at compile time what the value of the count argument will be. Because of this, it doesn't make sense to try to define array like we would if we were putting it on the stack, like this:
int array[count];
If count is huge it could cause our stack to grow too large and overwrite other program segments. When this stack overflow happens your program crashes (or worse).
So, in cases where we don't know how much memory we'll need until runtime, we use malloc(). Then we can just ask for the number of bytes we need when we need it, and malloc() will go check if it can vend that many bytes. If it can, great, we get it back, if not, we get a NULL pointer which tells us the call to malloc() failed. Notably though, the program doesn't crash! Of course you as the programmer can decide that your program isn't allowed to run if resource allocation fails, but programmer-initiated termination is different than a spurious crash.
So now we have to come back to look at efficiency. The stack allocator is super fast - one instruction to allocate, one instruction to deallocate, and it's done by the compiler, but remember the stack is meant for things like local variables of a known size so it tends to be fairly small.
The heap allocator on the other hand is several orders of magnitude slower. It has to do a lookup in tables to see if it has enough free memory to be able to vend the amount of memory the user wants. It has to update those tables after it vends the memory to make sure no one else can use that block (this bookkeeping may require the allocator to reserve memory for itself in addition to what it plans to vend). The allocator has to employ locking strategies to make sure it vends memory in a thread-safe way. And when memory is finally free()d, which happens at different times and in no predictable order typically, the allocator has to find contiguous blocks and stitch them back together to repair heap fragmentation. If that sounds like it's going to take more than a single CPU instruction to accomplish all of that, you're right! It's very complicated and it takes a while.
But heaps are big. Much larger than stacks. We can get lots of memory from them and they're great when we don't know at compile time how much memory we'll need. So we trade off speed for a managed memory system that declines us politely instead of crashing when we try to allocate something too large.
I hope that helps answer some of your questions. Please let me know if you'd like clarification on any of the above.

How fast or space-efficient if I don't use local variable with Java

I wonder if these two methods has any difference.
private int getSum(int a, int b) {
int total = a + b;
return total;
}
private int getSum2(int a, int b) {
return a+b;
}
What I have learned is that a compiler or interpreter will automatically optimize this. Is it right?

This difference will exist in the byte code and it’s impact is, of course, implementation dependent.
Your additional int variable can’t use the space of another variable with disjunct scope, so when executed (interpreted) literally, it raises the required size of the stack frame by four bytes. However, environments like the widely used HotSpot JVM pre-allocate the entire stack space of a thread when it is started and do not support resizing, so with this implementation, the local variable has no impact on the memory consumption at all.
You could be tempted to say, that if you add a local variable to a recursive method, the additional required memory may add up, reducing the maximum recursion depth by a few invocations (in interpreted mode), but as discussed in this answer, there is already an indeterminism in the maximum recursion depth larger than that.
That answer also demonstrates the impact of compilation/optimization on the required stack trace, reducing the required stack space by factor six in the example. When your method is a hot spot, the optimizer will have look at it and one of the tools in its box is a transformation into the SSA form, which only knows variables, which are used to model the data transfers. In other words, the differences between these variants are already eliminated in the intermediate representation. From there, the values are likely mapped to CPU registers when generating the native code, so the result will not require stack space at all.
Though it is even likelier that such a small method gets inlined into the caller each time where the operation gets fused with whatever the caller does, to result in code we can’t predict by just looking at the method. In either case, whether you use a temporary local variable or not, is irrelevant.

Which code is more CPU/memory efficient when used with a Garbage Collected language?

I have these two dummy pieces of code (let's consider they are written in either Java or C#, all variables are local):
Code 1:
int a;
int b = 0;
for (int i = 1; i < 10 ; i++)
{
a = 10;
b += i;
// a lot of more code that doesn't involve assigning new values to "a"
}
Code 2:
int b = 0;
for (int i = 1; i < 10 ; i++)
{
int a = 10;
b += i;
// a lot of more code that doesn't involve assigning new values to "a"
}
At first glance I would say both codes consume the same amount of memory, but Code 1 is more CPU efficient because it creates and allocates variable a just once.
Then I read that Garbage Collectors are extremely efficient to the point that Code 2 would be the more Memory (and CPU?) efficient: keeping variable a inside the loop makes it belongs to Gen0, so it would be garbage collected before variable b.
So, when used with a Garbage Collected language, Code 2 is the more efficient. Am I right?

A few points:
ints (and other primitives) are never allocated on heap. They live directly on the thread stack, "allocation" and "deallocation" are simple moves of a pointer, and happen once (when the function is entered, and immediately after return), regardless of scope.
primitives, that are accessed often, are usually stored in a register for speed, again, regardless of scope.
in your case a (and possibly, b as well, together with the whole loop) will be "optimized away", the optimizer is smart enough to detect a situation when a variable value changes, but is never read, and skip redundant operations. Or, if there is code that actually looks at a, but does not modify it, it will likely be replaced by the optimizer by a constant value of "10", that'll just appear inline everywhere where a is referenced.
New objects (if you did something like String a = new String("foo") for example instead of int) are always allocated in young generation, and only get transferred into old gen after they survive a few minor collections. This means that, for most of the cases, when an object is allocated inside a function, and never referenced from outside, it will never make it to the old gen regardless of its exact scope, unless your heap structure desperately needs tuning.
As pointed out in the comment, sometimes the VM might decide to allocate a large object directly in the old gen (this is true for java too, not just .net), so the point above only apply in most cases, but not always. However, in relation to this question, this does not make any difference, because if the decision is made to allocate an object in old gen, it is made without regard of the scope of its initial reference anyway.
From performance and memory standpoint your two snippets are identical. From the readability perspective though, it is always a good idea to declare all variables in the narrowest possible scope.

Before the code in snippet 2 is actually executed it's going to end up being transformed to look like the code in snippet 1 behind the scenes (whether it be a compiler or runtime). As a result, the performance of the two snippets is going to be identical, as they'll compile into functionally the same code at some point.
Note that for very short lived variables it's actually quite possible for them to not have memory allocated for them at all. They may well be stored entirely in a register, involving 0 memory allocation.

Is unused object available for garbage collection when it's still visible in stack?

In the following example there are two functionally equivalent methods:
public class Question {
public static String method1() {
String s = new String("s1");
// some operations on s1
s = new String("s2");
return s;
}
public static String method2() {
final String s1 = new String("s1");
// some operations on s1
final String s2 = new String("s2");
return s2;
}
}
however in first(method1) of them string "s1" is clearly available for garbage collection before return statement. In second(method2) string "s1" is still reachable (though from code review prospective it's not used anymore).
My question is - is there anything in jvm spec which says that once variable is unused down the stack it could be available for garbage collection?
EDIT:
Sometimes variables can refer to object like fully rendered image and that have impact on memory.
I'm asking because of practical considerations. I have large chunk of memory-greedy code in one method and thinking if I could help JVM (a bit) just by splitting this method into few small ones.
I really prefer code where no reassignment is done since it's easier to read and reason about.
UPDATE: per jls-12.6.1:
Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner
So it looks like it's possible for GC to claim object which still visible. I doubt, however that this optimisation is done during offline compilation (it would screw up debugging) and most likely will be done by JIT.

No, because your code could conceivably retrieve it and do something with it, and the abstract JVM does not consider what code is coming ahead. However, a very, very, very clever optimizing JVM might analyze the code ahead and find that there is no way s1 could ever be referenced, and garbage collect it. You definitely can't count on this, though.

If you're talking about the interpreter, then in the second case S1 remains "referenced" until the method exits and the stack frame is rolled up. (That is, in the standard interpreter -- it's entirely possible for GC to use liveness info from method verification. And, in addition (and more likely), javac may do its own liveness analysis and "share" interpreter slots based on that.)
In the case of the JITC, however, an even mildly optimizing one might recognize that S1 is unused and recycle that register for S2. Or it might not. The GC will examine register contents, and if S1 has been reused for something else then the old S1 object will be reclaimed (if not otherwise referenced). If the S1 location has not been reused then the S1 object might not be reclaimed.
"Might not" because, depending on the JVM, the JITC may or may not provide the GC with a map of where object references are "live" in the program flow. And this map, if provided, may or may not precisely identify the end of the "live range" (the last point of reference) of S1. Many different possibilities.
Note that this potential variability does not violate any Java principles -- GC is not required to reclaim an object at the earliest possible opportunity, and there's no practical way for a program to be sensitive to precisely when an object is reclaimed.

VM is free to optimized the code to nullify s1 before method exit (as long as it's correct), so s1 might be eligible for garbage earlier.
However that is hardly necessary. Many method invocations must have happened before the next GC; all the stack frames have been cleared anyway, no need to worry about a specific local variable in a specific method invocation.
As far as Java the language is concerned, garbages can live forever without impact program semantics. That's why JLS hardly talks about garbage at all.

in first of them string "s1" is clearly available for garbage collection before return statement
It isn't clear at all. I think you are confusing 'unused' with 'unreachable'. They aren't necessarily the same thing.
Formally speaking the variable is live until its enclosing scope terminates, so it isn't available for garbage collection until then.
However "a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner" JLS #12.6.1.

Basically stack frames and static area are considered as roots by GC. So if object is referenced from any stack frame its considered alive. The problem with reclaiming some objects from active stack frame is that GC works in parallel with application(mutator). How do you think GC should find out that object is unused while method is in progress? That would require a synchronization which would be VERY heavy and complex, in fact this will break the idea of GC to work in parallel with mutator. Every thread might keep variables in processor registers. To implement your logic, they should also be added to GC roots. I cant even imagine how to implement it.
To answer you question. If you have any logic which produces a lot of objects which are unused in the future, separate it to a distinct method. This is actually a good practice.
You should also take int account optimizations by JVM(like EJP pointed out). There is also an escape analysis, which might prevent object from heap allocation at all. But rely your codes performance on them is a bad practice

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.