Garbage collection on intern'd strings, String Pool, and perm-space - java

After exploring java's string internals I've grown confused on what is referred to as the "perm space." My understanding initially of it was that it held String literals as well as class meta data as explained in this question.
I've also read about the String.intern() method and that it places Strings into the String Pool returning a reference to unique instance of it. It is my understanding that this is the same string pool holding String literals that exists in the JVM's perm-space. It didn't seem possible to me that the "perm-space" could be modifiable, (it is permanent after all, yes?). But Then I found this question where the top voted comment by EJP on the accepted answer explains that
Intern'd strings have been GC-able for quite some years now.
Implying that the GC runs on the perm-space which doesn't seem very permanent. How does this reconcile? Does the GC check everything in the perm-space? Does the GC check everything in the string pool including string literals from the source? Is there a second string pool for intern'd strings? Does the GC know only to look over intern'd strings when collecting? Or is this comment mistaken and intern'ing a string prevents it from ever being GC'd (which I hope is not the case)?

String literals are interned. As of Java 7, the HotSpot JVM puts interned Strings in the heap, not permgen.
Prior to java 7, hotspot put interned Strings in permgen. However, interned Strings in permgen were garbage collected. Apparently, Class objects in permgen are also collectable, so everything in permgen is collectable, though permgen collection might not be enabled by default in some old JVMs.
String literals, being interned, would be a reference held by the declaring Class object to the String object in the intern pool. So the interned literal String would only be collected if the Class object that referred to it were also collected.

Related

Java: does pool of strings saves string literals that [duplicate]

In Java, when an object has got no live reference, it is eligible for garbage collection. Now in case of a string, this is not the case because the string will go into the string pool and JVM will keep the object alive for re-use.
So that means a string once created will 'never' be garbage collected?
Now in case of a string, this is not the case because string will go into the string pool and JVM will keep the object alive for reuse. So that means a string once created will 'never' be garbage collected?
First, it is only string literals (see notes) that get automatically interned / added to the string pool. String objects that are created by an application at runtime are not interned ... unless your application explicitly calls String.intern().
Second, in fact the rules for garbage collecting objects in the string pool are the same as for other String objects: indeed all objects. They will be garbage collected if the GC finds them to be unreachable.
In practice, the String objects that correspond to string literals typically do not become candidates for garbage collection. This is because there is an implicit reference to the String object in the code of every method that uses the literal. This means that the String is reachable for as long as the method could be executed.
However, this is not always the case. If a string literal was defined in a class that was dynamically loaded (e.g. using Class.forName(...)), then it is possible to arrange that the class is unloaded. If that happens, then the String object corresponding to the literal may then be unreachable, and may ultimately be GC'ed.
See also: When and how are classes garbage collected in Java?
Notes:
A string literal (JLS 3.10.5) is a string that appears in Java source code; e.g.
"abc" // string literal
new String(...) // not a string literal
A string produced by evaluation of (compile-time) constant expression (JLS 15.28) may also be interned.
"abc" + 123 // this is a constant expression
Strictly speaking, not all String literals are interned:
If a String literal only appears in the source code as a sub-expression of a constant expression, then the literal may not appear in the ".class" file in any form. Such a literal won't be interned because it won't exist at runtime.
In Java 9+, string concatenations involving literals and values that are not compile time constants may be handled differently. Now, at the option of the bytecode compiler, a string concatenation like the following:
int x = 42; // not a compile time constant
String s = "prefix " + x + " suffix";
may result in a string constant like the following being interned:
"prefix \1 suffix"
At runtime, the above string constant is used as the "recipe" for generating a dynamic concatenation method. The original string literals (i.e. "prefix " and " suffix") would not turn into interned string objects.
Kudos to #Holger for pointing this out. More details are in JEP 280 and the javadoc for StringConcatFactory.
Prior to Java 7, the string pool was in PermGen. For some versions of Java, garbage collection of PermGen was not enabled by default if you selected the CMS collector. But CMS was never the default collector AND there was a flag to enable PermGen collection by CMS. (And nobody should be developing code for Java 6 and earlier anymore.)
You are correct; strings in the intern pool will never be GC'd.
However, most strings on not interned.
String literals are interned, and strings passed to String.intern() are interned, but all other strings are not interned and can be GC'd normally.
String objects which are in the string pool will not be garbage collected. Other String objects will be garbage collected if you don't have reference to it in your program execution.
You may ask which string objects goes to string pool.Objects in the string pool are either:
Compile time literals (e.g.String s1 = "123";)
Interned String objects in the runtime (e.g. String s2 = new String("test").intern();)
Both s1 and s2 reference a string object in the string pool.
Any objects which are created at run time and not interned will act as a normal object and reside in heap memory. These objects can be garbage collected.
An example of this would be: String s3 = s1 + s2;
Here, s3 references a string object which resides in heap memory alongside other objects (not in the String pool).
Before Java 7 the string pool resided in Permanent Generation space. So string literals were never garbage collected (which also led to out of memory issues many a times)
After Java 7, string pool is placed in heap space, which is garbage collected by the JVM. It also reduces the chances of getting Out of memory issues in JVM.

When will a string be garbage collected in java

In Java, when an object has got no live reference, it is eligible for garbage collection. Now in case of a string, this is not the case because the string will go into the string pool and JVM will keep the object alive for re-use.
So that means a string once created will 'never' be garbage collected?
Now in case of a string, this is not the case because string will go into the string pool and JVM will keep the object alive for reuse. So that means a string once created will 'never' be garbage collected?
First, it is only string literals (see notes) that get automatically interned / added to the string pool. String objects that are created by an application at runtime are not interned ... unless your application explicitly calls String.intern().
Second, in fact the rules for garbage collecting objects in the string pool are the same as for other String objects: indeed all objects. They will be garbage collected if the GC finds them to be unreachable.
In practice, the String objects that correspond to string literals typically do not become candidates for garbage collection. This is because there is an implicit reference to the String object in the code of every method that uses the literal. This means that the String is reachable for as long as the method could be executed.
However, this is not always the case. If a string literal was defined in a class that was dynamically loaded (e.g. using Class.forName(...)), then it is possible to arrange that the class is unloaded. If that happens, then the String object corresponding to the literal may then be unreachable, and may ultimately be GC'ed.
See also: When and how are classes garbage collected in Java?
Notes:
A string literal (JLS 3.10.5) is a string that appears in Java source code; e.g.
"abc" // string literal
new String(...) // not a string literal
A string produced by evaluation of (compile-time) constant expression (JLS 15.28) may also be interned.
"abc" + 123 // this is a constant expression
Strictly speaking, not all String literals are interned:
If a String literal only appears in the source code as a sub-expression of a constant expression, then the literal may not appear in the ".class" file in any form. Such a literal won't be interned because it won't exist at runtime.
In Java 9+, string concatenations involving literals and values that are not compile time constants may be handled differently. Now, at the option of the bytecode compiler, a string concatenation like the following:
int x = 42; // not a compile time constant
String s = "prefix " + x + " suffix";
may result in a string constant like the following being interned:
"prefix \1 suffix"
At runtime, the above string constant is used as the "recipe" for generating a dynamic concatenation method. The original string literals (i.e. "prefix " and " suffix") would not turn into interned string objects.
Kudos to #Holger for pointing this out. More details are in JEP 280 and the javadoc for StringConcatFactory.
Prior to Java 7, the string pool was in PermGen. For some versions of Java, garbage collection of PermGen was not enabled by default if you selected the CMS collector. But CMS was never the default collector AND there was a flag to enable PermGen collection by CMS. (And nobody should be developing code for Java 6 and earlier anymore.)
You are correct; strings in the intern pool will never be GC'd.
However, most strings on not interned.
String literals are interned, and strings passed to String.intern() are interned, but all other strings are not interned and can be GC'd normally.
String objects which are in the string pool will not be garbage collected. Other String objects will be garbage collected if you don't have reference to it in your program execution.
You may ask which string objects goes to string pool.Objects in the string pool are either:
Compile time literals (e.g.String s1 = "123";)
Interned String objects in the runtime (e.g. String s2 = new String("test").intern();)
Both s1 and s2 reference a string object in the string pool.
Any objects which are created at run time and not interned will act as a normal object and reside in heap memory. These objects can be garbage collected.
An example of this would be: String s3 = s1 + s2;
Here, s3 references a string object which resides in heap memory alongside other objects (not in the String pool).
Before Java 7 the string pool resided in Permanent Generation space. So string literals were never garbage collected (which also led to out of memory issues many a times)
After Java 7, string pool is placed in heap space, which is garbage collected by the JVM. It also reduces the chances of getting Out of memory issues in JVM.

how long can a String survive in the literal pool [duplicate]

I am reading about Garbage collection and i am getting confusing search results when i search for String literal garbage collections.
I need clarification on following points:
If a string is defined as literal at compile time [e.g: String str = "java"] then will it be garbage collected?
If use intern method [e.g: String str = new String("java").intern()] then will it be garbage collected? Also will it be treated differently from String literal in point 1.
Some places it is mentioned that literals will be garbage collected only when String class will be unloaded? Does it make sense because I don't think String class will ever be unloaded.
If a string is defined as literal at compile time [e.g: String str = "java";] then will it be garbage collected?
Probably not. The code objects will contain one or more references to the String objects that represent the literals. So as long as the code objects are reachable, the String objects will be to.
It is possible for code objects to become unreachable, but only if they were dynamically loaded ... and their classloader is destroyed.
If I use the intern method [e.g: String str = new String("java").intern()] then will it be garbage collected?
The object returned by the intern call will be the same object that represents the "java" string literal. (The "java" literal is interned at class loading time. When you then intern the newly constructed String object in your code snippet, it will lookup and return the previously interned "java" string.)
However, interned strings that are not identical with string literals can be garbage collected once they become unreachable. The PermGen space is garbage collected on all recent HotSpot JVMs. (Prior to Java 8 ... which drops PermGen entirely.)
Also will it be treated differently from string literal in point 1.
No ... because it is the same object as the string literal.
And indeed, once you understand what is going on, it is clear that string literals are not treated specially either. It is just an application of the "reachability" rule ...
Some places it is mentioned that literals will be garbage collected only when String class will be unloaded? Does it make sense because I don't think the String class will ever be unloaded.
You are right. It doesn't make sense. The sources that said that are incorrect. (It would be helpful if you posted a URL so that we can read what they are saying for ourselves ...)
Under normal circumstances, string literals and classes are all allocated into the JVM's permanent generation ("PermGen"), and usually won't ever be collected. Strings that are interned (e.g. mystring.intern()) are stored in a memory pool owned by the String class in permgen, and it was once the case that aggressive interning could cause a space leak because the string pool itself held a reference to every string, even if no other references existed. Apparently this is no longer true, at least as of JDK 1.6 (see, e.g., here).
For more on permgen, this is a decent overview of the topic. (Note: that link goes to a blog associated with a product. I don't have any association with the blog, the company, or the product, but the blog entry is useful and doesn't have much to do with the product.)
The literal string will remain in memory as long as the program is in memory.
str will be garbage collected, but the literal it is created from will not.
That makes perfect sense, since the string class is unloaded when the program is unloaded.
intern() method checks the availability of the object in String pool. If the object/literal is available then reference of it will be returned. If the literal is not there in the pool then object is loaded in the perm area (String pool) and then reference to it will be return. We have to use intern() method judiciously.

String Constant Pool memory sector and garbage collection

I read this question on the site How is the java memory pool divided? and i was wondering to which of these sectors does the "String Constant Pool" belongs?
And also does the String literals in the pool ever get GCed?
The intern() method returns the base link of the String literal from the pool.
If the pool does gets GCed then wouldn't it be counter-productive to the idea of the string pool? New String literals would again be created nullifying the GC.
(It is assuming that only a specific set of literals exist in the pool, they never go obsolete and sooner or later they will be needed again)
As far as I know String literals end up in the "Perm Gen" part of non-Heap JVM memory. Perm Gen space is only examined during Full GC runs (not Partials).
In early JVM's (and I confess I had to look this up because I wasn't sure), String literals in the String Pool never got GC'ed. In the newer JVM's, WeakReferences are used to reference the Strings in the pool, so interned Strings can actually get GC'ed, but only during Full Garbage collections.
Reading the JavaDoc for String.intern() doesn't give hints to the implementation, but according to this page, the interned strings are held by a weak reference. This means that if the GC detects that there are no references to the interned string except for the repository that holds interned strings then it is allowed to collect them. Of course this is transparent to external code so unless you are using weak references of your own you'll never know about the garbage collection.
String pooling
String pooling (sometimes also called as string canonicalisation) is a
process of replacing several String objects with equal value but
different identity with a single shared String object. You can achieve
this goal by keeping your own Map (with possibly soft
or weak references depending on your requirements) and using map
values as canonicalised values. Or you can use String.intern() method
which is provided to you by JDK.
At times of Java 6 using String.intern() was forbidden by many
standards due to a high possibility to get an OutOfMemoryException if
pooling went out of control. Oracle Java 7 implementation of string
pooling was changed considerably. You can look for details in
http://bugs.sun.com/view_bug.do?bug_id=6962931 and
http://bugs.sun.com/view_bug.do?bug_id=6962930.
String.intern() in Java 6
In those good old days all interned strings were stored in the PermGen
– the fixed size part of heap mainly used for storing loaded classes
and string pool. Besides explicitly interned strings, PermGen string
pool also contained all literal strings earlier used in your program
(the important word here is used – if a class or method was never
loaded/called, any constants defined in it will not be loaded).
The biggest issue with such string pool in Java 6 was its location –
the PermGen. PermGen has a fixed size and can not be expanded at
runtime. You can set it using -XX:MaxPermSize=96m option. As far as I
know, the default PermGen size varies between 32M and 96M depending on
the platform. You can increase its size, but its size will still be
fixed. Such limitation required very careful usage of String.intern –
you’d better not intern any uncontrolled user input using this method.
That’s why string pooling at times of Java 6 was mostly implemented in
the manually managed maps.
String.intern() in Java 7
Oracle engineers made an extremely important change to the string
pooling logic in Java 7 – the string pool was relocated to the heap.
It means that you are no longer limited by a separate fixed size
memory area. All strings are now located in the heap, as most of other
ordinary objects, which allows you to manage only the heap size while
tuning your application. Technically, this alone could be a sufficient
reason to reconsider using String.intern() in your Java 7 programs.
But there are other reasons.
String pool values are garbage collected
Yes, all strings in the JVM string pool are eligible for garbage
collection if there are no references to them from your program roots.
It applies to all discussed versions of Java. It means that if your
interned string went out of scope and there are no other references to
it – it will be garbage collected from the JVM string pool.
Being eligible for garbage collection and residing in the heap, a JVM
string pool seems to be a right place for all your strings, isn’t it?
In theory it is true – non-used strings will be garbage collected from
the pool, used strings will allow you to save memory in case then you
get an equal string from the input. Seems to be a perfect memory
saving strategy? Nearly so. You must know how the string pool is
implemented before making any decisions.
source.
String literals don't get created into the pool at runtime. I don't know for sure if they get GC'd or not, but I suspect that they do not for two reasons:
It would be immensely complex to detect in the general case when a literal will not be used anymore
There is likely a static code segment where it is stored for performance. The rest of the data is likely built around it, where the boundaries are also static
Strings, even though they are immutable, are still objects like any other in Java. Objects are created on the heap and Strings are no exception. So, Strings that are part of the "String Literal Pool" still live on the heap, but they have references to them from the String Literal Pool.
For more please refer this link
`http://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html`
Edited Newly :
public class ImmutableStrings
{
public static void main(String[] args)
{
String one = "someString";
String two = new String("someString");
one = two = null;
}
}
Just before the main method ends, how many objects are available for garbage collection? 0? 1? 2?
The answer is 1. Unlike most objects, String literals always have a reference to them from the String Literal Pool. That means that they always have a reference to them and are, therefore, not eligible for garbage collection.
neither of our local variables, one or two, refer to our String object, there is still a reference to it from the String Literal Pool. Therefore, the object is not elgible for garbage collection.The object is always reachable through use of the intern() method

Regarding string object pool in PermGC

I heard that string object pool exists in the PermGC and when a string intern is executed, it checks the pool first to see if an equivalent string object exists, if it does not exist, it creates one and returns a reference to the pooled instance.
But here is my first question.
I think that object is created on the heap, especially in the young generation first. If it survives during few garbage collections, it moves to the old generation. can anybody explain how the string object goes to the pool that exists in the Perm GC?
second question:
String s = "test";
s = "test1";
If i reassign "test1" to a reference s and continue to use "test1", does it mean that "test" (created on the young generation) will be garbage collected?
third question:
How is the string object pool related to the runtime constant pool?
Thanks.
What makes you think the interned String first goes to the young generation? The String#intern() method is a native method. It's certainly very possible for an implementation to move it right into the permgen.
Second question: if there's no other references to that "test" String instance, it's eligible for garbage collection. Same story if it's interned. Even an interned String that no longer has any active references can be garbage collected. This might not have been the case in older JVMs, though. And it can be implementation-specific, I guess.
As for the third question, I do not know. All I know is that String literals from source code are placed into the same pool. If you were to construct a String that's equal to a String constant from source and then intern it, you'd be returned the instance that was used to represent the constant. Think of this as String literals having been interned right away.
EDIT: just read your initial few sentences again and I think I see the reason for the confusion. When you call intern() on a String, and no equal String is in the pool yet, then it's not first gonna construct an equivalent String. It'll just move the instance you called intern() on to the pool rather than returning a new reference. That's how it's stated in the JavaDoc.
Strings go to intern pool in two cases:
you explicitly call intern() method on the String object
you initialize it with a literal (you give the explicit content of the String), since Java automatically interns String literals.
The pool is organized as a table, once a String is interned it is added to the pool if the value is not yet present otherwise a reference to the existing entry is used.
"test" in your case is supposed to go to the pool and not to the young space, anyway cleanup of Strings not referenced anymore is performed there too (I cannot say if it is part of the same GC process used for the heap nor if this behavior is standard)

Categories

Resources