String Constant Pool memory sector and garbage collection - java

I read this question on the site How is the java memory pool divided? and i was wondering to which of these sectors does the "String Constant Pool" belongs?
And also does the String literals in the pool ever get GCed?
The intern() method returns the base link of the String literal from the pool.
If the pool does gets GCed then wouldn't it be counter-productive to the idea of the string pool? New String literals would again be created nullifying the GC.
(It is assuming that only a specific set of literals exist in the pool, they never go obsolete and sooner or later they will be needed again)

As far as I know String literals end up in the "Perm Gen" part of non-Heap JVM memory. Perm Gen space is only examined during Full GC runs (not Partials).
In early JVM's (and I confess I had to look this up because I wasn't sure), String literals in the String Pool never got GC'ed. In the newer JVM's, WeakReferences are used to reference the Strings in the pool, so interned Strings can actually get GC'ed, but only during Full Garbage collections.

Reading the JavaDoc for String.intern() doesn't give hints to the implementation, but according to this page, the interned strings are held by a weak reference. This means that if the GC detects that there are no references to the interned string except for the repository that holds interned strings then it is allowed to collect them. Of course this is transparent to external code so unless you are using weak references of your own you'll never know about the garbage collection.

String pooling
String pooling (sometimes also called as string canonicalisation) is a
process of replacing several String objects with equal value but
different identity with a single shared String object. You can achieve
this goal by keeping your own Map (with possibly soft
or weak references depending on your requirements) and using map
values as canonicalised values. Or you can use String.intern() method
which is provided to you by JDK.
At times of Java 6 using String.intern() was forbidden by many
standards due to a high possibility to get an OutOfMemoryException if
pooling went out of control. Oracle Java 7 implementation of string
pooling was changed considerably. You can look for details in
http://bugs.sun.com/view_bug.do?bug_id=6962931 and
http://bugs.sun.com/view_bug.do?bug_id=6962930.
String.intern() in Java 6
In those good old days all interned strings were stored in the PermGen
– the fixed size part of heap mainly used for storing loaded classes
and string pool. Besides explicitly interned strings, PermGen string
pool also contained all literal strings earlier used in your program
(the important word here is used – if a class or method was never
loaded/called, any constants defined in it will not be loaded).
The biggest issue with such string pool in Java 6 was its location –
the PermGen. PermGen has a fixed size and can not be expanded at
runtime. You can set it using -XX:MaxPermSize=96m option. As far as I
know, the default PermGen size varies between 32M and 96M depending on
the platform. You can increase its size, but its size will still be
fixed. Such limitation required very careful usage of String.intern –
you’d better not intern any uncontrolled user input using this method.
That’s why string pooling at times of Java 6 was mostly implemented in
the manually managed maps.
String.intern() in Java 7
Oracle engineers made an extremely important change to the string
pooling logic in Java 7 – the string pool was relocated to the heap.
It means that you are no longer limited by a separate fixed size
memory area. All strings are now located in the heap, as most of other
ordinary objects, which allows you to manage only the heap size while
tuning your application. Technically, this alone could be a sufficient
reason to reconsider using String.intern() in your Java 7 programs.
But there are other reasons.
String pool values are garbage collected
Yes, all strings in the JVM string pool are eligible for garbage
collection if there are no references to them from your program roots.
It applies to all discussed versions of Java. It means that if your
interned string went out of scope and there are no other references to
it – it will be garbage collected from the JVM string pool.
Being eligible for garbage collection and residing in the heap, a JVM
string pool seems to be a right place for all your strings, isn’t it?
In theory it is true – non-used strings will be garbage collected from
the pool, used strings will allow you to save memory in case then you
get an equal string from the input. Seems to be a perfect memory
saving strategy? Nearly so. You must know how the string pool is
implemented before making any decisions.
source.

String literals don't get created into the pool at runtime. I don't know for sure if they get GC'd or not, but I suspect that they do not for two reasons:
It would be immensely complex to detect in the general case when a literal will not be used anymore
There is likely a static code segment where it is stored for performance. The rest of the data is likely built around it, where the boundaries are also static

Strings, even though they are immutable, are still objects like any other in Java. Objects are created on the heap and Strings are no exception. So, Strings that are part of the "String Literal Pool" still live on the heap, but they have references to them from the String Literal Pool.
For more please refer this link
`http://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html`
Edited Newly :
public class ImmutableStrings
{
public static void main(String[] args)
{
String one = "someString";
String two = new String("someString");
one = two = null;
}
}
Just before the main method ends, how many objects are available for garbage collection? 0? 1? 2?
The answer is 1. Unlike most objects, String literals always have a reference to them from the String Literal Pool. That means that they always have a reference to them and are, therefore, not eligible for garbage collection.
neither of our local variables, one or two, refer to our String object, there is still a reference to it from the String Literal Pool. Therefore, the object is not elgible for garbage collection.The object is always reachable through use of the intern() method

Related

Unreachable literal created during "new String(..)"?

So a new String("abc"); creates an object in Heap & a literal "abc" in the String pool as per many of the answers I found. Since the new keyword was used, there should be no references to the String literal in the pool.
Does this mean -
a. The literal will be GC'ed in the next run (assuming no other references were created to the literal later on)?
b. If (the answer to a is) yes, it sounds fairly easy for JVM to free the literal in the pool as soon as the object is created, instead of waiting for GC. Why is this not done?
c. If (the answer to a is) no, what would be the reason for the an unreachable literal to not be GC'ed?
Since the new keyword was used, there should be no references to the String literal in the pool.
That is not correct. There is probably1 a reachable reference to the String object that corresponds to the literal. My recollection is that the reference is stored in the same "frame" that holds the static fields for the class. In practice, this reference will continue to be reachable until the enclosing class is unloaded by the garbage collector. (That typically never happens.)
So the answers are:
a. The literal will be GC'ed in the next run (assuming no other references were created to the literal later on)?
No.
c. If (the answer to a is) no, what would be the reason for the an unreachable literal to not be GC'ed?
The String object corresponding to the literal is NOT unreachable. For example, it needs to be reachable if there is any possibility that the new String("abc") statement could be executed again.
Since it is difficult for the JVM runtime to determine that a statement (that was determined to be reachable at compile time) won't be executed more than once at runtime, and since there is little performance benefit in doing that, the runtime assumes that all string literals need to be reachable for the lifetime of the Java classes2 that define them.
Finally, as #Holger points out, it makes no practical difference when String literal objects become unreachable. We know that they will be present (in some form) if they are needed. That's all that really matters.
1 - The actual behavior is highly implementation dependent. In early JVMs, the String objects for class literals were interned eagerly. Later on this changed to lazy interning. It would even be possible to re-intern a String object every time the string literal is used, though this would be very inefficient in general. Then we need to consider various things that optimizer could do. For example, it could notice that the String object for the literal never escapes and is used in a way that doesn't actually require interning. Or it could notice that the entire expression could be optimized away.
2 - I mean classes. The things that correspond to a Class object. Not instances of those classes.
Since new String("abc"); is object and not interned it will be garbage collected in next GC run.
However GC won't be immediately running just to collect this string object due to various performance reasons & availability of space.
Using System.gc(); also doesn't guarantee that it'll run (this is just suggestion to the GC to run.)
GC runs with many reason few are like below (also depends on VM)
More Memory allocation in specific generation is failling.
Heap allocation Or Objects presence reaching threshold etc.

Does String Pool in Java behaves like LRU cache?

Strings are immutable and are managed in String pool. I wish to know as how this pool is managed. If there are large number of String literals being used in an application, ( I understand String builder should be used when modifications like append, replace operations are more ) then Pool enhances the performance of the application by not recreating the new String objects again and again but using the same objects present in the pool, this is possible as Strings are immutable and doing so has no ill effect.
My question is as how this String Pool is managed. If in case there is huge frequency of some 'k' Strings and there may be few other String objects which are once created and not being used again. There may be other newer String literals being used.
In cases like these does String Pool behaves like LRU cache, holding
the references to the latest used literals and removing the older not
used strings from the pool ?
Does String pool has a size or can we control it in our application ?
Edit :
Usually we give size to the custom object pools we implement. I wonder why feature like LRU is not there for Sting Pools. This could have been a feature. In case of large Strings also there would not have been problem. But I feel its the way it has been implemented but I just wanted to know as why its not there, I mean its not there for some valid reason, having this feature would have resulted in some ill effects. If some one could throw some light on those ill effects, it will be good.
String pool is not an LRU cache, since entries aren't taken out unless GC'd.
There are 2 ways to get entries in the String pool. String literals go there automatically, and new entries can be added with String.intern() unless the String already exists in the pool, in which case a reference to it is returned.
The values are garbage collected if there are no more references to them, which for String literals (e.g. String constants) can be a bit harder than ones that were intern()ed.
The implementation has changed a lot between Java 6 and Java 8 (and even between minor versions). The default size of the String pool is apparently 1009, but it can be changed with -XX:StringTableSize=N (since Java 7) parameter. This size is the table size of an internal hash table, so it can be tuned higher if you're using a lot of intern() (for String literals, it should be plenty). The size affects only the speed of intern() call, not the amount of Strings you can intern.
Basically unless you're using intern() heavily (presumably for a good reason), there's very little reason to worry about the String pool. Especially since it's no longer stored in PermGen, so it can't cause OutOfMemoryErrors very easily anymore.
Source.

Java String Immutability storage when String object is changed

I understood that if a String is initialized with a literal then it is allotted a space in String Pool and if initialized with the new Keyword it create a String's object. But I am confused with a case which is written below.
My question is what if a String is created with the new keyword and then it value is updated with a literal?
E.g.
String s = new String("Value1"); -- Creates a new object in heap space
then what if write the next statement as below.
s = "value2";
So my question is,
1 Will it create a String literal in a String Pool or it will update the value of that object?
2 If it creates a new literal in String Pool what will be happened to the currently existed object? Will it be destroyed or it will be there until the garbage collector is called.
This is a small string if the string is say of the thousands of characters then I am just worried about the space it uses. So my key question is for the space.
Will it immediately free the space from the heap after assigning the literal?
Can anyone explain what what value goes where from the first statement to the second and what will happened to the memory area (heap and String Pool).
Modifying Strings
The value is not updated when running
s = "value2";
In Java, except for the primitive types, all other variables are references to objects. This means that only s is pointing to a new value.
Immutability guarantees that the state of an object cannot change after construction. In other words, there are no means to modify the content of any String object in Java. If you for instance state s = s+"a"; you have creates a new string, that somehow stores the new text.
Garbage collection
This answer already provides an in-depth answer. Below a short summary if you don't want to read the full answer, but it omits some details.
By default new String(...) objects are not interned and thus the normal rules of garbage collection apply. These are just ordinary objects.
The constant strings in your code, which are interned are typically never removed as it is likely that eventually you will refer back to these.
There is however a side-note in the answer that sometimes classes are dynamically (un)loaded, in which case the literals can be removed from the pool.
To answer your additional questions:
Will it immediately free the space from the heap after assigning the literal?
No, that would not be really efficient: the garbage collector needs to make an analysis about which objects to remove. It is possible that you shared the references to your old string with other objects, so it is not guaranteed that you can recycle the object. Furthermore there is not much wrong with storing data no longer useful, as long as you don't need to ask additional memory to the operating system (compare it with you computer, as long as you can store all your data on your hard disk drive, you don't really have to worry about useless files, from the moment you would have to buy an additional drive, you will probably try to remove some files first). The analysis requires some computational effort. In general a garbage collector only runs when it (nearly) runs out of memory. So you shouldn't worry much about memory.
Can anyone explain what what value goes where from the first statement to the second and what will happened to the memory area (heap and String Pool).
Your first string:
String s = new String("Value1");
is a reference to the heap. If you call the command, it will allocate space on the heap for the string.
Now if you call:
s = "value2";
"value2" is an element of the String Pool, it will remain there until your program ends.
Since you don't have a reference to your old string (value1), anymore. That object is a candidate for collection. If the garbage collector later walks by, it will remove the object from the heap and mark the space as free.
If you need to change a string, you can always create a new one that contains
the modifications.
Java defines a peer class of String, called StringBuffer, which allows strings to be altered.

Garbage collection on intern'd strings, String Pool, and perm-space

After exploring java's string internals I've grown confused on what is referred to as the "perm space." My understanding initially of it was that it held String literals as well as class meta data as explained in this question.
I've also read about the String.intern() method and that it places Strings into the String Pool returning a reference to unique instance of it. It is my understanding that this is the same string pool holding String literals that exists in the JVM's perm-space. It didn't seem possible to me that the "perm-space" could be modifiable, (it is permanent after all, yes?). But Then I found this question where the top voted comment by EJP on the accepted answer explains that
Intern'd strings have been GC-able for quite some years now.
Implying that the GC runs on the perm-space which doesn't seem very permanent. How does this reconcile? Does the GC check everything in the perm-space? Does the GC check everything in the string pool including string literals from the source? Is there a second string pool for intern'd strings? Does the GC know only to look over intern'd strings when collecting? Or is this comment mistaken and intern'ing a string prevents it from ever being GC'd (which I hope is not the case)?
String literals are interned. As of Java 7, the HotSpot JVM puts interned Strings in the heap, not permgen.
Prior to java 7, hotspot put interned Strings in permgen. However, interned Strings in permgen were garbage collected. Apparently, Class objects in permgen are also collectable, so everything in permgen is collectable, though permgen collection might not be enabled by default in some old JVMs.
String literals, being interned, would be a reference held by the declaring Class object to the String object in the intern pool. So the interned literal String would only be collected if the Class object that referred to it were also collected.

Interning a string

When we intern a string, we are making sure that all uses of that string are referring to the same instance.
I would assume that the underlying string object is in the heap.
However, where is the referring variable stored in the memory?
Does it have the same behaviour as static - wherein the reference gets stored in permgen and makes the string instance available for gc only after the classloader(and application) exits?
Up to JDK 6, Intern'ed strings are stored in the memory pool in a place called the Permanent Generation, which is an area of the JVM that is reserved for non-user objects, like Classes, Methods and other internal JVM objects. The size of this area is limited, and is usually much smaller than the heap.
From JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
A detailed explanation of this can be found on this answer.
When we intern a string, we are making sure that all uses of that string are referring to the same instance.
Not exactly. When you do this:
String s2 = s1.intern();
what you are doing is ensuring that s2 refers to a String in the string pool. This does not affect the value in s1, or any other String references or variables. If you want other copies of the string to be interned, you need to do that explicitly ... or assign interned string references to the respective variables.
I would assume that the underlying string object is in the heap.
That is correct. It might be in the "permgen" heap or the regular heap, depending on the version of Java you are using. But it is always "in the heap".
However, where is the referring variable stored in the memory?
The "referring variable" ... i.e. the one that holds the reference that you got from calling intern() ... is no different from any other variable. It can be
a local variable or parameter (held in a stack frame),
an instance field (held in a regular heap object),
a static field (held in a permgen heap object) ... or even
a jstring variable or similar in JNI code (held "somewhere else".)
In fact, a typical JVM uses a private hash table to hold the references to interned strings, and it uses the JVM's weak reference mechanism to ensure that interned strings can be garbage collected if nothing else is using them.
Does it have the same behaviour as static - wherein the reference gets stored in permgen and makes the string instance available for gc only after the classloader(and application) exits?
Typically no ... see above.
In most Java platforms, interned Strings can be garbage collected just like other Strings. If the interned Strings are stored in "permgen" space, it may take longer for the object to be garbage collected, because "permgen" is collected infrequently. However the lifetime of an interned String is not tied to the lifetime of a classloader, etc.

Categories

Resources