Interning a string - java

When we intern a string, we are making sure that all uses of that string are referring to the same instance.
I would assume that the underlying string object is in the heap.
However, where is the referring variable stored in the memory?
Does it have the same behaviour as static - wherein the reference gets stored in permgen and makes the string instance available for gc only after the classloader(and application) exits?

Up to JDK 6, Intern'ed strings are stored in the memory pool in a place called the Permanent Generation, which is an area of the JVM that is reserved for non-user objects, like Classes, Methods and other internal JVM objects. The size of this area is limited, and is usually much smaller than the heap.
From JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
A detailed explanation of this can be found on this answer.

When we intern a string, we are making sure that all uses of that string are referring to the same instance.
Not exactly. When you do this:
String s2 = s1.intern();
what you are doing is ensuring that s2 refers to a String in the string pool. This does not affect the value in s1, or any other String references or variables. If you want other copies of the string to be interned, you need to do that explicitly ... or assign interned string references to the respective variables.
I would assume that the underlying string object is in the heap.
That is correct. It might be in the "permgen" heap or the regular heap, depending on the version of Java you are using. But it is always "in the heap".
However, where is the referring variable stored in the memory?
The "referring variable" ... i.e. the one that holds the reference that you got from calling intern() ... is no different from any other variable. It can be
a local variable or parameter (held in a stack frame),
an instance field (held in a regular heap object),
a static field (held in a permgen heap object) ... or even
a jstring variable or similar in JNI code (held "somewhere else".)
In fact, a typical JVM uses a private hash table to hold the references to interned strings, and it uses the JVM's weak reference mechanism to ensure that interned strings can be garbage collected if nothing else is using them.
Does it have the same behaviour as static - wherein the reference gets stored in permgen and makes the string instance available for gc only after the classloader(and application) exits?
Typically no ... see above.
In most Java platforms, interned Strings can be garbage collected just like other Strings. If the interned Strings are stored in "permgen" space, it may take longer for the object to be garbage collected, because "permgen" is collected infrequently. However the lifetime of an interned String is not tied to the lifetime of a classloader, etc.

Related

Unreachable literal created during "new String(..)"?

So a new String("abc"); creates an object in Heap & a literal "abc" in the String pool as per many of the answers I found. Since the new keyword was used, there should be no references to the String literal in the pool.
Does this mean -
a. The literal will be GC'ed in the next run (assuming no other references were created to the literal later on)?
b. If (the answer to a is) yes, it sounds fairly easy for JVM to free the literal in the pool as soon as the object is created, instead of waiting for GC. Why is this not done?
c. If (the answer to a is) no, what would be the reason for the an unreachable literal to not be GC'ed?
Since the new keyword was used, there should be no references to the String literal in the pool.
That is not correct. There is probably1 a reachable reference to the String object that corresponds to the literal. My recollection is that the reference is stored in the same "frame" that holds the static fields for the class. In practice, this reference will continue to be reachable until the enclosing class is unloaded by the garbage collector. (That typically never happens.)
So the answers are:
a. The literal will be GC'ed in the next run (assuming no other references were created to the literal later on)?
No.
c. If (the answer to a is) no, what would be the reason for the an unreachable literal to not be GC'ed?
The String object corresponding to the literal is NOT unreachable. For example, it needs to be reachable if there is any possibility that the new String("abc") statement could be executed again.
Since it is difficult for the JVM runtime to determine that a statement (that was determined to be reachable at compile time) won't be executed more than once at runtime, and since there is little performance benefit in doing that, the runtime assumes that all string literals need to be reachable for the lifetime of the Java classes2 that define them.
Finally, as #Holger points out, it makes no practical difference when String literal objects become unreachable. We know that they will be present (in some form) if they are needed. That's all that really matters.
1 - The actual behavior is highly implementation dependent. In early JVMs, the String objects for class literals were interned eagerly. Later on this changed to lazy interning. It would even be possible to re-intern a String object every time the string literal is used, though this would be very inefficient in general. Then we need to consider various things that optimizer could do. For example, it could notice that the String object for the literal never escapes and is used in a way that doesn't actually require interning. Or it could notice that the entire expression could be optimized away.
2 - I mean classes. The things that correspond to a Class object. Not instances of those classes.
Since new String("abc"); is object and not interned it will be garbage collected in next GC run.
However GC won't be immediately running just to collect this string object due to various performance reasons & availability of space.
Using System.gc(); also doesn't guarantee that it'll run (this is just suggestion to the GC to run.)
GC runs with many reason few are like below (also depends on VM)
More Memory allocation in specific generation is failling.
Heap allocation Or Objects presence reaching threshold etc.

Why are Java Strings allocated memory on heap?

In Java, why are String datatypes allocated memory on the heap?
The reason is simple all objects are stored on the heap. It is designed like that. String is a class and its object will be stored on the heap.
Also note that String literals were previously stored in a Heap called the "permgen" heap. Now according to the JVM Specification, the area for storing string literals is in the runtime constant pool.
Only the primitive datatypes are stored on stack.
Heap memory is used by java runtime to allocate memory to Objects and
JRE classes. Whenever we create any object, it’s always created in the
Heap space. Garbage Collection runs on the heap memory to free the
memory used by objects that doesn’t have any reference. Any object
created in the heap space has global access and can be referenced from
anywhere of the application.
A good point to quote from the JDK7
Area: HotSpot
Synopsis: In JDK 7, interned strings are no longer
allocated in the permanent generation of the Java heap, but are
instead allocated in the main part of the Java heap (known as the
young and old generations), along with the other objects created by
the application. This change will result in more data residing in the
main Java heap, and less data in the permanent generation, and thus
may require heap sizes to be adjusted. Most applications will see only
relatively small differences in heap usage due to this change, but
larger applications that load many classes or make heavy use of the
String.intern() method will see more significant differences. RFE:
6962931
By default all objects are on the heap. String has two objects, the String and the char[] it wraps. It is not unusual to find the most numerous object by type is a char[] even if you create none directly.
What is surprising is that it doesn't always create objects on the heap, but it can place objects on the stack through escape analysis. Note: it can't do this for String literals as they are stored in the String literal pool.
When user enters the Strings, its always dynamic that is the size of the string may change for each execution, hence the compiler doesn't know the exact memory requirement needed for the String. Even during the run time, the size of string is not predicted until the user enters the complete string, so no memory can be assigned on the stack, hence, it generally stores a pointer on the stack which points to the string (on the heap).

how long can a String survive in the literal pool [duplicate]

I am reading about Garbage collection and i am getting confusing search results when i search for String literal garbage collections.
I need clarification on following points:
If a string is defined as literal at compile time [e.g: String str = "java"] then will it be garbage collected?
If use intern method [e.g: String str = new String("java").intern()] then will it be garbage collected? Also will it be treated differently from String literal in point 1.
Some places it is mentioned that literals will be garbage collected only when String class will be unloaded? Does it make sense because I don't think String class will ever be unloaded.
If a string is defined as literal at compile time [e.g: String str = "java";] then will it be garbage collected?
Probably not. The code objects will contain one or more references to the String objects that represent the literals. So as long as the code objects are reachable, the String objects will be to.
It is possible for code objects to become unreachable, but only if they were dynamically loaded ... and their classloader is destroyed.
If I use the intern method [e.g: String str = new String("java").intern()] then will it be garbage collected?
The object returned by the intern call will be the same object that represents the "java" string literal. (The "java" literal is interned at class loading time. When you then intern the newly constructed String object in your code snippet, it will lookup and return the previously interned "java" string.)
However, interned strings that are not identical with string literals can be garbage collected once they become unreachable. The PermGen space is garbage collected on all recent HotSpot JVMs. (Prior to Java 8 ... which drops PermGen entirely.)
Also will it be treated differently from string literal in point 1.
No ... because it is the same object as the string literal.
And indeed, once you understand what is going on, it is clear that string literals are not treated specially either. It is just an application of the "reachability" rule ...
Some places it is mentioned that literals will be garbage collected only when String class will be unloaded? Does it make sense because I don't think the String class will ever be unloaded.
You are right. It doesn't make sense. The sources that said that are incorrect. (It would be helpful if you posted a URL so that we can read what they are saying for ourselves ...)
Under normal circumstances, string literals and classes are all allocated into the JVM's permanent generation ("PermGen"), and usually won't ever be collected. Strings that are interned (e.g. mystring.intern()) are stored in a memory pool owned by the String class in permgen, and it was once the case that aggressive interning could cause a space leak because the string pool itself held a reference to every string, even if no other references existed. Apparently this is no longer true, at least as of JDK 1.6 (see, e.g., here).
For more on permgen, this is a decent overview of the topic. (Note: that link goes to a blog associated with a product. I don't have any association with the blog, the company, or the product, but the blog entry is useful and doesn't have much to do with the product.)
The literal string will remain in memory as long as the program is in memory.
str will be garbage collected, but the literal it is created from will not.
That makes perfect sense, since the string class is unloaded when the program is unloaded.
intern() method checks the availability of the object in String pool. If the object/literal is available then reference of it will be returned. If the literal is not there in the pool then object is loaded in the perm area (String pool) and then reference to it will be return. We have to use intern() method judiciously.

String Constant Pool memory sector and garbage collection

I read this question on the site How is the java memory pool divided? and i was wondering to which of these sectors does the "String Constant Pool" belongs?
And also does the String literals in the pool ever get GCed?
The intern() method returns the base link of the String literal from the pool.
If the pool does gets GCed then wouldn't it be counter-productive to the idea of the string pool? New String literals would again be created nullifying the GC.
(It is assuming that only a specific set of literals exist in the pool, they never go obsolete and sooner or later they will be needed again)
As far as I know String literals end up in the "Perm Gen" part of non-Heap JVM memory. Perm Gen space is only examined during Full GC runs (not Partials).
In early JVM's (and I confess I had to look this up because I wasn't sure), String literals in the String Pool never got GC'ed. In the newer JVM's, WeakReferences are used to reference the Strings in the pool, so interned Strings can actually get GC'ed, but only during Full Garbage collections.
Reading the JavaDoc for String.intern() doesn't give hints to the implementation, but according to this page, the interned strings are held by a weak reference. This means that if the GC detects that there are no references to the interned string except for the repository that holds interned strings then it is allowed to collect them. Of course this is transparent to external code so unless you are using weak references of your own you'll never know about the garbage collection.
String pooling
String pooling (sometimes also called as string canonicalisation) is a
process of replacing several String objects with equal value but
different identity with a single shared String object. You can achieve
this goal by keeping your own Map (with possibly soft
or weak references depending on your requirements) and using map
values as canonicalised values. Or you can use String.intern() method
which is provided to you by JDK.
At times of Java 6 using String.intern() was forbidden by many
standards due to a high possibility to get an OutOfMemoryException if
pooling went out of control. Oracle Java 7 implementation of string
pooling was changed considerably. You can look for details in
http://bugs.sun.com/view_bug.do?bug_id=6962931 and
http://bugs.sun.com/view_bug.do?bug_id=6962930.
String.intern() in Java 6
In those good old days all interned strings were stored in the PermGen
– the fixed size part of heap mainly used for storing loaded classes
and string pool. Besides explicitly interned strings, PermGen string
pool also contained all literal strings earlier used in your program
(the important word here is used – if a class or method was never
loaded/called, any constants defined in it will not be loaded).
The biggest issue with such string pool in Java 6 was its location –
the PermGen. PermGen has a fixed size and can not be expanded at
runtime. You can set it using -XX:MaxPermSize=96m option. As far as I
know, the default PermGen size varies between 32M and 96M depending on
the platform. You can increase its size, but its size will still be
fixed. Such limitation required very careful usage of String.intern –
you’d better not intern any uncontrolled user input using this method.
That’s why string pooling at times of Java 6 was mostly implemented in
the manually managed maps.
String.intern() in Java 7
Oracle engineers made an extremely important change to the string
pooling logic in Java 7 – the string pool was relocated to the heap.
It means that you are no longer limited by a separate fixed size
memory area. All strings are now located in the heap, as most of other
ordinary objects, which allows you to manage only the heap size while
tuning your application. Technically, this alone could be a sufficient
reason to reconsider using String.intern() in your Java 7 programs.
But there are other reasons.
String pool values are garbage collected
Yes, all strings in the JVM string pool are eligible for garbage
collection if there are no references to them from your program roots.
It applies to all discussed versions of Java. It means that if your
interned string went out of scope and there are no other references to
it – it will be garbage collected from the JVM string pool.
Being eligible for garbage collection and residing in the heap, a JVM
string pool seems to be a right place for all your strings, isn’t it?
In theory it is true – non-used strings will be garbage collected from
the pool, used strings will allow you to save memory in case then you
get an equal string from the input. Seems to be a perfect memory
saving strategy? Nearly so. You must know how the string pool is
implemented before making any decisions.
source.
String literals don't get created into the pool at runtime. I don't know for sure if they get GC'd or not, but I suspect that they do not for two reasons:
It would be immensely complex to detect in the general case when a literal will not be used anymore
There is likely a static code segment where it is stored for performance. The rest of the data is likely built around it, where the boundaries are also static
Strings, even though they are immutable, are still objects like any other in Java. Objects are created on the heap and Strings are no exception. So, Strings that are part of the "String Literal Pool" still live on the heap, but they have references to them from the String Literal Pool.
For more please refer this link
`http://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html`
Edited Newly :
public class ImmutableStrings
{
public static void main(String[] args)
{
String one = "someString";
String two = new String("someString");
one = two = null;
}
}
Just before the main method ends, how many objects are available for garbage collection? 0? 1? 2?
The answer is 1. Unlike most objects, String literals always have a reference to them from the String Literal Pool. That means that they always have a reference to them and are, therefore, not eligible for garbage collection.
neither of our local variables, one or two, refer to our String object, there is still a reference to it from the String Literal Pool. Therefore, the object is not elgible for garbage collection.The object is always reachable through use of the intern() method

Where are static methods and static variables stored in Java?

For example:
class A {
static int i=0;
static int j;
static void method() {
// static k=0; can't use static for local variables only final is permitted
// static int L;
}
}
Where will these variables be stored in Java, in heap or in stack memory? How are they stored?
Static methods (in fact all methods) as well as static variables are stored in the PermGen section of the heap, since they are part of the reflection data (class related data, not instance related). As of Java 8 PermGen has been replaced by MetaSpace and as per JEP 122 it only holds meta-data while static fields are stored in the heap.
Note that this mostly applies to Oracle's Hotspot JVM and others that are based on it. However, not every JVM has PermGen or Metaspace like Eclipse OpenJ9.
Update for clarification:
Note that only the variables and their technical values (primitives or references) are stored in PermGen space.
If your static variable is a reference to an object that object itself is stored in the normal sections of the heap (young/old generation or survivor space). Those objects (unless they are internal objects like classes etc.) are not stored in PermGen space.
Example:
static int i = 1; //the value 1 is stored in the PermGen section
static Object o = new SomeObject(); //the reference(pointer/memory address) is stored in the PermGen section, the object itself is not.
A word on garbage collection:
Do not rely on finalize() as it's not guaranteed to run. It is totally up to the JVM to decide when to run the garbage collector and what to collect, even if an object is eligible for garbage collection.
Of course you can set a static variable to null and thus remove the reference to the object on the heap but that doesn't mean the garbage collector will collect it (even if there are no more references).
Additionally finalize() is run only once, so you have to make sure it doesn't throw exceptions or otherwise prevent the object to be collected. If you halt finalization through some exception, finalize() won't be invoked on the same object a second time.
A final note: how code, runtime data etc. are stored depends on the JVM which is used, i.e. HotSpot might do it differently than JRockit and this might even differ between versions of the same JVM. The above is based on HotSpot for Java 5 and 6 (those are basically the same) since at the time of answering I'd say that most people used those JVMs. Due to major changes in the memory model as of Java 8, the statements above might not be true for Java 8 HotSpot - and I didn't check the changes of Java 7 HotSpot, so I guess the above is still true for that version, but I'm not sure here.
Prior to Java 8:
The static variables were stored in the permgen space(also called the method area).
PermGen Space is also known as Method Area
PermGen Space used to store 3 things
Class level data (meta-data)
interned strings
static variables
From Java 8 onwards
The static variables are stored in the Heap itself.From Java 8 onwards the PermGen Space have been removed and new space named as MetaSpace is introduced which is not the part of Heap any more unlike the previous Permgen Space. Meta-Space is present on the native memory (memory provided by the OS to a particular Application for its own usage) and it now only stores the class meta-data.
The interned strings and static variables are moved into the heap itself.
For official information refer : JEP 122:Remove the Permanent Gen Space
Class variables(Static variables) are stored as part of the Class object associated with that class. This Class object can only be created by JVM and is stored in permanent generation.
Also some have answered that it is stored in non heap area which is called Method Area. Even this answer is not wrong. It is just a debatable topic whether Permgen Area is a part of heap or not. Obviously perceptions differ from person to person. In my opinion we provide heap space and permgen space differently in JVM arguments. So it is a good assumption to treat them differently.
Another way to see it
Memory pools are created by JVM memory managers during runtime. Memory pool may belong to either heap or non-heap memory.A run time constant pool is a per-class or per-interface run time representation of the constant_pool table in a class file. Each runtime constant pool is allocated from the Java virtual machine’s method area and Static Variables are stored in this Method Area.
Also this non-heap is nothing but perm gen area.Actually Method area is part of perm gen.(Reference)
This is a question with a simple answer and a long-winded answer.
The simple answer is the heap. Classes and all of the data applying to classes (not instance data) is stored in the Permanent Generation section of the heap.
The long answer is already on stack overflow:
There is a thorough description of memory and garbage collection in the JVM as well as an answer that talks more concisely about it.
It is stored in the heap referenced by the class definition. If you think about it, it has nothing to do with stack because there is no scope.
In addition to the Thomas's answer , static variable are stored in non heap area which is called Method Area.
As static variables are class level variables, they will store " permanent generation " of heap memory.
Please look into this for more details of JVM. Hoping this will be helpful
static variables are stored in the heap
When we create a static variable or method it is stored in the special area on heap: PermGen(Permanent Generation), where it lays down with all the data applying to classes(non-instance data). Starting from Java 8 the PermGen became - Metaspace. The difference is that Metaspace is auto-growing space, while PermGen has a fixed Max size, and this space is shared among all of the instances. Plus the Metaspace is a part of a Native Memory and not JVM Memory.
You can look into this for more details.
In real world or project we have requirement in advance and needs to create variable and methods inside the class , On the basis of requirement we needs to decide whether we needs to create
Local ( create n access within block or method constructor)
Static,
Instance Variable( every object has its own copy of it),
=>2. Static Keyword will be used with variable which will going to be same for particular class throughout for all objects,
e.g in selenium : we decalre webDriver as static => so we do not need to create webdriver again and again for every test case
Static Webdriver driver
(but parallel execution it will cause problem, but thats another case);
Real world scenario => If India is class, then flag, money would be same for every Indian, so we might take them as static.
Another example: utility method we always declare as static b'cos it will be used in different test cases.
Static stored in CMA( PreGen space)=PreGen (Fixed memory)changed to Metaspace after Java8 as now its growing dynamically
As of Java 8 , PermGen space is Obsolete. Static Methods,Primitives and Reference Variables are stored in Java MetaSpace. The actual objects reside in the JAVA heap. Since static methods never get out of reference they are never Garbage collected both from MetaSpace and the HEAP.

Categories

Resources