Strings and Permgen memory - java

I have a map of format Map stored in a file.
This file has over 100,000 records.
The value of each entry is nearly 10k.
I load 1000 records into a map in memory , process them ,then clear the map and load the next 1000 records.
My question is :
Since the strings are stored in String pool which is in permgen
memory area , when i clear the map will the Strings be garbage
collected ?
Incase if they are not garbage collected is there any way to force
them to be garbage collected?
Is there any guarantee that if the program is running out of memory
, JVM would clean the permGen memory before throwing OutOfMemory
Exception ?

Ok.. Let's start....
Since the strings are stored in String pool which is in permgen memory
area , when i clear the map will the Strings be garbage collected ?
All strings are NOT stored in String constants pool. Only interned Strings and String literals go into the String constants pool. There is no concept of permgen in java-8. Metaspace has (almost gracefully) replaced Permgen.
If you have Strings read from a file (which are not interned), then yes your strings will get GCed. If you have String literals (and God save you if you do.. :P), the they will be GCed when the classloader which loaded your class which defined these string literals gets GCed.
Incase if they are not garbage collected is there any way to force
them to be garbage collected?
Well, You could always call System.gc() explicitly (NOT a good idea in production environment). If you are using java-8 use G1Gc and enable String deduplication.
Is there any guarantee that if the program is running out of memory ,
JVM would clean the permGen memory before throwing OutOfMemory
Exception
The GC will try its best to cleanup as much as it can. No, there is no guarantee that this would happen.

Related

Why are Java Strings allocated memory on heap?

In Java, why are String datatypes allocated memory on the heap?
The reason is simple all objects are stored on the heap. It is designed like that. String is a class and its object will be stored on the heap.
Also note that String literals were previously stored in a Heap called the "permgen" heap. Now according to the JVM Specification, the area for storing string literals is in the runtime constant pool.
Only the primitive datatypes are stored on stack.
Heap memory is used by java runtime to allocate memory to Objects and
JRE classes. Whenever we create any object, it’s always created in the
Heap space. Garbage Collection runs on the heap memory to free the
memory used by objects that doesn’t have any reference. Any object
created in the heap space has global access and can be referenced from
anywhere of the application.
A good point to quote from the JDK7
Area: HotSpot
Synopsis: In JDK 7, interned strings are no longer
allocated in the permanent generation of the Java heap, but are
instead allocated in the main part of the Java heap (known as the
young and old generations), along with the other objects created by
the application. This change will result in more data residing in the
main Java heap, and less data in the permanent generation, and thus
may require heap sizes to be adjusted. Most applications will see only
relatively small differences in heap usage due to this change, but
larger applications that load many classes or make heavy use of the
String.intern() method will see more significant differences. RFE:
6962931
By default all objects are on the heap. String has two objects, the String and the char[] it wraps. It is not unusual to find the most numerous object by type is a char[] even if you create none directly.
What is surprising is that it doesn't always create objects on the heap, but it can place objects on the stack through escape analysis. Note: it can't do this for String literals as they are stored in the String literal pool.
When user enters the Strings, its always dynamic that is the size of the string may change for each execution, hence the compiler doesn't know the exact memory requirement needed for the String. Even during the run time, the size of string is not predicted until the user enters the complete string, so no memory can be assigned on the stack, hence, it generally stores a pointer on the stack which points to the string (on the heap).

Interning a string

When we intern a string, we are making sure that all uses of that string are referring to the same instance.
I would assume that the underlying string object is in the heap.
However, where is the referring variable stored in the memory?
Does it have the same behaviour as static - wherein the reference gets stored in permgen and makes the string instance available for gc only after the classloader(and application) exits?
Up to JDK 6, Intern'ed strings are stored in the memory pool in a place called the Permanent Generation, which is an area of the JVM that is reserved for non-user objects, like Classes, Methods and other internal JVM objects. The size of this area is limited, and is usually much smaller than the heap.
From JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.
A detailed explanation of this can be found on this answer.
When we intern a string, we are making sure that all uses of that string are referring to the same instance.
Not exactly. When you do this:
String s2 = s1.intern();
what you are doing is ensuring that s2 refers to a String in the string pool. This does not affect the value in s1, or any other String references or variables. If you want other copies of the string to be interned, you need to do that explicitly ... or assign interned string references to the respective variables.
I would assume that the underlying string object is in the heap.
That is correct. It might be in the "permgen" heap or the regular heap, depending on the version of Java you are using. But it is always "in the heap".
However, where is the referring variable stored in the memory?
The "referring variable" ... i.e. the one that holds the reference that you got from calling intern() ... is no different from any other variable. It can be
a local variable or parameter (held in a stack frame),
an instance field (held in a regular heap object),
a static field (held in a permgen heap object) ... or even
a jstring variable or similar in JNI code (held "somewhere else".)
In fact, a typical JVM uses a private hash table to hold the references to interned strings, and it uses the JVM's weak reference mechanism to ensure that interned strings can be garbage collected if nothing else is using them.
Does it have the same behaviour as static - wherein the reference gets stored in permgen and makes the string instance available for gc only after the classloader(and application) exits?
Typically no ... see above.
In most Java platforms, interned Strings can be garbage collected just like other Strings. If the interned Strings are stored in "permgen" space, it may take longer for the object to be garbage collected, because "permgen" is collected infrequently. However the lifetime of an interned String is not tied to the lifetime of a classloader, etc.

How to make sure String objects are garbage collected in Java

Here is the code snippet.
method(){
String s1="abc";
String s2 = new String ("abc");
s1=null;
s2=null;
--------
---------
}
At the end is s1 & s2 objects are exists? How you will make sure these objects are garbage collected ?
Objects referenced to by s1 and s2 are eligible for garbage collection once s1=null and s2=null provided that no other references to that Object exists or when the method exits, provided that the Objects were only referenced by the local variables.An object once created uses some memory and the memory remains allocated till there are references for the use of the object.When there are no references for an object, it is assumed to be no longer needed and the memory occupied by the object *can be reclaimed.*An Object becomes eligible for Garbage collection or GC if its not reachable from any live threads or any static refrences in other words you can say that an object becomes eligible for garbage collection if its all references are null.
There are methods like System.gc() and Runtime.gc() which is used to send request of Garbage collection to JVM but it’s not guaranteed that garbage collection will happen. Java programmers can not force Garbage collection in Java; it will only trigger if JVM thinks it needs a garbage collection. Forced GC is sign of bad coding.Once should in turn always look to minimize creation of unnecessary objects and references to those objects.
They get garbage Collected after they go out of scope.
Unless you're actually having serious performance issues, I'd stop worrying about it so much and let the garbage collector do it's thing.
You should be careful though, there are some kinds of elements such as file streams, open sockets, and such that are not managed like that. you have to close those.
If the question is how to make sure, the answer is fairly simple. You can never make sure that any object will be garbage collected. Read this to understand what garbage collection really is and how to reason about it.
If the question is how to hint for a collection, then set all the references of unwanted objects to null and call System.gc(), which will request (not force) a collection. Nothing is guaranteed to be released using this method, but often it's the closest thing you can get.
If you want to do this specifically for strings, because they may contain sensitive data or something along these lines, use a char[] to store that data instead of a String, because you can change the primitive values of the array at will and erase them when you're done.
Garbage collector runs periodically(time period is JVM dependent). Java maintains table of objects and its references when reference is broken (probably by assigning null to reference) then on next execution of GC (garbage collector) object's having no reference will be deleted (If something goes wrong with GC then object will not garbage collected - very very rare condition), which is totally dependent on JVM. You can send request to JVM to run GC by using following code (Processing your request is once again JVM dependent):
Runtime.getRuntime().gc();
or
System.gc();
Programmer don't have to worry about the running GC mostly JVM will handle execution of GC. There are lots of enhancements made to the garbage collectors. Java (latest version) comes with G1(Garbage First) collector which is a server-style garbage collector which runs more effectively. G1 is a great replacement for CMS (Concurrent Mark-Sweep Collector).
If you want to know more about garbage collector then you should read the pages:
[http://docs.oracle.com/javase/7/docs/technotes/guides/vm/gc-ergonomics.html][1]
[http://docs.oracle.com/javase/7/docs/technotes/guides/vm/cms-6.html][2]
[http://docs.oracle.com/javase/7/docs/technotes/guides/vm/par-compaction-6.html][3]
String s2 = new String ("abc");
Here 'abc' will be created in regular, garbage collectible heap area.
So as soon as you make S2 null, this string object is eligible for garbage collection.
This is assuming that your programm do not have any other reference to this particular string object "abc".
String s1="abc";
In this case, "abc" will be created in special area of heap called literal pool or string pool. Making "abc" null does not make "abc" eligible for garbage collection since JVM will try to reuse this "abc" in future.
Baseline in this case is, normal garbage collection rules won't apply here.
Hope this helped. :-)

String literal memory utilization

In java there is concept of string literal pool. If I am not creating any string in my code, this memory pool is waste for me. How can I use this memory area instead of keeping it for string literal pool.
There is no "string literal pool"; string literals are interned, but that means they are just normal objects on the heap. They presumably get referenced a lot and in this way save on memory, but fundamentally they are no different than any other object.
If no string literals exist in your program (and you don't ever call String.intern) then the JVM does not allocate heap memory for such. There is no "hidden" memory area involved, and you don't need to do anything to "get access to it".
I don't think it makes any sense. Anyway, theoretically, string pool is in permgen area of Java heap. This is the same memory where JVM stores classes. By default (at least for Oracle HotSpot JVM) it is 64 M. You can try to configure this area with two HotSpot JVM options: -XX:MaxPermSize and -XX:PermSize. The less permgen the more memory for objects.
How can I use this memory area instead of keeping it for string literal pool.
It is the area reserved by JVM and being increased (upto some threshold) as program runs. In short, you can't use it for some other purpose.

String Constant Pool memory sector and garbage collection

I read this question on the site How is the java memory pool divided? and i was wondering to which of these sectors does the "String Constant Pool" belongs?
And also does the String literals in the pool ever get GCed?
The intern() method returns the base link of the String literal from the pool.
If the pool does gets GCed then wouldn't it be counter-productive to the idea of the string pool? New String literals would again be created nullifying the GC.
(It is assuming that only a specific set of literals exist in the pool, they never go obsolete and sooner or later they will be needed again)
As far as I know String literals end up in the "Perm Gen" part of non-Heap JVM memory. Perm Gen space is only examined during Full GC runs (not Partials).
In early JVM's (and I confess I had to look this up because I wasn't sure), String literals in the String Pool never got GC'ed. In the newer JVM's, WeakReferences are used to reference the Strings in the pool, so interned Strings can actually get GC'ed, but only during Full Garbage collections.
Reading the JavaDoc for String.intern() doesn't give hints to the implementation, but according to this page, the interned strings are held by a weak reference. This means that if the GC detects that there are no references to the interned string except for the repository that holds interned strings then it is allowed to collect them. Of course this is transparent to external code so unless you are using weak references of your own you'll never know about the garbage collection.
String pooling
String pooling (sometimes also called as string canonicalisation) is a
process of replacing several String objects with equal value but
different identity with a single shared String object. You can achieve
this goal by keeping your own Map (with possibly soft
or weak references depending on your requirements) and using map
values as canonicalised values. Or you can use String.intern() method
which is provided to you by JDK.
At times of Java 6 using String.intern() was forbidden by many
standards due to a high possibility to get an OutOfMemoryException if
pooling went out of control. Oracle Java 7 implementation of string
pooling was changed considerably. You can look for details in
http://bugs.sun.com/view_bug.do?bug_id=6962931 and
http://bugs.sun.com/view_bug.do?bug_id=6962930.
String.intern() in Java 6
In those good old days all interned strings were stored in the PermGen
– the fixed size part of heap mainly used for storing loaded classes
and string pool. Besides explicitly interned strings, PermGen string
pool also contained all literal strings earlier used in your program
(the important word here is used – if a class or method was never
loaded/called, any constants defined in it will not be loaded).
The biggest issue with such string pool in Java 6 was its location –
the PermGen. PermGen has a fixed size and can not be expanded at
runtime. You can set it using -XX:MaxPermSize=96m option. As far as I
know, the default PermGen size varies between 32M and 96M depending on
the platform. You can increase its size, but its size will still be
fixed. Such limitation required very careful usage of String.intern –
you’d better not intern any uncontrolled user input using this method.
That’s why string pooling at times of Java 6 was mostly implemented in
the manually managed maps.
String.intern() in Java 7
Oracle engineers made an extremely important change to the string
pooling logic in Java 7 – the string pool was relocated to the heap.
It means that you are no longer limited by a separate fixed size
memory area. All strings are now located in the heap, as most of other
ordinary objects, which allows you to manage only the heap size while
tuning your application. Technically, this alone could be a sufficient
reason to reconsider using String.intern() in your Java 7 programs.
But there are other reasons.
String pool values are garbage collected
Yes, all strings in the JVM string pool are eligible for garbage
collection if there are no references to them from your program roots.
It applies to all discussed versions of Java. It means that if your
interned string went out of scope and there are no other references to
it – it will be garbage collected from the JVM string pool.
Being eligible for garbage collection and residing in the heap, a JVM
string pool seems to be a right place for all your strings, isn’t it?
In theory it is true – non-used strings will be garbage collected from
the pool, used strings will allow you to save memory in case then you
get an equal string from the input. Seems to be a perfect memory
saving strategy? Nearly so. You must know how the string pool is
implemented before making any decisions.
source.
String literals don't get created into the pool at runtime. I don't know for sure if they get GC'd or not, but I suspect that they do not for two reasons:
It would be immensely complex to detect in the general case when a literal will not be used anymore
There is likely a static code segment where it is stored for performance. The rest of the data is likely built around it, where the boundaries are also static
Strings, even though they are immutable, are still objects like any other in Java. Objects are created on the heap and Strings are no exception. So, Strings that are part of the "String Literal Pool" still live on the heap, but they have references to them from the String Literal Pool.
For more please refer this link
`http://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html`
Edited Newly :
public class ImmutableStrings
{
public static void main(String[] args)
{
String one = "someString";
String two = new String("someString");
one = two = null;
}
}
Just before the main method ends, how many objects are available for garbage collection? 0? 1? 2?
The answer is 1. Unlike most objects, String literals always have a reference to them from the String Literal Pool. That means that they always have a reference to them and are, therefore, not eligible for garbage collection.
neither of our local variables, one or two, refer to our String object, there is still a reference to it from the String Literal Pool. Therefore, the object is not elgible for garbage collection.The object is always reachable through use of the intern() method

Categories

Resources