How does string interning work in Java 7+? [duplicate] - java

This question already has answers here:
String pool vs Constant pool
(3 answers)
Closed 8 years ago.
So, I realize the questions I'm about to ask relate to a topic that has been beaten to death time and time again, however, even after reading all of the answers and documentation I could find, I'm still kind of confused about string interning. Perhaps it's due to my lack of understanding for the JVM; perhaps it's due to the changes introduced in Java 7 depreciating many of the aforementioned answers and documentation. Either way, I'm stuck, and I'm hoping someone can help me understand the concept a bit more clearly...
String a = "text";
String b = new String("text");
In the above example, I understand that two String objects will be created. I also understand that there will be only one char array containing the sequence 't', 'e', 'x', and 't' in memory.
However, where in memory are each of the string objects actually stored?
If what I've read I've read correctly: the referent of variable a will be stored in the constant pool whereas the referent of b will be stored in the heap, right?
If that be the case, I'm confused as to how the intern pool maintains interned strings. Does it keep track of the Strings defined in the constant pool and those that have been manually interned (invoked .intern()) from the heap? Does the JVM create the string objects defined in the constant pool and load them into the intern pool? I'm confused as to how it all works...
Again, sorry for asking such confusing/asinine questions, it's just that I'm relatively new to the structure and inner-workings of the JVM and a lot of it has left my head spinning. Thanks!

There's a thing called String Memory Pool in java, when you declare:
String str1="abc";
It goes to that memory pool and not on the heap. But when you write:
String str2=new String("abc");
It creates a full fledged object on the heap, If you again write:
String str3 = "abc";
It won't create any more object on the pool, it will check the pool if this literal already exists it will assign that to it. But writing:
String str4 = new String("abc");
will again create a new object on the heap
Key point is that:
A new object will always be created on the heap as many times as you keep writing:
new String("abc");
But if you keep assigning the Strings directly without using the keyword new, it will just get referenced from the memory pool (or get created if not present in the memory pool)
intern() method finds if the string is present in the memory pool if it is not it adds it to the memory pool and returns a reference to it. so after using this method the String reference of yours is not pointing to any object on the heap, it is pointing to an object in the String Memory Pool (Also, note that the memory pool only contains unique strings).

When you say new String() you get a new Object reference so consider
String a = "text";
String b = new String("text");
System.out.println(a == b);
b = b.intern();
System.out.println(a == b);
Then first a == b will display false because they are different references. If we intern() b by saying b = b.intern() we can then test again and get true. I hope that helps. The above code has worked the same way in Java since version 1.0 (and it still works this way in Java 8 today).

Related

When exactly the object is created in string constant pool when we use new operator.?

String s = new String(“hello”);
Here two objects will be created, one in heap memory and another in the string pool.
So, what is the use of the intern() method? The string "hello" will be available in heap as well as the string pool after above statement execution
First of all. String s = new String(“hello”); creates an unnecessary String and should not be used. Next, calling s = s.intern() will ensure that the "hello" added to SCP will be returned and hence the second string that was created on the heap will be eligible for GC.
intern() adds the string to the SCP if it is not already present. It is usually used when you know that a String is used multiple times but you cannot create it using literal. So instead of creating thousands of Strings with the same value, you (which exist simultaneously), you could use intern and ensure that only one String is put in the SCP and is used in 1000 places (and all other strings with the same value on the heap are eligible for GC)
when exactly the object is created in string constant pool when we use new operator.?
It isn't. There is considerable confusion here.
The object in the string pool is created by the compiler and classloader in response to the use of a string literal, in this case "hello".
The new operator creates a new object, on the heap.
The intern() method returns a reference to an object in the string pool that either was already there or was created by the intern() call.
An object is created in the string constant pool if anything is written in double quotes and if it doesn't already exists in the string constant pool.
As for intern() method it returns the canonical representation of string.
For further understanding seehttp://www.javatpoint.com/java-string-intern
what is the use of the intern() method
intern strings gives the simplicity to compare strings with ==(faster) instead of equals function where non-intern can't use the == operator for equality.
String s = new String(“hello”);
new will assign memory to s in heap instead of internal set of unique strings which is maintained by VM ,also known as SCP.All strings found in class at the time of loading calls, are automatically interned(with strong-reference) which leads to efficient memory use.
Calling intern() on s string literal will add a weak-reference(short-time) of s in SCP and also returned that reference so GC will surely free heap memory consumed by s.
Weak-reference will also be deleted when it is no longer used hence again leads to efficient memory management.
When exactly the object is created in string constant pool
String will be added to SCP temorarly, either with direct double quotes(String s="sytax";) syntax or calling intern().
when we use new operator?
Avoid it as much as you can with strings or never.

Why jvm create new string Object each time we create string using new keyword

If jvm creates string pool for memory optimization, then why it creates new Object each time we create string using new keyword even though it exists in string pool?
... why does Java create new Object each time we create a string using the new keyword even though it exists in string pool?
Because you explicitly told it to! The new operator always creates a new object. JLS 15.9.4 says:
"The value of a class instance creation expression is a reference to the newly created object of the specified class. Every time the expression is evaluated, a fresh object is created."
For the record, it is nearly always a mistake to call new String(String) ... but in obscure cases it might be useful. It is conceivable that you might want a string for which equals returns true and == gives false. Calling new String(String) will give you that.
For older versions of Java, the substring, trim and possibly other String methods would give you a string that shared backing storage with the original. Under certain circumstances, this could result in a memory leak. Calling new String(str.trim()) for example would prevent that memory leak, at the cost of creating a fresh copy of the trimmed string. The String(String) constructor guarantees to allocate a fresh backing array as well as giving you a new String object.
This behavior of substring and trim changed in Java 7.
To give primitive style of declaration and for performance designers introduced String literals.
But when you use new keyword, then you are explicitly creating objects on heap not in constant pool.
When the objects created on heap, there is no way to share that memory with each other and they become completely strangers unlike in constant pool.
To break this barrier between heap and constant pool String interning will help you out.
string interning is a method of storing only one copy of each distinct string value, which must be immutable
Remember that constant pool also a small part of heap with some additional benefits where sharing of memory is available.
When you write
String str = new String("mystring");
then it creates a string object in heap just like other object which you create. The string literal "mystring" is stored in the string constant pool.
From the Javadocs:
A pool of strings, initially empty, is maintained privately by the
class String.
When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
To take advantage of string pooling you need to use String#intern instead of new.
Following object will be stored in String pool :
String s = "hello";
And following object will be stored in Heap (not in string pool):
String s = new String ("hello")
To enforce garbage collection!. If you need some String just one time, then there is no point in keeping it in memory (for almost forever. Which is the case with Strings in constant pool). Strings which are not in the constants pool can be GCed like any other object. So, you should only keep frequently used Strings in the constants pool (by using literals or interning them).
Strings created in the form of String literals (String s = "string";) are stored in string pool, but Strings created by invoking String constructor using new (String s = new String("string");, are not stored in string pool.

Do two objects of a class refer to the same memory location?

I am starting to learn some Java and I have been reading a lot about how memory is allocated by the JVM and consequently how this memory is freed using the garbage collector.
One thing that I have been unable to find out is if I create two new objects which are exactly the same would they refer to the same location in memory? Similar to how the String Pool works.
One thing that I have been unable to find out is if I create two new objects which are exactly the same would they refer to the same location in memory? Similar to how the String Pool works
The answer is No :
If you create two objects using the new keyword, they will never point to the same memory location.
This holds true for String objects as well. if you create two String objects using new,
the two references to these objects will point to two different
memory locations.
String literals are a special case. A String literal is stored in the String literal pool. So two String references to a String literal will always point to the same memory location.
There are other special cases in Java such as the Integer class. e.g Integer a = 100; Integer b = 100; System.out.println(a==b); This prints true because Integer values between -128 and 127 are cached by the JVM. (The values between -128 and 127 are cached by all JVM implementations. It is upto the individual JVM implementation to cache values beyond this range)
If you create the new object using the new operator, it's guaranteed to be the new object distinct from any other objects existed before. However when you create objects in indirect way (for example, using factory methods), the object existed before may be reused. A good example is Integer.valueOf(int) method which caches small numbers and returns the same instance:
Integer a = Integer.valueOf(10);
Integer b = Integer.valueOf(10);
// a and b is the same object
Integer c = new Integer(10);
Integer d = new Integer(10);
// c and d are two distinct objects
Note that even if JVM can determine that two new objects are essentially the same, it simply cannot merge them to the single object, because it would violate the language specification and may break your program later. For example, you may decide to synchronize on both of these objects later. According to the specification these synchronizations should not interfere, but if JVM merges these two objects, synchronization on the first one will have to wait for the synchronization on the second one.
No, they are not the same object. You can verify this with the == operation which checks, whether two references refer to the same object.
String s1 = new String("hello");
String s2 = new String("hello");
System.out.println(s1 == s2);
This will print false, which is the exact reason why you normally never want to compare Strings (or really, any object) with == (ref this post). The correct way to check for (content) equality is equals(Object that).

Total Number of String objects created in the process?

String str1="JAVA";
String str2="JAVA";
String str3=new String("JAVA");
String str4=new String("JAVA").intern();
2 objects will be created. str1 and str2 refer to same object because of String literal pool concept and str3 points to new object because using new operator and str4 points to the same object points by str1 and str2 because intern() method checks into string pool for string having same value.
str1=str2=str3=str4=null;
One object will be eligible for GC. That is the object created through String str3=new String("JAVA"). The first String object is always accessible through reference stored in string literal pool.
Is my explanation correct?
Total Number of String objects created in the process?
Three: The one in the intern pool created via the literal and the two you create via new String.
One object will be eligible for GC.
I count two, and possibly even all three under very special circumstances:
The one you created in this line:
String str3=new String("JAVA");
(since you later set str3 to null).
The one you created temporarily in this line:
String str4=new String("JAVA").intern();
That line creates a new String object, calls intern on it, and then saves a reference to the string from the pool. So in theory, it creates a String object that is immediately available for GC. (The JVM may be smart enough not to do that, but that's the theory.)
Possibly, eventually, under the right conditions, even the string in the intern pool. Contrary to popular belief, strings in the intern pool are available for garbage collection as we can see from the answer to this other question. Just because they're in the permgen (unless you're using Oracle's JVM 7 or later) that doesn't mean they're not GC'd, since the permgen is GC'd too. So the question becomes: When or how is a string literal used in code no longer referenced? I don't know the answer, but I think a reasonable assumption would be: When and if the class using it is unloaded from memory. According to this other answer, that can only happen if both the class and its classloader are unloaded (and may not happen even then). If the class was loaded by the system classloader (the normal case), then presumably it's never unloaded.
So almost certainly just two (#1 and #2 above), but it was fun looking into #3 as well.

Regarding string object pool in PermGC

I heard that string object pool exists in the PermGC and when a string intern is executed, it checks the pool first to see if an equivalent string object exists, if it does not exist, it creates one and returns a reference to the pooled instance.
But here is my first question.
I think that object is created on the heap, especially in the young generation first. If it survives during few garbage collections, it moves to the old generation. can anybody explain how the string object goes to the pool that exists in the Perm GC?
second question:
String s = "test";
s = "test1";
If i reassign "test1" to a reference s and continue to use "test1", does it mean that "test" (created on the young generation) will be garbage collected?
third question:
How is the string object pool related to the runtime constant pool?
Thanks.
What makes you think the interned String first goes to the young generation? The String#intern() method is a native method. It's certainly very possible for an implementation to move it right into the permgen.
Second question: if there's no other references to that "test" String instance, it's eligible for garbage collection. Same story if it's interned. Even an interned String that no longer has any active references can be garbage collected. This might not have been the case in older JVMs, though. And it can be implementation-specific, I guess.
As for the third question, I do not know. All I know is that String literals from source code are placed into the same pool. If you were to construct a String that's equal to a String constant from source and then intern it, you'd be returned the instance that was used to represent the constant. Think of this as String literals having been interned right away.
EDIT: just read your initial few sentences again and I think I see the reason for the confusion. When you call intern() on a String, and no equal String is in the pool yet, then it's not first gonna construct an equivalent String. It'll just move the instance you called intern() on to the pool rather than returning a new reference. That's how it's stated in the JavaDoc.
Strings go to intern pool in two cases:
you explicitly call intern() method on the String object
you initialize it with a literal (you give the explicit content of the String), since Java automatically interns String literals.
The pool is organized as a table, once a String is interned it is added to the pool if the value is not yet present otherwise a reference to the existing entry is used.
"test" in your case is supposed to go to the pool and not to the young space, anyway cleanup of Strings not referenced anymore is performed there too (I cannot say if it is part of the same GC process used for the heap nor if this behavior is standard)

Categories

Resources