What is the purpose of Java's String.intern()? - java

I know there are two ways of creating String in Java:
String a = "aaa";
String b = new String("bbb");
With the first way Java will definitely create a String object in the string pool and make a refer to it. (Assume "aaa" wan't in the pool before.)
With the second method, an object will be created in the heap, but will jvm also create an object in the string pool?
In this post Questions about Java's String pool, #Jesper said:
If you do this:
String s = new String("abc");
then there will be one String object in the pool, the one that represents the literal "abc", > and there will be a separate String object, not in the pool, that contains a copy of the > content of the pooled object.
If that's true, then every time with the new String("bbb");, a object "bbb" is created in the pool, which means by either way above, java will always create a string object in the pool. Then what is intern() used for ? In the docs http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern(), it says:
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
That means there are cases that a string is not in the pool, is that possible ? Which one is true ?

As you know that String is an immutable object in Java programming language, which means once constructed can not be altered. Due to this, JVM has the ability to maintain a literal pool which is helpful to reduce the memory usage and to increase the performance. Each time when a String literal is used JVM checks the literal pool. If the literal is already available, the same reference would be returned. If the literal is not available, a new String object will be created and added in the literal pool.
This theory is applied when you try to create a String like a primitive or a literal/constant.
String str = "bbb";
But when you create a new String object
String str = new String("bbb");
the above mentioned rules are overridden and a new instance is created always.
But the intern API in the String class can be used to pick the String reference from the literal pool even though you create a String using new operator. Please check the below given example. Although the str3 is created using new operator since we used the intern method JVM picked up the reference from the literal pool.
public class StringInternExample {
public static void main(final String args[]) {
final String str = "bbb";
final String str1 = "bbb";
final String str2 = new String("bbb");
final String str3 = new String("bbb").intern();
System.out.println("str == str1 : "+(str == str1));
System.out.println("str == str2 : "+(str == str2));
System.out.println("str == str3 : "+(str == str3));
}
}
Output of above code:
str == str1 : true
str == str2 : false
str == str3 : true
You can have a look: Confusion on string immutability
Source of answer: http://ourownjava.com/java/java-string-immutability-and-intern-method/
Shishir

There are essentially two ways that our String objects can enter in to the pool:
Using a literal in source code like "bbb".
Using intern.
intern is for when you have a String that's not otherwise from the pool. For example:
String bb = "bbb".substring(1); // substring creates a new object
System.out.println(bb == "bb"); // false
System.out.println(bb.intern() == "bb"); // true
Or slightly different:
System.out.println(new String("bbb").intern() == "bbb"); // true
new String("bbb") does create two objects...
String fromLiteral = "bbb"; // in pool
String fromNewString = new String(fromLiteral); // not in pool
...but it's more like a special case. It creates two objects because "bbb" refers to an object:
A string literal is a reference to an instance of class String [...].
Moreover, a string literal always refers to the same instance of class String.
And new String(...) creates a copy of it.
However, there are many ways String objects are created without using a literal, such as:
All the String methods that perform some kind of mutation. (substring, split, replace, etc.)
Reading a String from some kind of input such as a Scanner or Reader.
Concatenation when at least one operand is not a compile-time constant.
intern lets you add them to the pool or retrieve an existing object if there was one. Under most circumstances interning Strings is unnecessary but it can be used as an optimization because:
It lets you compare with ==.
It can save memory because duplicates can be garbage collected.

Yes, new String("abc") will create a new object in memory, and thus it is advised to avoid it. Please have a look at item 5 of Josh Bloch's Effective Java, "Avoid creating unnecessary objects" where it is better explained:
As an extreme example of what not to do, consider this statement:
String s = new String("stringette"); // DON'T DO THIS!
The statement
creates a new String instance each time it is executed, and none of
those object creations is necessary. The argument to the String
constructor ("stringette") is itself a String instance, functionally
identical to all of the objects created by the constructor. If this
usage occurs in a loop or in a frequently invoked method, millions of
String instances can be created needlessly. The improved version is
simply the following:
String s = "stringette";
This version uses a
single String instance, rather than creating a new one each time it is
executed. Furthermore, it is guaranteed that the object will be reused
by any other code running in the same virtual machine that happens to
contain the same string literal [JLS, 3.10.5].
http://uet.vnu.edu.vn/~chauttm/e-books/java/Effective.Java.2nd.Edition.May.2008.3000th.Release.pdf

With the second method, an object will be created in the heap, but will jvm also create an object in the string pool?
Yes, but it is the string literal "bbb" which ensures the interned string1. The string constructor creates a new string object which is a copy with the same length and content - the newly created string is not automatically interned.
If that's true, then every time with the new String("bbb");, a object "bbb" is created in the pool, which means by either way above, java will always create a string object in the pool. Then what is intern() used for ?
Only string literals are automatically interned. Other string objects must be manually interned, if such is the desired behavior.
That means there are cases that a string is not in the pool, is that possible ?
With the exception of manual calls to String.intern, only string literals result in interned strings.
While I would recommend using a specialized collection for such cases, interning may be useful where it can be used to avoid creating extra duplicate objects. Some use-cases where interning can be beneficial - as in, the same string value can appear many times - is in JSON keys and XML element/attribute names.
1 This is trivial to reason, consider:
String _b = "bbb"; // string from string literal (this is interned)
String b = new String(_b); // create a NEW string via "copy constructor"
b == _b // -> false (new did NOT return an interned string)
b.equals(_b) // -> true (but it did return an equivalent string)
b.intern() == _b // -> true (which interns to .. the same string object)

Related

Why the string 'java' cann't be added to 'String Pool'

The following code confuses me, could anyone explain why the two tests behave differently? Why does the String comparison in the first test return false while the comparison in the second test returns true?
public class Student {
/**
* Why the string "java" doesn't added to the 'String Pool' by intern() method ?
*/
#Test
public void test1() {
String str1 = new String("ja") + new String("va");
str1.intern();
String str2 = "java";
// Result:false
System.out.println("Result:" + (str1 == str2));
}
/**
* Any other strings will be added to 'String Pool' as expected after intern() is invoked.
*/
#Test
public void test2() {
String str1 = new String("ja1") + new String("va");
str1.intern();
String str2 = "ja1va";
// Result:true
System.out.println("Result:" + (str1 == str2));
}
You're basically checking whether a string was already in the string pool. The string "java" isn't added to the pool by calling intern in your first piece of code because it's already in the string pool. In each method, your code:
Creates a new string
Calls intern on the newly created string (but ignores the result; almost always a bad idea, and you can detect the existence of a previous value in the string pool easily by using the return value)
Compares the new string with a string literal, which will always use the result that's now in the string pool
Now the call to intern will add the target string to the pool if it doesn't already exist, so your comparison will return true if and only if the new string value was not previously in the string pool. This is equivalent to testing whether intern returns a different reference to the target of the call.
For any given string reference, there are three possibilities:
That exact reference is present in the string pool already. (That can't be the case in your code, because you're creating a new string.)
A reference to an equal string is present in the string pool. In that case, intern() will return the existing reference.
No equal string is present in the string pool. In that case, the target of the call will be added to the string pool, and the same reference returned.
What you're seeing is the result of other code putting things in the string pool - quite possibly as part of loading classes. Here's an example to demonstrate that:
public class Test {
public static void main(String... args) {
checkInterned("ja", "va");
checkInterned("ja", "va.lang");
checkInterned("ja", "va.other");
checkInterned("Int", "eger");
checkInterned("abc", "def");
checkInterned("Te", "st");
checkInterned("Te", "st2");
checkInterned("check", "Interned");
checkInterned("check", "Interned2");
}
public static void checkInterned(String start, String end) {
String x = start + end;
String y = x.intern();
System.out.println(x + " was interned already? " + (x != y));
}
}
Output:
java was interned already? true
java.lang was interned already? true
java.other was interned already? false
Integer was interned already? true
abcdef was interned already? false
Test was interned already? true
Test2 was interned already? false
checkInterned was interned already? true
checkInterned2 was interned already? false
So the interned values are:
java
java.lang
Integer
Test
checkInterned
They're all names that would naturally come up when loading classes (including the one being run).
I suspect that "java" is only a special case here in that there may well be lots of code within the JRE that checks whether a string starts with "java" as a reserved name.
This doesn't indicate anything about "java" being a keyword though - it's just "a string that's already in the string pool". You don't need to treat it any differently.
The first thing to realize is that str1.intern() doesn't change the str1 reference. It returns the interned reference. So if you wanted str1 to now be that reference, you'd have to do:
str1 = str1.intern();
So, why the difference? In a nutshell, because the JVM already has a string "java" in its thread pool, because of various internals.
In the first example, str1 starts off as a newly instantiated String (as I think you understand). You then call str1.intern(), which returns the interned reference of a pre-existing String "java", but you don't do anything with that reference. When you then compare str1 == "java", you're comparing the reference to the newly instantiated object with the reference to the interned object, and get false.
In the second example, "ja1va" does not exist in the string pool to start off. When you call str1.intern(), that method puts "ja1va" into the pool, with its current reference (that is, str1) as the canonical reference. When you subsequently refer to the "ja1va" literal string, the JVM looks to see whether it's already in the pool, sees that it is, and uses it. Thus, you get true.
In other words, in the first case, you're creating a new String object and then not actually grabbing its interned equivalent. In the second case, you're creating a new String object, defining that as the interned reference, and then reloading it via a string literal.

Every time I use String, does it create a new String object?

Let's say that I need to iteratively retrieve a value of the same key from a Java hashmap.
for(int i=0; i<INTEGER.MAX; i++)
map.get("KEY");
In this case, is the "KEY" string created every time I call map.get("KEY")? I was wondering if it's always better to have a String constant, or it doesn't matter.
No. String constants are interned automatically, so any identical string literals all reference the same object in memory.
Some more information on this: http://www.xyzws.com/Javafaq/what-is-string-literal-pool/3
An example of this:
String s1 = "Test";
String s2 = "Test";
String s3 = new String("Test");
s1 == s2;//Evaluates to true, because they are the same object (both created with string literals)
s1 == s3;//Evaluates to false, because they are different objects containing identical data
Yes/No Answer depends on how you create String Objects. Below are the four scenarios I can think of as of now.
Yes Cases
new String() always creates new Object. It is not internedn(Doesn't go to String pool) so you
can not take it back from memory.
Concatenation ( "a" + "b" ) always creates new String Object and it is not interned (Doesn't go to String pool).
No Cases
String a ="aa"; if already available it retrieves from the pool, when not available it creates a new object which is interned also (Goes to String pool as well)
new String().intern() or "aa".intern(); if already available it retrieves from pool , when not available it creates new object which
is interned also (Goes to String pool as well).
is the "KEY" string created every time I call map.get("KEY")?
No.
Java Strings are immutable, which allows the Java compiler to use a single instance for all string literals.
That is: all identical string literals in your program will reference a single string object.
In the rare cases you need identical strings to be wrapped in two separate objects, you must explicitly
instantiate a String object:
String s1 = "bla";
String s2 = "bla";
// s1 == s2
String s3 = new String ("bla");
// s1 != s3

string instantiation vs stringbuffer instantiation

I am not able to figure out that if
String ab = "hello"; //straight initialization
String ab_1 = new String ("hello_1"); //initializing using new
both work, but
StringBuffer bfr = new StringBuffer("hi"); //works only with new
works only if created with new.
Why it is that String can be instantiated directly but StringBuffer needs new operator. Can someone explain me the main reason please.
All objects need to be instantiated with new. Only primitives can be instantiated from a literal (int i = 0;).
The only exceptions are:
strings, which allow a special initialisation construct:
String s = "abc"; //can be instantiated from a literal, like primitives
null instantiation: Object o = null;
It is defined in the Java Language Specification #3.10:
A literal is the source code representation of a value of a primitive type, the String type, or the null type.
Note: arrays also have a dedicated initialisation patterm , but that's not a literal:
int[][] a = { { 00, 01 }, { 10, 11 } };
Using String s1 = "hello"; and String s2 = new String("hello"); have a subtle difference.
public static void main(String[] arg ) {
String s1 = "Java";
String s2 = "Java";
String s3 = new String("Java");
System.out.println(s1==s2); //true
System.out.println(s1==s3); //false
StringBuilder sb = new StringBuilder(25); //initial capacikacity
sb = new StringBuilder(10);
sb.append(s1).append(" uses immutable strings.");
sb.setCharAt(20, 'S');
System.out.println(sb);
}
In the above code, "Java" is known as a String literal. In order to save memory, both times this appears in the code, it is the same String literal, so s1 and s2 actually refer to the same object in memory. While s1.equals(s3) would be true, they do not reference the same object in memory as shown above.
In practice, we always use .equals to compare Strings and they are immutable, so we cannot change the data s1 refers to (at least not easily). But if we were able to change the data referenced by s1, then s2 would change along with it.
StringBuilder does let you modify the underlying data: we often use it to append one String to another as illustrated above. We can be glad that StringBuilder sb2 = "what?" is illegal because in the case of StringBuilders, having two of them reference the same data (meaning sb1==sb2) is more likely to lead to problems where a change in sb1 causes an unexpected change in sb2.
String ab = "hello"; //straight initialization
String ac = "hello"; // create one more reference ac
String is a special case when you use the new keyword, a new String object will be created. Note that objects are always on the heap - the string pool is not a separate memory area that is separate from the heap.The string pool is like a cache.
It is like this because Strings are something heavily used by java and creating String objects using new key word is expensive also that's why java has introduced StringPool concept.
If you declare one variable ac with same value , java will not create new object(String) it will simply refer to the same object(hello) which is already there in pool.
String ab_1 = new String ("hello_1"); //initializing using new
It will simple create object in memory and ab_1 will refer to that object.
Strings are quite a special case in Java (this is not really a good thing in my opinion, but that doesn't matter).
Strings, unlike other objects, can be instantiated directly like they were constants.
When you do this, the String constant is added to the String constant pool, and handled like it was a primitive. Let me give an example.
String a = "abc";
String b = "abc";
When you instantiate a as a "primitive" string, it gets added to the pool, when you instantiate b, the same object is returned from the pool, so if you do this:
a == b;
You'll get... true, since it's actually the same object. If you instantiate both with new, you'll get false, since you're comparing the references of two different Objects (new forces the creation of a distinct object).
Strings are handle specially by java compiler. When you type a string literal such as "hello", the compiler creates a new String object for you internally.
No such thing is performed for StringBuffers (although Java uses StringBuffers internally for another purpose - for implementing string concatenation).
See Difference between string object and string literal for more details.
Other pointers:
String, StringBuffer, and StringBuilder
+ operator for String in Java
There is also one more difference based on 'where' strings are 'stored' - memory or string constant pool.
To make Java more memory efficient, the JVM sets aside a special area
of memory called the "String constant pool." When the compiler
encounters a String literal, it checks the pool to see if an identical
String already exists. If a match is found, the reference to the new
literal is directed to the existing String, and no new String literal
object is created. (The existing String simply has an additional
reference.)
String s = "abc"; // creates one String object and one reference variable
In this simple case, "abc" will go in the pool and s will refer to it.
String s = new String("abc"); // creates two objects, and one reference variable
In this case, because we used the new keyword, Java will create a new String object
in normal (nonpool) memory, and s will refer to it. In addition, the literal "abc" will
be placed in the pool.
String is a mutable class and has in-build constructors which can create String object from the string literal.
There is no exception in case of String also (like creating it like primitive .e.g int i =0). String also executes constructor to initialize following (just difference is its abstract and not directly visible) :
String str = "ABC";
Becuase here "ABC" also represent one String object which can not be used directly by programmer but it resides in the String pool. And when this statement will be executed JVM will internally call the private constructor to create object using the "ABC" object which resides in the pool.
Basically, since Strings are used so much, Java offers a shorthand solution to instantiating a String.
Instead of always using this,
String str = new String ("hello");
Java makes it able to do this:
String str = "hello";

String object creation using new and its comparison with intern method

I read in Kathy Sierra book that when we create String using new operator like String s = new String("abc") In this case, because we used the new keyword, Java will create a new String object in normal (nonpool) memory, and s will refer to it. In addition, literal "abc" will be placed in the pool.
intern() says that if String pool already contains a string then the string from the pool is returned Otherwise, the String object is added to the pool and a reference to this String object is returned.
If string "abc" when created using new also placed the string in the pool, then wht does intern() says that string from the pool is returned if String pool contains the string otherwise the string object is added to the pool.
Also I want to know if we create a String using new then actually how many objects get created?
TL;DR: If you ever really need to do new String("abc"), you'll know you need to and you'll know why. It's so rare that it's almost valid to say you never need to. Just use "abc".
The long version:
When you have the code new String("abc") the following things occur at various times:
When the class containing that code is loaded, if a string with the characters "abc" is not already in the intern pool, it's created and put there.
When the new String("abc") code is run:
A reference to the "abc" string from the intern pool is passed into the String constructor.
A new String object is created and initialized by copying the characters from the String passed into the constructor.
The new String object is returned to you.
If string "abc" when created using new also placed the string in the pool, then why does intern() says that string from the pool is returned if String pool contains the string otherwise the string object is added to the pool.
Because that's what intern does. Note that calling intern on a string literal is a no-op; string literals are all interned automatically. E.g.:
String s1 = "abc"; // Get a reference to the string defined by the literal
String s2 = s1.intern(); // No-op
System.out.println(s1 == s2); // "true"
System.out.println(s1 == "abc"); // "true", all literals are interned automatically
Also I want to know if we create a String using new then actually how many objects get created?
You create at least one String object (the new, non-interned one), and possibly two (if the literal wasn't already in the pool; but again, that bit happens earlier, when the class file's literals are loaded):
String s1 = "abc"; // Get a reference to the string defined by the literal
String s2 = new String(s1); // Create a new `String` object (guaranteed)
System.out.println(s1 == s2); // "false"
String s3 = s2.intern(); // Get the interned version of the string with these characters
System.out.println(s1 == s3); // "true"
String Pool is a pool of string references. Objects are created in Heap only.
When using new String("abc").intern() or using method like String s = "abc"; String pool is checked if there is an reference existing which refers to "abc".
In case reference for "abc" already exists in pool and .intern() is called on the reference referencing to an String object created using new String("abc"), then object created by new String("abc") is eligible for garbage collection. See below code for more clarity.
public static void main(String[] args) {
String s = new String("abc");
String a = s;
System.out.println(s==a);// true
String b = "abc";
s = s.intern();
System.out.println(s==a);// false
}

What is String pool in Java? [duplicate]

This question already has answers here:
What is the Java string pool and how is "s" different from new String("s")? [duplicate]
(5 answers)
Closed 9 years ago.
I am confused about StringPool in Java. I came across this while reading the String chapter in Java. Please help me understand, in layman terms, what StringPool actually does.
This prints true (even though we don't use equals method: correct way to compare strings)
String s = "a" + "bc";
String t = "ab" + "c";
System.out.println(s == t);
When compiler optimizes your string literals, it sees that both s and t have same value and thus you need only one string object. It's safe because String is immutable in Java.
As result, both s and t point to the same object and some little memory saved.
Name 'string pool' comes from the idea that all already defined string are stored in some 'pool' and before creating new String object compiler checks if such string is already defined.
I don't think it actually does much, it looks like it's just a cache for string literals. If you have multiple Strings who's values are the same, they'll all point to the same string literal in the string pool.
String s1 = "Arul"; //case 1
String s2 = "Arul"; //case 2
In case 1, literal s1 is created newly and kept in the pool. But in case 2, literal s2 refer the s1, it will not create new one instead.
if(s1 == s2) System.out.println("equal"); //Prints equal.
String n1 = new String("Arul");
String n2 = new String("Arul");
if(n1 == n2) System.out.println("equal"); //No output.
http://p2p.wrox.com/java-espanol/29312-string-pooling.html
Let's start with a quote from the virtual machine spec:
Loading of a class or interface that contains a String literal may create a new String object (ยง2.4.8) to represent that literal. This may not occur if the a String object has already been created to represent a previous occurrence of that literal, or if the String.intern method has been invoked on a String object representing the same string as the literal.
This may not occur - This is a hint, that there's something special about String objects. Usually, invoking a constructor will always create a new instance of the class. This is not the case with Strings, especially when String objects are 'created' with literals. Those Strings are stored in a global store (pool) - or at least the references are kept in a pool, and whenever a new instance of an already known Strings is needed, the vm returns a reference to the object from the pool. In pseudo code, it may go like that:
1: a := "one"
--> if(pool[hash("one")] == null) // true
pool[hash("one") --> "one"]
return pool[hash("one")]
2: b := "one"
--> if(pool[hash("one")] == null) // false, "one" already in pool
pool[hash("one") --> "one"]
return pool[hash("one")]
So in this case, variables a and b hold references to the same object. IN this case, we have (a == b) && (a.equals(b)) == true.
This is not the case if we use the constructor:
1: a := "one"
2: b := new String("one")
Again, "one" is created on the pool but then we create a new instance from the same literal, and in this case, it leads to (a == b) && (a.equals(b)) == false
So why do we have a String pool? Strings and especially String literals are widely used in typical Java code. And they are immutable. And being immutable allowed to cache String to save memory and increase performance (less effort for creation, less garbage to be collected).
As programmers we don't have to care much about the String pool, as long as we keep in mind:
(a == b) && (a.equals(b)) may be true or false (always use equals to compare Strings)
Don't use reflection to change the backing char[] of a String (as you don't know who is actualling using that String)
When the JVM loads classes, or otherwise sees a literal string, or some code interns a string, it adds the string to a mostly-hidden lookup table that has one copy of each such string. If another copy is added, the runtime arranges it so that all the literals refer to the same string object. This is called "interning". If you say something like
String s = "test";
return (s == "test");
it'll return true, because the first and second "test" are actually the same object. Comparing interned strings this way can be much, much faster than String.equals, as there's a single reference comparison rather than a bunch of char comparisons.
You can add a string to the pool by calling String.intern(), which will give you back the pooled version of the string (which could be the same string you're interning, but you'd be crazy to rely on that -- you often can't be sure exactly what code has been loaded and run up til now and interned the same string). The pooled version (the string returned from intern) will be equal to any identical literal. For example:
String s1 = "test";
String s2 = new String("test"); // "new String" guarantees a different object
System.out.println(s1 == s2); // should print "false"
s2 = s2.intern();
System.out.println(s1 == s2); // should print "true"

Categories

Resources