I have a scenario like this -
String s = "abc", t="abc"; //LINE 1
System.out.println(s==t); // definitely it would return true; //LINE 2
s=s+"d"; t=t+"d"; //LINE 3
System.out.println(s==t); // output would be false; but why??
s=s.intern(); t=t.intern();
System.out.println(s==t); // it would return true;
I wanted to know why the second print statement returned false. Please provide me any reference link which explains the same.
While creating t at line 1; intern was called and it pointed to "abc" why not intern was called at line 3?
java strings are immutable.
that means that when you do something like s=s+"d" youre actually creating a whole new string, and assigning it to s.
on top of that, the compiler does constant detection and allocation, so that when you write s="abc", t="abc" the compiler re-uses the same reference and your code is effectively s=t="abc"
so you start with the exact same string instance (thanks to compiler optimization) and turn it into 2 identical yet different strings, at which point s==t is false (s.equals(t) would have been true, as it compares the contents and not the address in memory).
next up is intern(). what intern() does is looks up an identical string in the string cache and returns it. if it doesnt find an identical entry it places the argument provided into the cache and returns the argument. so s=s.intern() places s into the string cache and returns it (so s is unchanged) but the following call t=t.intern() actually returns s, so that s==t again.
Strings are "special" Java objects.
The JVM tries to reuse the same references (that's why String s = "abc", t="abc"; causes s and t to point to the same instance), however, when working on instances (like t=t+"d") a new instance gets created, thus, the references are not the same
In order to compare strings you have to use the .equals() method.
intern() causes to create a canonical representation out of the string pool inside the String class (
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern%28%29)
String s = "abc", t="abc";
s == t is true because Java automatically interns String literals. In this case the String literal "abc" has been interned and both s and t point to that same instance. Hence s == t is true.
s = s + "d"; t = t + "d";
Strings in Java are immutable. Hence what you are assigning to s and t are two new Strings that have been constructed. Therefore they do not point to the same instance. This is why s == t returns false.
s = s.intern(); t = t.intern();
Here you have forcibly interned the string in s.intern(). Since both s and t contain the same string values, the JVM sees that t is the same and makes it point to the same interned-instance as s. Hence s == t is true.
As a general note, establishing the equality of strings should be done via .equals() and not ==; == only compares references for reference-types and not values.
Java Language Specification explicitly covers this particular situation. Here is a quote from chapter 3.10.5. "String Literals":
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
As you can see, only constant expressions are interned. So, first four lines of your code are equivalent to:
String s = "abc".intern(), t="abc".intern();
System.out.println(s==t);
s=s+"d".intern(); t=t+"d".intern();
System.out.println(s==t);
Expressions s+"d" and t+"d" aren't constant and, thus, aren't interned.
JLS even provides an example with useful notes. Here is the relevant part:
package testPackage;
class Test {
public static void main(String[] args) {
String hello = "Hello", lo = "lo";
System.out.print((hello == ("Hel"+lo)));
}
}
Output: false
Note: Strings computed by concatenation at run time are newly created and therefore distinct.
Because when you concatenate Strings you generate a new object reference except when they are literal Strings.
Note that the intern of both Strings point to the same literal String object reference.
Related
This question already has answers here:
How do I compare strings in Java?
(23 answers)
Closed 3 years ago.
Why does this return true?
This seems a little odd since I have two Strings which are separate objects but are said to be aliases of each other.
public boolean stringEquals() {
String tmp1 = "hello";
String tmp2 = "hello";
return tmp1==tmp2;
}
String literals are interned by the JVM, so their reference will be the same.
In other words, each String literal used will be stored exactly once, hence their object will be similar. See
Taken from here: https://www.geeksforgeeks.org/interning-of-string/
The == operator compares the reference of the 2 objects and not their values. So unless we use .equals() we must expect to see false as these are 2 separate objects.
But this special case happens with strings. In Java Strings are immutable. Meaning their value cannot change. JVM uses this property to optimize memory. The strings in Java are stored in a separate space in memory called String Pool. Since these 2 strings are the same and that they are immutable, JVM stores "hello" in the pool and reuses the same reference for both objects. This is safe as strings are immutable. ( If you assign it something else later in code, it would create a new value elsewhere in pool and reference to it).
At the same time it is interesting to note that this isn't the case when using constructor. If we use the constructor to construct a new string, it always creates a separate object with unique reference regardless of whether the value is same or not.
String a = new String("Hello");
String b = new String("Hello");
return a==b;
Would return false.
The string pool concept applies only when using string literals without the constructor.
This is a question that I got in an interview.
I've two strings defined as
String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location. In general, when we create identical strings (without new keyword), does the content get stored in the memory only once and all the String objects with the same content just refer to the same location, without storing the string "Java" redundantly ? The hash codes of s1 and s2 are the same. But are hashcodes dependent directly on memory location of the object?
The process of combining identical strings is called "interning", and has been done for many years by lots of language compilers, but not always. The answer to the question, especially as expanded by #GennadyVanin--Novosibirsk, depends on the language and the compiler implementation. For Java, all constant strings are interned, as required by the Java Language Specification. But that's only constant string expressions, and only when they're compiled at the same time. If you have two Java strings sufficiently separated in time and space (e.g., compiled into separate JAR files), they will not be the same object. Similarly, dynamically created Java strings (e.g., the output of various toString() methods) won't be interned unless the method specifically requests it via String.intern(). And yes, all uses of an interned string will share the same memory locations - that's a big part of why strings are interned in the first place.
As to other languages, that's a bigger question, but with all the information in these answers, I'm sure you can research it on the web. Suffice it to say that there is no universal agreement on how this ought to be done.
String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location
Dumb citing §3.10.5 of Java Language Specification:
A string literal is a reference to an instance of class String
(§4.3.1, §4.3.3).
Moreover, a string literal always refers to the same instance of class
String. This is because string literals - or, more generally, strings
that are the values of constant expressions (§15.28) - are
"interned" so as to share unique instances, using the method
String.intern.
And read the comments to code example there:
This example illustrates six points:
Literal strings within the same class (§8) in the same package (§7) represent references to the same String object (§4.3.1).
Literal strings within different classes in the same package represent references to the same String object.
Literal strings within different classes in different packages likewise represent references to the same String object.
Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.
Strings computed by concatenation at run time are newly created and therefore distinct.
The result of explicitly interning a computed string is the same string as any pre-existing literal string with the same contents.
When compiler optimizes your string literals, it sees that both s1 and s2 have same value and thus you need only one string object. It's safe because String is immutable in Java.
String s1="Java";
String s2="Java";
System.out.println(s1== s2);
This gives result true because s1 and s2 points to the same object.
String Pool is the mechanism that all already defined string are stored in some 'pool' and before creating new String object compiler checks if such string is already defined.
Example.
First example
String s1 = "FirstString";
String s2 = "FirstString";
if(s1 == s2) {
//This condition matched true because java don't make separate object for these two string. Both strings point to same reference.
}
Second example
String s1= "FirstString";
String s2 = new String("FirstString");
if(s1.equals(s2)) {
//This condition true because same content.
}
if(s1 == s2) {
//This condition will be false because in this java allocate separate reference for both of them
}
Conclusion: Java check whether string exist or not. If we create the object of second string using new and have different content then its creates object and assign different reference and In case of If we don't create the object using new and have same content then its assign the same reference as first string contain.
Adding to others:
new keyword always forces to create a new object.
If you declare like below:
String s1 = "some";
String s2 = "some";
Then using String Pooling mechanism, both references s1 and s2 will refer to the same String object with the value "some".
When you have
String str1 = new String("BlaBla"); //In the heap!
String str2 = new String("BlaBla"); //In the heap!
then you're explicitly creating a String object through new operator (and constructor).
In this case you'll have each object pointing to a different storage location.
But if you have:
String str1 = "BlaBla";
String str2 = "BlaBla";
then you've implicit construction.
Two strings literals share the same storage if they have the same values, this is because Java conserves the storage of the same strings! (Strings that have the same value)
String s1="Java";
String s2="Java";
both points to same object. for more detail click here
String s1="Java";
String s2="Java";
Do they point to the same memory location?
I originally said "no" but in the case above, see the StringPool answer referred to below, it's actually yes..
"when we create identical strings (without new keyword), does the
content get stored in the memory only once and all the String objects
with the same content just refer to the same location"
...kind of see detailed answer in question "Java Strings and StringPool"
"The hash codes of s1 and s2 are the same. But are hashcodes dependent
directly on memory location of the object?"
No the hashcodes depend on the content of the String
YES,
Andrew Hare was answer on stack overflow in this link https://stackoverflow.com/a/2486195/4835894.
Basically, a string intern pool allows a runtime to save memory by preserving immutable strings in a pool so that areas of the application can reuse instances of common strings instead of creating multiple instances of it.
In Java, if one is to check if two Strings are equal, in the sense that their values are the same, he/she needs to use the equals method. E.g. :
String foo = "foo";
String bar = "bar";
if(foo.equals(bar)) { /* do stuff */ }
And if one wants to check for reference equality he needs to use the == operator on the two strings.
if( foo == bar ) { /* do stuff */ }
So my question is does the == operator have it's use for the String class ? Why would one want to compare String references ?
Edit:
What I am not asking : How to compare strings ? How does the == work ? How does the equals method work?
What I am asking is what uses does the == operator have for String class in Java ? What is the justification of not overloading it, so that it does a deep comparison ?
Imagine a thread-safe Queue<String> acting as a communication channel between a producer thread and a consumer thread. It seems perfectly reasonable to use a special String to indicate termination.
// Deliberate use of `new` to make sure JVM does not re-use a cached "EOT".
private static final String EOT = new String("EOT");
...
// Signal we're done.
queue.put(EOT);
// Meanwhile at the consumer end of the queue.
String got = queue.get();
if ( got == EOT ) {
// Tidy shutdown
}
note that this would be resilient to:
queue.put("EOT");
because "EOT" != EOT even though "EOT".equals(EOT) would be true.
What use is there for it? Not much in normal practice but you can always write a class that operates on intern()-ed strings, which can then use == to compare them.
Why it isn't overloaded is a simpler question: because there is no operator overloading in Java. (To mess things up a bit, the + operator IS sort of overloaded for strings, which was done to make string operations slightly less cumbersome. But you can argue that's just syntactic sugar and there certainly is no operator overloading in Java on the bytecode level.)
The lack of an overloaded == operator made the use of the operator much less ambiguous, at least for reference types. (That is, until the point autoboxing/unboxing was introduced, which muddies the waters again, but that's another story.) It also allows you to have classes like IdentityHashMap that will behave the same way for every object you put into it.
Having said all that, the decision to avoid operator overloading (where possible) was a fairly arbitrary design choice.
The == operator compares the reference between two objects. For example, if String x and String y refers to two different things, then the == operator will show false. However, the String.equals() method compares not if they refer to each other, but if the values (ex. "Hello", "World", etc.) are the same.
// A.java
String foo1 = "foo";
// B.java
String bar1 = "foo";
All String literals realized at compile time are added to String Constant Pool. So when you have two different String declarations in two different classes, two String objects will not be created and both foo1 & bar1 refer to the same String instance of value foo. Now that you have same String reference in two different variables, you can just check if those two strings are equal just by using == which is fast because all it does is compare the bit pattern, where as in equals() method, each character is compared and is generally used for two different String instances but same content.
In fact, if you look at equals() implementation in String class, the first check they do is Reference comparison using == because they might seem as different instances to you, but if they're String literals or if they're interned by someone else already, then all you have is a Single reference in two variables.
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
// remaining code
}
Also, == is not just for Strings, it's used to compare any two bit patterns, be it primitives or references
1."=="operation of comparison are the values of the two variables are equal, for a reference type variables is expressed by the two variables in the heap memory address is the same, namely the stack have the same content.
2."equals"Whether the two operation variables represent references to the same object in the heap, i.e. whether the contents of the same.
String s = "string1"; creates 1 reference and 1 object in pool String
s1 = "string1"; creates just 1 reference and points to what s is
pointing to.
s == s1 // true
String s2 = new String("string1"); creates 1 object in heap, one in
pool and one reference.
//Since, s2 is pointing to different object so,
s2 == s // false
s1 == s // false
Problem :
So, suppose We want to check, how many unique String object is created and stored in pool by the application while it is running,
We can have a singleton object which can have all the String references stored in an array.
From the previous examples of s, s1 and s2, finally for s and s1, 1 object is created and for s2, 1 object (in total 2).
//If we use equals method, all
s.equals(s1) // gives true
s1.equals(s2) // gives true
//So, number of references present in the array of singleton object will be our
//total number of objects created which equals to 3 // doesn't match actual
//count which is 2
we can use == to check for equality of reference, so if reference is equal, we will not increment our count of unique String object in pool, and for every non equal result, we will increment the count.
here,
for
s // count = 1
s1 == s // count remains same
s2 == s // false, so count = 1 + 1 = 2
//We get total number of unique String objects created and got stored in pool using ==
Simple answer...
Why would one want to compare String references ?
Because they want to compare String values in a very fast way.
Strings are not always interned(). String constants are, but it is possible that the string was created manually on the heap. Using the intern() on a manually created string allows us to to continue using reference comparison on our strings for value comparison.
What is the justification of not overloading it, so that it does a deep comparison ?
Because Java does not have operator overloading as a design decision
Operator '==' is a reference operator always, and equals() is a value method always. In C++ you can change that, but many feel that simply obfuscates the code.
Checking references is Faster compared to checking the entire Strings' equality.
Assume you have Large Strings (URLs or DBMS queries), a have multiple references to them. To check if they are equal, either you can check character by character or you can check if they both refer to the same object.
In fact, equals method in java first checks if the references are same and only if not goes ahead and checks character by character.
Java is full of references and hence, you might need a case where you need to check if two variables are referring to the same String/Object rather than both having each copy of the same String so that you can update string at one place and it reflects in all variables.
To do so, equals method does not help as it checks the copies to be equal as well. you need to check if they both refer to the same object and hence == comes into picture.
It seems that this was asked before and received quite a popular answer here:
Why didn't == operator string value comparison make it to Java?
The simple answer is: consistency
I guess it's just consistency, or "principle of least astonishment".
String is an object, so it would be surprising if was treated
differently than other objects.
Although this is not the fundamental reason, a usage could be to improve performances: before executing a heavy computation, "internalize" your Strings (intern()) and use only == for comparisons.
What I am asking is what uses does the == operator have for String class in Java ?
What is the justification of not overloading it, so that it does a deep comparison ?
== and equals have altogether different uses.
== confirms if there is reference-equality
Equals confirms if the objects contains are same.
Example of reference-equality is IdentityHashMap.
There could be a case in which Only the object inserting something to IdentityHashMap has the right to get/remove the object.
overloading reference-equality can lead to unwanted complexity for java.
for example
if (string)
{
do deep equality
}
else
{
do reference-equality
}
/*****************************************************************/
public class IdentityHashMap extends AbstractMap implements Map, Serializable, Cloneable
This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2). (In normal Map implementations (like HashMap) two keys k1 and k2 are considered equal if and only if (k1==null ? k2==null : k1.equals(k2)).)
This class is not a general-purpose Map implementation! While this class implements the Map interface, it intentionally violates Map's general contract, which mandates the use of the equals method when comparing objects. This class is designed for use only in the rare cases wherein reference-equality semantics are required.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
When should we use intern method of String?
what is string interning?
Please explain the inner workings of the following code:
System.out.println(new String("ABC").intern()==new String("ABC").intern());
In the above code it prints "true". But according java rules, in the case of the new operator, it always creates a new object. object.intern() method also creates an object in the string pool. So my question is, in the above code how many objects are created.
According to me, 3 new objects will created. One goes to String pool, and two anonymous objects will be created by the new operator. But i am not sure.
If i am wrong please explain.
Assuming no cleverness in the optimizer, two objects are created. (A smart enough optimizer could optimize this to just an unconditional true, in which case no objects are created.)
tl;dr version: You were almost right with your answer of 3, except that the string that goes into the String pool is not generated as part of this statement; it's already created.
First, let's get the "ABC" literal out of the way. It's represented in the runtime as a String object, but that lives in pergen and was created once in the whole life of the JVM. If this is the first class that uses that string literal, it was created at class load time (see JLS 12.5, which states that the String was created when the class was loaded, unless it previously existed).
So, the first new String("ABC") creates one String, which simply copies the reference (but does not create a new object) to the chars array and hash from the String that represents the "ABC" literal (which, again, is not created as part of this line). The .intern() method then looks to see whether an equal String is already in permgen. It is (it's just the String that represents the literal to begin with), so that's what that function returns. So, new String("ABC").intern() == "ABC". See JLS 3.10.5, and in particular:
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
The same thing exactly happens with the second occurrence of new String("ABC").intern(). And, since both intern() method return the same object as the "ABC" literal, they represent the same value.
Breaking it down a bit:
String a = new String("ABC"); // a != "ABC"
String aInterned = a.intern(); // aInterned == "ABC"
String b = new String("ABC"); // b != "ABC"
String bInterned = b.intern(); // bInterned == "ABC"
System.out.println(new String("ABC").intern()==new String("ABC").intern());
// ... is equivalent to...
System.out.println(aInterned == bInterned); // ...which is equivalent to...
System.out.println("ABC" == "ABC"); // ...which is always true.
When you call intern() method, jvm will check if the given string is there, in string pool or not. If it is there, it will return a reference to that, otherwise it will create a new string in pool and return reference to that.
In your case : System.out.println(new String("ABC").intern()==new String("ABC").intern());
The first new String("ABC").intern() will create a string "ABC" in pool.When you call new String("ABC").intern() second time, jvm will return the reference to previously created string.That is the reason you are getting true when comparing both(btn are pointing to same reference).
I believe you are right, as new operation create a new object so there are 2 anonymous objects and intern() creates a new string in the string pool only if it is not already and returns it's reference
check the following program:
Run it in sun java hostspot jvm, everything will be "true".
--------updated: got the answer by Stephen and Danie,changed the program to add string intern method-----------
how it will become, if B is separate compiled not together with A, what will happen???, for example , B is compiled and put in a jar, and put its class path when run TestStringEqual ??
Also, is this java compile time optimization, or java run time optimization, or java language specification defined ??
Also, it this program comes the same result on different VMs, or just one VM feature?
thanks
public class TestStringEqual {
public static String HELLO = "hello";
private String m_hello;
public TestStringEqual() {
m_hello = "hello";
}
public static void main(String[] args) {
String a = "hello";
String b = "hello";
System.out.println("string a== string b:" + (a == b));
System.out.println("static memebr ==a:" + (HELLO == a));
System.out.println("instance field ==a:"
+ (new TestStringEqual().getHello() == a));
System.out.println("hello in B ==a:" + (B.B_HELLO == a));
System.out.println("interned new string object in heep==a:"
+ ( new String("hello").intern() == a));
}
public String getHello() {
return this.m_hello;
}
}
class B{
public static final String B_HELLO = "he"+"llo";
}
There is really no mystery about this at all. You just need to know three basic facts about Java:
The '==' operator for object references tests if two object references are the same; i.e if they point to the same object. Reference JLS 15.21.3
All String literals with the same sequence of characters in a Java program will be represented by the same String object. Reference JLS 3.10.5 So (for example) "hello" == "hello" is comparing the same object.
Constant expressions are evaluated at compile time. Reference JLS 15.28. So (for example) "hell" + "o" is evaluated at compile time, and is therefore equivalent to the literal "hello".
These three facts are stated in the Java Language Specifications. They are sufficient to explain the "puzzling" aspects behaviour of your program, without relying on anything else.
The more detailed explanation involving the string pool, string literals being interned by the class loader, the bytecodes emitted by the compiler, etc, etc ... are just implementation details. You don't need to understand these details if you understand what the JLS is saying, and they don't really help to make the JLS clearer (IMO).
Notes:
The definition of what is and what isn't a constant expression is a little involved. Some things that you might imagine to be constant valued, are in fact not. For instance, "hello".length() is not a constant expression. However, a concatenation of two string literals is a constant expression.
The explanation of equality of string literals in the JLS does in fact mention interning as the mechanism by which this property of literals is implemented.
On the JVM level, the LDC (load constant) instruction is used to push a string literal onto the stack. For performance reasons, the string literal isn't stored in the code itself; it's stored in the constant pool of the class. The constant pool is a table which appears at the beginning of a class file containing string literals, numeric literals, field and method descriptors, and a few other things. LDC is followed by a byte specifying the string's index in the constant pool. (If one byte is not large enough, the compiler will use LDC_W, which is followed by a 16-bit offset. Hence the limit of 65,536 constants.)
If the same string literal occurs twice in the same class, javac is smart enough to create only one entry in the constant pool. When a class is loaded, the JVM creates actual String objects from the data in the constant pool. LDCs which contain the same offset into the constant pool will thus cause the same String to be pushed onto the stack. Instructions like IF_ACMPEQ (which checks for reference equality as == does) will then recognize the strings as identical.
See the JVMS for more info.
It's an immutable string (unable to be mutated or changed), not an immune one, though I suppose you could argue that it's immune from change :-)
That means you cannot change the underlying string itself, you can only assign a different string to the variable. So:
string a = "Hello";
a = "Goodbye";
doesn't change the memory where "Hello" is stored, it changes a to point to a different memory location where "Goodbye" is stored.
This allows Java to share strings for efficiency. You can even get cases where strings like "deoxyribonucleic acid" and "acid" may share space, where the latter points to a specific location within the former. Again, this is made possible by the immutable nature of such strings.
In any case, == will check to see if the strings refer to the same underlying object, not something that's often useful. If you want to see if the strings are equal, you should be using String.equals() or one of its variations.
It is fairly simple: the compiler will generate a (bytecode) constant for the string "hello" the first time it encounters it. In normal assembler it would be in the .TEXT section.
The subsequent "hello" strings will then point to that same constant, since there is no need to allocate new space or create a new constant. The reason this is so is because strings are immutable and if one is assigned a new value new memory is needed for it anyway.
It will probably not work on input, i.e. if you let a user input "hello" and ==-compare that to the compile-time hello strings you'll likely get false.
As far as a==b goes, it seems the compiler is making the shortcuts and sharing the same string object. When I declare my varuiables as follows, I get a==b is false.
String a = "hello";
String b = "hell";
String temp = "o";
if (new java.util.Random().nextDouble() < 0.5) b += temp;
else b += "o";
If I do String b = "hell"+"o"; I still get a==b as true.