String equality check by only reference - java

As we know if we do a check like below the output will be equal.
String s1 = "stackoverflow";
String s2 = "stackoverflow";
if(s1==s2){
System.out.println("equal");
}
So my question is if i am not using new operator in my application to create String and all are strings are literals so can i use only reference equality as given above? Thanks in advance.
N.B: i am writing a crawler so i need to check whether i have already visited the given url that i am currently holding. I am using murmur hash which gives me a long for every url but there are collision so i need to check for the content if the url string if there is a hash collision. Hence for performance i am thinking of just comparing the reference equality of two string urls. And i am using jsoup for html parsing.

if i am not using new operator in my application to create String and all are strings are literals so can i use only reference equality as given above?
If you are 100% sure that all the strings you are dealing with are plain string literals or compile-time constant expressions then yes. The Java Language Specification §15.28 mandates that
Compile-time constant expressions of type String are always "interned" so as to share unique instances, using the method String.intern.
But if you get strings from anywhere else (e.g. reading them from a web page retrieved by your crawler, or building them using concatenation expressions that are not compile-time constants) then you must use .equals to compare them by value rather than by reference or .intern() them explicitly.
It's not always obvious whether an expression is a compile-time constant or not:
String s1 = "Stack";
String s2 = s1 + "Overflow"; // not a CTC
but
final String s1 = "Stack";
String s2 = s1 + "Overflow"; // _is_ a CTC, because s1 is a "constant variable"
// (final, with an initializer that is itself a CTC)

No, you cannot. The VM does not guarantee described behavior, it is an optimization. To guarantee this behavior, you need to call String#intern().
This and only this will guarantee reference equality.
But do a performance test, String#equals() is probably faster :-)

So my question is if i am not using new operator in my application to create String and all are strings are literals so can i use only reference equality as given above?
Yes , for sure. Since they are resolved at compile time , so no issues.
But keep in mind the below scenario
String s3= s2;
String s4= s1+""; //resolved at run time
System.out.println(s3==s4); //false
System.out.println(s3.equals(s4));//true
So until unless you are sure that the strings not going to be change later, you can safely use ==.

Related

String literal and object relation in Java [duplicate]

This is a question that I got in an interview.
I've two strings defined as
String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location. In general, when we create identical strings (without new keyword), does the content get stored in the memory only once and all the String objects with the same content just refer to the same location, without storing the string "Java" redundantly ? The hash codes of s1 and s2 are the same. But are hashcodes dependent directly on memory location of the object?
The process of combining identical strings is called "interning", and has been done for many years by lots of language compilers, but not always. The answer to the question, especially as expanded by #GennadyVanin--Novosibirsk, depends on the language and the compiler implementation. For Java, all constant strings are interned, as required by the Java Language Specification. But that's only constant string expressions, and only when they're compiled at the same time. If you have two Java strings sufficiently separated in time and space (e.g., compiled into separate JAR files), they will not be the same object. Similarly, dynamically created Java strings (e.g., the output of various toString() methods) won't be interned unless the method specifically requests it via String.intern(). And yes, all uses of an interned string will share the same memory locations - that's a big part of why strings are interned in the first place.
As to other languages, that's a bigger question, but with all the information in these answers, I'm sure you can research it on the web. Suffice it to say that there is no universal agreement on how this ought to be done.
String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location
Dumb citing §3.10.5 of Java Language Specification:
A string literal is a reference to an instance of class String
(§4.3.1, §4.3.3).
Moreover, a string literal always refers to the same instance of class
String. This is because string literals - or, more generally, strings
that are the values of constant expressions (§15.28) - are
"interned" so as to share unique instances, using the method
String.intern.
And read the comments to code example there:
This example illustrates six points:
Literal strings within the same class (§8) in the same package (§7) represent references to the same String object (§4.3.1).
Literal strings within different classes in the same package represent references to the same String object.
Literal strings within different classes in different packages likewise represent references to the same String object.
Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.
Strings computed by concatenation at run time are newly created and therefore distinct.
The result of explicitly interning a computed string is the same string as any pre-existing literal string with the same contents.
When compiler optimizes your string literals, it sees that both s1 and s2 have same value and thus you need only one string object. It's safe because String is immutable in Java.
String s1="Java";
String s2="Java";
System.out.println(s1== s2);
This gives result true because s1 and s2 points to the same object.
String Pool is the mechanism that all already defined string are stored in some 'pool' and before creating new String object compiler checks if such string is already defined.
Example.
First example
String s1 = "FirstString";
String s2 = "FirstString";
if(s1 == s2) {
//This condition matched true because java don't make separate object for these two string. Both strings point to same reference.
}
Second example
String s1= "FirstString";
String s2 = new String("FirstString");
if(s1.equals(s2)) {
//This condition true because same content.
}
if(s1 == s2) {
//This condition will be false because in this java allocate separate reference for both of them
}
Conclusion: Java check whether string exist or not. If we create the object of second string using new and have different content then its creates object and assign different reference and In case of If we don't create the object using new and have same content then its assign the same reference as first string contain.
Adding to others:
new keyword always forces to create a new object.
If you declare like below:
String s1 = "some";
String s2 = "some";
Then using String Pooling mechanism, both references s1 and s2 will refer to the same String object with the value "some".
When you have
String str1 = new String("BlaBla"); //In the heap!
String str2 = new String("BlaBla"); //In the heap!
then you're explicitly creating a String object through new operator (and constructor).
In this case you'll have each object pointing to a different storage location.
But if you have:
String str1 = "BlaBla";
String str2 = "BlaBla";
then you've implicit construction.
Two strings literals share the same storage if they have the same values, this is because Java conserves the storage of the same strings! (Strings that have the same value)
String s1="Java";
String s2="Java";
both points to same object. for more detail click here
String s1="Java";
String s2="Java";
Do they point to the same memory location?
I originally said "no" but in the case above, see the StringPool answer referred to below, it's actually yes..
"when we create identical strings (without new keyword), does the
content get stored in the memory only once and all the String objects
with the same content just refer to the same location"
...kind of see detailed answer in question "Java Strings and StringPool"
"The hash codes of s1 and s2 are the same. But are hashcodes dependent
directly on memory location of the object?"
No the hashcodes depend on the content of the String
YES,
Andrew Hare was answer on stack overflow in this link https://stackoverflow.com/a/2486195/4835894.
Basically, a string intern pool allows a runtime to save memory by preserving immutable strings in a pool so that areas of the application can reuse instances of common strings instead of creating multiple instances of it.

How JVM allocates memory for String in java?

I have a scenario like this -
String s = "abc", t="abc"; //LINE 1
System.out.println(s==t); // definitely it would return true; //LINE 2
s=s+"d"; t=t+"d"; //LINE 3
System.out.println(s==t); // output would be false; but why??
s=s.intern(); t=t.intern();
System.out.println(s==t); // it would return true;
I wanted to know why the second print statement returned false. Please provide me any reference link which explains the same.
While creating t at line 1; intern was called and it pointed to "abc" why not intern was called at line 3?
java strings are immutable.
that means that when you do something like s=s+"d" youre actually creating a whole new string, and assigning it to s.
on top of that, the compiler does constant detection and allocation, so that when you write s="abc", t="abc" the compiler re-uses the same reference and your code is effectively s=t="abc"
so you start with the exact same string instance (thanks to compiler optimization) and turn it into 2 identical yet different strings, at which point s==t is false (s.equals(t) would have been true, as it compares the contents and not the address in memory).
next up is intern(). what intern() does is looks up an identical string in the string cache and returns it. if it doesnt find an identical entry it places the argument provided into the cache and returns the argument. so s=s.intern() places s into the string cache and returns it (so s is unchanged) but the following call t=t.intern() actually returns s, so that s==t again.
Strings are "special" Java objects.
The JVM tries to reuse the same references (that's why String s = "abc", t="abc"; causes s and t to point to the same instance), however, when working on instances (like t=t+"d") a new instance gets created, thus, the references are not the same
In order to compare strings you have to use the .equals() method.
intern() causes to create a canonical representation out of the string pool inside the String class (
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern%28%29)
String s = "abc", t="abc";
s == t is true because Java automatically interns String literals. In this case the String literal "abc" has been interned and both s and t point to that same instance. Hence s == t is true.
s = s + "d"; t = t + "d";
Strings in Java are immutable. Hence what you are assigning to s and t are two new Strings that have been constructed. Therefore they do not point to the same instance. This is why s == t returns false.
s = s.intern(); t = t.intern();
Here you have forcibly interned the string in s.intern(). Since both s and t contain the same string values, the JVM sees that t is the same and makes it point to the same interned-instance as s. Hence s == t is true.
As a general note, establishing the equality of strings should be done via .equals() and not ==; == only compares references for reference-types and not values.
Java Language Specification explicitly covers this particular situation. Here is a quote from chapter 3.10.5. "String Literals":
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
As you can see, only constant expressions are interned. So, first four lines of your code are equivalent to:
String s = "abc".intern(), t="abc".intern();
System.out.println(s==t);
s=s+"d".intern(); t=t+"d".intern();
System.out.println(s==t);
Expressions s+"d" and t+"d" aren't constant and, thus, aren't interned.
JLS even provides an example with useful notes. Here is the relevant part:
package testPackage;
class Test {
public static void main(String[] args) {
String hello = "Hello", lo = "lo";
System.out.print((hello == ("Hel"+lo)));
}
}
Output: false
Note: Strings computed by concatenation at run time are newly created and therefore distinct.
Because when you concatenate Strings you generate a new object reference except when they are literal Strings.
Note that the intern of both Strings point to the same literal String object reference.

will two strings with same content be stored in the same memory location?

This is a question that I got in an interview.
I've two strings defined as
String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location. In general, when we create identical strings (without new keyword), does the content get stored in the memory only once and all the String objects with the same content just refer to the same location, without storing the string "Java" redundantly ? The hash codes of s1 and s2 are the same. But are hashcodes dependent directly on memory location of the object?
The process of combining identical strings is called "interning", and has been done for many years by lots of language compilers, but not always. The answer to the question, especially as expanded by #GennadyVanin--Novosibirsk, depends on the language and the compiler implementation. For Java, all constant strings are interned, as required by the Java Language Specification. But that's only constant string expressions, and only when they're compiled at the same time. If you have two Java strings sufficiently separated in time and space (e.g., compiled into separate JAR files), they will not be the same object. Similarly, dynamically created Java strings (e.g., the output of various toString() methods) won't be interned unless the method specifically requests it via String.intern(). And yes, all uses of an interned string will share the same memory locations - that's a big part of why strings are interned in the first place.
As to other languages, that's a bigger question, but with all the information in these answers, I'm sure you can research it on the web. Suffice it to say that there is no universal agreement on how this ought to be done.
String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location
Dumb citing §3.10.5 of Java Language Specification:
A string literal is a reference to an instance of class String
(§4.3.1, §4.3.3).
Moreover, a string literal always refers to the same instance of class
String. This is because string literals - or, more generally, strings
that are the values of constant expressions (§15.28) - are
"interned" so as to share unique instances, using the method
String.intern.
And read the comments to code example there:
This example illustrates six points:
Literal strings within the same class (§8) in the same package (§7) represent references to the same String object (§4.3.1).
Literal strings within different classes in the same package represent references to the same String object.
Literal strings within different classes in different packages likewise represent references to the same String object.
Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.
Strings computed by concatenation at run time are newly created and therefore distinct.
The result of explicitly interning a computed string is the same string as any pre-existing literal string with the same contents.
When compiler optimizes your string literals, it sees that both s1 and s2 have same value and thus you need only one string object. It's safe because String is immutable in Java.
String s1="Java";
String s2="Java";
System.out.println(s1== s2);
This gives result true because s1 and s2 points to the same object.
String Pool is the mechanism that all already defined string are stored in some 'pool' and before creating new String object compiler checks if such string is already defined.
Example.
First example
String s1 = "FirstString";
String s2 = "FirstString";
if(s1 == s2) {
//This condition matched true because java don't make separate object for these two string. Both strings point to same reference.
}
Second example
String s1= "FirstString";
String s2 = new String("FirstString");
if(s1.equals(s2)) {
//This condition true because same content.
}
if(s1 == s2) {
//This condition will be false because in this java allocate separate reference for both of them
}
Conclusion: Java check whether string exist or not. If we create the object of second string using new and have different content then its creates object and assign different reference and In case of If we don't create the object using new and have same content then its assign the same reference as first string contain.
Adding to others:
new keyword always forces to create a new object.
If you declare like below:
String s1 = "some";
String s2 = "some";
Then using String Pooling mechanism, both references s1 and s2 will refer to the same String object with the value "some".
When you have
String str1 = new String("BlaBla"); //In the heap!
String str2 = new String("BlaBla"); //In the heap!
then you're explicitly creating a String object through new operator (and constructor).
In this case you'll have each object pointing to a different storage location.
But if you have:
String str1 = "BlaBla";
String str2 = "BlaBla";
then you've implicit construction.
Two strings literals share the same storage if they have the same values, this is because Java conserves the storage of the same strings! (Strings that have the same value)
String s1="Java";
String s2="Java";
both points to same object. for more detail click here
String s1="Java";
String s2="Java";
Do they point to the same memory location?
I originally said "no" but in the case above, see the StringPool answer referred to below, it's actually yes..
"when we create identical strings (without new keyword), does the
content get stored in the memory only once and all the String objects
with the same content just refer to the same location"
...kind of see detailed answer in question "Java Strings and StringPool"
"The hash codes of s1 and s2 are the same. But are hashcodes dependent
directly on memory location of the object?"
No the hashcodes depend on the content of the String
YES,
Andrew Hare was answer on stack overflow in this link https://stackoverflow.com/a/2486195/4835894.
Basically, a string intern pool allows a runtime to save memory by preserving immutable strings in a pool so that areas of the application can reuse instances of common strings instead of creating multiple instances of it.

Why should I not instantiate a string object with a constructor?

In Java, apparently, String s = "foo" is preferred over String s = new String("foo").
Why? Isn't a new string object created in both cases? Why would the first case preclude calling a constructor?
Why?
Because the second approach results in two string objects (the original due to the string literal, plus an explicit copy).
The first case is a string literal, simply a shorthand the language offers you to create a string. The String class constructor still gets called, just not explicitly, which means less typing and less code clutter.
The second case takes the String object already created by the literal and passes it to a constructor, which copies the content to create a new, separate String object. The literal will still be around because literals are interned.
There is rarely a point to using the String constructor (pretty much only when you've created a substring of a very large string and want to release the memory used by the rest of the string, because substrings by default use the same underlying char array as the original string, just with a different offset and length.
I don't think it's preferable. I assume the only "benefit" you get is that if you wrongfully use the "==" operator rather than the equals method, have two different instances of a string will fail faster which will prompt you to fix your code. (the == operator may "succeed" and fail unpredictably)
Unless of course your code requires you to construct two different instances for whatever reason
Why? Isn't a new string object created in both cases?
No, the initial form being a string literal will be interned such that only one instance is created:
String s = "foo";
String s2 = "foo";
s == s2 => true

Operator overloading in Java

Acordding to my knowledge in java I know, that there is no operator overloading in the Java language.
So, why this code prints 'true' twice ?
String s1 = "abc";
String s2 = "abc";
System.out.println(s1==s2);
Integer i1 = 1;
Integer i2 = 1;
System.out.println(i1==i2);
== for reference types compares the references; == for primitive types compares values. In case of your first example, the two object references turn out to be the same due to a concept known as string pool. Hence two true in the given case. Another code snippet you might want to try out:
String s1 = "abc";
String s2 = new String("abc");
System.out.println(s1 == s2);
System.out.println(s1.equals(s2));
As you must have already tried out; it prints out false and then true. The reason for this is that using the new keyword results in the creation of a completely new string even though a string object with the exact same contents already exists in the string pool. In this case, s1 now points to an interned string with the contents "abc" (or to a string in the string pool) whereas s2 now points to a completely new string object (again with the content "abc"). Hence the false in the first print statement.
In the second print statement, what we are doing is comparing the contents of the String object rather than its reference, which as it should prints true.
This is one of the most common mistakes made by beginners of the Java language; they use == for logical comparison when it actually results in a reference comparison. Read the link posted in one of the answers here for more details about string pooling. On a related note, String class "overrides" the equals method of the Object class to provide a logical comparison. Unless the class you write doesn't provide a logical implementation of the equals method, it doesn't matter whether you call equals or use the == operator; the result would be the same i.e. reference comparison.
For a more in-depth view on equality, read Brian's article; an excellent read.
It's not entirely true that there is no operator-overloading in Java. There just isn't any custom operator overloading. For example there's some operator-overloading with + which adds as both addition and as String-concatenation. This is defined by the language and can't be modified by the developer.
Your example, however doesn't use operator overloading anywhere. == on reference types always does the same thing: return true when the left side and the right side refer to the exact same object.
In this case s1 and s2 reference the same object and i1 and i2 reference the same object as well.
s1 and s2 reference the same interned String, because string literals are guaranteed to be interned.
i1 and i2 reference the same cached Integer because auto-boxing will re-use a fixed pool of Integer objects for common numeric values.
You don't get to overload operators, but that doesn't mean that it's not built into the JVM itself. The obvious counter example is the plus operator and the different behavior for String and numbers.
This is because "All literal strings and string-valued constant expressions are interned."
See http://download.oracle.com/javase/6/docs/api/java/lang/String.html#intern%28%29
Your code shows like its was related to operator overloading, but, is not.
String "==" operator seems to be "overloded" with Integer "==" operator. As #Sanjay T. Sharma mentioned in a previous answer, in Java there are "reference" types, and "primitive" types, which handles different the "==" operator.
Strings in Java are "reference" types, and integers are "primitive" types. If you have use pointers and objects in other languages, you will find that in Java, a string variables, is really a pointer to an object, and using the "==" operator behaves different.

Categories

Resources