String literal and object relation in Java [duplicate] - java

This is a question that I got in an interview.
I've two strings defined as
String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location. In general, when we create identical strings (without new keyword), does the content get stored in the memory only once and all the String objects with the same content just refer to the same location, without storing the string "Java" redundantly ? The hash codes of s1 and s2 are the same. But are hashcodes dependent directly on memory location of the object?

The process of combining identical strings is called "interning", and has been done for many years by lots of language compilers, but not always. The answer to the question, especially as expanded by #GennadyVanin--Novosibirsk, depends on the language and the compiler implementation. For Java, all constant strings are interned, as required by the Java Language Specification. But that's only constant string expressions, and only when they're compiled at the same time. If you have two Java strings sufficiently separated in time and space (e.g., compiled into separate JAR files), they will not be the same object. Similarly, dynamically created Java strings (e.g., the output of various toString() methods) won't be interned unless the method specifically requests it via String.intern(). And yes, all uses of an interned string will share the same memory locations - that's a big part of why strings are interned in the first place.
As to other languages, that's a bigger question, but with all the information in these answers, I'm sure you can research it on the web. Suffice it to say that there is no universal agreement on how this ought to be done.

String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location
Dumb citing §3.10.5 of Java Language Specification:
A string literal is a reference to an instance of class String
(§4.3.1, §4.3.3).
Moreover, a string literal always refers to the same instance of class
String. This is because string literals - or, more generally, strings
that are the values of constant expressions (§15.28) - are
"interned" so as to share unique instances, using the method
String.intern.
And read the comments to code example there:
This example illustrates six points:
Literal strings within the same class (§8) in the same package (§7) represent references to the same String object (§4.3.1).
Literal strings within different classes in the same package represent references to the same String object.
Literal strings within different classes in different packages likewise represent references to the same String object.
Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.
Strings computed by concatenation at run time are newly created and therefore distinct.
The result of explicitly interning a computed string is the same string as any pre-existing literal string with the same contents.

When compiler optimizes your string literals, it sees that both s1 and s2 have same value and thus you need only one string object. It's safe because String is immutable in Java.
String s1="Java";
String s2="Java";
System.out.println(s1== s2);
This gives result true because s1 and s2 points to the same object.
String Pool is the mechanism that all already defined string are stored in some 'pool' and before creating new String object compiler checks if such string is already defined.

Example.
First example
String s1 = "FirstString";
String s2 = "FirstString";
if(s1 == s2) {
//This condition matched true because java don't make separate object for these two string. Both strings point to same reference.
}
Second example
String s1= "FirstString";
String s2 = new String("FirstString");
if(s1.equals(s2)) {
//This condition true because same content.
}
if(s1 == s2) {
//This condition will be false because in this java allocate separate reference for both of them
}
Conclusion: Java check whether string exist or not. If we create the object of second string using new and have different content then its creates object and assign different reference and In case of If we don't create the object using new and have same content then its assign the same reference as first string contain.

Adding to others:
new keyword always forces to create a new object.
If you declare like below:
String s1 = "some";
String s2 = "some";
Then using String Pooling mechanism, both references s1 and s2 will refer to the same String object with the value "some".

When you have
String str1 = new String("BlaBla"); //In the heap!
String str2 = new String("BlaBla"); //In the heap!
then you're explicitly creating a String object through new operator (and constructor).
In this case you'll have each object pointing to a different storage location.
But if you have:
String str1 = "BlaBla";
String str2 = "BlaBla";
then you've implicit construction.
Two strings literals share the same storage if they have the same values, this is because Java conserves the storage of the same strings! (Strings that have the same value)

String s1="Java";
String s2="Java";
both points to same object. for more detail click here

String s1="Java";
String s2="Java";
Do they point to the same memory location?
I originally said "no" but in the case above, see the StringPool answer referred to below, it's actually yes..
"when we create identical strings (without new keyword), does the
content get stored in the memory only once and all the String objects
with the same content just refer to the same location"
...kind of see detailed answer in question "Java Strings and StringPool"
"The hash codes of s1 and s2 are the same. But are hashcodes dependent
directly on memory location of the object?"
No the hashcodes depend on the content of the String

YES,
Andrew Hare was answer on stack overflow in this link https://stackoverflow.com/a/2486195/4835894.
Basically, a string intern pool allows a runtime to save memory by preserving immutable strings in a pool so that areas of the application can reuse instances of common strings instead of creating multiple instances of it.

Related

How many String objects are created by JVM? [duplicate]

As per my understanding
String s = new String("abc") will create two objects-
one in String constant pool (if "abc" is not already in constant pool)
one in Heap memory
Although more than understandings exist about how many objects will actually be created and where.
Anyway I have following questions -
Why is it not covered in java docs of String class about how many objects will be created and where?
Why new String(String) is provided anyway in String class provided Strings are immutable?.Also can it can be assumed that all strings, created by either String s = "abc" or String s = new String("abc"), will be available in String constant pool?
The String literals used in creating or appended in StringBuilder or StringBuffer,do they also go in String constant pool or they remain in heap memory only.
Edit
java docs for intern() method clearly talks about constant pool but
nothing is said like that in String(String).
Why is it not covered in java docs of String class about how many objects will be created and where?
It is covered in Docs of String
The String class represents character strings. All string literals in Java programs, such as "abc", are implemented as instances of this class.
Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared. For example:
String str = "abc";
is equivalent to:
char data[] = {'a', 'b', 'c'};
String str = new String(data);
And from the Java language specification
A String object has a constant (unchanging) value.
String literals (§3.10.5) are references to instances of class String.
And from JSL # 3.10.5. String Literals
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
.
Why new String(String) is provided anyway in String class provided Strings are immutable?.Also can it can be assumed that all strings, created by either String s = "abc" or String s = new String("abc"), will be available in String constant pool?
Since String is object the valid way of declaration is
String s = new String("abc");
But where String s = "abc"; is designed for other reasons
The designers of Java decided to retain primitive types in an object-oriented language, instead of making everything an object, so as to improve the performance of the language.
Since it is the most useful class For performance reason, Java's String is designed to be in between a primitive and a class.
The String literals used in creating or appended in StringBuilder or StringBuffer,do they also go in String constant pool or they remain in heap memory only.
Consider the example
StringBuilder sb = new StringBuilder("abc");
The literal "abc" available in constant pool and the object sb created in heap.
Give a shot to read my old answer : How can a string be initialized using " "?
String s = new String("abc") will create two objects:
one in String constant pool (if "abc" is not already in constant pool)
No. It is in the constant pool. It was put there by the compiler.
one in Heap memory
Correct.
Although more than understandings exist about how many objects will actually be created and where.
Many misunderstandings: only one correct understanding.
Why is it not covered in java docs of String class about how many objects will be created and where?
The premiss of the question is not correct. It is covered. Not possibly in the exact form 'how many strings are created', but as the question has been asked millions of time in the last 20 years the answer isn't exactly a secret. Or shouldn't be.
Why new String(String) is provided anyway in String class provided Strings are immutable?.
So you can create a new one.
Also can it can be assumed that all strings, created by either String s = "abc" or String s = new String("abc"), will be available in String constant pool?
No. Only string literals and the result of String.intern() are in the constant pool. And that is documented as well.
The String literals used in creating or appended in StringBuilder or StringBuffer,do they also go in String constant pool or they remain in heap memory only.
All String literals are placed in the constant pool by the compiler.

What is need of String s = new String("abc") when we can create String s = "abc"

As per my understanding
String s = new String("abc") will create two objects-
one in String constant pool (if "abc" is not already in constant pool)
one in Heap memory
Although more than understandings exist about how many objects will actually be created and where.
Anyway I have following questions -
Why is it not covered in java docs of String class about how many objects will be created and where?
Why new String(String) is provided anyway in String class provided Strings are immutable?.Also can it can be assumed that all strings, created by either String s = "abc" or String s = new String("abc"), will be available in String constant pool?
The String literals used in creating or appended in StringBuilder or StringBuffer,do they also go in String constant pool or they remain in heap memory only.
Edit
java docs for intern() method clearly talks about constant pool but
nothing is said like that in String(String).
Why is it not covered in java docs of String class about how many objects will be created and where?
It is covered in Docs of String
The String class represents character strings. All string literals in Java programs, such as "abc", are implemented as instances of this class.
Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared. For example:
String str = "abc";
is equivalent to:
char data[] = {'a', 'b', 'c'};
String str = new String(data);
And from the Java language specification
A String object has a constant (unchanging) value.
String literals (§3.10.5) are references to instances of class String.
And from JSL # 3.10.5. String Literals
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
.
Why new String(String) is provided anyway in String class provided Strings are immutable?.Also can it can be assumed that all strings, created by either String s = "abc" or String s = new String("abc"), will be available in String constant pool?
Since String is object the valid way of declaration is
String s = new String("abc");
But where String s = "abc"; is designed for other reasons
The designers of Java decided to retain primitive types in an object-oriented language, instead of making everything an object, so as to improve the performance of the language.
Since it is the most useful class For performance reason, Java's String is designed to be in between a primitive and a class.
The String literals used in creating or appended in StringBuilder or StringBuffer,do they also go in String constant pool or they remain in heap memory only.
Consider the example
StringBuilder sb = new StringBuilder("abc");
The literal "abc" available in constant pool and the object sb created in heap.
Give a shot to read my old answer : How can a string be initialized using " "?
String s = new String("abc") will create two objects:
one in String constant pool (if "abc" is not already in constant pool)
No. It is in the constant pool. It was put there by the compiler.
one in Heap memory
Correct.
Although more than understandings exist about how many objects will actually be created and where.
Many misunderstandings: only one correct understanding.
Why is it not covered in java docs of String class about how many objects will be created and where?
The premiss of the question is not correct. It is covered. Not possibly in the exact form 'how many strings are created', but as the question has been asked millions of time in the last 20 years the answer isn't exactly a secret. Or shouldn't be.
Why new String(String) is provided anyway in String class provided Strings are immutable?.
So you can create a new one.
Also can it can be assumed that all strings, created by either String s = "abc" or String s = new String("abc"), will be available in String constant pool?
No. Only string literals and the result of String.intern() are in the constant pool. And that is documented as well.
The String literals used in creating or appended in StringBuilder or StringBuffer,do they also go in String constant pool or they remain in heap memory only.
All String literals are placed in the constant pool by the compiler.

How to get address of any primitive data type variables and string variables or reference in Java?

I want to check how immutable string variables store only one address location.
For example,
String s1="welcome"
String s2="welcome"
then s1 and s2 reference the same one string "welcome" store address.
String pool is a memory area of Heap where all the Strings are located by the java virtual machine.
In the String pool there is an another small portion of memory to store String constants or literals. So, String constant pool is subset of String pool in the Heap space.
Consider following example,
String cPoolStr1 = "Hello";
String cPoolStr2 = "Hello";
String sPoolStr1 = new String("Hello");
String sPoolStr2 = new String("Hello");
To check whether both are referring to the same address you will get true for cPoolStr1 == cPoolStr2 which compares the reference of the literals.
String literals are stored in a common pool. Typically a part of heap.
Secondly, no you can't get address of an Object in Java. It changes over time with GC.
However you can check the hashcode of two objects to see weather they are sharing the same address at given point of time.
System.out.println(s1.hashCode()+" ** " +s2.hashCode());
Where hashcode() method
(This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
Give a shot on the answer :How can a string be initialized using " "?

String equality check by only reference

As we know if we do a check like below the output will be equal.
String s1 = "stackoverflow";
String s2 = "stackoverflow";
if(s1==s2){
System.out.println("equal");
}
So my question is if i am not using new operator in my application to create String and all are strings are literals so can i use only reference equality as given above? Thanks in advance.
N.B: i am writing a crawler so i need to check whether i have already visited the given url that i am currently holding. I am using murmur hash which gives me a long for every url but there are collision so i need to check for the content if the url string if there is a hash collision. Hence for performance i am thinking of just comparing the reference equality of two string urls. And i am using jsoup for html parsing.
if i am not using new operator in my application to create String and all are strings are literals so can i use only reference equality as given above?
If you are 100% sure that all the strings you are dealing with are plain string literals or compile-time constant expressions then yes. The Java Language Specification §15.28 mandates that
Compile-time constant expressions of type String are always "interned" so as to share unique instances, using the method String.intern.
But if you get strings from anywhere else (e.g. reading them from a web page retrieved by your crawler, or building them using concatenation expressions that are not compile-time constants) then you must use .equals to compare them by value rather than by reference or .intern() them explicitly.
It's not always obvious whether an expression is a compile-time constant or not:
String s1 = "Stack";
String s2 = s1 + "Overflow"; // not a CTC
but
final String s1 = "Stack";
String s2 = s1 + "Overflow"; // _is_ a CTC, because s1 is a "constant variable"
// (final, with an initializer that is itself a CTC)
No, you cannot. The VM does not guarantee described behavior, it is an optimization. To guarantee this behavior, you need to call String#intern().
This and only this will guarantee reference equality.
But do a performance test, String#equals() is probably faster :-)
So my question is if i am not using new operator in my application to create String and all are strings are literals so can i use only reference equality as given above?
Yes , for sure. Since they are resolved at compile time , so no issues.
But keep in mind the below scenario
String s3= s2;
String s4= s1+""; //resolved at run time
System.out.println(s3==s4); //false
System.out.println(s3.equals(s4));//true
So until unless you are sure that the strings not going to be change later, you can safely use ==.

will two strings with same content be stored in the same memory location?

This is a question that I got in an interview.
I've two strings defined as
String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location. In general, when we create identical strings (without new keyword), does the content get stored in the memory only once and all the String objects with the same content just refer to the same location, without storing the string "Java" redundantly ? The hash codes of s1 and s2 are the same. But are hashcodes dependent directly on memory location of the object?
The process of combining identical strings is called "interning", and has been done for many years by lots of language compilers, but not always. The answer to the question, especially as expanded by #GennadyVanin--Novosibirsk, depends on the language and the compiler implementation. For Java, all constant strings are interned, as required by the Java Language Specification. But that's only constant string expressions, and only when they're compiled at the same time. If you have two Java strings sufficiently separated in time and space (e.g., compiled into separate JAR files), they will not be the same object. Similarly, dynamically created Java strings (e.g., the output of various toString() methods) won't be interned unless the method specifically requests it via String.intern(). And yes, all uses of an interned string will share the same memory locations - that's a big part of why strings are interned in the first place.
As to other languages, that's a bigger question, but with all the information in these answers, I'm sure you can research it on the web. Suffice it to say that there is no universal agreement on how this ought to be done.
String s1="Java";
String s2="Java";
My question is whether these two references point to the same memory location
Dumb citing §3.10.5 of Java Language Specification:
A string literal is a reference to an instance of class String
(§4.3.1, §4.3.3).
Moreover, a string literal always refers to the same instance of class
String. This is because string literals - or, more generally, strings
that are the values of constant expressions (§15.28) - are
"interned" so as to share unique instances, using the method
String.intern.
And read the comments to code example there:
This example illustrates six points:
Literal strings within the same class (§8) in the same package (§7) represent references to the same String object (§4.3.1).
Literal strings within different classes in the same package represent references to the same String object.
Literal strings within different classes in different packages likewise represent references to the same String object.
Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.
Strings computed by concatenation at run time are newly created and therefore distinct.
The result of explicitly interning a computed string is the same string as any pre-existing literal string with the same contents.
When compiler optimizes your string literals, it sees that both s1 and s2 have same value and thus you need only one string object. It's safe because String is immutable in Java.
String s1="Java";
String s2="Java";
System.out.println(s1== s2);
This gives result true because s1 and s2 points to the same object.
String Pool is the mechanism that all already defined string are stored in some 'pool' and before creating new String object compiler checks if such string is already defined.
Example.
First example
String s1 = "FirstString";
String s2 = "FirstString";
if(s1 == s2) {
//This condition matched true because java don't make separate object for these two string. Both strings point to same reference.
}
Second example
String s1= "FirstString";
String s2 = new String("FirstString");
if(s1.equals(s2)) {
//This condition true because same content.
}
if(s1 == s2) {
//This condition will be false because in this java allocate separate reference for both of them
}
Conclusion: Java check whether string exist or not. If we create the object of second string using new and have different content then its creates object and assign different reference and In case of If we don't create the object using new and have same content then its assign the same reference as first string contain.
Adding to others:
new keyword always forces to create a new object.
If you declare like below:
String s1 = "some";
String s2 = "some";
Then using String Pooling mechanism, both references s1 and s2 will refer to the same String object with the value "some".
When you have
String str1 = new String("BlaBla"); //In the heap!
String str2 = new String("BlaBla"); //In the heap!
then you're explicitly creating a String object through new operator (and constructor).
In this case you'll have each object pointing to a different storage location.
But if you have:
String str1 = "BlaBla";
String str2 = "BlaBla";
then you've implicit construction.
Two strings literals share the same storage if they have the same values, this is because Java conserves the storage of the same strings! (Strings that have the same value)
String s1="Java";
String s2="Java";
both points to same object. for more detail click here
String s1="Java";
String s2="Java";
Do they point to the same memory location?
I originally said "no" but in the case above, see the StringPool answer referred to below, it's actually yes..
"when we create identical strings (without new keyword), does the
content get stored in the memory only once and all the String objects
with the same content just refer to the same location"
...kind of see detailed answer in question "Java Strings and StringPool"
"The hash codes of s1 and s2 are the same. But are hashcodes dependent
directly on memory location of the object?"
No the hashcodes depend on the content of the String
YES,
Andrew Hare was answer on stack overflow in this link https://stackoverflow.com/a/2486195/4835894.
Basically, a string intern pool allows a runtime to save memory by preserving immutable strings in a pool so that areas of the application can reuse instances of common strings instead of creating multiple instances of it.

Categories

Resources