about String a= "hello"; String b= "hello" a==b , in java

about String a= "hello"; String b= "hello" a==b , in java - java

check the following program:
Run it in sun java hostspot jvm, everything will be "true".
--------updated: got the answer by Stephen and Danie,changed the program to add string intern method-----------
how it will become, if B is separate compiled not together with A, what will happen???, for example , B is compiled and put in a jar, and put its class path when run TestStringEqual ??
Also, is this java compile time optimization, or java run time optimization, or java language specification defined ??
Also, it this program comes the same result on different VMs, or just one VM feature?
thanks
public class TestStringEqual {
public static String HELLO = "hello";
private String m_hello;
public TestStringEqual() {
m_hello = "hello";
}
public static void main(String[] args) {
String a = "hello";
String b = "hello";
System.out.println("string a== string b:" + (a == b));
System.out.println("static memebr ==a:" + (HELLO == a));
System.out.println("instance field ==a:"
+ (new TestStringEqual().getHello() == a));
System.out.println("hello in B ==a:" + (B.B_HELLO == a));
System.out.println("interned new string object in heep==a:"
+ ( new String("hello").intern() == a));
}
public String getHello() {
return this.m_hello;
}
}
class B{
public static final String B_HELLO = "he"+"llo";
}

There is really no mystery about this at all. You just need to know three basic facts about Java:
The '==' operator for object references tests if two object references are the same; i.e if they point to the same object. Reference JLS 15.21.3
All String literals with the same sequence of characters in a Java program will be represented by the same String object. Reference JLS 3.10.5 So (for example) "hello" == "hello" is comparing the same object.
Constant expressions are evaluated at compile time. Reference JLS 15.28. So (for example) "hell" + "o" is evaluated at compile time, and is therefore equivalent to the literal "hello".
These three facts are stated in the Java Language Specifications. They are sufficient to explain the "puzzling" aspects behaviour of your program, without relying on anything else.
The more detailed explanation involving the string pool, string literals being interned by the class loader, the bytecodes emitted by the compiler, etc, etc ... are just implementation details. You don't need to understand these details if you understand what the JLS is saying, and they don't really help to make the JLS clearer (IMO).
Notes:
The definition of what is and what isn't a constant expression is a little involved. Some things that you might imagine to be constant valued, are in fact not. For instance, "hello".length() is not a constant expression. However, a concatenation of two string literals is a constant expression.
The explanation of equality of string literals in the JLS does in fact mention interning as the mechanism by which this property of literals is implemented.

On the JVM level, the LDC (load constant) instruction is used to push a string literal onto the stack. For performance reasons, the string literal isn't stored in the code itself; it's stored in the constant pool of the class. The constant pool is a table which appears at the beginning of a class file containing string literals, numeric literals, field and method descriptors, and a few other things. LDC is followed by a byte specifying the string's index in the constant pool. (If one byte is not large enough, the compiler will use LDC_W, which is followed by a 16-bit offset. Hence the limit of 65,536 constants.)
If the same string literal occurs twice in the same class, javac is smart enough to create only one entry in the constant pool. When a class is loaded, the JVM creates actual String objects from the data in the constant pool. LDCs which contain the same offset into the constant pool will thus cause the same String to be pushed onto the stack. Instructions like IF_ACMPEQ (which checks for reference equality as == does) will then recognize the strings as identical.
See the JVMS for more info.

It's an immutable string (unable to be mutated or changed), not an immune one, though I suppose you could argue that it's immune from change :-)
That means you cannot change the underlying string itself, you can only assign a different string to the variable. So:
string a = "Hello";
a = "Goodbye";
doesn't change the memory where "Hello" is stored, it changes a to point to a different memory location where "Goodbye" is stored.
This allows Java to share strings for efficiency. You can even get cases where strings like "deoxyribonucleic acid" and "acid" may share space, where the latter points to a specific location within the former. Again, this is made possible by the immutable nature of such strings.
In any case, == will check to see if the strings refer to the same underlying object, not something that's often useful. If you want to see if the strings are equal, you should be using String.equals() or one of its variations.

It is fairly simple: the compiler will generate a (bytecode) constant for the string "hello" the first time it encounters it. In normal assembler it would be in the .TEXT section.
The subsequent "hello" strings will then point to that same constant, since there is no need to allocate new space or create a new constant. The reason this is so is because strings are immutable and if one is assigned a new value new memory is needed for it anyway.
It will probably not work on input, i.e. if you let a user input "hello" and ==-compare that to the compile-time hello strings you'll likely get false.

As far as a==b goes, it seems the compiler is making the shortcuts and sharing the same string object. When I declare my varuiables as follows, I get a==b is false.
String a = "hello";
String b = "hell";
String temp = "o";
if (new java.util.Random().nextDouble() < 0.5) b += temp;
else b += "o";
If I do String b = "hell"+"o"; I still get a==b as true.

Related

after used non-string operand of "+" operator will the resulted string go to String pool?

I am studding and preparing for OCP 1Z0-815 and I read from this excellent preparing book:
Deshmukh, Hanumant. OCP Oracle Certified Professional Java SE 11 Programmer I Exam Fundamentals 1Z0-815: Study guide for passing the OCP Java 11 Developer Certification Part 1 Exam 1Z0-815 (p. 99). Enthuware. Edição do Kindle.
String str = "hello";
for( int i = 0; i < 5; i + +) {
str = str + i;
}
The above creates one String object containing "hello" at the beginning and then two more in each iteration of the loop - a String containing the int value of i and the concatenated String. Thus, overall, the above code creates 1 + 2* 5 = 11 Strings. However, if you are asked how many String will be eligible to be garbage collected, the answer is not that easy. The Java Language Specification mentions in Section 15.8.1 that the non-string operand of the + operator is converted to a String at runtime but it does not clearly tell whether this String goes to the String pool (in which case it will not be garbage collected) or not.
Let me show you another piece of code:
String s = "hello";
int k = 100;
s = s + " 123" + k;
In this case, JLS section 15.8.1 clearly says that a compiler may avoid creating multiple strings altogether by making use of a StringBuilder. Thus, it is not possible to tell how many Strings will be created and how many will be eligible to be garbage collected...
The statement above "...The Java Language Specification mentions in Section 15.8.1 that the non-string operand of the + operator is converted to a String at runtime but it does not clearly tell whether this String goes to the String pool..." drove me to search around but I didn't manage to find an explanation: as far as I don't see "new" I understand that indeed it goes to String pool and so it is not an object. Consequentlly it is not elegiable for garbage collection at all. Am I saying something wrong?
In other words, as far as understand, every loop will result in a constant string in String Pool in Heap (Java 11 in mind instead of Java 7). hello0, hello1, hello2 and so on and there will not be any candidate for Garbage Collection at all. I understand that Garbage Colletion "clean" objects created by new operator and doesn't act on String Pool.
Based on conclusion paragraph, "... section 15.8.1 clearly says that a compiler may avoid creating multiple strings altogether by making use of a StringBuilder. Thus, it is not possible to tell how many Strings will be created and how many will be eligible to be garbage collected..." I assume it is saying that I can't discover how many Strings are created in String Pool in second code example (wihtout loop) because StringBuilder is used behind the scene and we know that StringBuilder manipulate concatenation in memory avoiding creating many String literals. Well, if that is the case, I still can assume in first code thatt fofr each loop will result in a Literal String (hello0, hello1...) that goes to String Pool and will not be eligible for Garbage Collection in pratical terms. Am I wrong?
PS.: You may comment that Garbage Collection acts on String Pool but I understand that in pratical terms, literal string in String Pool resides for a so long time that we can consider it is never eligible for Garbage Collection before program ends (" there is an implicit reference to the String object in the code of every method that uses the literal." and " all string literals in the string pool are reachable until the program is terminated and thus not eligible for garbage collection"

as far as I don't see new I understand that indeed it goes to String pool and so it is not an object.
First of all, the Java Language Specification doesn't mention the string pool specifically. What it actually says is that if two string-valued constant expressions are equal, then they will be the same object. (This covers string literals, but also covers cases like String s = "a" + "b";)
The string pool is >>a<< mechanism for doing that, but the JLS doesn't mandate a specific mechanism.
What about new?
The JLS also says that the new operator always produces a brand new object:
"The value of a class instance creation expression is a reference to the newly created object of the specified class. Every time the expression is evaluated, a fresh object is created." (JLS 15.9.4)
The consequence of this is:
String a1 = new String("a");
System.out.println("a" == a1); // prints "false"
This must be do, because the new String is not a fresh one. Since the string pool is a de-duping mechanism, we can conclude that new String("a") is NOT putting the new String object into the string pool.
Likewise, the + operator creates a new String:
"The result of string concatenation is a reference to a String object that is the concatenation of the two operand strings. The characters of the left-hand operand precede the characters of the right-hand operand in the newly created string." (JLS 15.18.1)
The JLS also says that the compiler can optimize expressions involving +:
"An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.
For primitive types, an implementation may also optimize away the creation of a wrapper object by converting directly from a primitive type to a string." (JLS 15.18.1)
But note that this optimization is only permitted within an expression, not across multiple statements.
The JLS says nothing about other String operations. For their specification we need to refer to the javadocs. In current versions, the only String methods that state that a new string is always created are the two join methods.
But likewise, the only String method that specifically mentions the string pool is the intern method.
That is what the specifications say. What do String implementations actually do?
Well if you examine the standard source code for Java SE implementations going back to Java 1.1, you will find that apart from intern, no String methods put objects into the string pool. None.
You also said this:
I understand that in practical terms, literal string in String Pool resides for a so long time that we can consider it is never eligible for Garbage Collection before program ends.
That is true is most cases. The exception is code that is you can create a classloader and use that to load classes dynamically. If that classloader becomes unreachable, and there are no other references to the classes it loaded, then its string literals may be unreachable ... before the program ends.
This is liable to happen if you use a product (e.g. a web container) where new versions of classes can be hot-loaded.
Finally if we look at the example:
String str = "hello";
for (int i = 0; i < 5; i++) {
str = str + i;
}
In practice:
The first time we run this, a new String object will (probably) be created for the "hello" literal.
On each loop iteration, at least one new String will be created. It is possible that an intermediate String will be created to represent the string value of i, but the compiler is permitted to optimize that away.
None of the intermediate String objects will be interned / added to the string pool.
All but the "hello" literal and the final value will be unreachable.

How JVM allocates memory for String in java?

I have a scenario like this -
String s = "abc", t="abc"; //LINE 1
System.out.println(s==t); // definitely it would return true; //LINE 2
s=s+"d"; t=t+"d"; //LINE 3
System.out.println(s==t); // output would be false; but why??
s=s.intern(); t=t.intern();
System.out.println(s==t); // it would return true;
I wanted to know why the second print statement returned false. Please provide me any reference link which explains the same.
While creating t at line 1; intern was called and it pointed to "abc" why not intern was called at line 3?

java strings are immutable.
that means that when you do something like s=s+"d" youre actually creating a whole new string, and assigning it to s.
on top of that, the compiler does constant detection and allocation, so that when you write s="abc", t="abc" the compiler re-uses the same reference and your code is effectively s=t="abc"
so you start with the exact same string instance (thanks to compiler optimization) and turn it into 2 identical yet different strings, at which point s==t is false (s.equals(t) would have been true, as it compares the contents and not the address in memory).
next up is intern(). what intern() does is looks up an identical string in the string cache and returns it. if it doesnt find an identical entry it places the argument provided into the cache and returns the argument. so s=s.intern() places s into the string cache and returns it (so s is unchanged) but the following call t=t.intern() actually returns s, so that s==t again.

Strings are "special" Java objects.
The JVM tries to reuse the same references (that's why String s = "abc", t="abc"; causes s and t to point to the same instance), however, when working on instances (like t=t+"d") a new instance gets created, thus, the references are not the same
In order to compare strings you have to use the .equals() method.
intern() causes to create a canonical representation out of the string pool inside the String class (
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern%28%29)

String s = "abc", t="abc";
s == t is true because Java automatically interns String literals. In this case the String literal "abc" has been interned and both s and t point to that same instance. Hence s == t is true.
s = s + "d"; t = t + "d";
Strings in Java are immutable. Hence what you are assigning to s and t are two new Strings that have been constructed. Therefore they do not point to the same instance. This is why s == t returns false.
s = s.intern(); t = t.intern();
Here you have forcibly interned the string in s.intern(). Since both s and t contain the same string values, the JVM sees that t is the same and makes it point to the same interned-instance as s. Hence s == t is true.
As a general note, establishing the equality of strings should be done via .equals() and not ==; == only compares references for reference-types and not values.

Java Language Specification explicitly covers this particular situation. Here is a quote from chapter 3.10.5. "String Literals":
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
As you can see, only constant expressions are interned. So, first four lines of your code are equivalent to:
String s = "abc".intern(), t="abc".intern();
System.out.println(s==t);
s=s+"d".intern(); t=t+"d".intern();
System.out.println(s==t);
Expressions s+"d" and t+"d" aren't constant and, thus, aren't interned.
JLS even provides an example with useful notes. Here is the relevant part:
package testPackage;
class Test {
public static void main(String[] args) {
String hello = "Hello", lo = "lo";
System.out.print((hello == ("Hel"+lo)));
}
}
Output: false
Note: Strings computed by concatenation at run time are newly created and therefore distinct.

Because when you concatenate Strings you generate a new object reference except when they are literal Strings.
Note that the intern of both Strings point to the same literal String object reference.

Object life cycle in java and memory management?

For the below statement in a program, how many objects will be created in heap memory and in the string constant pool?
I need clarity in object creation. Many sources I've read are not elaborating. I am confused when the object gets destroyed.
String a="MAM"+"BCD"+"EFG"+"GFE";
How many objects will be created?
I am looking for good material about the life cycle of objects, methods and classes and how the JVM handles them when they are dynamically changed and modified.

"MAM"+"BCD"+"EFG"+"GFE" is a compile-time constant expression and it compiles into "MAMBCDEFGGFE" string literal. JVM will create an instance of String from this literal when loading the class containing the above code and will put this String into the string pool. Thus String a = "MAM"+"BCD"+"EFG"+"GFE"; does not create any object, see JLS 15.18.1. String Concatenation Operator +
The String object is newly created (§12.5) unless the expression is a compile-time constant expression (§15.28).
It simply assigns a reference to String object in pool to local var a.

Only one object is created.
string s1 = "java";
string s2 = "ja" + "va";
s.o.p(s1==s2);
The statement yields true.
String s1="java";
string s2 = "ja";
String s3 = s2 +"va";
s.o.p(s1==s3);
The statement yields false.
So minimum one apparent should be permanent, then '+' operator generates new string object (in non constant pool using new()).
So, the question you asked does not have one also permanent. This means it creates only one object.

Exactly one object is created and placed in the constant pool, unless it already exists, in which case the existing object is used. The compiler concatenates string constants together, as specified in JLS 3.10.5 and 15.28.
A long string literal can always be broken up into shorter pieces and written as a (possibly parenthesized) expression using the string concatenation operator +
http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.10.5

Most answers seem to focus that a) the complete expression is one compile time constant and b) that the line itself does not construct a new object but only a reference to one object.
However noone so far has mentioned, that String itself contains a reference to an internal char[] (which is also in the constant pool).
Summary: There are two objects in the constant pool (String and char[]). The line does neither create nor destroy any object.
And regarding:
I am confused when the object gets destroyed.
No object is destroyed, since stuff in the constant pool will only be destroyed if the class itself would be unloaded. At most you can say, that the reference a will go out of scope eventually.

Only one object will be created since String a will compile into "MAMBCDEFGGFE".

Answers stating a single heap object in your example are correct. However, consider this code:
public class Tester
{
public String a="MAM";
public String b ="BCD";
public String c = "EFG";
public String d ="GFE";
public Tester()
{
String abcd = a + b + c + d;
}
}
In this example, there are 7 strings being created. a,b,c and d are not compiled into a single constant - they are members. 1 string is then created for each + operator - semantically speaking, + is a concatenation but logically it is creating a new string in memory. The first 2 operator strings are discarded immediately and are now eligible for garbage collection but the memory churn still occurs.
Technically there in an 8th object. The instance of Tester.
Edit: This has been proved to be nonsense in the comments

Why should I not instantiate a string object with a constructor?

In Java, apparently, String s = "foo" is preferred over String s = new String("foo").
Why? Isn't a new string object created in both cases? Why would the first case preclude calling a constructor?

Why?
Because the second approach results in two string objects (the original due to the string literal, plus an explicit copy).

The first case is a string literal, simply a shorthand the language offers you to create a string. The String class constructor still gets called, just not explicitly, which means less typing and less code clutter.
The second case takes the String object already created by the literal and passes it to a constructor, which copies the content to create a new, separate String object. The literal will still be around because literals are interned.
There is rarely a point to using the String constructor (pretty much only when you've created a substring of a very large string and want to release the memory used by the rest of the string, because substrings by default use the same underlying char array as the original string, just with a different offset and length.

I don't think it's preferable. I assume the only "benefit" you get is that if you wrongfully use the "==" operator rather than the equals method, have two different instances of a string will fail faster which will prompt you to fix your code. (the == operator may "succeed" and fail unpredictably)
Unless of course your code requires you to construct two different instances for whatever reason

Why? Isn't a new string object created in both cases?
No, the initial form being a string literal will be interned such that only one instance is created:
String s = "foo";
String s2 = "foo";
s == s2 => true

String initialization difference

Firstly apologies if this a very basic question, I'm just curious to know the difference between the following string definitions
String x= "hello";
String y = new String("hello");
I knew that in java String is a Class its neither a primitive nor a Wrapper(Correct me if this a misconception). Consider a class A, I've seen the following declarations for any class so far. i think A c; is valid and A a = new A(); is also valid. I'm confused with A a ="xyz"; this is how we declared a String as in above first type of definition. I'm sure that the above two definitions are absolutely different, like if i say x==y it returns false. I understand that y is the reference to the String object. What is x there, how is it stored in memory, interestingly i found that both x and y can access all the methods of String class.
Then what is the advantage of one over other.Can i know the applicability of each.

From the 2nd edition of Joshua Bloch's "Effective Java":
String s = new String("stringette");// DON'T DO THIS!
The statement creates a new String instance each time it is executed,
and none of those object creations is necessary. The argument to the
String constructor ("stringette") is itself a String instance,
functionally identical to all of the objects created by the
constructor. If this usage occurs in a loop or in a frequently invoked
method, millions of String instances can be created needlessly. The
improved version is simply the following:
String s = "stringette";

The first couple of answers you got are incorrect. There is a difference between the two statements. But first, the TL;DR version: Use String x = "hello"; in 99.99999% of situations.
The full answer:
The chief difference between those two is that in the first case, the string is implicitly interned; in the second case, it is not. This is a very real difference, although it only comes into play in select situations. So in the first case, if you have any other strings with the same series of characters ("hello"), your x will refer to the one shared object that is used in all of those places (this is useful, since String instances are immutable). In the second case, you're explicitly saying you need (for whatever reason) to have a String instance with that sequence of characters which is separate from any others. There are very, very few reasons to do that.
With regard to x==y, in Java you compare strings for equality using equals, not ==. The == operator when used with object references compares the references (e.g., do both variables point to the same object), not the object contents.
So: Prefer the first form to the second unless you have a good reason for doing the second.

String a = "hello"; //this kind of declaration always return "singlton" instance.
String b = "hello";
String c = new String("hello"); //This kind of declaration always return a new instance.
SO
a ==b;
but
a !=c

Very simply, the value of x is a String. A normal java.lang.String.
You're used to the idea that there are literals for primitive types (like, 7, or 3.14159, or true, or 'q'), but it seems what you may not have realised is that there are literal forms for some object types too. The only ones i can think of are java.lang.String, where you can write "hello", and java.lang.Class, where you can write SomeClass.class.

I understand that y is the reference to the String object. What is x there[...]?
Both x and y are references to String objects. The x is a reference to an interned string (as T.J. Crowder already wrote in his answer), the y is a reference to a second String object with the same content. This means:
x == y --> false
x.equals(y) --> true
y.equals(x) --> true
Then what is the advantage of one over other.Can i know the applicability of each.
Always use the first, direct version. The only valid exception which I ever encountered and found to be valid is test code which explicitly want to make sure, that the code to be tested does not use == for string comparisons but uses the proper x.equals(y) method.

One more way to initialize String is
String str =("abc");
It seems at first that this would give an error (because of the brackets) but this doesn't. Surprisingly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.