String initialization difference

String initialization difference - java

Firstly apologies if this a very basic question, I'm just curious to know the difference between the following string definitions
String x= "hello";
String y = new String("hello");
I knew that in java String is a Class its neither a primitive nor a Wrapper(Correct me if this a misconception). Consider a class A, I've seen the following declarations for any class so far. i think A c; is valid and A a = new A(); is also valid. I'm confused with A a ="xyz"; this is how we declared a String as in above first type of definition. I'm sure that the above two definitions are absolutely different, like if i say x==y it returns false. I understand that y is the reference to the String object. What is x there, how is it stored in memory, interestingly i found that both x and y can access all the methods of String class.
Then what is the advantage of one over other.Can i know the applicability of each.

From the 2nd edition of Joshua Bloch's "Effective Java":
String s = new String("stringette");// DON'T DO THIS!
The statement creates a new String instance each time it is executed,
and none of those object creations is necessary. The argument to the
String constructor ("stringette") is itself a String instance,
functionally identical to all of the objects created by the
constructor. If this usage occurs in a loop or in a frequently invoked
method, millions of String instances can be created needlessly. The
improved version is simply the following:
String s = "stringette";

The first couple of answers you got are incorrect. There is a difference between the two statements. But first, the TL;DR version: Use String x = "hello"; in 99.99999% of situations.
The full answer:
The chief difference between those two is that in the first case, the string is implicitly interned; in the second case, it is not. This is a very real difference, although it only comes into play in select situations. So in the first case, if you have any other strings with the same series of characters ("hello"), your x will refer to the one shared object that is used in all of those places (this is useful, since String instances are immutable). In the second case, you're explicitly saying you need (for whatever reason) to have a String instance with that sequence of characters which is separate from any others. There are very, very few reasons to do that.
With regard to x==y, in Java you compare strings for equality using equals, not ==. The == operator when used with object references compares the references (e.g., do both variables point to the same object), not the object contents.
So: Prefer the first form to the second unless you have a good reason for doing the second.

String a = "hello"; //this kind of declaration always return "singlton" instance.
String b = "hello";
String c = new String("hello"); //This kind of declaration always return a new instance.
SO
a ==b;
but
a !=c

Very simply, the value of x is a String. A normal java.lang.String.
You're used to the idea that there are literals for primitive types (like, 7, or 3.14159, or true, or 'q'), but it seems what you may not have realised is that there are literal forms for some object types too. The only ones i can think of are java.lang.String, where you can write "hello", and java.lang.Class, where you can write SomeClass.class.

I understand that y is the reference to the String object. What is x there[...]?
Both x and y are references to String objects. The x is a reference to an interned string (as T.J. Crowder already wrote in his answer), the y is a reference to a second String object with the same content. This means:
x == y --> false
x.equals(y) --> true
y.equals(x) --> true
Then what is the advantage of one over other.Can i know the applicability of each.
Always use the first, direct version. The only valid exception which I ever encountered and found to be valid is test code which explicitly want to make sure, that the code to be tested does not use == for string comparisons but uses the proper x.equals(y) method.

One more way to initialize String is
String str =("abc");
It seems at first that this would give an error (because of the brackets) but this doesn't. Surprisingly.

Related

Particularities of strings vs other objects

I'm into Java since a short time, and I was wondering: Strings are in fact objects, but I heard that in assigning them a value and retrieving it they act quite differently, almost as if they were primitive types... could someone make it more clear?
What do I exactly have to care about when I declare/edit/access a string compared to other objects?

First of all Java has string literals. That means you may write String foo = "bar";.
String are immutible (once you create one, you can't change it) and it helps JVM to do one trick called "string pool". String literals are stored in pools, and in the following example both foo and bar may point to one instance of string. String foo = "baz"; String bar = "baz". You may even compare them with ==, but you should never do that. How ever, equals() method (which you use to compare strings in Java) may benefit from it since it does not need to compare strings if both vars point to the same string.
Please check this topic for more info What is the Java string pool and how is "s" different from new String("s")?

what do we mean exactly when we say "ints are immutable"

I'm confused on the concept of "being immutable". Our professor is saying "ints are immutable! Strings are immutable" everyday, what does he mean by that exactly?
A more general question, how do we know if a data structure is immutable or not?
Thanks

Some of the other answers here are confusing mutability/immutability with value/reference semantics, so be careful...
Simply put, an entity is mutable if it may be modified after it's been created. In other words, its value may change over time.
First, a counterexample. A Java String object is immutable; there is no method that you can call on a String object that will change its value:
String a = "foo";
a.concat("bar");
System.out.println(a); // foo
You could do this instead:
String a = "foo";
a = a.concat("bar");
System.out.println(a); // foobar
but that works because concat() is creating a new String object, and the reference a is then repointed at it. There are now two String objects; the original has not changed (it's just lost forever). a is mutable, the underlying object isn't.
As for int variables; in C or Java, we can do this:
int x = 3;
x = 4; // Mutates x
x++; // Mutates x
How do we know these really mutate x, rather than simply creating a new integer "object" and "repointing" x at it? (Other than by the fact that the language assures us that primitive types are distinct from object types.) In C, we can somewhat prove it:
int x = 3;
int *p = x; // Pointer to original entity
x = 4;
printf("%d\n", *p); // 4
AFAIK, there is no equivalent approach in Java. So you could argue that the question of whether integer types are truly mutable in Java is irrelevant.
As for how we know whether a given type is immutable, very often we don't. At least, not without inspecting it, or simply believing a promise we've been told.
In Java, ensuring a user-defined type is immutable involves following a few simple rules (explained here). But it's still just a promise; the language doesn't enforce it.

Immutability (of an object or value, not a variable) usually means there's no way to do an in-place change of the value. (One that would propagate to other references to it.) This means that if you have something like the following:
String a = "foo";
There is no operation that you could perform on a that would change its value. I.e. you can't have a hypothetical method append() that would cause the following behaviour:
String a = "foo";
a.append("bar"); // a is not reassigned
System.out.println(a); // prints "foobar"
You can contrast this with mutable objects like collections:
int[] as = new String[] { "foo" };
as[0] = "bar"; // we're changing `as` in-place - not the Strings stored in it
System.out.println(as[0]); // prints "bar"
Primitive types are not a great choice of example for Java, since you can't have multiple references to them, and there's no way to demonstrate the distinction between a mutation and a reassignment.

It's awkward to talk about immutability of ints, because the idea of mutating something that isn't a container doesn't make sense to most of us. So let's talk about strings.
Here's a string, in Python:
s = "abc"
Strings are containers in the sense that they contain some number of individual characters: here a, b, and c. If I want to change the second character to a d, I might try:
s[1] = 'd'
Which will fail with a TypeError. We say strings are immutable in Python because there is no operation that will alter an existing string. Certainly there are plenty of operations that will perform some operation and create a new string, but existing strings are set in stone.
There are a couple advantages here. One is that it allows interning: sometimes when a string needs allocating (and at the discretion of the interpreter), CPython will notice that an identical string has already been allocated and just reuse the same str object. This is easiest when strings are immutable—otherwise, you'd have to do something about problems like this:
s = "abc"
t = "abc" # this reuses the same memory, as an optimization
s[0] = "x" # oops! now t has changed, too!
Interning is particularly useful in Python and similar languages that support runtime reflection: it has to know the name of every function and method at runtime, and a great many methods have builtin names like __init__ (the name of the constructor method), so reusing the same string object for all those identical names saves a good deal of wasted space.
The other advantage is in semantics: you can safely pass strings to arbitrary functions without worrying that they'll be changed in-place behind your back. Functional programmers appreciate this kind of thing.
The disadvantage, of course, is that doing a lot of work with very large strings requires reallocating and rebuilding those large strings many times over, instead of making small edits in-place.
Now, about ints. This is NOT an example of immutability:
x = 3
x = 4
This doesn't involve the actual objects at all; it only assigns a new value to the variable x.
Consider instead:
x = [1, 2, 3]
y = x
x[:] = [4, 5, 6]
print y # [4, 5, 6]
The x[:] = syntax is "slice notation" for replacing the entire contents of a list. Here, x and y are two names for the same list. So when you replace the contents of x, you also see the same effect in y, because... they both name the same list. (This is different from reference variables in other languages: you can assign a new value to either x or y without affecting the other.)
Consider this with numbers. If you could do some hypothetical operation like the above on plain numbers, this would happen:
x = 3
y = x
x[:] = 4
print y # hypothetically, 4
But you can't do that. You can't change the number an existing int represents. So we call them immutable.
Mutating an int is easy in Smalltalk:
3 become: 4
This would change the 3 to a 4, overwriting the memory that previously contained a 3. If ints are interned (as they can be in Python), this could even mean that everywhere 3 appears in your source code, it acts like the number 4.
In C, these distinctions aren't as meaningful, because variables are fixed blocks of memory rather than the transient labels of Python. So when you do this:
int x = 3;
x = 4;
It's hard to say definitively whether this is "mutating" an int. It does overwrite existing memory, but that's also just how C variable assignment works.
Anyway! Mutability is just about whether you're altering an existing object or replacing it with a new one. In Python and Java, you can't alter existing strings, and you can't "alter" numbers, so we call them immutable. You're free to change the contents of lists and arrays in-place without creating new ones, so they're mutable.

What is immutable is highly language-dependent, but an immutable object is simply an object that cannot be changed after it is created.
What this usually means is that:
int x = 4;
x = 5;//not 'allowed'
This is seen in languages where primitives, such as an int, can be immutable (such as functional languages like Scala).
Most objects in OOP are actually pointers to a place in memory. If that object is immutable that location in memory cannot have its contents changed. In the case of a String in Java, we see this happening:
String a = "Hello"; //points to some memory location, lets say '0x00001'
a = a + " World!"; //points to a new locations, lets say '0x00002'
System.out.println(a);//prints the contents of memory location '0x00002'
In this case, a actually points to an entirely different place in memory after line 2. What this means is that another thread with a different scope that has handed a would not see "Hello World!" but instead "Hello":
String a = "Hello";
startThread(a, " Hello!");//starts some thread and passes a to it
startThread(b, " World!");//starts another thread and passes a to it
...
public void methodInThread(String a, String b) {
a = a + b;
System.out.println(a);
}
These two threads will output the following, regardless of the order they're called in:
"Hello Hello!" //thread 1
"Hello World!" //thread 2

An object is considered immutable if its state cannot change after it is constructed.
source : http://docs.oracle.com/javase/tutorial/essential/concurrency/immutable.html

Typically it means you can't call a method on the type (int or whatever) that will change a
Sometimes people refer to value types as being immutable
//theres no way for this to be mutable but this is an example of a value type
int a = 5
int b = a;
b=9
a does not change unlike class types like
MyClass a = new MyClass
MyClass b = a
b.DoSomething()
//a is now changed

An immutable object is some thing that once instantiated can not be modified. If you have to modify, a new object will be created and pointed to the reference.
And ints are not immutable.

There are some classes in java which are immutable like String, All Wrapper Class ie. Integer, Float, Long etc.
For Example:
Integer i=5;
i=10;
i=15;
When Integer i=5, here a new Integer object is created, then in the 2nd, i=10 rather assigning this value 10 to previously created object, a another new object is created and assign to i, and 3rd i=15 , here again new object is created and again is assigned to i.
Note: don't be confused with int with Integer. int is primitive type and Integer is wrapper class. All primitives are mutable.

The concepts of mutability and immutability are only relevant for things to which code may hold a reference. If one holds a reference to something, and some immutable aspect of that thing's state is observed to have some value (or state), then as long as that reference exists, that aspect of the thing's state may always be observed to have the same value (state).
The String type in Java may reasonably be described as immutable, because code which has a reference to a string and observes that it contains the characters "Hello" may examine it at any time and will always observe that it contain those characters. By contrast, a Char[] might in one moment be observed to contain the letters "Hello" but at some later time be observed to contain the letters "Jello". Thus, a Char[] is considered mutable.
Because it is not possible to hold a direct reference to an int in Java, the concepts of mutability and immutability are not really applicable to that type. One can, however, hold a reference to an Integer, for which they are relevant. Any such reference that is observed to have a particular value will always have that same value. Thus, Integer is immutable. Note that while the concepts of mutability and immutability aren't really applicable to value types like int, they do share a useful aspect of immutable types: the state represented by a storage location (variable, field, or array element) of either a primitive type or an immutable type is guaranteed not to change except by overwriting that location with either a new value or a reference to a different immutable object.

Why should I not instantiate a string object with a constructor?

In Java, apparently, String s = "foo" is preferred over String s = new String("foo").
Why? Isn't a new string object created in both cases? Why would the first case preclude calling a constructor?

Why?
Because the second approach results in two string objects (the original due to the string literal, plus an explicit copy).

The first case is a string literal, simply a shorthand the language offers you to create a string. The String class constructor still gets called, just not explicitly, which means less typing and less code clutter.
The second case takes the String object already created by the literal and passes it to a constructor, which copies the content to create a new, separate String object. The literal will still be around because literals are interned.
There is rarely a point to using the String constructor (pretty much only when you've created a substring of a very large string and want to release the memory used by the rest of the string, because substrings by default use the same underlying char array as the original string, just with a different offset and length.

I don't think it's preferable. I assume the only "benefit" you get is that if you wrongfully use the "==" operator rather than the equals method, have two different instances of a string will fail faster which will prompt you to fix your code. (the == operator may "succeed" and fail unpredictably)
Unless of course your code requires you to construct two different instances for whatever reason

Why? Isn't a new string object created in both cases?
No, the initial form being a string literal will be interned such that only one instance is created:
String s = "foo";
String s2 = "foo";
s == s2 => true

about String a= "hello"; String b= "hello" a==b , in java

check the following program:
Run it in sun java hostspot jvm, everything will be "true".
--------updated: got the answer by Stephen and Danie,changed the program to add string intern method-----------
how it will become, if B is separate compiled not together with A, what will happen???, for example , B is compiled and put in a jar, and put its class path when run TestStringEqual ??
Also, is this java compile time optimization, or java run time optimization, or java language specification defined ??
Also, it this program comes the same result on different VMs, or just one VM feature?
thanks
public class TestStringEqual {
public static String HELLO = "hello";
private String m_hello;
public TestStringEqual() {
m_hello = "hello";
}
public static void main(String[] args) {
String a = "hello";
String b = "hello";
System.out.println("string a== string b:" + (a == b));
System.out.println("static memebr ==a:" + (HELLO == a));
System.out.println("instance field ==a:"
+ (new TestStringEqual().getHello() == a));
System.out.println("hello in B ==a:" + (B.B_HELLO == a));
System.out.println("interned new string object in heep==a:"
+ ( new String("hello").intern() == a));
}
public String getHello() {
return this.m_hello;
}
}
class B{
public static final String B_HELLO = "he"+"llo";
}

There is really no mystery about this at all. You just need to know three basic facts about Java:
The '==' operator for object references tests if two object references are the same; i.e if they point to the same object. Reference JLS 15.21.3
All String literals with the same sequence of characters in a Java program will be represented by the same String object. Reference JLS 3.10.5 So (for example) "hello" == "hello" is comparing the same object.
Constant expressions are evaluated at compile time. Reference JLS 15.28. So (for example) "hell" + "o" is evaluated at compile time, and is therefore equivalent to the literal "hello".
These three facts are stated in the Java Language Specifications. They are sufficient to explain the "puzzling" aspects behaviour of your program, without relying on anything else.
The more detailed explanation involving the string pool, string literals being interned by the class loader, the bytecodes emitted by the compiler, etc, etc ... are just implementation details. You don't need to understand these details if you understand what the JLS is saying, and they don't really help to make the JLS clearer (IMO).
Notes:
The definition of what is and what isn't a constant expression is a little involved. Some things that you might imagine to be constant valued, are in fact not. For instance, "hello".length() is not a constant expression. However, a concatenation of two string literals is a constant expression.
The explanation of equality of string literals in the JLS does in fact mention interning as the mechanism by which this property of literals is implemented.

On the JVM level, the LDC (load constant) instruction is used to push a string literal onto the stack. For performance reasons, the string literal isn't stored in the code itself; it's stored in the constant pool of the class. The constant pool is a table which appears at the beginning of a class file containing string literals, numeric literals, field and method descriptors, and a few other things. LDC is followed by a byte specifying the string's index in the constant pool. (If one byte is not large enough, the compiler will use LDC_W, which is followed by a 16-bit offset. Hence the limit of 65,536 constants.)
If the same string literal occurs twice in the same class, javac is smart enough to create only one entry in the constant pool. When a class is loaded, the JVM creates actual String objects from the data in the constant pool. LDCs which contain the same offset into the constant pool will thus cause the same String to be pushed onto the stack. Instructions like IF_ACMPEQ (which checks for reference equality as == does) will then recognize the strings as identical.
See the JVMS for more info.

It's an immutable string (unable to be mutated or changed), not an immune one, though I suppose you could argue that it's immune from change :-)
That means you cannot change the underlying string itself, you can only assign a different string to the variable. So:
string a = "Hello";
a = "Goodbye";
doesn't change the memory where "Hello" is stored, it changes a to point to a different memory location where "Goodbye" is stored.
This allows Java to share strings for efficiency. You can even get cases where strings like "deoxyribonucleic acid" and "acid" may share space, where the latter points to a specific location within the former. Again, this is made possible by the immutable nature of such strings.
In any case, == will check to see if the strings refer to the same underlying object, not something that's often useful. If you want to see if the strings are equal, you should be using String.equals() or one of its variations.

It is fairly simple: the compiler will generate a (bytecode) constant for the string "hello" the first time it encounters it. In normal assembler it would be in the .TEXT section.
The subsequent "hello" strings will then point to that same constant, since there is no need to allocate new space or create a new constant. The reason this is so is because strings are immutable and if one is assigned a new value new memory is needed for it anyway.
It will probably not work on input, i.e. if you let a user input "hello" and ==-compare that to the compile-time hello strings you'll likely get false.

As far as a==b goes, it seems the compiler is making the shortcuts and sharing the same string object. When I declare my varuiables as follows, I get a==b is false.
String a = "hello";
String b = "hell";
String temp = "o";
if (new java.util.Random().nextDouble() < 0.5) b += temp;
else b += "o";
If I do String b = "hell"+"o"; I still get a==b as true.

What is the purpose of the expression "new String(...)" in Java?

While looking at online code samples, I have sometimes come across an assignment of a String constant to a String object via the use of the new operator.
For example:
String s;
...
s = new String("Hello World");
This, of course, compared to
s = "Hello World";
I'm not familiar with this syntax and have no idea what the purpose or effect would be.
Since String constants typically get stored in the constant pool and then in whatever representation the JVM has for dealing with String constants, would anything even be allocated on the heap?

The one place where you may think you want new String(String) is to force a distinct copy of the internal character array, as in
small=new String(huge.substring(10,20))
However, this behavior is unfortunately undocumented and implementation dependent.
I have been burned by this when reading large files (some up to 20 MiB) into a String and carving it into lines after the fact. I ended up with all the strings for the lines referencing the char[] consisting of entire file. Unfortunately, that unintentionally kept a reference to the entire array for the few lines I held on to for a longer time than processing the file - I was forced to use new String() to work around it, since processing 20,000 files very quickly consumed huge amounts of RAM.
The only implementation agnostic way to do this is:
small=new String(huge.substring(10,20).toCharArray());
This unfortunately must copy the array twice, once for toCharArray() and once in the String constructor.
There needs to be a documented way to get a new String by copying the chars of an existing one; or the documentation of String(String) needs to be improved to make it more explicit (there is an implication there, but it's rather vague and open to interpretation).
Pitfall of Assuming what the Doc Doesn't State
In response to the comments, which keep coming in, observe what the Apache Harmony implementation of new String() was:
public String(String string) {
value = string.value;
offset = string.offset;
count = string.count;
}
That's right, no copy of the underlying array there. And yet, it still conforms to the (Java 7) String documentation, in that it:
Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.
The salient piece being "copy of the argument string"; it does not say "copy of the argument string and the underlying character array supporting the string".
Be careful that you program to the documentation and not one implementation.

The only time I have found this useful is in declaring lock variables:
private final String lock = new String("Database lock");
....
synchronized(lock)
{
// do something
}
In this case, debugging tools like Eclipse will show the string when listing what locks a thread currently holds or is waiting for. You have to use "new String", i.e. allocate a new String object, because otherwise a shared string literal could possibly be locked in some other unrelated code.

String s1="foo"; literal will go in StringPool and s1 will refer.
String s2="foo"; this time it will check "foo" literal is already available in StringPool or not as now it exist so s2 will refer the same literal.
String s3=new String("foo"); "foo" literal will be created in StringPool first then through string arg constructor String Object will be created i.e "foo" in the heap due to object creation through new operator then s3 will refer it.
String s4=new String("foo"); same as s3
so System.out.println(s1==s2); //true due to literal comparison.
and System.out.println(s3==s4);// false due to object comparison(s3 and s4 is created at different places in heap)

The sole utility for this constructor described by Software Monkey and Ruggs seems to have disappeared from JDK7.
There is no longer an offset field in class String, and substring always use
Arrays.copyOfRange(char[] original, int from, int to)
to trim the char array for the copy.

Well, that depends on what the "..." is in the example. If it's a StringBuffer, for example, or a byte array, or something, you'll get a String constructed from the data you're passing.
But if it's just another String, as in new String("Hello World!"), then it should be replaced by simply "Hello World!", in all cases. Strings are immutable, so cloning one serves no purpose -- it's just more verbose and less efficient to create a new String object just to serve as a duplicate of an existing String (whether it be a literal or another String variable you already have).
In fact, Effective Java (which I highly recommend) uses exactly this as one of its examples of "Avoid creating unnecessary objects":
As an extreme example of what not to do, consider this statement:
String s = new String("stringette"); **//DON'T DO THIS!**
(Effective Java, Second Edition)

Here is a quote from the book Effective Java Third Edition (Item 17: Minimize Mutability):
A consequence of the fact that immutable objects can be shared freely
is that you never have to make defensive copies of them (Item
50). In fact, you never have to make any copies at all because the
copies would be forever equivalent to the originals. Therefore, you
need not and should not provide a clone method or copy constructor
(Item 13) on an immutable class. This was not well understood in the
early days of the Java platform, so the String class does have a copy
constructor, but it should rarely, if ever, be used.
So It was a wrong decision by Java, since String class is immutable they should not have provided copy constructor for this class, in cases you want to do costly operation on immutable classes, you can use public mutable companion classes which are StringBuilder and StringBuffer in case of String.

Generally, this indicates someone who isn't comfortable with the new-fashioned C++ style of declaring when initialized.
Back in the C days, it wasn't considered good form to define auto variables in an inner scope; C++ eliminated the parser restriction, and Java extended that.
So you see code that has
int q;
for(q=0;q<MAX;q++){
String s;
int ix;
// other stuff
s = new String("Hello, there!");
// do something with s
}
In the extreme case, all the declarations may be at the top of a function, and not in enclosed scopes like the for loop here.
IN general, though, the effect of this is to cause a String ctor to be called once, and the resulting String thrown away. (The desire to avoid this is just what led Stroustrup to allow declarations anywhere in the code.) So you are correct that it's unnecessary and bad style at best, and possibly actually bad.

There are two ways in which Strings can be created in Java. Following are the examples for both the ways:
1) Declare a variable of type String(a class in Java) and assign it to a value which should be put between double quotes. This will create a string in the string pool area of memory.
eg: String str = "JAVA";
2)Use the constructor of String class and pass a string(within double quotes) as an argument.
eg: String s = new String("JAVA");
This will create a new string JAVA in the main memory and also in the string pool if this string is not already present in string pool.

I guess it will depend on the code samples you're seeing.
Most of the times using the class constructor "new String()" in code sample are only to show a very well know java class instead of creating a new one.
You should avoid using it most of the times. Not only because string literals are interned but mainly because string are inmutable. It doesn't make sense have two copies that represent the same object.
While the article mensioned by Ruggs is "interesting" it should not be used unless very specific circumstances, because it could create more damage than good. You'll be coding to an implementation rather than an specification and the same code could not run the same for instance in JRockit, IBM VM, or other.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.