How should I copy Strings in Java?

How should I copy Strings in Java? - java

String s = "hello";
String backup_of_s = s;
s = "bye";
At this point, the backup variable still contains the original value "hello" (this is because of String's immutability right?).
But is it really safe to copy Strings with this method (which is of course not safe to copy regular mutable objects), or is better to write this? :
String s = "hello";
String backup_of_s = new String(s);
s = "bye";
In other words, what's the difference (if any) between these two snippets?
EDIT - the reason why the first snippet is safe:
Let me just explain things with a little more detail, based on the good answers already provided (which were essentially focused on the question of difference of performance between the 2 snippets):
Strings are immutable in Java, which means that a String object cannot be modified after its construction.
Hence,
String s = "hello"; creates a new String instance and assigns its address to s (s being a reference to the instance/object)
String backup_of_s = s; creates a new variable backup_of_s and initializes it so that it references the object currently referenced by s.
Note: String immutability guarantees that this object will not be modified: our backup is safe
Note 2: Java garbage collection mechanism guarantees that this object will not be destroyed as long as it is referenced by at least one variable (backup_of_s in this case)
Finally, s = "bye"; creates another String instance (because of immutability, it's the only way), and modifies the s variable so that it now references the new object.

Since strings are immutable, both versions are safe. The latter, however, is less efficient (it creates an extra object and in some cases copies the character data).
With this in mind, the first version should be preferred.

Strings are immutable objects so you can copy them just coping the reference to them, because the object referenced can't change ...
So you can copy as in your first example without any problem :
String s = "hello";
String backup_of_s = s;
s = "bye";

Your second version is less efficient because it creates an extra string object when there is simply no need to do so.
Immutability means that your first version behaves the way you expect and is thus the approach to be preferred.

Second case is also inefficient in terms of String pool, you have to explicitly call intern() on return reference to make it intern.

String str1="this is a string";
String str2=str1.clone();
How about copy like this?
I think to get a new copy is better, so that the data of str1 won't be affected when str2 is reference and modified in futher action.

Related

rely on java String copy on write

My application creates a lot of instances of a class, say class A. All instance contains a string, and most of them contain the same String
class A {
String myString;
}
I know that JVM makes "all equal strings" point to the same String that is stored just one time. If myString field of one of my A instances is overwritten, the reference to the original string is replaced by the reference to the new String value and all works as expected, that is as if each instance had a copy of the string all for itself.
Is this behaviour required to a compliant JVM, or is it a sort of improvement of the jvm that may change from a jvm to another, or from version to version?
Another way to put the question: when designing higly redundant (string based) data-structures, should one rely only on the copy on write mechanism or it is adviceable to put in place something at the application level?

Another aspect of this is that your Strings will not be the same if they are created dynamically (e.g. allocated by parser). Check out String.intern() if space is a concern:
String a = String.valueOf('a') + "b";
String b = a.intern();
String c = "ab";
// now b == c is true

as #Hot Licks said: strings are immutable so there is no place to talk about copy on write. also when you are using mutable object you have to be aware that 'copy on write' may not be available on your client's environment.
and another thing that may be very important when you create a lot of objects. each object contains a few bytes of header, pointers etc. if i remember correctly empty object is like 20 bytes or so. when you we are talking about a lot of objects containing properties it starts to be significant. be aware of that and when you measure that it is causing the problem then you have to do something at the application level (lightweight design pattern, using stream xml parser etc).

The fact is that String are regular objects.
String a = "test";
String b = a;
Does exactly the same thing as:
StringBuffer a = new StringBuffer("test");
StringBuffer b = a;
that is: in both cases, b is a second reference to a, and this is not due to the immutability.
Immutability comes into play
So, you always handle two pointers to the same data. Now, if the class is immutable, you can forget about it: nobody will change your data under your shoes not because you have a copy for your own, but because the shared copy is immutable. You can even think that you have a copy of the string, but actually a copy has never existed since String b = a; does what it does for each object: a copy of the only reference.

Why should I not instantiate a string object with a constructor?

In Java, apparently, String s = "foo" is preferred over String s = new String("foo").
Why? Isn't a new string object created in both cases? Why would the first case preclude calling a constructor?

Why?
Because the second approach results in two string objects (the original due to the string literal, plus an explicit copy).

The first case is a string literal, simply a shorthand the language offers you to create a string. The String class constructor still gets called, just not explicitly, which means less typing and less code clutter.
The second case takes the String object already created by the literal and passes it to a constructor, which copies the content to create a new, separate String object. The literal will still be around because literals are interned.
There is rarely a point to using the String constructor (pretty much only when you've created a substring of a very large string and want to release the memory used by the rest of the string, because substrings by default use the same underlying char array as the original string, just with a different offset and length.

I don't think it's preferable. I assume the only "benefit" you get is that if you wrongfully use the "==" operator rather than the equals method, have two different instances of a string will fail faster which will prompt you to fix your code. (the == operator may "succeed" and fail unpredictably)
Unless of course your code requires you to construct two different instances for whatever reason

Why? Isn't a new string object created in both cases?
No, the initial form being a string literal will be interned such that only one instance is created:
String s = "foo";
String s2 = "foo";
s == s2 => true

Java immutable strings confusion

If Strings are immutable in Java, then how can we write as:
String s = new String();
s = s + "abc";

Strings are immutable.
That means that an instance of String cannot change.
You're changing the s variable to refer to a different (but still immutable) String instance.

Your string variable is NOT the string. It's a REFERENCE to an instance of String.
See for yourself:
String str = "Test String";
System.out.println( System.identityHashCode(str) ); // INSTANCE ID of the string
str = str + "Another value";
System.out.println( System.identityHashCode(str) ); // Whoa, it's a different string!
The instances the str variable points to are individually immutable, BUT the variable can be pointed to any instance of String you want.
If you don't want it to be possible to reassign str to point to a different string instance, declare it final:
final String str = "Test String";
System.out.println( System.identityHashCode(str) ); // INSTANCE ID of the string
str = str + "Another value"; // BREAKS HORRIBLY

The first answer is absolutely correct. You should mark it as answered.
s = s+"abc" does not append to the s object. it creates a new string that contains the characters from the s object (of which there are none) and "abc".
if string were mutable. it would have methods like append() and other such mutating methods that are on StringBuilder and StringBuffer.
Effective Java by Josh Bloch has excellent discussion on immutable objects and their value.

Immutable Classes are those whose methods can change their fields, for example:
Foo f = new Foo("a");
f.setField("b"); // Now, you are changing the field of class Foo
but in immutable classes, e.g. String, you cannot change the object once you create it, but of course, you can reassign the reference to another object. For example:
String s = "Hello";
s.substring(0,1); // s is still "Hello"
s = s.substring(0,1); // s is now referring to another object whose value is "H"

String s = new String();
Creates a new, immutable, empty string, variable "s" references it.
s = s+"abc";
Creates a new, immutable, string; the concatenation of the empty string and "abc", variable "s" now references this new object.

Just to clarify, when you say s = s+"abc";
That means, create a new String instance (which is composed of s and "abc") then assign that new String instance to s. So the new reference in s is different from the old.
Remember, a variable is effectively a reference to an object at some specific memory location. The object at that location stays at that location, even if you change the variable to refer to a new object at a different location.

String s = new String();
An empty String object ("") is created. And the variable s refers to that object.
s = s + "abc";
"abc" is a string literal (which is nothing but a String object, which is implicitly created and kept in a pool of strings) so that it can be reused (since strings are immutable and thus are constant). But when you do new String() is totally different because you are explicitly creating the object so does not end up in the pool. You can throw is in the pool by something called interning.
So, s + "abc" since at this point concatenation of and empty string ("") and "abc" does not really create a new String object because the end result is "abc" which is already in the pool. So, finally the variable s will refer to the literal "abc" in the pool.

I believe you are all making this much more complicated than it needs to be, and that simply confuses people who are trying to learn!
The primary benefit of making an object immutable in Java is that it can be passed by reference (e.g. to another method or assigned using the assignment operator) without having to worry about downstream changes to the object causing issues in the current method or context. (This is very different than any conversation about the thread safety of an object.)
To illustrate, create an application that passes a String as a parameter to a separate method, and modify the String in that method. Print the String at the end of the called method and then after control returns to the calling method. The Strings will have different values, and that's because they point to different memory locations, a direct result of "changing" the immutable String (creating a new pointer and pointing it to a new value behind the scenes). Then create an application that does the same things except with StringBuffer, which is not immutable. (For example, you can append to the StringBuffer to modify it.) The printed StringBuffers will have the same values, and that is because it is (a) being passed by reference, as Java does with all objects passed to methods as parameters and (b) mutable.
I hope this helps folks who are reading this thread and trying to learn!

what is the advantage of string object as compared to string literal

i want to know where to use string object(in which scenario in my java code).
ok i understood the diff btwn string literal and string object, but i want to know that since java has given us the power to make string object, there must be some reason, at some point string object creation would be useful. so i want to know in which scenario can we prefer string object in place of string literal.

In most situations, you should use String literals to avoid creating unnecessary objects. This is actually Item 5: Avoid creating unnecessary objects of Effective Java:
Item 5: Avoid creating unnecessary objects
It is often appropriate to reuse a
single object instead of creating a
new functionally equivalent object
each time it is needed. Reuse can be
both faster and more stylish. An
object can always be reused if it is
immutable (Item 15). As an extreme
example of what not to do, consider
this statement:
String s = new String("stringette"); // DON'T DO THIS!
The statement creates a new String
instance each time it is executed, and
none of those object creations is
necessary. The argument to the String
constructor ("stringette") is itself a
String instance, functionally
identical to all of the objects
created by the constructor. If this
usage occurs in a loop or in a
frequently invoked method, millions of
String instances can be created
needlessly. The improved version is
simply the following:
String s = "stringette";
This version uses a single String
instance, rather than creating a new
one each time it is executed.
Furthermore, it is guaranteed that the
object will be reused by any other
code running in the same virtual
machine that happens to con- tain the
same string literal [JLS, 3.10.5]
There is however one situation where you want to use the new String(String) constructor: when you want to force a substring to copy to a new underlying character array like in:
String tiny = new String(huge.substring(0, 10));
This will allow the big underlying char[] from the original huge String to be recycled by the GC.

Don't use a new String object if you know what the string is. For example:
String str = new String("foo"); // don't do this
You are thus creating an unnecessary object - once you have a String object created from the literal, and then you create another one, taking the first one as constructor argument.

Contrary to your question, there is a DISADVANTAGE of using a String object compared to String literal.
When you declare a String literal, String s = "foo", the compiler will check for an existing "foo" object on the heap and assign 's' to already existing "foo".
However, if you create a String object, String s = new String("foo"), an entirely new object will be created on the heap (even if there is already an existing "foo"). Strings being immutable this is totally unnecessary.
Here is good reference: http://www.javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html

String a = "ABC";
String b = new String("ABC");
String c = "ABC";
a == b // false
a == c // true
a.equals(b) // true
a.equals(c) // true
The point is that a & c point to the same "ABC" object (JVM magic). Using "new String" creates a new object each time. IMO, using string object is a disadvantage, not an advantage. However, as another poster said, string object is useful for converting byte[], char[], StringBuffer - if you need to do that.

String literals are converted to String objects, and as others pointed out, creating explicit String objects is unnecessary and inperformant, as it defeats String pooling.
However, there is one situation where you want to create new Strings explicitly: If you use just a small part of a very long String. String.substring() prevents the original String from getting GC'd, so you can save memory when you write
String s = new String(veryLongString.substring(1,3));
instead of
String s = veryLongString.substring(1,3);

literal strings are objects created in a String Pool and if they have the same value, they are referencing to the same object.
System.out.println("abc"=="abc"); // the output is true
Meanwhile, string object are real objects in memory and if they have the same value, there's no guarantee that they are referencing to the same object.
String a = new String("abc");
String b = new String("abc");
System.out.println(a==b); // the output is false

What is the purpose of the expression "new String(...)" in Java?

While looking at online code samples, I have sometimes come across an assignment of a String constant to a String object via the use of the new operator.
For example:
String s;
...
s = new String("Hello World");
This, of course, compared to
s = "Hello World";
I'm not familiar with this syntax and have no idea what the purpose or effect would be.
Since String constants typically get stored in the constant pool and then in whatever representation the JVM has for dealing with String constants, would anything even be allocated on the heap?

The one place where you may think you want new String(String) is to force a distinct copy of the internal character array, as in
small=new String(huge.substring(10,20))
However, this behavior is unfortunately undocumented and implementation dependent.
I have been burned by this when reading large files (some up to 20 MiB) into a String and carving it into lines after the fact. I ended up with all the strings for the lines referencing the char[] consisting of entire file. Unfortunately, that unintentionally kept a reference to the entire array for the few lines I held on to for a longer time than processing the file - I was forced to use new String() to work around it, since processing 20,000 files very quickly consumed huge amounts of RAM.
The only implementation agnostic way to do this is:
small=new String(huge.substring(10,20).toCharArray());
This unfortunately must copy the array twice, once for toCharArray() and once in the String constructor.
There needs to be a documented way to get a new String by copying the chars of an existing one; or the documentation of String(String) needs to be improved to make it more explicit (there is an implication there, but it's rather vague and open to interpretation).
Pitfall of Assuming what the Doc Doesn't State
In response to the comments, which keep coming in, observe what the Apache Harmony implementation of new String() was:
public String(String string) {
value = string.value;
offset = string.offset;
count = string.count;
}
That's right, no copy of the underlying array there. And yet, it still conforms to the (Java 7) String documentation, in that it:
Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.
The salient piece being "copy of the argument string"; it does not say "copy of the argument string and the underlying character array supporting the string".
Be careful that you program to the documentation and not one implementation.

The only time I have found this useful is in declaring lock variables:
private final String lock = new String("Database lock");
....
synchronized(lock)
{
// do something
}
In this case, debugging tools like Eclipse will show the string when listing what locks a thread currently holds or is waiting for. You have to use "new String", i.e. allocate a new String object, because otherwise a shared string literal could possibly be locked in some other unrelated code.

String s1="foo"; literal will go in StringPool and s1 will refer.
String s2="foo"; this time it will check "foo" literal is already available in StringPool or not as now it exist so s2 will refer the same literal.
String s3=new String("foo"); "foo" literal will be created in StringPool first then through string arg constructor String Object will be created i.e "foo" in the heap due to object creation through new operator then s3 will refer it.
String s4=new String("foo"); same as s3
so System.out.println(s1==s2); //true due to literal comparison.
and System.out.println(s3==s4);// false due to object comparison(s3 and s4 is created at different places in heap)

The sole utility for this constructor described by Software Monkey and Ruggs seems to have disappeared from JDK7.
There is no longer an offset field in class String, and substring always use
Arrays.copyOfRange(char[] original, int from, int to)
to trim the char array for the copy.

Well, that depends on what the "..." is in the example. If it's a StringBuffer, for example, or a byte array, or something, you'll get a String constructed from the data you're passing.
But if it's just another String, as in new String("Hello World!"), then it should be replaced by simply "Hello World!", in all cases. Strings are immutable, so cloning one serves no purpose -- it's just more verbose and less efficient to create a new String object just to serve as a duplicate of an existing String (whether it be a literal or another String variable you already have).
In fact, Effective Java (which I highly recommend) uses exactly this as one of its examples of "Avoid creating unnecessary objects":
As an extreme example of what not to do, consider this statement:
String s = new String("stringette"); **//DON'T DO THIS!**
(Effective Java, Second Edition)

Here is a quote from the book Effective Java Third Edition (Item 17: Minimize Mutability):
A consequence of the fact that immutable objects can be shared freely
is that you never have to make defensive copies of them (Item
50). In fact, you never have to make any copies at all because the
copies would be forever equivalent to the originals. Therefore, you
need not and should not provide a clone method or copy constructor
(Item 13) on an immutable class. This was not well understood in the
early days of the Java platform, so the String class does have a copy
constructor, but it should rarely, if ever, be used.
So It was a wrong decision by Java, since String class is immutable they should not have provided copy constructor for this class, in cases you want to do costly operation on immutable classes, you can use public mutable companion classes which are StringBuilder and StringBuffer in case of String.

Generally, this indicates someone who isn't comfortable with the new-fashioned C++ style of declaring when initialized.
Back in the C days, it wasn't considered good form to define auto variables in an inner scope; C++ eliminated the parser restriction, and Java extended that.
So you see code that has
int q;
for(q=0;q<MAX;q++){
String s;
int ix;
// other stuff
s = new String("Hello, there!");
// do something with s
}
In the extreme case, all the declarations may be at the top of a function, and not in enclosed scopes like the for loop here.
IN general, though, the effect of this is to cause a String ctor to be called once, and the resulting String thrown away. (The desire to avoid this is just what led Stroustrup to allow declarations anywhere in the code.) So you are correct that it's unnecessary and bad style at best, and possibly actually bad.

There are two ways in which Strings can be created in Java. Following are the examples for both the ways:
1) Declare a variable of type String(a class in Java) and assign it to a value which should be put between double quotes. This will create a string in the string pool area of memory.
eg: String str = "JAVA";
2)Use the constructor of String class and pass a string(within double quotes) as an argument.
eg: String s = new String("JAVA");
This will create a new string JAVA in the main memory and also in the string pool if this string is not already present in string pool.

I guess it will depend on the code samples you're seeing.
Most of the times using the class constructor "new String()" in code sample are only to show a very well know java class instead of creating a new one.
You should avoid using it most of the times. Not only because string literals are interned but mainly because string are inmutable. It doesn't make sense have two copies that represent the same object.
While the article mensioned by Ruggs is "interesting" it should not be used unless very specific circumstances, because it could create more damage than good. You'll be coding to an implementation rather than an specification and the same code could not run the same for instance in JRockit, IBM VM, or other.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.