effect of changing String using reflection - java

As we all know, String is immutable in java. however, one can change it using reflection, by getting the Field and setting access level. (I know it is unadvised, I am not planning to do so, this question is pure theoretical).
my question: assuming I know what I am doing (and modify all fields as needed), will the program run properly? or does the jvm makes some optimizations that rely on String being immutable? will I suffer performance loss? if so, what assumption does it make? what will go wrong in the program
p.s. String is just an example, I am interested actually in a general answer, in addition to the example.
thanks!

After compilation some strings may refer to the one instance, so, you will edit more than you want and never know what else are you editing.
public static void main(String args[]) throws Exception {
String s1 = "Hello"; // I want to edit it
String s2 = "Hello"; // It may be anywhere and must not be edited
Field f = String.class.getDeclaredField("value");
f.setAccessible(true);
f.set(s1, "Doesn't say hello".toCharArray());
System.out.println(s2);
}
Output:
Doesn't say hello

You are definitely asking for trouble if you do this. Does that mean you will definitely see bugs right away? No. You might get away with it in a lot of cases, depending on what you're doing.
Here are a couple of cases where it would bite you:
You modify a string that happens to have been declared as literal somewhere within the code. For example you have a function and somewhere it is being called like function("Bob"); in this scenario the string "Bob" is changed throughout your app (this will also be true of string constants declared as final).
You modify a string which is used in substring operations, or which is the result of a substring operation. In Java, taking a substring of a string actually uses the same underlying character array as the source string, which means modifications to the source string will affect substrings (and vice versa).
You modify a string that happens to be used as a key in a map somewhere. It will no longer compare equal to its original value, so lookups will fail.
I know this question is about Java, but I wrote a blog post a while back illustrating just how insane your program may behave if you mutate a string in .NET. The situations are really quite similar.

The thing that jumps to mind for me is string interning - literals, anything in the constant pool and anything manually intern()ed points to the same string object. If you start messing around with the contents of an interned string literal, you may well see the exact same alterations on all the other literals using the same underlying object.
I'm not sure whether the above actually happens since I've never tried (in theory it will, I don't know if something happens under the scene to stop it but I doubt it) but it's things like that that could throw up potential problems. Of course, it could also throw up problems at the Java level through just passing multiple references to the same string around and then using a reflection attack to alter the object from one of the references. Most people (me included!) won't explicitly guard against that sort of thing in code, so using that attack with any code that's not your own, or your own code if you haven't guarded against that either, could cause all sorts of bizarre, horrible bugs.
It's an interesting area theoretically, but the more you dig around the more you see why anything along these lines is a bad idea!
Speaking outside of string, there's no performance enhancements I know of for an object being immutable (indeed I don't think the JVM can even tell at the moment whether an object is immutable, reflection attacks aside.) It could throw things like the checker-framework off though or anything that tries to statically analyse the code to guarantee it's immutable.

I'm pretty sure The JVM itself makes no assumptions about the immutability of Strings, as "immutability" in Java is not a language-level construct; it's a trait implied by a class's implementation, but cannot, as you note, be actually guaranteed in the presence of reflection. Thus, it also shouldn't be relevant to performance.
However, pretty much all Java code in existence (including the Standard API implementation) relies on Strings being immutable, and if you break that expectation, you'll see all kinds of bugs.

The private fields in the String class are the char[], the offset and length. Changing any of these should not have any adverse effect on any other object. But if you can somehow change the contents of the char[], then you could probably see some surprising side effects.

public static void main(String args[]){
String a = "test213";
String s = new String("test213");
try {
System.out.println(s);
System.out.println(a);
char[] value = (char[])getFieldValue(s, "value");
value[1] = 'a';
System.out.println(s);
System.out.println(a);
} catch (Exception e) {
e.printStackTrace();
}
}
static Object getFieldValue(String s,String fieldName) throws SecurityException, NoSuchFieldException, IllegalArgumentException, IllegalAccessException {
Object chars = null;
Field innerCharArray = String.class.getDeclaredField(fieldName);
innerCharArray.setAccessible(true);
chars = innerCharArray.get(s);
return chars;
}
Changing value of S will change the literal of a as mentioned in all answers.

To demonstrate how can it screw up a program:
System.out.print("Initial: "); System.out.println(addr);
editIntStr("ADDR_PLACEH", "192.168.1.1");
System.out.print("From var: "); System.out.println(addr);//
System.out.print("Hardcoded: "); System.out.println("ADDR_PLACEH");
System.out.print("Substring: "); System.out.println("ADDR_PLACE" + "H".substring(0));
System.out.print("Equals test: "); System.out.println("ADDR_PLACEH".equals("192.168.1.1"));
System.out.print("Equals test with substring: "); System.out.println(("ADDR_PLACE" + "H".substring(0)).equals("192.168.1.1"));
Output:
Initial: ADDR_PLACEH
From var: 192.168.1.1
Hardcoded: 192.168.1.1
Substring: ADDR_PLACEH
Equals test: true
Equals test with substring: false
The result of the first Equals test is weird, isn't it? You can't expect your fellow programmers to figure out why is Java thinking they are equal...
Full test code: http://pastebin.com/vbstfWX1

Related

"Immutable" strings in Java - actually, it's a lie [duplicate]

We all know that String is immutable in Java, but check the following code:
String s1 = "Hello World";
String s2 = "Hello World";
String s3 = s1.substring(6);
System.out.println(s1); // Hello World
System.out.println(s2); // Hello World
System.out.println(s3); // World
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
char[] value = (char[])field.get(s1);
value[6] = 'J';
value[7] = 'a';
value[8] = 'v';
value[9] = 'a';
value[10] = '!';
System.out.println(s1); // Hello Java!
System.out.println(s2); // Hello Java!
System.out.println(s3); // World
Why does this program operate like this? And why is the value of s1 and s2 changed, but not s3?
String is immutable* but this only means you cannot change it using its public API.
What you are doing here is circumventing the normal API, using reflection. The same way, you can change the values of enums, change the lookup table used in Integer autoboxing etc.
Now, the reason s1 and s2 change value, is that they both refer to the same interned string. The compiler does this (as mentioned by other answers).
The reason s3 does not was actually a bit surprising to me, as I thought it would share the value array (it did in earlier version of Java, before Java 7u6). However, looking at the source code of String, we can see that the value character array for a substring is actually copied (using Arrays.copyOfRange(..)). This is why it goes unchanged.
You can install a SecurityManager, to avoid malicious code to do such things. But keep in mind that some libraries depend on using these kind of reflection tricks (typically ORM tools, AOP libraries etc).
*) I initially wrote that Strings aren't really immutable, just "effective immutable". This might be misleading in the current implementation of String, where the value array is indeed marked private final. It's still worth noting, though, that there is no way to declare an array in Java as immutable, so care must be taken not to expose it outside its class, even with the proper access modifiers.
As this topic seems overwhelmingly popular, here's some suggested further reading: Heinz Kabutz's Reflection Madness talk from JavaZone 2009, which covers a lot of the issues in the OP, along with other reflection... well... madness.
It covers why this is sometimes useful. And why, most of the time, you should avoid it. :-)
In Java, if two string primitive variables are initialized to the same literal, it assigns the same reference to both variables:
String Test1="Hello World";
String Test2="Hello World";
System.out.println(test1==test2); // true
That is the reason the comparison returns true. The third string is created using substring() which makes a new string instead of pointing to the same.
When you access a string using reflection, you get the actual pointer:
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
So change to this will change the string holding a pointer to it, but as s3 is created with a new string due to substring() it would not change.
You are using reflection to circumvent the immutability of String - it's a form of "attack".
There are lots of examples you can create like this (eg you can even instantiate a Void object too), but it doesn't mean that String is not "immutable".
There are use cases where this type of code may be used to your advantage and be "good coding", such as clearing passwords from memory at the earliest possible moment (before GC).
Depending on the security manager, you may not be able to execute your code.
You are using reflection to access the "implementation details" of string object. Immutability is the feature of the public interface of an object.
Visibility modifiers and final (i.e. immutability) are not a measurement against malicious code in Java; they are merely tools to protect against mistakes and to make the code more maintainable (one of the big selling points of the system). That is why you can access internal implementation details like the backing char array for Strings via reflection.
The second effect you see is that all Strings change while it looks like you only change s1. It is a certain property of Java String literals that they are automatically interned, i.e. cached. Two String literals with the same value will actually be the same object. When you create a String with new it will not be interned automatically and you will not see this effect.
#substring until recently (Java 7u6) worked in a similar way, which would have explained the behaviour in the original version of your question. It didn't create a new backing char array but reused the one from the original String; it just created a new String object that used an offset and a length to present only a part of that array. This generally worked as Strings are immutable - unless you circumvent that. This property of #substring also meant that the whole original String couldn't be garbage collected when a shorter substring created from it still existed.
As of current Java and your current version of the question there is no strange behaviour of #substring.
String immutability is from the interface perspective. You are using reflection to bypass the interface and directly modify the internals of the String instances.
s1 and s2 are both changed because they are both assigned to the same "intern" String instance. You can find out a bit more about that part from this article about string equality and interning. You might be surprised to find out that in your sample code, s1 == s2 returns true!
Which version of Java are you using? From Java 1.7.0_06, Oracle has changed the internal representation of String, especially the substring.
Quoting from Oracle Tunes Java's Internal String Representation:
In the new paradigm, the String offset and count fields have been removed, so substrings no longer share the underlying char [] value.
With this change, it may happen without reflection (???).
There are really two questions here:
Are strings really immutable?
Why is s3 not changed?
To point 1: Except for ROM there is no immutable memory in your computer. Nowadays even ROM is sometimes writable. There is always some code somewhere (whether it's the kernel or native code sidestepping your managed environment) that can write to your memory address. So, in "reality", no they are not absolutely immutable.
To point 2: This is because substring is probably allocating a new string instance, which is likely copying the array. It is possible to implement substring in such a way that it won't do a copy, but that doesn't mean it does. There are tradeoffs involved.
For example, should holding a reference to reallyLargeString.substring(reallyLargeString.length - 2) cause a large amount of memory to be held alive, or only a few bytes?
That depends on how substring is implemented. A deep copy will keep less memory alive, but it will run slightly slower. A shallow copy will keep more memory alive, but it will be faster. Using a deep copy can also reduce heap fragmentation, as the string object and its buffer can be allocated in one block, as opposed to 2 separate heap allocations.
In any case, it looks like your JVM chose to use deep copies for substring calls.
To add to the #haraldK's answer - this is a security hack which could lead to a serious impact in the app.
First thing is a modification to a constant string stored in a String Pool. When string is declared as a String s = "Hello World";, it's being places into a special object pool for further potential reusing. The issue is that compiler will place a reference to the modified version at compile time and once the user modifies the string stored in this pool at runtime, all references in code will point to the modified version. This would result into a following bug:
System.out.println("Hello World");
Will print:
Hello Java!
There was another issue I experienced when I was implementing a heavy computation over such risky strings. There was a bug which happened in like 1 out of 1000000 times during the computation which made the result undeterministic. I was able to find the problem by switching off the JIT - I was always getting the same result with JIT turned off. My guess is that the reason was this String security hack which broke some of the JIT optimization contracts.
According to the concept of pooling, all the String variables containing the same value will point to the same memory address. Therefore s1 and s2, both containing the same value of “Hello World”, will point towards the same memory location (say M1).
On the other hand, s3 contains “World”, hence it will point to a different memory allocation (say M2).
So now what's happening is that the value of S1 is being changed (by using the char [ ] value). So the value at the memory location M1 pointed both by s1 and s2 has been changed.
Hence as a result, memory location M1 has been modified which causes change in the value of s1 and s2.
But the value of location M2 remains unaltered, hence s3 contains the same original value.
The reason s3 does not actually change is because in Java when you do a substring the value character array for a substring is internally copied (using Arrays.copyOfRange()).
s1 and s2 are the same because in Java they both refer to the same interned string. It's by design in Java.
String is immutable, but through reflection you're allowed to change the String class. You've just redefined the String class as mutable in real-time. You could redefine methods to be public or private or static if you wanted.
Strings are created in permanent area of the JVM heap memory. So yes, it's really immutable and cannot be changed after being created.
Because in the JVM, there are three types of heap memory:
1. Young generation
2. Old generation
3. Permanent generation.
When any object are created, it goes into the young generation heap area and PermGen area reserved for String pooling.
Here is more detail you can go and grab more information from:
How Garbage Collection works in Java .
[Disclaimer this is a deliberately opinionated style of answer as I feel a more "don't do this at home kids" answer is warranted]
The sin is the line field.setAccessible(true); which says to violate the public api by allowing access to a private field. Thats a giant security hole which can be locked down by configuring a security manager.
The phenomenon in the question are implementation details which you would never see when not using that dangerous line of code to violate the access modifiers via reflection. Clearly two (normally) immutable strings can share the same char array. Whether a substring shares the same array depends on whether it can and whether the developer thought to share it. Normally these are invisible implementation details which you should not have to know unless you shoot the access modifier through the head with that line of code.
It is simply not a good idea to rely upon such details which cannot be experienced without violating the access modifiers using reflection. The owner of that class only supports the normal public API and is free to make implementation changes in the future.
Having said all that the line of code is really very useful when you have a gun held you your head forcing you to do such dangerous things. Using that back door is usually a code smell that you need to upgrade to better library code where you don't have to sin. Another common use of that dangerous line of code is to write a "voodoo framework" (orm, injection container, ...). Many folks get religious about such frameworks (both for and against them) so I will avoid inviting a flame war by saying nothing other than the vast majority of programmers don't have to go there.
String is immutable in nature Because there is no method to modify String object.
That is the reason They introduced StringBuilder and StringBuffer classes
This is a quick guide to everything
// Character array
char[] chr = {'O', 'K', '!'};
// this is String class
String str1 = new String(chr);
// this is concat
str1 = str1.concat("another string's ");
// this is format
System.out.println(String.format(str1 + " %s ", "string"));
// this is equals
System.out.println(str1.equals("another string"));
//this is split
for(String s: str1.split(" ")){
System.out.println(s);
}
// this is length
System.out.println(str1.length());
//gives an score of the total change in the length
System.out.println(str1.compareTo("OK!another string string's"));
// trim
System.out.println(str1.trim());
// intern
System.out.println(str1.intern());
// character at
System.out.println(str1.charAt(5));
// substring
System.out.println(str1.substring(5, 12));
// to uppercase
System.out.println(str1.toUpperCase());
// to lowerCase
System.out.println(str1.toLowerCase());
// replace
System.out.println(str1.replace("another", "hello"));
// output
// OK!another string's string
// false
// OK!another
// string's
// 20
// 7
// OK!another string's
// OK!another string's
// o
// other s
// OK!ANOTHER STRING'S
// ok!another string's
// OK!hello string's

What is the purpose of modifying a string using reflection?

I was reading an article that said that Java strings are not completely immutable. However, in the article's sample code that modifies the string, it makes a call to string.toUpperCase().toCharArray(), which returns a new string. So what's the purpose of going through the process of changing the string if you call toUpperCase() anyway? Here is the code:
public static void toUpperCase(String orig)
{
try
{
Field stringValue = String.class.getDeclaredField("value");
stringValue.setAccessible(true);
stringValue.set(orig, orig.toUpperCase().toCharArray());
}
catch (Exception ex){}
}
Also, I noticed that string.toUpperCase() by itself doesn't work. It needs to be string.toUpperCase().toCharArray(). Is there a reason for this?
What he's doing:
He's acquring some character array that he knows is the right length (such as the uppercase version of the String) and putting it as the backing array of the String. (The backing array is called value inside the String class.)
Why he's doing it:
To illustrate that you could put any char array there you wanted.
Why this is useful:
String is immutable, and this allows you to circumvent the immutability. Of course, this is not recommended to do - EVER. On the contrary, I would not be surprised if he was saying "Watch out, because people could potentially do this to YOUR code. Even if you think your stuff is safe, it might not be!"
The implications of this are wide reaching. Immutable variables are no longer immutable. Final variables are no longer final. Thread safe objects are no longer thread safe. Contracts you thought you could rely upon, you can no longer do so. All because some engineer somewhere had a problem he couldn't fix with normal means, so he delves into reflection to solve it. Don't be 'that guy'.
You'll also note that how the hashCode for that String would now be changed. So, if you've never calculated the hashCode for that String, it's still 0 so you're okay. On the other hand, if you have calculated it, when you go to put it in a HashMap or HashSet, it won't be retrieved.
Consider the following:
import java.util.*;
import java.lang.reflect.*;
class HashTest {
/** Results:
C:\Documents and Settings\glowcoder\My Documents>java HashTest
Orig hash: -804322678
New value: STACKOVERFLOW
Contains orig: true
Contains copy: false
*/
public static void main(String[] args) throws Exception {
Set<String> set = new HashSet<String>();
String str = "StackOverflow";
System.out.println("Orig hash: " + str.hashCode());
set.add(str);
Field stringValue = String.class.getDeclaredField("value");
stringValue.setAccessible(true);
stringValue.set(str, str.toUpperCase().toCharArray()); //
System.out.println("New value: " + str);
String copy = new String(str); // force a copy
System.out.println("Contains orig: " + set.contains(str));
System.out.println("Contains copy: " + set.contains(copy));
}
}
I would bet he is doing this as a warning against bad behavior rather than showing a 'cool' trick.
EDIT: I found the article you're referring to, and the article it is based on. The original article states: "This means that if a class in another package "fiddles" with an interned String, it can cause havoc in your program. Is this a good thing? (You don't need to answer ;-) " I think that makes it quite clear this is more of a protection guide than advice on how to code.
So if you walk away from this thread with only one piece of information, it is that reflection is dangerous, unreliable, and not to be trifled with!
Don't try this at home!
You are subverting String's immutability. There is no good reason to do this.
I think I cannot add to the explanations already provided, so perhaps I can add to the discussion by suggesting how this can be prevented.
You can prevent somebody tampering with your code in these and other unintended ways by means of using a security manager.
public static void main(String args[]){
System.setSecurityManager(new SecurityManager());
String jedi1 = "jedi";
toUpperCase(jedi1);
System.out.println(jedi1);
}
This will generate an exception in the toUpperCase method, provided that you are not granting all privileges to all code bases in the default policy files. (In your current code your exceptions are currently swallowed).
What is the purpose? I'm not sure, ask the one that wrote this stuff. You normally should not do something like this. There is a reason String is immutable.
Here how this method would look if the fields were public, i.e. without reflection:
public static void toUpperCase(String orig) {
orig.value = orig.toUpperCase().toCharArray();
}
As value is of type char[], you can't assign a String to this field - this is why you need the toCharArray call after .toUpperCase(). You will get an exception if you try to do this (I suppose ClassCastException), but the try-catch block there eats it away. (This gets us another lesson: Never use such empty catch blocks.)
Pay attention: This code might not do the correct thing, since the actual data of the original string might not start at the start of the char[]. Since you don't update the offset field, you will get IndexOutOfBoundsExceptions when using such a modified String. Also, the String object caches its hashCode, thus this will be wrong, too.
Here would be a correct way:
public static void toUpperCase(String orig) {
orig.value = orig.toUpperCase().toCharArray();
orig.offset = 0;
orig.hash = 0; // will be recalculated on next `.hashCode()` call.
}
With reflection, it looks like this:
public static void toUpperCase(String orig)
{
try
{
Field stringValue = String.class.getDeclaredField("value");
stringValue.setAccessible(true);
stringValue.set(orig, orig.toUpperCase().toCharArray());
Field stringOffset = String.class.getDeclaredField("offset");
stringOffset.setAccessible(true);
stringOffset.setInt(orig, 0);
Field stringHash = String.class.getDeclaredField("hash");
stringHash.setAccessible(true);
stringHash.setInt(orig, 0);
}
catch (Exception ex){
// at least print the output
ex.printStackTrace();
}
}
1.) Read Bohemian answer.
2.) Strings are internally stored in a char array, that's why you need to call toCharArray to set the field.
By default, String.toUpperCase() leaves the original string intact, whilst returning a new string object.
The function you defined above, edits the contents of the original string object in-place.
You change a final string with reflection for testing. Sometimes that string contains the path to a default location used in the production environment but not suitable for testing. Yet, that variable is referenced by several objects/methods your trigger in your test, and hence during your tests you might want to set it to a particular value.
As others said, it's probably something you don't want to be doing (often/ever).

What are the benefits of using identical String literals instead of a final variable?

I've come across a class that includes multiple uses of a string literal, "foo".
What I'd like to know, is what are the benefits and impact (in terms of object creation, memory usage and speed) of using this approach instead of declaring the String as final and replacing all the literals with the final variable?
For example (although obviously not a real word usage):
private static final String FINAL_STRING = "foo";
public void stringPrinter(){
for(int i=0;i<10;i++){
System.out.println(FINAL_STRING);
}
}
Versus:
public void stringPrinter(){
for(int i=0;i<10;i++){
System.out.println("foo");
}
}
Which is preferable and why (assuming the string value will remain constant)?
Would the above (second) example result in 10 String objects being created or would the JVM realise that only a single literal is actually used, and create a single reference. If so, is there any advantage for declaring the String as final (as in the first example)?
If the interpreted code does replace the string literal with a single reference, does that still apply if the same literal occurs in more than one place:
public void stringPrinter(){
for(int i=0;i<5;i++){
System.out.println("foo"); // first occurence
System.out.println("foo"); // second occurence
}
}
They will be exactly the same. The literal is interned (any compile time constant expression that results in that string shares the same instance as all other constants/literals) in both cases and a smart compiler+runtime should have no trouble reducing both to the most optimized example.
The advantage comes more in maintainability. If you want to change the literal, you would need only change one occurrence with a constant but you would need to search and change every instance if they were included inline.
From the JLS
Compile-time constants of type String are always "interned" so as to share unique instances, using the method String.intern.
So, no, there's gonna be only one string object.
As Mark notes, this is strictly the question of maintainability and not performance.
The advantage is not in performance, but in maintainability and reliability.
Let me take a real example I came across just recently. A programmer created a function that took a String parameter that identified the type of a transaction. Then in the program he did string compares against this type. Like:
if (type.equals("stock"))
{ ... do whatever ... }
Then he called this function, passing it the value "Stock".
Do you notice the difference in capitalization? Neither did the original programmer. It proved to be a fairly subtle bug to figure out, because even looking at both listings, the difference in capitalization didn't strike me.
If instead he had declared a final static, say
final static String stock="stock";
Then the first time he tried to pass in "Stock" instead of "stock", he would have gotten a compile-time error.
Better still in this example would have been to make an enum, but let's assume he actually had to write the string to an output file or something so it had to be a string.
Using final statics gives at least x advantages:
(1) If you mis-spell it, you get a compile-time error, rather than a possibly-subtle run-time error.
(2) A static can assign a meaingful name to a value. Which is more comprehensible:
if (employeeType.equals("R")) ...
or
if (employeeType.equals(EmployeeType.RETIRED)) ...
(3) When there are multiple related values, you can put a group of final statics together at the top of the program, thus informing future readers what all the possible values are. I've had plenty of times when I've seen a function compare a value against two or three literals. And that leaves me wondering: Are there other possible values, or is this it? (Better still is often to have an enum, but that's another story.)
All String literals are kept in a String cache (this is across all classes)
Using a constant can make the code clearer, give the the string some context and make the code easier to maintain esp if the same string appears in multiple places.
Those string literals are internalized, so no new String objects are created in the loop. Using the same literal twice could still be a sign for code smell, though; but not in terms of speed or memory usage.
In the cases you are providing, I believe the biggest reason for having it declared as FINAL_STRING somewhere is to ensure it stays in one centralized location. There will only ever be one instance of that string constant, but the first example is far easier to maintain.

What is the purpose of the expression "new String(...)" in Java?

While looking at online code samples, I have sometimes come across an assignment of a String constant to a String object via the use of the new operator.
For example:
String s;
...
s = new String("Hello World");
This, of course, compared to
s = "Hello World";
I'm not familiar with this syntax and have no idea what the purpose or effect would be.
Since String constants typically get stored in the constant pool and then in whatever representation the JVM has for dealing with String constants, would anything even be allocated on the heap?
The one place where you may think you want new String(String) is to force a distinct copy of the internal character array, as in
small=new String(huge.substring(10,20))
However, this behavior is unfortunately undocumented and implementation dependent.
I have been burned by this when reading large files (some up to 20 MiB) into a String and carving it into lines after the fact. I ended up with all the strings for the lines referencing the char[] consisting of entire file. Unfortunately, that unintentionally kept a reference to the entire array for the few lines I held on to for a longer time than processing the file - I was forced to use new String() to work around it, since processing 20,000 files very quickly consumed huge amounts of RAM.
The only implementation agnostic way to do this is:
small=new String(huge.substring(10,20).toCharArray());
This unfortunately must copy the array twice, once for toCharArray() and once in the String constructor.
There needs to be a documented way to get a new String by copying the chars of an existing one; or the documentation of String(String) needs to be improved to make it more explicit (there is an implication there, but it's rather vague and open to interpretation).
Pitfall of Assuming what the Doc Doesn't State
In response to the comments, which keep coming in, observe what the Apache Harmony implementation of new String() was:
public String(String string) {
value = string.value;
offset = string.offset;
count = string.count;
}
That's right, no copy of the underlying array there. And yet, it still conforms to the (Java 7) String documentation, in that it:
Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.
The salient piece being "copy of the argument string"; it does not say "copy of the argument string and the underlying character array supporting the string".
Be careful that you program to the documentation and not one implementation.
The only time I have found this useful is in declaring lock variables:
private final String lock = new String("Database lock");
....
synchronized(lock)
{
// do something
}
In this case, debugging tools like Eclipse will show the string when listing what locks a thread currently holds or is waiting for. You have to use "new String", i.e. allocate a new String object, because otherwise a shared string literal could possibly be locked in some other unrelated code.
String s1="foo"; literal will go in StringPool and s1 will refer.
String s2="foo"; this time it will check "foo" literal is already available in StringPool or not as now it exist so s2 will refer the same literal.
String s3=new String("foo"); "foo" literal will be created in StringPool first then through string arg constructor String Object will be created i.e "foo" in the heap due to object creation through new operator then s3 will refer it.
String s4=new String("foo"); same as s3
so System.out.println(s1==s2); //true due to literal comparison.
and System.out.println(s3==s4);// false due to object comparison(s3 and s4 is created at different places in heap)
The sole utility for this constructor described by Software Monkey and Ruggs seems to have disappeared from JDK7.
There is no longer an offset field in class String, and substring always use
Arrays.copyOfRange(char[] original, int from, int to)
to trim the char array for the copy.
Well, that depends on what the "..." is in the example. If it's a StringBuffer, for example, or a byte array, or something, you'll get a String constructed from the data you're passing.
But if it's just another String, as in new String("Hello World!"), then it should be replaced by simply "Hello World!", in all cases. Strings are immutable, so cloning one serves no purpose -- it's just more verbose and less efficient to create a new String object just to serve as a duplicate of an existing String (whether it be a literal or another String variable you already have).
In fact, Effective Java (which I highly recommend) uses exactly this as one of its examples of "Avoid creating unnecessary objects":
As an extreme example of what not to do, consider this statement:
String s = new String("stringette"); **//DON'T DO THIS!**
(Effective Java, Second Edition)
Here is a quote from the book Effective Java Third Edition (Item 17: Minimize Mutability):
A consequence of the fact that immutable objects can be shared freely
is that you never have to make defensive copies of them (Item
50). In fact, you never have to make any copies at all because the
copies would be forever equivalent to the originals. Therefore, you
need not and should not provide a clone method or copy constructor
(Item 13) on an immutable class. This was not well understood in the
early days of the Java platform, so the String class does have a copy
constructor, but it should rarely, if ever, be used.
So It was a wrong decision by Java, since String class is immutable they should not have provided copy constructor for this class, in cases you want to do costly operation on immutable classes, you can use public mutable companion classes which are StringBuilder and StringBuffer in case of String.
Generally, this indicates someone who isn't comfortable with the new-fashioned C++ style of declaring when initialized.
Back in the C days, it wasn't considered good form to define auto variables in an inner scope; C++ eliminated the parser restriction, and Java extended that.
So you see code that has
int q;
for(q=0;q<MAX;q++){
String s;
int ix;
// other stuff
s = new String("Hello, there!");
// do something with s
}
In the extreme case, all the declarations may be at the top of a function, and not in enclosed scopes like the for loop here.
IN general, though, the effect of this is to cause a String ctor to be called once, and the resulting String thrown away. (The desire to avoid this is just what led Stroustrup to allow declarations anywhere in the code.) So you are correct that it's unnecessary and bad style at best, and possibly actually bad.
There are two ways in which Strings can be created in Java. Following are the examples for both the ways:
1) Declare a variable of type String(a class in Java) and assign it to a value which should be put between double quotes. This will create a string in the string pool area of memory.
eg: String str = "JAVA";
2)Use the constructor of String class and pass a string(within double quotes) as an argument.
eg: String s = new String("JAVA");
This will create a new string JAVA in the main memory and also in the string pool if this string is not already present in string pool.
I guess it will depend on the code samples you're seeing.
Most of the times using the class constructor "new String()" in code sample are only to show a very well know java class instead of creating a new one.
You should avoid using it most of the times. Not only because string literals are interned but mainly because string are inmutable. It doesn't make sense have two copies that represent the same object.
While the article mensioned by Ruggs is "interesting" it should not be used unless very specific circumstances, because it could create more damage than good. You'll be coding to an implementation rather than an specification and the same code could not run the same for instance in JRockit, IBM VM, or other.

Why can't strings be mutable in Java and .NET?

Why is it that they decided to make String immutable in Java and .NET (and some other languages)? Why didn't they make it mutable?
According to Effective Java, chapter 4, page 73, 2nd edition:
"There are many good reasons for this: Immutable classes are easier to
design, implement, and use than mutable classes. They are less prone
to error and are more secure.
[...]
"Immutable objects are simple. An immutable object can be in
exactly one state, the state in which it was created. If you make sure
that all constructors establish class invariants, then it is
guaranteed that these invariants will remain true for all time, with
no effort on your part.
[...]
Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads
accessing them concurrently. This is far and away the easiest approach
to achieving thread safety. In fact, no thread can ever observe any
effect of another thread on an immutable object. Therefore,
immutable objects can be shared freely
[...]
Other small points from the same chapter:
Not only can you share immutable objects, but you can share their internals.
[...]
Immutable objects make great building blocks for other objects, whether mutable or immutable.
[...]
The only real disadvantage of immutable classes is that they require a separate object for each distinct value.
There are at least two reasons.
First - security http://www.javafaq.nu/java-article1060.html
The main reason why String made
immutable was security. Look at this
example: We have a file open method
with login check. We pass a String to
this method to process authentication
which is necessary before the call
will be passed to OS. If String was
mutable it was possible somehow to
modify its content after the
authentication check before OS gets
request from program then it is
possible to request any file. So if
you have a right to open text file in
user directory but then on the fly
when somehow you manage to change the
file name you can request to open
"passwd" file or any other. Then a
file can be modified and it will be
possible to login directly to OS.
Second - Memory efficiency http://hikrish.blogspot.com/2006/07/why-string-class-is-immutable.html
JVM internally maintains the "String
Pool". To achive the memory
efficiency, JVM will refer the String
object from pool. It will not create
the new String objects. So, whenever
you create a new string literal, JVM
will check in the pool whether it
already exists or not. If already
present in the pool, just give the
reference to the same object or create
the new object in the pool. There will
be many references point to the same
String objects, if someone changes the
value, it will affect all the
references. So, sun decided to make it
immutable.
Actually, the reasons string are immutable in java doesn't have much to do with security. The two main reasons are the following:
Thead Safety:
Strings are extremely widely used type of object. It is therefore more or less guaranteed to be used in a multi-threaded environment. Strings are immutable to make sure that it is safe to share strings among threads. Having an immutable strings ensures that when passing strings from thread A to another thread B, thread B cannot unexpectedly modify thread A's string.
Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.
Performance:
While String interning has been mentioned, it only represents a small gain in memory efficiency for Java programs. Only string literals are interned. This means that only the strings which are the same in your source code will share the same String Object. If your program dynamically creates string that are the same, they will be represented in different objects.
More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:
| myString |
v v
"The quick brown fox jumps over the lazy dog" <-- shared char[]
^ ^
| | myString.substring(0,5)
This makes this kind of operations extremely cheap, and O(1) since the operation neither depends on the length of the original string, nor on the length of the substring we need to extract. This behavior also has some memory benefits, since many strings can share their underlying char[].
Thread safety and performance. If a string cannot be modified it is safe and quick to pass a reference around among multiple threads. If strings were mutable, you would always have to copy all of the bytes of the string to a new instance, or provide synchronization. A typical application will read a string 100 times for every time that string needs to be modified. See wikipedia on immutability.
One should really ask, "why should X be mutable?" It's better to default to immutability, because of the benefits already mentioned by Princess Fluff. It should be an exception that something is mutable.
Unfortunately most of the current programming languages default to mutability, but hopefully in the future the default is more on immutablity (see A Wish List for the Next Mainstream Programming Language).
Wow! I Can't believe the misinformation here. Strings being immutable have nothing with security. If someone already has access to the objects in a running application (which would have to be assumed if you are trying to guard against someone 'hacking' a String in your app), they would certainly be a plenty of other opportunities available for hacking.
It's a quite novel idea that the immutability of String is addressing threading issues. Hmmm ... I have an object that is being changed by two different threads. How do I resolve this? synchronize access to the object? Naawww ... let's not let anyone change the object at all -- that'll fix all of our messy concurrency issues! In fact, let's make all objects immutable, and then we can removed the synchonized contruct from the Java language.
The real reason (pointed out by others above) is memory optimization. It is quite common in any application for the same string literal to be used repeatedly. It is so common, in fact, that decades ago, many compilers made the optimization of storing only a single instance of a String literal. The drawback of this optimization is that runtime code that modifies a String literal introduces a problem because it is modifying the instance for all other code that shares it. For example, it would be not good for a function somewhere in an application to change the String literal "dog" to "cat". A printf("dog") would result in "cat" being written to stdout. For that reason, there needed to be a way of guarding against code that attempts to change String literals (i. e., make them immutable). Some compilers (with support from the OS) would accomplish this by placing String literal into a special readonly memory segment that would cause a memory fault if a write attempt was made.
In Java this is known as interning. The Java compiler here is just following an standard memory optimization done by compilers for decades. And to address the same issue of these String literals being modified at runtime, Java simply makes the String class immutable (i. e, gives you no setters that would allow you to change the String content). Strings would not have to be immutable if interning of String literals did not occur.
String is not a primitive type, yet you normally want to use it with value semantics, i.e. like a value.
A value is something you can trust won't change behind your back.
If you write: String str = someExpr();
You don't want it to change unless YOU do something with str.
String as an Object has naturally pointer semantics, to get value semantics as well it needs to be immutable.
One factor is that, if Strings were mutable, objects storing Strings would have to be careful to store copies, lest their internal data change without notice. Given that Strings are a fairly primitive type like numbers, it is nice when one can treat them as if they were passed by value, even if they are passed by reference (which also helps to save on memory).
I know this is a bump, but...
Are they really immutable?
Consider the following.
public static unsafe void MutableReplaceIndex(string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
...
string s = "abc";
MutableReplaceIndex(s, '1', 0);
MutableReplaceIndex(s, '2', 1);
MutableReplaceIndex(s, '3', 2);
Console.WriteLine(s); // Prints 1 2 3
You could even make it an extension method.
public static class Extensions
{
public static unsafe void MutableReplaceIndex(this string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
}
Which makes the following work
s.MutableReplaceIndex('1', 0);
s.MutableReplaceIndex('2', 1);
s.MutableReplaceIndex('3', 2);
Conclusion: They're in an immutable state which is known by the compiler. Of couse the above only applies to .NET strings as Java doesn't have pointers. However a string can be entirely mutable using pointers in C#. It's not how pointers are intended to be used, has practical usage or is safely used; it's however possible, thus bending the whole "mutable" rule. You can normally not modify an index directly of a string and this is the only way. There is a way that this could be prevented by disallowing pointer instances of strings or making a copy when a string is pointed to, but neither is done, which makes strings in C# not entirely immutable.
For most purposes, a "string" is (used/treated as/thought of/assumed to be) a meaningful atomic unit, just like a number.
Asking why the individual characters of a string are not mutable is therefore like asking why the individual bits of an integer are not mutable.
You should know why. Just think about it.
I hate to say it, but unfortunately we're debating this because our language sucks, and we're trying to using a single word, string, to describe a complex, contextually situated concept or class of object.
We perform calculations and comparisons with "strings" similar to how we do with numbers. If strings (or integers) were mutable, we'd have to write special code to lock their values into immutable local forms in order to perform any kind of calculation reliably. Therefore, it is best to think of a string like a numeric identifier, but instead of being 16, 32, or 64 bits long, it could be hundreds of bits long.
When someone says "string", we all think of different things. Those who think of it simply as a set of characters, with no particular purpose in mind, will of course be appalled that someone just decided that they should not be able to manipulate those characters. But the "string" class isn't just an array of characters. It's a STRING, not a char[]. There are some basic assumptions about the concept we refer to as a "string", and it generally can be described as meaningful, atomic unit of coded data like a number. When people talk about "manipulating strings", perhaps they're really talking about manipulating characters to build strings, and a StringBuilder is great for that. Just think a bit about what the word "string" truly means.
Consider for a moment what it would be like if strings were mutable. The following API function could be tricked into returning information for a different user if the mutable username string is intentionally or unintentionally modified by another thread while this function is using it:
string GetPersonalInfo( string username, string password )
{
string stored_password = DBQuery.GetPasswordFor( username );
if (password == stored_password)
{
//another thread modifies the mutable 'username' string
return DBQuery.GetPersonalInfoFor( username );
}
}
Security isn't just about 'access control', it's also about 'safety' and 'guaranteeing correctness'. If a method can't be easily written and depended upon to perform a simple calculation or comparison reliably, then it's not safe to call it, but it would be safe to call into question the programming language itself.
Immutability is not so closely tied to security. For that, at least in .NET, you get the SecureString class.
Later edit: In Java you will find GuardedString, a similar implementation.
The decision to have string mutable in C++ causes a lot of problems, see this excellent article by Kelvin Henney about Mad COW Disease.
COW = Copy On Write.
It's a trade off. Strings go into the String pool and when you create multiple identical Strings they share the same memory. The designers figured this memory saving technique would work well for the common case, since programs tend to grind over the same strings a lot.
The downside is that concatenations make a lot of extra Strings that are only transitional and just become garbage, actually harming memory performance. You have StringBuffer and StringBuilder (in Java, StringBuilder is also in .NET) to use to preserve memory in these cases.
Strings in Java are not truly immutable, you can change their value's using reflection and or class loading. You should not be depending on that property for security.
For examples see: Magic Trick In Java
Immutability is good. See Effective Java. If you had to copy a String every time you passed it around, then that would be a lot of error-prone code. You also have confusion as to which modifications affect which references. In the same way that Integer has to be immutable to behave like int, Strings have to behave as immutable to act like primitives. In C++ passing strings by value does this without explicit mention in the source code.
There is an exception for nearly almost every rule:
using System;
using System.Runtime.InteropServices;
namespace Guess
{
class Program
{
static void Main(string[] args)
{
const string str = "ABC";
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
var handle = GCHandle.Alloc(str, GCHandleType.Pinned);
try
{
Marshal.WriteInt16(handle.AddrOfPinnedObject(), 4, 'Z');
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
}
finally
{
handle.Free();
}
}
}
}
It's largely for security reasons. It's much harder to secure a system if you can't trust that your Strings are tamperproof.

Categories

Resources