Can anyone explain me what is state and mutable data? - java

In computer science, functional programming is a programming paradigm
that treats computation as the evaluation of mathematical functions
and avoids state and mutable data.
http://en.wikipedia.org/wiki/Functional_programming
Can anyone explain me what is state and mutable data? Can anyone give me examples in either JAVA or JavaScript.

mutable suggest anything that can change, i.e. an int
int a = 0;
System.out.prtinln(a); //prints 0
a = 2;
System.out.prtinln(a); //now prints 2, so its mutable
In java a string is immutable. you cannot change the string value only its reference.
String s1 = "Hello";
System.out.println(s1); //prints Hello
String s2 = s1;
s1 = "Hi";
System.out.println(s2); //prints "Hello" and not "Hi"
State is something which an instance of a class will have (an Object).
If an Object has certain values for its attributes its in a diffrent state then another Object of the same class with diffrent attribute values

State refers collectively to the data stored in the object that determines the current properties of the object. For example, if you have a BankAccount object, the owner of the account and the amount of money in it represent the state of the account.
However, not all state is bad for functional programming, only the mutable+ one is not acceptable. For example, the characters of which a string is composed is that string's state. If you cannot change the content of the string, it is said to have an immutable state. This fits well with functional programming paradigm.
+ mutable is a fancy word for "changeable".

Mutable state is everything that can make a function return a different value, despite it being called with the same arguments.
Simple example in Java:
public static double badSqrt(double x) {
double r = Math.sqrt(x);
if (System.currentTimeMillis() % 42L == 0L) return r;
return r + 0.000000001;
}
This function computes a slightly incorrect result sometimes. We say that badSqrt is impure, because its result does not solely depend on its argument (and constants).
For the person debugging a program that contains calls to badSqrt() or impure functions in general this is a nightmare. Often, the program seems to work, but once in a while, wrong results are delivered. Unless the function is clearly documented or the source code is available, it'll be hard to track the bug.
In such cases, it is said that the behaviour of the functions depends on mutable state. This is state that could be changed (mutated) by completely unrelated parts of the program, or, like in the example, by another program (the operating system).

A classic example of an immutable object is an instance of the Java String class.
String s = “ABC”;
s.toLowerCase();
System.out.println(s);
Output = ABC
This is because s continues referencing its immutable String. If you want to mutate s, a different approach is needed:
s = s.toLowerCase();
This will create a new reference. Now the String s references a new String object that contains "abc".

suppose
int i=5;
now the state of variable i is that , now it has contained value 5.
suppose now i have set i=7;
now the state of variable i is that , now it has contained value 7, it has replaces 5.
if change in value is possible then it is called as-mutable that means
we can change state here.
if change in value is not possible then it is called as-immutable.

Here's the simplest explanation of the difference between state and mutable state. Almost every language has built in methods and functions that we can call to do different things. These methods and functions maintain its own state and we can manipulate it depending on the parameter(s) we use. Different parameters gives different parameters(arguments) can return different values. Now, lets say we make a constructor(JavaScript) to store our own state. We then write a function to manipulate state based on user input, and after we write it we don't change the parameter(s), even when it's called. BUT the content inside state can be changed using setState, resulting in endless changes from the same function without changing the parameter(s).

State data is static. I.e. hypercard. Mutable data resembles more of a binary assembled artificial movement paradigm. QED -Bryan Meluch Purdue 1997 [verb.atim] Web

Related

Why is String immutable in Java?

I was asked in an interview why String is immutable
I answered like this:
When we create a string in java like String s1="hello"; then an
object will be created in string pool(hello) and s1 will be
pointing to hello.Now if again we do String s2="hello"; then
another object will not be created but s2 will point to hello
because JVM will first check if the same object is present in
string pool or not.If not present then only a new one is created else not.
Now if suppose java allows string mutable then if we change s1 to hello world then s2 value will also be hello world so java String is immutable.
Can any body please tell me if my answer is right or wrong?
String is immutable for several reasons, here is a summary:
Security: parameters are typically represented as String in network connections, database connection urls, usernames/passwords etc. If it were mutable, these parameters could be easily changed.
Synchronization and concurrency: making String immutable automatically makes them thread safe thereby solving the synchronization issues.
Caching: when compiler optimizes your String objects, it sees that if two objects have same value (a="test", and b="test") and thus you need only one string object (for both a and b, these two will point to the same object).
Class loading: String is used as arguments for class loading. If mutable, it could result in wrong class being loaded (because mutable objects change their state).
That being said, immutability of String only means you cannot change it using its public API. You can in fact bypass the normal API using reflection. See the answer here.
In your example, if String was mutable, then consider the following example:
String a="stack";
System.out.println(a);//prints stack
a.setValue("overflow");
System.out.println(a);//if mutable it would print overflow
Java Developers decide Strings are immutable due to the following aspect design, efficiency, and security.
Design
Strings are created in a special memory area in java heap known as "String Intern pool". While you creating new String (Not in the case of using String() constructor or any other String functions which internally use the String() constructor for creating a new String object; String() constructor always create new string constant in the pool unless we call the method intern()) variable it searches the pool to check whether is it already exist.
If it is exist, then return reference of the existing String object.
If the String is not immutable, changing the String with one reference will lead to the wrong value for the other references.
According to this article on DZone:
Security
String is widely used as parameter for many java classes, e.g. network connection, opening files, etc. Were String not immutable, a connection or file would be changed and lead to serious security threat.
Mutable strings could cause security problem in Reflection too, as the parameters are strings.
Efficiency
The hashcode of string is frequently used in Java. For example, in a HashMap. Being immutable guarantees that hashcode will always the same, so that it can be cached without worrying the changes.That means, there is no need to calculate hashcode every time it is used.
We can not be sure of what was Java designers actually thinking while designing String but we can only conclude these reasons based on the advantages we get out of string immutability, Some of which are
1. Existence of String Constant Pool
As discussed in Why String is Stored in String Constant Pool article, every application creates too many string objects and in order to save JVM from first creating lots of string objects and then garbage collecting them. JVM stores all string objects in a separate memory area called String constant pool and reuses objects from that cached pool.
Whenever we create a string literal JVM first sees if that literal is already present in constant pool or not and if it is there, new reference will start pointing to the same object in SCP.
String a = "Naresh";
String b = "Naresh";
String c = "Naresh";
In above example string object with value Naresh will get created in SCP only once and all reference a, b, c will point to the same object but what if we try to make change in a e.g. a.replace("a", "").
Ideally, a should have value Nresh but b, c should remain unchanged because as an end user we are making the change in a only. And we know a, b, c all are pointing the same object so if we make a change in a, others should also reflect the change.
But string immutability saves us from this scenario and due to the immutability of string object string object Naresh will never change. So when we make any change in a instead of change in string object Naresh JVM creates a new object assign it to a and then make change in that object.
So String pool is only possible because of String's immutability and if String would not have been immutable, then caching string objects and reusing them would not have a possibility because any variable woulds have changed the value and corrupted others.
And That's why it is handled by JVM very specially and have been given a special memory area.
2. Thread Safety
An object is called thread-safe when multiple threads are operating on it but none of them is able to corrupt its state and object hold the same state for every thread at any point in time.
As we an immutable object cannot be modified by anyone after its creation which makes every immutable object is thread safe by default. We do not need to apply any thread safety measures to it such as creating synchronized methods.
So due to its immutable nature string object can be shared by multiple threads and even if it is getting manipulated by many threads it will not change its value.
3. Security
In every application, we need to pass several secrets e.g. user's user-name\passwords, connection URLs and in general, all of this information is passed as the string object.
Now suppose if String would not have been immutable in nature then it would cause a serious security threat to the application because these values are allowed to get changed and if it is allowed then these might get changed due to wrongly written code or any other person who have access to our variable references.
4. Class Loading
As discussed in Creating objects through Reflection in Java with Example, we can use Class.forName("class_name") method to load a class in memory which again calls other methods to do so. And even JVM uses these methods to load classes.
But if you see clearly all of these methods accepts the class name as a string object so Strings are used in java class loading and immutability provides security that correct class is getting loaded by ClassLoader.
Suppose if String would not have been immutable and we are trying to load java.lang.Object which get changed to org.theft.OurObject in between and now all of our objects have a behavior which someone can use to unwanted things.
5. HashCode Caching
If we are going to perform any hashing related operations on any object we must override the hashCode() method and try to generate an accurate hashcode by using the state of the object. If an object's state is getting changed which means its hashcode should also change.
Because String is immutable so the value one string object is holding will never get changed which means its hashcode will also not change which gives String class an opportunity to cache its hashcode during object creation.
Yes, String object caches its hashcode at the time of object creation which makes it the great candidate for hashing related operations because hashcode doesn't need to be calculated again which save us some time. This is why String is mostly used as HashMap keys.
Read More on Why String is Immutable and Final in Java.
Most important reason according to this article on DZone:
String Constant Pool
...
If string is mutable, changing the string with one reference will lead to the wrong value for the other references.
Security
String is widely used as parameter for many java classes, e.g. network connection, opening files, etc. Were String not immutable, a connection or file would be changed and lead to serious security threat.
...
Hope it will help you.
IMHO, this is the most important reason:
String is Immutable in Java because String objects are cached in
String pool. Since cached String literals are shared between multiple
clients there is always a risk, where one client's action would affect
all another client.
Ref: Why String is Immutable or Final in Java
You are right. String in java uses concept of String Pool literal. When a string is created and if the string already exists in the pool, the reference of the existing string will be returned, instead of creating a new object and returning its reference.If a string is not immutable, changing the string with one reference will lead to the wrong value for the other references.
I would add one more thing, since String is immutable, it is safe for multi threading and a single String instance can be shared across different threads. This avoid the usage of synchronization for thread safety, Strings are implicitly thread safe.
String is given as immutable by Sun micro systems,because string can used to store as key in map collection.
StringBuffer is mutable .That is the reason,It cannot be used as key in map object
The most important reason of a String being made immutable in Java is Security consideration. Next would be Caching.
I believe other reasons given here, such as efficiency, concurrency, design and string pool follows from the fact that String in made immutable. For eg. String Pool could be created because String was immutable and not the other way around.
Check Gosling interview transcript here
From a strategic point of view, they tend to more often be trouble free. And there are usually things you can do with immutables that you can't do with mutable things, such as cache the result. If you pass a string to a file open method, or if you pass a string to a constructor for a label in a user interface, in some APIs (like in lots of the Windows APIs) you pass in an array of characters. The receiver of that object really has to copy it, because they don't know anything about the storage lifetime of it. And they don't know what's happening to the object, whether it is being changed under their feet.
You end up getting almost forced to replicate the object because you don't know whether or not you get to own it. And one of the nice things about immutable objects is that the answer is, "Yeah, of course you do." Because the question of ownership, who has the right to change it, doesn't exist.
One of the things that forced Strings to be immutable was security. You have a file open method. You pass a String to it. And then it's doing all kind of authentication checks before it gets around to doing the OS call. If you manage to do something that effectively mutated the String, after the security check and before the OS call, then boom, you're in. But Strings are immutable, so that kind of attack doesn't work. That precise example is what really demanded that
Strings be immutable
String class is FINAL it mean you can't create any class to inherit it and change the basic structure and make the Sting mutable.
Another thing instance variable and methods of String class that are provided are such that you can't change String object once created.
The reason what you have added doesn't make the String immutable at all.This all says how the String is stored in heap.Also string pool make the huge difference in performance
In addition to the great answers, I wanted to add a few points. Like Strings, Array holds a reference to the starting of the array so if you create two arrays arr1 and arr2 and did something like arr2 = arr1 this will make the reference of arr2 same as arr1 hence changing value in one of them will result in change of the other one for example
public class Main {
public static void main(String[] args) {
int[] a = {1, 2, 3, 4};
int[] b = a;
a[0] = 8;
b[1] = 7;
System.out.println("A: " + a[0] + ", B: " + b[0]);
System.out.println("A: " + a[1] + ", B: " + b[1]);
//outputs
//A: 8, B: 8
//A: 7, B: 7
}
}
Not only that it would cause bugs in the code it also can(and will) be exploited by malicious user. Suppose if you have a system that changes the admin password. The user have to first enter the newPassword and then the oldPassword if the oldPassword is same as the adminPass the program change the password by adminPass = newPassword. let's say that the new password has the same reference as the admin password so a bad programmer may create a temp variable to hold the admin password before the users inputs data if the oldPassword is equal to temp it changes the password otherwise adminPass = temp. Someone knowing that could easily enter the new password and never enter the old password and abracadabra he has admin access. Another thing I didn't understand when learning about Strings why doesn't JVM create a new string for every object and have a unique place in memory for it and you can just do that using new String("str"); The reason you wouldn't want to always use new is because it's not memory efficient and it is slower in most cases read more.
If HELLO is your String then you can't change HELLO to HILLO. This property is called immutability property.
You can have multiple pointer String variable to point HELLO String.
But if HELLO is char Array then you can change HELLO to HILLO. Eg,
char[] charArr = 'HELLO';
char[1] = 'I'; //you can do this
Answer:
Programming languages have immutable data variables so that it can be used as keys in key, value pair. String variables are used as keys/indices, so they are immutable.
This probably has little to do with security because, very differently, security practices recommend using character arrays for passwords, not strings. This is because an array can be immediately erased when no longer needed. Differently, a string cannot be erased, because it is immutable. It may take long time before it is garbage collected, and even more before the content gets overwritten.
I think that immutability was chosen to allow sharing the strings and they fragments easily. String assignment, picking a substring becomes a constant time operation, and string comparison largely also, because of the reusable hash codes that are part of the string data structure and can be compared first.
From the other side, if the original string is huge (say large XML document), picking few symbols from there may prevent the whole document from being garbage collected. Because of that later Java versions seemed moved away from this immutability. Modern C++ has both mutable (std::string) and from C++17 also immutable (std::string_view) versions.
From the Security point of view we can use this practical example:
DBCursor makeConnection(String IP,String PORT,String USER,String PASS,String TABLE) {
// if strings were mutable IP,PORT,USER,PASS can be changed by validate function
Boolean validated = validate(IP,PORT,USER,PASS);
// here we are not sure if IP, PORT, USER, PASS changed or not ??
if (validated) {
DBConnection conn = doConnection(IP,PORT,USER,PASS);
}
// rest of the code goes here ....
}

what do we mean exactly when we say "ints are immutable"

I'm confused on the concept of "being immutable". Our professor is saying "ints are immutable! Strings are immutable" everyday, what does he mean by that exactly?
A more general question, how do we know if a data structure is immutable or not?
Thanks
Some of the other answers here are confusing mutability/immutability with value/reference semantics, so be careful...
Simply put, an entity is mutable if it may be modified after it's been created. In other words, its value may change over time.
First, a counterexample. A Java String object is immutable; there is no method that you can call on a String object that will change its value:
String a = "foo";
a.concat("bar");
System.out.println(a); // foo
You could do this instead:
String a = "foo";
a = a.concat("bar");
System.out.println(a); // foobar
but that works because concat() is creating a new String object, and the reference a is then repointed at it. There are now two String objects; the original has not changed (it's just lost forever). a is mutable, the underlying object isn't.
As for int variables; in C or Java, we can do this:
int x = 3;
x = 4; // Mutates x
x++; // Mutates x
How do we know these really mutate x, rather than simply creating a new integer "object" and "repointing" x at it? (Other than by the fact that the language assures us that primitive types are distinct from object types.) In C, we can somewhat prove it:
int x = 3;
int *p = x; // Pointer to original entity
x = 4;
printf("%d\n", *p); // 4
AFAIK, there is no equivalent approach in Java. So you could argue that the question of whether integer types are truly mutable in Java is irrelevant.
As for how we know whether a given type is immutable, very often we don't. At least, not without inspecting it, or simply believing a promise we've been told.
In Java, ensuring a user-defined type is immutable involves following a few simple rules (explained here). But it's still just a promise; the language doesn't enforce it.
Immutability (of an object or value, not a variable) usually means there's no way to do an in-place change of the value. (One that would propagate to other references to it.) This means that if you have something like the following:
String a = "foo";
There is no operation that you could perform on a that would change its value. I.e. you can't have a hypothetical method append() that would cause the following behaviour:
String a = "foo";
a.append("bar"); // a is not reassigned
System.out.println(a); // prints "foobar"
You can contrast this with mutable objects like collections:
int[] as = new String[] { "foo" };
as[0] = "bar"; // we're changing `as` in-place - not the Strings stored in it
System.out.println(as[0]); // prints "bar"
Primitive types are not a great choice of example for Java, since you can't have multiple references to them, and there's no way to demonstrate the distinction between a mutation and a reassignment.
It's awkward to talk about immutability of ints, because the idea of mutating something that isn't a container doesn't make sense to most of us. So let's talk about strings.
Here's a string, in Python:
s = "abc"
Strings are containers in the sense that they contain some number of individual characters: here a, b, and c. If I want to change the second character to a d, I might try:
s[1] = 'd'
Which will fail with a TypeError. We say strings are immutable in Python because there is no operation that will alter an existing string. Certainly there are plenty of operations that will perform some operation and create a new string, but existing strings are set in stone.
There are a couple advantages here. One is that it allows interning: sometimes when a string needs allocating (and at the discretion of the interpreter), CPython will notice that an identical string has already been allocated and just reuse the same str object. This is easiest when strings are immutable—otherwise, you'd have to do something about problems like this:
s = "abc"
t = "abc" # this reuses the same memory, as an optimization
s[0] = "x" # oops! now t has changed, too!
Interning is particularly useful in Python and similar languages that support runtime reflection: it has to know the name of every function and method at runtime, and a great many methods have builtin names like __init__ (the name of the constructor method), so reusing the same string object for all those identical names saves a good deal of wasted space.
The other advantage is in semantics: you can safely pass strings to arbitrary functions without worrying that they'll be changed in-place behind your back. Functional programmers appreciate this kind of thing.
The disadvantage, of course, is that doing a lot of work with very large strings requires reallocating and rebuilding those large strings many times over, instead of making small edits in-place.
Now, about ints. This is NOT an example of immutability:
x = 3
x = 4
This doesn't involve the actual objects at all; it only assigns a new value to the variable x.
Consider instead:
x = [1, 2, 3]
y = x
x[:] = [4, 5, 6]
print y # [4, 5, 6]
The x[:] = syntax is "slice notation" for replacing the entire contents of a list. Here, x and y are two names for the same list. So when you replace the contents of x, you also see the same effect in y, because... they both name the same list. (This is different from reference variables in other languages: you can assign a new value to either x or y without affecting the other.)
Consider this with numbers. If you could do some hypothetical operation like the above on plain numbers, this would happen:
x = 3
y = x
x[:] = 4
print y # hypothetically, 4
But you can't do that. You can't change the number an existing int represents. So we call them immutable.
Mutating an int is easy in Smalltalk:
3 become: 4
This would change the 3 to a 4, overwriting the memory that previously contained a 3. If ints are interned (as they can be in Python), this could even mean that everywhere 3 appears in your source code, it acts like the number 4.
In C, these distinctions aren't as meaningful, because variables are fixed blocks of memory rather than the transient labels of Python. So when you do this:
int x = 3;
x = 4;
It's hard to say definitively whether this is "mutating" an int. It does overwrite existing memory, but that's also just how C variable assignment works.
Anyway! Mutability is just about whether you're altering an existing object or replacing it with a new one. In Python and Java, you can't alter existing strings, and you can't "alter" numbers, so we call them immutable. You're free to change the contents of lists and arrays in-place without creating new ones, so they're mutable.
What is immutable is highly language-dependent, but an immutable object is simply an object that cannot be changed after it is created.
What this usually means is that:
int x = 4;
x = 5;//not 'allowed'
This is seen in languages where primitives, such as an int, can be immutable (such as functional languages like Scala).
Most objects in OOP are actually pointers to a place in memory. If that object is immutable that location in memory cannot have its contents changed. In the case of a String in Java, we see this happening:
String a = "Hello"; //points to some memory location, lets say '0x00001'
a = a + " World!"; //points to a new locations, lets say '0x00002'
System.out.println(a);//prints the contents of memory location '0x00002'
In this case, a actually points to an entirely different place in memory after line 2. What this means is that another thread with a different scope that has handed a would not see "Hello World!" but instead "Hello":
String a = "Hello";
startThread(a, " Hello!");//starts some thread and passes a to it
startThread(b, " World!");//starts another thread and passes a to it
...
public void methodInThread(String a, String b) {
a = a + b;
System.out.println(a);
}
These two threads will output the following, regardless of the order they're called in:
"Hello Hello!" //thread 1
"Hello World!" //thread 2
An object is considered immutable if its state cannot change after it is constructed.
source : http://docs.oracle.com/javase/tutorial/essential/concurrency/immutable.html
Typically it means you can't call a method on the type (int or whatever) that will change a
Sometimes people refer to value types as being immutable
//theres no way for this to be mutable but this is an example of a value type
int a = 5
int b = a;
b=9
a does not change unlike class types like
MyClass a = new MyClass
MyClass b = a
b.DoSomething()
//a is now changed
An immutable object is some thing that once instantiated can not be modified. If you have to modify, a new object will be created and pointed to the reference.
And ints are not immutable.
There are some classes in java which are immutable like String, All Wrapper Class ie. Integer, Float, Long etc.
For Example:
Integer i=5;
i=10;
i=15;
When Integer i=5, here a new Integer object is created, then in the 2nd, i=10 rather assigning this value 10 to previously created object, a another new object is created and assign to i, and 3rd i=15 , here again new object is created and again is assigned to i.
Note: don't be confused with int with Integer. int is primitive type and Integer is wrapper class. All primitives are mutable.
The concepts of mutability and immutability are only relevant for things to which code may hold a reference. If one holds a reference to something, and some immutable aspect of that thing's state is observed to have some value (or state), then as long as that reference exists, that aspect of the thing's state may always be observed to have the same value (state).
The String type in Java may reasonably be described as immutable, because code which has a reference to a string and observes that it contains the characters "Hello" may examine it at any time and will always observe that it contain those characters. By contrast, a Char[] might in one moment be observed to contain the letters "Hello" but at some later time be observed to contain the letters "Jello". Thus, a Char[] is considered mutable.
Because it is not possible to hold a direct reference to an int in Java, the concepts of mutability and immutability are not really applicable to that type. One can, however, hold a reference to an Integer, for which they are relevant. Any such reference that is observed to have a particular value will always have that same value. Thus, Integer is immutable. Note that while the concepts of mutability and immutability aren't really applicable to value types like int, they do share a useful aspect of immutable types: the state represented by a storage location (variable, field, or array element) of either a primitive type or an immutable type is guaranteed not to change except by overwriting that location with either a new value or a reference to a different immutable object.

rely on java String copy on write

My application creates a lot of instances of a class, say class A. All instance contains a string, and most of them contain the same String
class A {
String myString;
}
I know that JVM makes "all equal strings" point to the same String that is stored just one time. If myString field of one of my A instances is overwritten, the reference to the original string is replaced by the reference to the new String value and all works as expected, that is as if each instance had a copy of the string all for itself.
Is this behaviour required to a compliant JVM, or is it a sort of improvement of the jvm that may change from a jvm to another, or from version to version?
Another way to put the question: when designing higly redundant (string based) data-structures, should one rely only on the copy on write mechanism or it is adviceable to put in place something at the application level?
Another aspect of this is that your Strings will not be the same if they are created dynamically (e.g. allocated by parser). Check out String.intern() if space is a concern:
String a = String.valueOf('a') + "b";
String b = a.intern();
String c = "ab";
// now b == c is true
as #Hot Licks said: strings are immutable so there is no place to talk about copy on write. also when you are using mutable object you have to be aware that 'copy on write' may not be available on your client's environment.
and another thing that may be very important when you create a lot of objects. each object contains a few bytes of header, pointers etc. if i remember correctly empty object is like 20 bytes or so. when you we are talking about a lot of objects containing properties it starts to be significant. be aware of that and when you measure that it is causing the problem then you have to do something at the application level (lightweight design pattern, using stream xml parser etc).
The fact is that String are regular objects.
String a = "test";
String b = a;
Does exactly the same thing as:
StringBuffer a = new StringBuffer("test");
StringBuffer b = a;
that is: in both cases, b is a second reference to a, and this is not due to the immutability.
Immutability comes into play
So, you always handle two pointers to the same data. Now, if the class is immutable, you can forget about it: nobody will change your data under your shoes not because you have a copy for your own, but because the shared copy is immutable. You can even think that you have a copy of the string, but actually a copy has never existed since String b = a; does what it does for each object: a copy of the only reference.

What is the purpose of the expression "new String(...)" in Java?

While looking at online code samples, I have sometimes come across an assignment of a String constant to a String object via the use of the new operator.
For example:
String s;
...
s = new String("Hello World");
This, of course, compared to
s = "Hello World";
I'm not familiar with this syntax and have no idea what the purpose or effect would be.
Since String constants typically get stored in the constant pool and then in whatever representation the JVM has for dealing with String constants, would anything even be allocated on the heap?
The one place where you may think you want new String(String) is to force a distinct copy of the internal character array, as in
small=new String(huge.substring(10,20))
However, this behavior is unfortunately undocumented and implementation dependent.
I have been burned by this when reading large files (some up to 20 MiB) into a String and carving it into lines after the fact. I ended up with all the strings for the lines referencing the char[] consisting of entire file. Unfortunately, that unintentionally kept a reference to the entire array for the few lines I held on to for a longer time than processing the file - I was forced to use new String() to work around it, since processing 20,000 files very quickly consumed huge amounts of RAM.
The only implementation agnostic way to do this is:
small=new String(huge.substring(10,20).toCharArray());
This unfortunately must copy the array twice, once for toCharArray() and once in the String constructor.
There needs to be a documented way to get a new String by copying the chars of an existing one; or the documentation of String(String) needs to be improved to make it more explicit (there is an implication there, but it's rather vague and open to interpretation).
Pitfall of Assuming what the Doc Doesn't State
In response to the comments, which keep coming in, observe what the Apache Harmony implementation of new String() was:
public String(String string) {
value = string.value;
offset = string.offset;
count = string.count;
}
That's right, no copy of the underlying array there. And yet, it still conforms to the (Java 7) String documentation, in that it:
Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.
The salient piece being "copy of the argument string"; it does not say "copy of the argument string and the underlying character array supporting the string".
Be careful that you program to the documentation and not one implementation.
The only time I have found this useful is in declaring lock variables:
private final String lock = new String("Database lock");
....
synchronized(lock)
{
// do something
}
In this case, debugging tools like Eclipse will show the string when listing what locks a thread currently holds or is waiting for. You have to use "new String", i.e. allocate a new String object, because otherwise a shared string literal could possibly be locked in some other unrelated code.
String s1="foo"; literal will go in StringPool and s1 will refer.
String s2="foo"; this time it will check "foo" literal is already available in StringPool or not as now it exist so s2 will refer the same literal.
String s3=new String("foo"); "foo" literal will be created in StringPool first then through string arg constructor String Object will be created i.e "foo" in the heap due to object creation through new operator then s3 will refer it.
String s4=new String("foo"); same as s3
so System.out.println(s1==s2); //true due to literal comparison.
and System.out.println(s3==s4);// false due to object comparison(s3 and s4 is created at different places in heap)
The sole utility for this constructor described by Software Monkey and Ruggs seems to have disappeared from JDK7.
There is no longer an offset field in class String, and substring always use
Arrays.copyOfRange(char[] original, int from, int to)
to trim the char array for the copy.
Well, that depends on what the "..." is in the example. If it's a StringBuffer, for example, or a byte array, or something, you'll get a String constructed from the data you're passing.
But if it's just another String, as in new String("Hello World!"), then it should be replaced by simply "Hello World!", in all cases. Strings are immutable, so cloning one serves no purpose -- it's just more verbose and less efficient to create a new String object just to serve as a duplicate of an existing String (whether it be a literal or another String variable you already have).
In fact, Effective Java (which I highly recommend) uses exactly this as one of its examples of "Avoid creating unnecessary objects":
As an extreme example of what not to do, consider this statement:
String s = new String("stringette"); **//DON'T DO THIS!**
(Effective Java, Second Edition)
Here is a quote from the book Effective Java Third Edition (Item 17: Minimize Mutability):
A consequence of the fact that immutable objects can be shared freely
is that you never have to make defensive copies of them (Item
50). In fact, you never have to make any copies at all because the
copies would be forever equivalent to the originals. Therefore, you
need not and should not provide a clone method or copy constructor
(Item 13) on an immutable class. This was not well understood in the
early days of the Java platform, so the String class does have a copy
constructor, but it should rarely, if ever, be used.
So It was a wrong decision by Java, since String class is immutable they should not have provided copy constructor for this class, in cases you want to do costly operation on immutable classes, you can use public mutable companion classes which are StringBuilder and StringBuffer in case of String.
Generally, this indicates someone who isn't comfortable with the new-fashioned C++ style of declaring when initialized.
Back in the C days, it wasn't considered good form to define auto variables in an inner scope; C++ eliminated the parser restriction, and Java extended that.
So you see code that has
int q;
for(q=0;q<MAX;q++){
String s;
int ix;
// other stuff
s = new String("Hello, there!");
// do something with s
}
In the extreme case, all the declarations may be at the top of a function, and not in enclosed scopes like the for loop here.
IN general, though, the effect of this is to cause a String ctor to be called once, and the resulting String thrown away. (The desire to avoid this is just what led Stroustrup to allow declarations anywhere in the code.) So you are correct that it's unnecessary and bad style at best, and possibly actually bad.
There are two ways in which Strings can be created in Java. Following are the examples for both the ways:
1) Declare a variable of type String(a class in Java) and assign it to a value which should be put between double quotes. This will create a string in the string pool area of memory.
eg: String str = "JAVA";
2)Use the constructor of String class and pass a string(within double quotes) as an argument.
eg: String s = new String("JAVA");
This will create a new string JAVA in the main memory and also in the string pool if this string is not already present in string pool.
I guess it will depend on the code samples you're seeing.
Most of the times using the class constructor "new String()" in code sample are only to show a very well know java class instead of creating a new one.
You should avoid using it most of the times. Not only because string literals are interned but mainly because string are inmutable. It doesn't make sense have two copies that represent the same object.
While the article mensioned by Ruggs is "interesting" it should not be used unless very specific circumstances, because it could create more damage than good. You'll be coding to an implementation rather than an specification and the same code could not run the same for instance in JRockit, IBM VM, or other.

Why can't strings be mutable in Java and .NET?

Why is it that they decided to make String immutable in Java and .NET (and some other languages)? Why didn't they make it mutable?
According to Effective Java, chapter 4, page 73, 2nd edition:
"There are many good reasons for this: Immutable classes are easier to
design, implement, and use than mutable classes. They are less prone
to error and are more secure.
[...]
"Immutable objects are simple. An immutable object can be in
exactly one state, the state in which it was created. If you make sure
that all constructors establish class invariants, then it is
guaranteed that these invariants will remain true for all time, with
no effort on your part.
[...]
Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads
accessing them concurrently. This is far and away the easiest approach
to achieving thread safety. In fact, no thread can ever observe any
effect of another thread on an immutable object. Therefore,
immutable objects can be shared freely
[...]
Other small points from the same chapter:
Not only can you share immutable objects, but you can share their internals.
[...]
Immutable objects make great building blocks for other objects, whether mutable or immutable.
[...]
The only real disadvantage of immutable classes is that they require a separate object for each distinct value.
There are at least two reasons.
First - security http://www.javafaq.nu/java-article1060.html
The main reason why String made
immutable was security. Look at this
example: We have a file open method
with login check. We pass a String to
this method to process authentication
which is necessary before the call
will be passed to OS. If String was
mutable it was possible somehow to
modify its content after the
authentication check before OS gets
request from program then it is
possible to request any file. So if
you have a right to open text file in
user directory but then on the fly
when somehow you manage to change the
file name you can request to open
"passwd" file or any other. Then a
file can be modified and it will be
possible to login directly to OS.
Second - Memory efficiency http://hikrish.blogspot.com/2006/07/why-string-class-is-immutable.html
JVM internally maintains the "String
Pool". To achive the memory
efficiency, JVM will refer the String
object from pool. It will not create
the new String objects. So, whenever
you create a new string literal, JVM
will check in the pool whether it
already exists or not. If already
present in the pool, just give the
reference to the same object or create
the new object in the pool. There will
be many references point to the same
String objects, if someone changes the
value, it will affect all the
references. So, sun decided to make it
immutable.
Actually, the reasons string are immutable in java doesn't have much to do with security. The two main reasons are the following:
Thead Safety:
Strings are extremely widely used type of object. It is therefore more or less guaranteed to be used in a multi-threaded environment. Strings are immutable to make sure that it is safe to share strings among threads. Having an immutable strings ensures that when passing strings from thread A to another thread B, thread B cannot unexpectedly modify thread A's string.
Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.
Performance:
While String interning has been mentioned, it only represents a small gain in memory efficiency for Java programs. Only string literals are interned. This means that only the strings which are the same in your source code will share the same String Object. If your program dynamically creates string that are the same, they will be represented in different objects.
More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:
| myString |
v v
"The quick brown fox jumps over the lazy dog" <-- shared char[]
^ ^
| | myString.substring(0,5)
This makes this kind of operations extremely cheap, and O(1) since the operation neither depends on the length of the original string, nor on the length of the substring we need to extract. This behavior also has some memory benefits, since many strings can share their underlying char[].
Thread safety and performance. If a string cannot be modified it is safe and quick to pass a reference around among multiple threads. If strings were mutable, you would always have to copy all of the bytes of the string to a new instance, or provide synchronization. A typical application will read a string 100 times for every time that string needs to be modified. See wikipedia on immutability.
One should really ask, "why should X be mutable?" It's better to default to immutability, because of the benefits already mentioned by Princess Fluff. It should be an exception that something is mutable.
Unfortunately most of the current programming languages default to mutability, but hopefully in the future the default is more on immutablity (see A Wish List for the Next Mainstream Programming Language).
Wow! I Can't believe the misinformation here. Strings being immutable have nothing with security. If someone already has access to the objects in a running application (which would have to be assumed if you are trying to guard against someone 'hacking' a String in your app), they would certainly be a plenty of other opportunities available for hacking.
It's a quite novel idea that the immutability of String is addressing threading issues. Hmmm ... I have an object that is being changed by two different threads. How do I resolve this? synchronize access to the object? Naawww ... let's not let anyone change the object at all -- that'll fix all of our messy concurrency issues! In fact, let's make all objects immutable, and then we can removed the synchonized contruct from the Java language.
The real reason (pointed out by others above) is memory optimization. It is quite common in any application for the same string literal to be used repeatedly. It is so common, in fact, that decades ago, many compilers made the optimization of storing only a single instance of a String literal. The drawback of this optimization is that runtime code that modifies a String literal introduces a problem because it is modifying the instance for all other code that shares it. For example, it would be not good for a function somewhere in an application to change the String literal "dog" to "cat". A printf("dog") would result in "cat" being written to stdout. For that reason, there needed to be a way of guarding against code that attempts to change String literals (i. e., make them immutable). Some compilers (with support from the OS) would accomplish this by placing String literal into a special readonly memory segment that would cause a memory fault if a write attempt was made.
In Java this is known as interning. The Java compiler here is just following an standard memory optimization done by compilers for decades. And to address the same issue of these String literals being modified at runtime, Java simply makes the String class immutable (i. e, gives you no setters that would allow you to change the String content). Strings would not have to be immutable if interning of String literals did not occur.
String is not a primitive type, yet you normally want to use it with value semantics, i.e. like a value.
A value is something you can trust won't change behind your back.
If you write: String str = someExpr();
You don't want it to change unless YOU do something with str.
String as an Object has naturally pointer semantics, to get value semantics as well it needs to be immutable.
One factor is that, if Strings were mutable, objects storing Strings would have to be careful to store copies, lest their internal data change without notice. Given that Strings are a fairly primitive type like numbers, it is nice when one can treat them as if they were passed by value, even if they are passed by reference (which also helps to save on memory).
I know this is a bump, but...
Are they really immutable?
Consider the following.
public static unsafe void MutableReplaceIndex(string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
...
string s = "abc";
MutableReplaceIndex(s, '1', 0);
MutableReplaceIndex(s, '2', 1);
MutableReplaceIndex(s, '3', 2);
Console.WriteLine(s); // Prints 1 2 3
You could even make it an extension method.
public static class Extensions
{
public static unsafe void MutableReplaceIndex(this string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
}
Which makes the following work
s.MutableReplaceIndex('1', 0);
s.MutableReplaceIndex('2', 1);
s.MutableReplaceIndex('3', 2);
Conclusion: They're in an immutable state which is known by the compiler. Of couse the above only applies to .NET strings as Java doesn't have pointers. However a string can be entirely mutable using pointers in C#. It's not how pointers are intended to be used, has practical usage or is safely used; it's however possible, thus bending the whole "mutable" rule. You can normally not modify an index directly of a string and this is the only way. There is a way that this could be prevented by disallowing pointer instances of strings or making a copy when a string is pointed to, but neither is done, which makes strings in C# not entirely immutable.
For most purposes, a "string" is (used/treated as/thought of/assumed to be) a meaningful atomic unit, just like a number.
Asking why the individual characters of a string are not mutable is therefore like asking why the individual bits of an integer are not mutable.
You should know why. Just think about it.
I hate to say it, but unfortunately we're debating this because our language sucks, and we're trying to using a single word, string, to describe a complex, contextually situated concept or class of object.
We perform calculations and comparisons with "strings" similar to how we do with numbers. If strings (or integers) were mutable, we'd have to write special code to lock their values into immutable local forms in order to perform any kind of calculation reliably. Therefore, it is best to think of a string like a numeric identifier, but instead of being 16, 32, or 64 bits long, it could be hundreds of bits long.
When someone says "string", we all think of different things. Those who think of it simply as a set of characters, with no particular purpose in mind, will of course be appalled that someone just decided that they should not be able to manipulate those characters. But the "string" class isn't just an array of characters. It's a STRING, not a char[]. There are some basic assumptions about the concept we refer to as a "string", and it generally can be described as meaningful, atomic unit of coded data like a number. When people talk about "manipulating strings", perhaps they're really talking about manipulating characters to build strings, and a StringBuilder is great for that. Just think a bit about what the word "string" truly means.
Consider for a moment what it would be like if strings were mutable. The following API function could be tricked into returning information for a different user if the mutable username string is intentionally or unintentionally modified by another thread while this function is using it:
string GetPersonalInfo( string username, string password )
{
string stored_password = DBQuery.GetPasswordFor( username );
if (password == stored_password)
{
//another thread modifies the mutable 'username' string
return DBQuery.GetPersonalInfoFor( username );
}
}
Security isn't just about 'access control', it's also about 'safety' and 'guaranteeing correctness'. If a method can't be easily written and depended upon to perform a simple calculation or comparison reliably, then it's not safe to call it, but it would be safe to call into question the programming language itself.
Immutability is not so closely tied to security. For that, at least in .NET, you get the SecureString class.
Later edit: In Java you will find GuardedString, a similar implementation.
The decision to have string mutable in C++ causes a lot of problems, see this excellent article by Kelvin Henney about Mad COW Disease.
COW = Copy On Write.
It's a trade off. Strings go into the String pool and when you create multiple identical Strings they share the same memory. The designers figured this memory saving technique would work well for the common case, since programs tend to grind over the same strings a lot.
The downside is that concatenations make a lot of extra Strings that are only transitional and just become garbage, actually harming memory performance. You have StringBuffer and StringBuilder (in Java, StringBuilder is also in .NET) to use to preserve memory in these cases.
Strings in Java are not truly immutable, you can change their value's using reflection and or class loading. You should not be depending on that property for security.
For examples see: Magic Trick In Java
Immutability is good. See Effective Java. If you had to copy a String every time you passed it around, then that would be a lot of error-prone code. You also have confusion as to which modifications affect which references. In the same way that Integer has to be immutable to behave like int, Strings have to behave as immutable to act like primitives. In C++ passing strings by value does this without explicit mention in the source code.
There is an exception for nearly almost every rule:
using System;
using System.Runtime.InteropServices;
namespace Guess
{
class Program
{
static void Main(string[] args)
{
const string str = "ABC";
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
var handle = GCHandle.Alloc(str, GCHandleType.Pinned);
try
{
Marshal.WriteInt16(handle.AddrOfPinnedObject(), 4, 'Z');
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
}
finally
{
handle.Free();
}
}
}
}
It's largely for security reasons. It's much harder to secure a system if you can't trust that your Strings are tamperproof.

Categories

Resources