Why Wrapper Classes do not have pool Similar to Stringpool? - java

Integer, Character, Double, etc. -- all these are immutable classes like String. String has Stringpool to save memory but why don't these wrappers have similar pools?
I have checked: Integer has a similar pool only up to 127, but not more than that.

Unless someone can find a design document from Gosling, et. al., circa 1994 or so that specifically addresses this, it's impossible to say for certain.
One likely reason is that the complexity and overhead weren't deemed worth the benefit. Strings are A) a lot bigger and B) a lot more common than Integer, Long, and such, as mostly people use primitives whenever they can, only using the wrappers where they can't avoid it.

IMO, String is the most commonly used type in java. As an argument to load a class, a param to connect to DB/network connections, to store (almost) each and every thing - the list is long. Usage scenario for rest other primitives/wrapper types combined together would also be negligible compared to String - in any application.
If used in an un-optimized manner (e.g. implemented without Stringpool), performance would be up for a toss - hence it does make sense to have a pool of (only) String.

Related

Why does Java CharSequence.chars() return an IntStream? [duplicate]

In Java 8, there is a new method String.chars() which returns a stream of ints (IntStream) that represent the character codes. I guess many people would expect a stream of chars here instead. What was the motivation to design the API this way?
As others have already mentioned, the design decision behind this was to prevent the explosion of methods and classes.
Still, personally I think this was a very bad decision, and there should, given they do not want to make CharStream, which is reasonable, different methods instead of chars(), I would think of:
Stream<Character> chars(), that gives a stream of boxes characters, which will have some light performance penalty.
IntStream unboxedChars(), which would to be used for performance code.
However, instead of focusing on why it is done this way currently, I think this answer should focus on showing a way to do it with the API that we have gotten with Java 8.
In Java 7 I would have done it like this:
for (int i = 0; i < hello.length(); i++) {
System.out.println(hello.charAt(i));
}
And I think a reasonable method to do it in Java 8 is the following:
hello.chars()
.mapToObj(i -> (char)i)
.forEach(System.out::println);
Here I obtain an IntStream and map it to an object via the lambda i -> (char)i, this will automatically box it into a Stream<Character>, and then we can do what we want, and still use method references as a plus.
Be aware though that you must do mapToObj, if you forget and use map, then nothing will complain, but you will still end up with an IntStream, and you might be left off wondering why it prints the integer values instead of the strings representing the characters.
Other ugly alternatives for Java 8:
By remaining in an IntStream and wanting to print them ultimately, you cannot use method references anymore for printing:
hello.chars()
.forEach(i -> System.out.println((char)i));
Moreover, using method references to your own method do not work anymore! Consider the following:
private void print(char c) {
System.out.println(c);
}
and then
hello.chars()
.forEach(this::print);
This will give a compile error, as there possibly is a lossy conversion.
Conclusion:
The API was designed this way because of not wanting to add CharStream, I personally think that the method should return a Stream<Character>, and the workaround currently is to use mapToObj(i -> (char)i) on an IntStream to be able to work properly with them.
The answer from skiwi covered many of the major points already. I'll fill in a bit more background.
The design of any API is a series of tradeoffs. In Java, one of the difficult issues is dealing with design decisions that were made long ago.
Primitives have been in Java since 1.0. They make Java an "impure" object-oriented language, since the primitives are not objects. The addition of primitives was, I believe, a pragmatic decision to improve performance at the expense of object-oriented purity.
This is a tradeoff we're still living with today, nearly 20 years later. The autoboxing feature added in Java 5 mostly eliminated the need to clutter source code with boxing and unboxing method calls, but the overhead is still there. In many cases it's not noticeable. However, if you were to perform boxing or unboxing within an inner loop, you'd see that it can impose significant CPU and garbage collection overhead.
When designing the Streams API, it was clear that we had to support primitives. The boxing/unboxing overhead would kill any performance benefit from parallelism. We didn't want to support all of the primitives, though, since that would have added a huge amount of clutter to the API. (Can you really see a use for a ShortStream?) "All" or "none" are comfortable places for a design to be, yet neither was acceptable. So we had to find a reasonable value of "some". We ended up with primitive specializations for int, long, and double. (Personally I would have left out int but that's just me.)
For CharSequence.chars() we considered returning Stream<Character> (an early prototype might have implemented this) but it was rejected because of boxing overhead. Considering that a String has char values as primitives, it would seem to be a mistake to impose boxing unconditionally when the caller would probably just do a bit of processing on the value and unbox it right back into a string.
We also considered a CharStream primitive specialization, but its use would seem to be quite narrow compared to the amount of bulk it would add to the API. It didn't seem worthwhile to add it.
The penalty this imposes on callers is that they have to know that the IntStream contains char values represented as ints and that casting must be done at the proper place. This is doubly confusing because there are overloaded API calls like PrintStream.print(char) and PrintStream.print(int) that differ markedly in their behavior. An additional point of confusion possibly arises because the codePoints() call also returns an IntStream but the values it contains are quite different.
So, this boils down to choosing pragmatically among several alternatives:
We could provide no primitive specializations, resulting in a simple, elegant, consistent API, but which imposes a high performance and GC overhead;
we could provide a complete set of primitive specializations, at the cost of cluttering up the API and imposing a maintenance burden on JDK developers; or
we could provide a subset of primitive specializations, giving a moderately sized, high performing API that imposes a relatively small burden on callers in a fairly narrow range of use cases (char processing).
We chose the last one.

Why is String.chars() a stream of ints in Java 8?

In Java 8, there is a new method String.chars() which returns a stream of ints (IntStream) that represent the character codes. I guess many people would expect a stream of chars here instead. What was the motivation to design the API this way?
As others have already mentioned, the design decision behind this was to prevent the explosion of methods and classes.
Still, personally I think this was a very bad decision, and there should, given they do not want to make CharStream, which is reasonable, different methods instead of chars(), I would think of:
Stream<Character> chars(), that gives a stream of boxes characters, which will have some light performance penalty.
IntStream unboxedChars(), which would to be used for performance code.
However, instead of focusing on why it is done this way currently, I think this answer should focus on showing a way to do it with the API that we have gotten with Java 8.
In Java 7 I would have done it like this:
for (int i = 0; i < hello.length(); i++) {
System.out.println(hello.charAt(i));
}
And I think a reasonable method to do it in Java 8 is the following:
hello.chars()
.mapToObj(i -> (char)i)
.forEach(System.out::println);
Here I obtain an IntStream and map it to an object via the lambda i -> (char)i, this will automatically box it into a Stream<Character>, and then we can do what we want, and still use method references as a plus.
Be aware though that you must do mapToObj, if you forget and use map, then nothing will complain, but you will still end up with an IntStream, and you might be left off wondering why it prints the integer values instead of the strings representing the characters.
Other ugly alternatives for Java 8:
By remaining in an IntStream and wanting to print them ultimately, you cannot use method references anymore for printing:
hello.chars()
.forEach(i -> System.out.println((char)i));
Moreover, using method references to your own method do not work anymore! Consider the following:
private void print(char c) {
System.out.println(c);
}
and then
hello.chars()
.forEach(this::print);
This will give a compile error, as there possibly is a lossy conversion.
Conclusion:
The API was designed this way because of not wanting to add CharStream, I personally think that the method should return a Stream<Character>, and the workaround currently is to use mapToObj(i -> (char)i) on an IntStream to be able to work properly with them.
The answer from skiwi covered many of the major points already. I'll fill in a bit more background.
The design of any API is a series of tradeoffs. In Java, one of the difficult issues is dealing with design decisions that were made long ago.
Primitives have been in Java since 1.0. They make Java an "impure" object-oriented language, since the primitives are not objects. The addition of primitives was, I believe, a pragmatic decision to improve performance at the expense of object-oriented purity.
This is a tradeoff we're still living with today, nearly 20 years later. The autoboxing feature added in Java 5 mostly eliminated the need to clutter source code with boxing and unboxing method calls, but the overhead is still there. In many cases it's not noticeable. However, if you were to perform boxing or unboxing within an inner loop, you'd see that it can impose significant CPU and garbage collection overhead.
When designing the Streams API, it was clear that we had to support primitives. The boxing/unboxing overhead would kill any performance benefit from parallelism. We didn't want to support all of the primitives, though, since that would have added a huge amount of clutter to the API. (Can you really see a use for a ShortStream?) "All" or "none" are comfortable places for a design to be, yet neither was acceptable. So we had to find a reasonable value of "some". We ended up with primitive specializations for int, long, and double. (Personally I would have left out int but that's just me.)
For CharSequence.chars() we considered returning Stream<Character> (an early prototype might have implemented this) but it was rejected because of boxing overhead. Considering that a String has char values as primitives, it would seem to be a mistake to impose boxing unconditionally when the caller would probably just do a bit of processing on the value and unbox it right back into a string.
We also considered a CharStream primitive specialization, but its use would seem to be quite narrow compared to the amount of bulk it would add to the API. It didn't seem worthwhile to add it.
The penalty this imposes on callers is that they have to know that the IntStream contains char values represented as ints and that casting must be done at the proper place. This is doubly confusing because there are overloaded API calls like PrintStream.print(char) and PrintStream.print(int) that differ markedly in their behavior. An additional point of confusion possibly arises because the codePoints() call also returns an IntStream but the values it contains are quite different.
So, this boils down to choosing pragmatically among several alternatives:
We could provide no primitive specializations, resulting in a simple, elegant, consistent API, but which imposes a high performance and GC overhead;
we could provide a complete set of primitive specializations, at the cost of cluttering up the API and imposing a maintenance burden on JDK developers; or
we could provide a subset of primitive specializations, giving a moderately sized, high performing API that imposes a relatively small burden on callers in a fairly narrow range of use cases (char processing).
We chose the last one.

Why Wrapper class like Boolean in java is immutable?

I can't see the reason why the Boolean wrapper classes were made Immutable.
Why the Boolean Wrapper was not implemented like MutableBoolean in Commons lang which actually can be reset.
Does anyone have any idea/understanding about this ? Thanks.
Because 2 is 2. It won't be 3 tomorrow.
Immutable is always preferred as the default, especially in multithreaded situations, and it makes for easier to read and more maintainable code. Case in point: the Java Date API, which is riddled with design flaws. If Date were immutable the API would be very streamlined. I would know Date operations would create new dates and would never have to look for APIs that modify them.
Read Concurrency in Practice to understand the true importance of immutable types.
But also note that if for some reason you want mutable types, use AtomicInteger AtomicBoolean, etc. Why Atomic? Because by introducing mutability you introduced a need for threadsafety. Which you wouldn't have needed if your types stayed immutable, so in using mutable types you also must pay the price of thinking about threadsafety and using types from the concurrent package. Welcome to the wonderful world of concurrent programming.
Also, for Boolean - I challenge you to name a single operation that you might want to perform that cares whether Boolean is mutable. set to true? Use myBool = true. That is a re-assignment, not a mutation. Negate? myBool = !myBool. Same rule. Note that immutability is a feature, not a constraint, so if you can offer it, you should - and in these cases, of course you can.
Note this applies to other types as well. The most subtle thing with integers is count++, but that is just count = count + 1, unless you care about getting the value atomically... in which case use the mutable AtomicInteger.
Wrapper classes in Java are immutable so the runtime can have only two Boolean objects - one for true, one for false - and every variable is a reference to one of those two. And since they can never be changed, you know they'll never be pulled out from under you. Not only does this save memory, it makes your code easier to reason about - since the wrapper classes you're passing around you know will never have their value change, they won't suddenly jump to a new value because they're accidentally a reference to the same value elsewhere.
Similarly, Integer has a cache of all signed byte values - -128 to 127 - so the runtime doesn't have to have extra instances of those common Integer values.
Patashu is the closest. Many of the goofy design choices in Java were because of the limitations of how they implemented a VM. I think originally they tried to make a VM for C or C++ but it was too hard (impossible?) so made this other, similar language. Write one, run everywhere!
Any computer sciency justification like those other dudes spout is just after-the-fact folderal. As you now know, Java and C# are evolving to be as powerful as C. Sure, they were cleaner. Ought to be for languages designed decade(s) later!
Simple trick is to make a "holder" class. Or use a closure nowadays! Maybe Java is evolving into JavaScript. LOL.
Boolean or any other wrapper class is immutable in java. Since wrapper classes are used as variables for storing simple data, those should be safe and data integrity must be maintained to avoid inconsistent or unwanted results. Also, immutability saves lots of memory by avoiding duplicate objects. More can be found in article Why Strings & Wrapper classes are designed immutable in java?

What's the limit to the number of members you can have in a java enum?

Assuming you have a hypothetical enum in java like this (purely for demonstration purposes, this isn't code i'm seriously expecting to use):
enum Example{
FIRST,
SECOND,
THIRD,
...
LAST;
}
What's the maximum number of members you could have inside that enum before the compiler stops you?
Secondly, is there any performance difference at runtime when your code is referencing an enum with say, 10 members as opposed to 100 or 1,000 (other than just the obvious memory overhead required to store the large class)?
The language specification itself doesn't have a limit. Yet, there are many limitations that classfile has that bound the number of enums, with the upper bound being aruond 65,536 (2^16) enums:
Number of Fields
The JVMS 4.1 specifies that ClassFile may have up to 65,536 (2^16) fields. Enums get stored in the classfile as static field, so the maximum number of enum values and enum member fields is 65,536.
Constant Pool
The JVMS also specifies that the Constant Pool may have up to 65,536. Constant Pools store all String literals, type literals, supertype, super interfaces types, method signatures, method names, AND enum value names. So there must be fewer than 2^16 enum values, since the names strings need to share that Constant Pool limit.
Static Method Initialization
The maximum limit for a method is 65,535 bytes (in bytecode). So the static initializer for the Enum has to be smaller than 64Kb. While the compiler may split it into different methods (Look at Bug ID: 4262078) to distribute the initializations into small blocks, the compiler doesn't do that currently.
Long story short, there is no easy answer, and the answer depends not only on the number of enum values there are, but also the number of methods, interfaces, and fields the enums have!
The best way to find out the answer to this type of question is to try it. Start with a little Python script to generate the Java files:
n = input()
print "class A{public static void main(String[] a){}enum B{"
print ','.join("C%d" % x for x in range(n))
print '}}'
Now try with 1,10,100,1000... works fine, then BAM:
A.java:2: code too large
C0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15,C16,C17,C18,C19,C20,C21,C22,...
Seems like I hit some sort of internal limit. Not sure if it's a documented limit, if it's dependent on the specific version of my compiler, or if its some system dependant limit. But for me the limit was around 3000 and appears to be related to the source code size. Maybe you could write your own compiler to bypass this limit.
The maximum number of enum values will I think be just under the 65536 maximum number of fields/constant pool entries in the class. (As I mentioned in a comment above, the actual values shouldn't take up constant pool entries: they can be "inlined" into the bytecode, but the names will.)
As far as the second question is concerned, there's no direct performance difference, but it's conceivable that there'll be small indirect performance differences, partly because of the class file size as you say. Another thing to bear in mind is that when you use enum collections, there are optimised versions of some of the classes for when all of the enum values fit within a certain range (a byte, as I recall). So yes, there could be a small difference. I woudln't get paranoid, though.
This is an extension of the comments to the original question.
There are multiple problems with having a LOT of enums.
The main reason is that when you have a lot of data it tends to change, or if not you often want to add new items. There are exemptions to this like unit conversions that would never change, but for the most part you want to read data like this from a file into a collection of classes rather than an enum.
To add new items is problematic because since it's an enum, you need to physically modify your code unless you are ALWAYS using the enums as a collection, and if you are ALWAYS using them as a collection, why make them enums at all?
The case where your data doesn't change--like "conversion units" where you are converting feet, inches, etc. You COULD do this as enums and there WOULD be a lot of them, but by coding them as enums you lose the ability to have data drive your program. For instance, a user could select from a pull-down list populated by your "Units", but again, this is not an "ENUM" usage, it's using it as a collection.
The other problem will be repetition around the references to your enum. You will almost certainly have something very repetitive like:
if(userSelectedCard() == cards.HEARTS)
graphic=loadFile("Heart.jpg");
if(userSelectedCard() == cards.SPADES)
graphic=loadFile("Spade.jpg");
Which is just wrong (If you can squint to where you can't read the letters and see this kind of pattern in your code, you KNOW you are doing it wrong).
If the cards were stored in a card collection, it would be easier to just use:
graphic=cards.getGraphicFor(userSelectedCard());
I'm not saying that this can't be done with an enum as well, but I am saying that I can't see how you would use these as enums without having some nasty code-block like the one I posted above.
I'm also not saying that there aren't cases for enums--there are lots of them, but when you get more than a few (7 was a good number), you're probably better off with some other structure.
I guess the exception is when you are modeling real-world stuff that has that many types and each must be addressed with different code, but even then you are probably better off using a data file to bind a name to some code to run and storing them in a hash so you can invoke them with code like: hash.get(nameString).executeCode(). This way, again, your "nameString" is data and not hard-coded, allowing refactoring elsewhere.
If you get in the habit of brutally factoring your code like this, you can reduce many programs by 50% or more in size.
If you have to ask, you're probably doing something wrong. The actual limit is probably fairly high, but an enum with more than 10 or so values would be highly suspect, I think. Break that up into related collections, or a type hierarchy, or something.

Why can't strings be mutable in Java and .NET?

Why is it that they decided to make String immutable in Java and .NET (and some other languages)? Why didn't they make it mutable?
According to Effective Java, chapter 4, page 73, 2nd edition:
"There are many good reasons for this: Immutable classes are easier to
design, implement, and use than mutable classes. They are less prone
to error and are more secure.
[...]
"Immutable objects are simple. An immutable object can be in
exactly one state, the state in which it was created. If you make sure
that all constructors establish class invariants, then it is
guaranteed that these invariants will remain true for all time, with
no effort on your part.
[...]
Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads
accessing them concurrently. This is far and away the easiest approach
to achieving thread safety. In fact, no thread can ever observe any
effect of another thread on an immutable object. Therefore,
immutable objects can be shared freely
[...]
Other small points from the same chapter:
Not only can you share immutable objects, but you can share their internals.
[...]
Immutable objects make great building blocks for other objects, whether mutable or immutable.
[...]
The only real disadvantage of immutable classes is that they require a separate object for each distinct value.
There are at least two reasons.
First - security http://www.javafaq.nu/java-article1060.html
The main reason why String made
immutable was security. Look at this
example: We have a file open method
with login check. We pass a String to
this method to process authentication
which is necessary before the call
will be passed to OS. If String was
mutable it was possible somehow to
modify its content after the
authentication check before OS gets
request from program then it is
possible to request any file. So if
you have a right to open text file in
user directory but then on the fly
when somehow you manage to change the
file name you can request to open
"passwd" file or any other. Then a
file can be modified and it will be
possible to login directly to OS.
Second - Memory efficiency http://hikrish.blogspot.com/2006/07/why-string-class-is-immutable.html
JVM internally maintains the "String
Pool". To achive the memory
efficiency, JVM will refer the String
object from pool. It will not create
the new String objects. So, whenever
you create a new string literal, JVM
will check in the pool whether it
already exists or not. If already
present in the pool, just give the
reference to the same object or create
the new object in the pool. There will
be many references point to the same
String objects, if someone changes the
value, it will affect all the
references. So, sun decided to make it
immutable.
Actually, the reasons string are immutable in java doesn't have much to do with security. The two main reasons are the following:
Thead Safety:
Strings are extremely widely used type of object. It is therefore more or less guaranteed to be used in a multi-threaded environment. Strings are immutable to make sure that it is safe to share strings among threads. Having an immutable strings ensures that when passing strings from thread A to another thread B, thread B cannot unexpectedly modify thread A's string.
Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.
Performance:
While String interning has been mentioned, it only represents a small gain in memory efficiency for Java programs. Only string literals are interned. This means that only the strings which are the same in your source code will share the same String Object. If your program dynamically creates string that are the same, they will be represented in different objects.
More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:
| myString |
v v
"The quick brown fox jumps over the lazy dog" <-- shared char[]
^ ^
| | myString.substring(0,5)
This makes this kind of operations extremely cheap, and O(1) since the operation neither depends on the length of the original string, nor on the length of the substring we need to extract. This behavior also has some memory benefits, since many strings can share their underlying char[].
Thread safety and performance. If a string cannot be modified it is safe and quick to pass a reference around among multiple threads. If strings were mutable, you would always have to copy all of the bytes of the string to a new instance, or provide synchronization. A typical application will read a string 100 times for every time that string needs to be modified. See wikipedia on immutability.
One should really ask, "why should X be mutable?" It's better to default to immutability, because of the benefits already mentioned by Princess Fluff. It should be an exception that something is mutable.
Unfortunately most of the current programming languages default to mutability, but hopefully in the future the default is more on immutablity (see A Wish List for the Next Mainstream Programming Language).
Wow! I Can't believe the misinformation here. Strings being immutable have nothing with security. If someone already has access to the objects in a running application (which would have to be assumed if you are trying to guard against someone 'hacking' a String in your app), they would certainly be a plenty of other opportunities available for hacking.
It's a quite novel idea that the immutability of String is addressing threading issues. Hmmm ... I have an object that is being changed by two different threads. How do I resolve this? synchronize access to the object? Naawww ... let's not let anyone change the object at all -- that'll fix all of our messy concurrency issues! In fact, let's make all objects immutable, and then we can removed the synchonized contruct from the Java language.
The real reason (pointed out by others above) is memory optimization. It is quite common in any application for the same string literal to be used repeatedly. It is so common, in fact, that decades ago, many compilers made the optimization of storing only a single instance of a String literal. The drawback of this optimization is that runtime code that modifies a String literal introduces a problem because it is modifying the instance for all other code that shares it. For example, it would be not good for a function somewhere in an application to change the String literal "dog" to "cat". A printf("dog") would result in "cat" being written to stdout. For that reason, there needed to be a way of guarding against code that attempts to change String literals (i. e., make them immutable). Some compilers (with support from the OS) would accomplish this by placing String literal into a special readonly memory segment that would cause a memory fault if a write attempt was made.
In Java this is known as interning. The Java compiler here is just following an standard memory optimization done by compilers for decades. And to address the same issue of these String literals being modified at runtime, Java simply makes the String class immutable (i. e, gives you no setters that would allow you to change the String content). Strings would not have to be immutable if interning of String literals did not occur.
String is not a primitive type, yet you normally want to use it with value semantics, i.e. like a value.
A value is something you can trust won't change behind your back.
If you write: String str = someExpr();
You don't want it to change unless YOU do something with str.
String as an Object has naturally pointer semantics, to get value semantics as well it needs to be immutable.
One factor is that, if Strings were mutable, objects storing Strings would have to be careful to store copies, lest their internal data change without notice. Given that Strings are a fairly primitive type like numbers, it is nice when one can treat them as if they were passed by value, even if they are passed by reference (which also helps to save on memory).
I know this is a bump, but...
Are they really immutable?
Consider the following.
public static unsafe void MutableReplaceIndex(string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
...
string s = "abc";
MutableReplaceIndex(s, '1', 0);
MutableReplaceIndex(s, '2', 1);
MutableReplaceIndex(s, '3', 2);
Console.WriteLine(s); // Prints 1 2 3
You could even make it an extension method.
public static class Extensions
{
public static unsafe void MutableReplaceIndex(this string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
}
Which makes the following work
s.MutableReplaceIndex('1', 0);
s.MutableReplaceIndex('2', 1);
s.MutableReplaceIndex('3', 2);
Conclusion: They're in an immutable state which is known by the compiler. Of couse the above only applies to .NET strings as Java doesn't have pointers. However a string can be entirely mutable using pointers in C#. It's not how pointers are intended to be used, has practical usage or is safely used; it's however possible, thus bending the whole "mutable" rule. You can normally not modify an index directly of a string and this is the only way. There is a way that this could be prevented by disallowing pointer instances of strings or making a copy when a string is pointed to, but neither is done, which makes strings in C# not entirely immutable.
For most purposes, a "string" is (used/treated as/thought of/assumed to be) a meaningful atomic unit, just like a number.
Asking why the individual characters of a string are not mutable is therefore like asking why the individual bits of an integer are not mutable.
You should know why. Just think about it.
I hate to say it, but unfortunately we're debating this because our language sucks, and we're trying to using a single word, string, to describe a complex, contextually situated concept or class of object.
We perform calculations and comparisons with "strings" similar to how we do with numbers. If strings (or integers) were mutable, we'd have to write special code to lock their values into immutable local forms in order to perform any kind of calculation reliably. Therefore, it is best to think of a string like a numeric identifier, but instead of being 16, 32, or 64 bits long, it could be hundreds of bits long.
When someone says "string", we all think of different things. Those who think of it simply as a set of characters, with no particular purpose in mind, will of course be appalled that someone just decided that they should not be able to manipulate those characters. But the "string" class isn't just an array of characters. It's a STRING, not a char[]. There are some basic assumptions about the concept we refer to as a "string", and it generally can be described as meaningful, atomic unit of coded data like a number. When people talk about "manipulating strings", perhaps they're really talking about manipulating characters to build strings, and a StringBuilder is great for that. Just think a bit about what the word "string" truly means.
Consider for a moment what it would be like if strings were mutable. The following API function could be tricked into returning information for a different user if the mutable username string is intentionally or unintentionally modified by another thread while this function is using it:
string GetPersonalInfo( string username, string password )
{
string stored_password = DBQuery.GetPasswordFor( username );
if (password == stored_password)
{
//another thread modifies the mutable 'username' string
return DBQuery.GetPersonalInfoFor( username );
}
}
Security isn't just about 'access control', it's also about 'safety' and 'guaranteeing correctness'. If a method can't be easily written and depended upon to perform a simple calculation or comparison reliably, then it's not safe to call it, but it would be safe to call into question the programming language itself.
Immutability is not so closely tied to security. For that, at least in .NET, you get the SecureString class.
Later edit: In Java you will find GuardedString, a similar implementation.
The decision to have string mutable in C++ causes a lot of problems, see this excellent article by Kelvin Henney about Mad COW Disease.
COW = Copy On Write.
It's a trade off. Strings go into the String pool and when you create multiple identical Strings they share the same memory. The designers figured this memory saving technique would work well for the common case, since programs tend to grind over the same strings a lot.
The downside is that concatenations make a lot of extra Strings that are only transitional and just become garbage, actually harming memory performance. You have StringBuffer and StringBuilder (in Java, StringBuilder is also in .NET) to use to preserve memory in these cases.
Strings in Java are not truly immutable, you can change their value's using reflection and or class loading. You should not be depending on that property for security.
For examples see: Magic Trick In Java
Immutability is good. See Effective Java. If you had to copy a String every time you passed it around, then that would be a lot of error-prone code. You also have confusion as to which modifications affect which references. In the same way that Integer has to be immutable to behave like int, Strings have to behave as immutable to act like primitives. In C++ passing strings by value does this without explicit mention in the source code.
There is an exception for nearly almost every rule:
using System;
using System.Runtime.InteropServices;
namespace Guess
{
class Program
{
static void Main(string[] args)
{
const string str = "ABC";
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
var handle = GCHandle.Alloc(str, GCHandleType.Pinned);
try
{
Marshal.WriteInt16(handle.AddrOfPinnedObject(), 4, 'Z');
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
}
finally
{
handle.Free();
}
}
}
}
It's largely for security reasons. It's much harder to secure a system if you can't trust that your Strings are tamperproof.

Categories

Resources