Why factory methods for Collections produce immutable instances? [duplicate] - java

I am unable to get what are the scenarios where we need an immutable class.
Have you ever faced any such requirement? or can you please give us any real example where we should use this pattern.

The other answers seem too focused on explaining why immutability is good. It is very good and I use it whenever possible. However, that is not your question. I'll take your question point by point to try to make sure you're getting the answers and examples you need.
I am unable to get what are the scenarios where we need an immutable class.
"Need" is a relative term here. Immutable classes are a design pattern that, like any paradigm/pattern/tool, is there to make constructing software easier. Similarly, plenty of code was written before the OO paradigm came along, but count me among the programmers that "need" OO. Immutable classes, like OO, aren't strictly needed, but I going to act like I need them.
Have you ever faced any such requirement?
If you aren't looking at the objects in the problem domain with the right perspective, you may not see a requirement for an immutable object. It might be easy to think that a problem domain doesn't require any immutable classes if you're not familiar when to use them advantageously.
I often use immutable classes where I think of a given object in my problem domain as a value or fixed instance. This notion is sometimes dependent on perspective or viewpoint, but ideally, it will be easy to switch into the right perspective to identify good candidate objects.
You can get a better sense of where immutable objects are really useful (if not strictly necessary) by making sure you read up on various books/online articles to develop a good sense of how to think about immutable classes. One good article to get you started is Java theory and practice: To mutate or not to mutate?
I'll try to give a couple of examples below of how one can see objects in different perspectives (mutable vs immutable) to clarify what I mean by perspective.
... can you please give us any real example where we should use this pattern.
Since you asked for real examples I'll give you some, but first, let's start with some classic examples.
Classic Value Objects
Strings and integers are often thought of as values. Therefore it's not surprising to find that String class and the Integer wrapper class (as well as the other wrapper classes) are immutable in Java. A color is usually thought of as a value, thus the immutable Color class.
Counterexample
In contrast, a car is not usually thought of as a value object. Modeling a car usually means creating a class that has changing state (odometer, speed, fuel level, etc). However, there are some domains where it car may be a value object. For example, a car (or specifically a car model) might be thought of as a value object in an app to look up the proper motor oil for a given vehicle.
Playing Cards
Ever write a playing card program? I did. I could have represented a playing card as a mutable object with a mutable suit and rank. A draw-poker hand could be 5 fixed instances where replacing the 5th card in my hand would mean mutating the 5th playing card instance into a new card by changing its suit and rank ivars.
However, I tend to think of a playing card as an immutable object that has a fixed unchanging suit and rank once created. My draw poker hand would be 5 instances and replacing a card in my hand would involve discarding one of those instance and adding a new random instance to my hand.
Map Projection
One last example is when I worked on some map code where the map could display itself in various projections. The original code had the map use a fixed, but mutatable projection instance (like the mutable playing card above). Changing the map projection meant mutating the map's projection instance's ivars (projection type, center point, zoom, etc).
However, I felt the design was simpler if I thought of a projection as an immutable value or fixed instance. Changing the map projection meant having the map reference a different projection instance rather than mutating the map's fixed projection instance. This also made it simpler to capture named projections such as MERCATOR_WORLD_VIEW.

Immutable classes are in general much simpler to design, implement and use correctly. An example is String: the implementation of java.lang.String is significantly simpler than that of std::string in C++, mostly due to its immutability.
One particular area where immutability makes an especially big difference is concurrency: immutable objects can safely be shared among multiple threads, whereas mutable objects must be made thread-safe via careful design and implementation - usually this is far from a trivial task.
Update: Effective Java 2nd Edition tackles this issue in detail - see Item 15: Minimize mutability.
See also these related posts:
non-technical benefits of having string-type immutable
Downsides to immutable objects in Java?

Effective Java by Joshua Bloch outlines several reasons to write immutable classes:
Simplicity - each class is in one state only
Thread Safe - because the state cannot be changed, no synchronization is required
Writing in an immutable style can lead to more robust code. Imagine if Strings weren't immutable; Any getter methods that returned a String would require the implementation to create a defensive copy before the String was returned - otherwise a client may accidentally or maliciously break that state of the object.
In general it is good practise to make an object immutable unless there are severe performance problems as a result. In such circumstances, mutable builder objects can be used to build immutable objects e.g. StringBuilder

Hashmaps are a classic example. It's imperative that the key to a map be immutable. If the key is not immutable, and you change a value on the key such that hashCode() would result in a new value, the map is now broken (a key is now in the wrong location in the hash table.).

Java is practically one and all references. Sometimes an instance is referenced multiple times. If you change such an instance, it would be reflected into all its references. Sometimes you simply don't want to have this to improve robustness and threadsafety. Then an immutable class is useful so that one is forced to create a new instance and reassign it to the current reference. This way the original instance of the other references remain untouched.
Imagine how Java would look like if String was mutable.

Let's take an extreme case: integer constants. If I write a statement like "x=x+1" I want to be 100% confidant that the number "1" will not somehow become 2, no matter what happens anywhere else in the program.
Now okay, integer constants are not a class, but the concept is the same. Suppose I write:
String customerId=getCustomerId();
String customerName=getCustomerName(customerId);
String customerBalance=getCustomerBalance(customerid);
Looks simple enough. But if Strings were not immutable, then I would have to consider the possibility that getCustomerName could change customerId, so that when I call getCustomerBalance, I am getting the balance for a different customer. Now you might say, "Why in the world would someone writing a getCustomerName function make it change the id? That would make no sense." But that's exactly where you could get in trouble. The person writing the above code might take it as just obvious that the functions would not change the parameter. Then someone comes along who has to modify another use of that function to handle the case where where a customer has multiple accounts under the same name. And he says, "Oh, here's this handy getCustomer name function that's already looking up the name. I'll just make that automatically change the id to the next account with the same name, and put it in a loop ..." And then your program starts mysteriously not working. Would that be bad coding style? Probably. But it's precisely a problem in cases where the side effect is NOT obvious.
Immutability simply means that a certain class of objects are constants, and we can treat them as constants.
(Of course the user could assign a different "constant object" to a variable. Someone can write
String s="hello";
and then later write
s="goodbye";
Unless I make the variable final, I can't be sure that it's not being changed within my own block of code. Just like integer constants assure me that "1" is always the same number, but not that "x=1" will never be changed by writing "x=2". But I can be confidant that if I have a handle to an immutable object, that no function I pass it to can change it on me, or that if I make two copies of it, that a change to the variable holding one copy will not change the other. Etc.

We don't need immutable classes, per se, but they can certainly make some programming tasks easier, especially when multiple threads are involved. You don't have to perform any locking to access an immutable object, and any facts that you've already established about such an object will continue to be true in the future.

There are various reason for immutability:
Thread Safety: Immutable objects cannot be changed nor can its internal state change, thus there's no need to synchronise it.
It also guarantees that whatever I send through (through a network) has to come in the same state as previously sent. It means that nobody (eavesdropper) can come and add random data in my immutable set.
It's also simpler to develop. You guarantee that no subclasses will exist if an object is immutable. E.g. a String class.
So, if you want to send data through a network service, and you want a sense of guarantee that you will have your result exactly the same as what you sent, set it as immutable.

My 2 cents for future visitors:
2 scenarios where immutable objects are good choices are:
In multi-threading
Concurrency issues in multi-threaded environment can very well be solved by synchronization but synchronization is costly affair (wouldn't dig here on "why"), so if you are using immutable objects then there is no synchronization to solve concurrency issue because state of immutable objects cannot be changed, and if state cannot be changed then all threads can seamless access the object. So, immutable objects makes a great choice for shared objects in multi-threaded environment.
As key for hash based collections
One of the most important thing to note when working with hash based collection is that key should be such that its hashCode() should always return the same value for the lifetime of the object, because if that value is changed then old entry made into the hash based collection using that object cannot be retrieved, hence it would cause memory leak. Since state of immutable objects cannot be changed so they makes a great choice as key in hash based collection. So, if you are using immutable object as key for hash based collection then you can be sure that there will not be any memory leak because of that (of course there can still be memory leak when the object used as key is not referenced from anywhere else, but that's not the point here).

I'm going to attack this from a different perspective. I find immutable objects make life easier for me when reading code.
If I have a mutable object I am never sure what its value is if it's ever used outside of my immediate scope. Let's say I create MyMutableObject in a method's local variables, fill it out with values, then pass it to five other methods. ANY ONE of those methods can change my object's state, so one of two things has to occur:
I have to keep track of the bodies of five additional methods while thinking about my code's logic.
I have to make five wasteful defensive copies of my object to ensure that the right values get passed to each method.
The first makes reasoning about my code difficult. The second makes my code suck in performance -- I'm basically mimicking an immutable object with copy-on-write semantics anyway, but doing it all the time whether or not the called methods actually modify my object's state.
If I instead use MyImmutableObject, I can be assured that what I set is what the values will be for the life of my method. There's no "spooky action at a distance" that will change it out from under me and there's no need for me to make defensive copies of my object before invoking the five other methods. If the other methods want to change things for their purposes they have to make the copy – but they only do this if they really have to make a copy (as opposed to my doing it before each and every external method call). I spare myself the mental resources of keeping track of methods which may not even be in my current source file, and I spare the system the overhead of endlessly making unnecessary defensive copies just in case.
(If I go outside of the Java world and into, say, the C++ world, among others, I can get even trickier. I can make the objects appear as if they're mutable, but behind the scenes make them transparently clone on any kind of state change—that's copy-on-write—with nobody being the wiser.)

Immutable objects are instances whose states do not change once initiated.
The use of such objects is requirement specific.
Immutable class is good for caching purpose and it is thread safe.

By the virtue of immutability you can be sure that the behavior/state of the underlying immutable object do not to change, with that you get added advantage of performing additional operations:
You can use multiple core/processing(concurrent/parallel processing) with ease(as the sequence of operations will no longer matter.)
Can do caching for expensive operations (as you are sure of the same
result).
Can do debugging with ease(as the history of run will not be a concern
anymore)

Using the final keyword doesn't necessarily make something immutable:
public class Scratchpad {
public static void main(String[] args) throws Exception {
SomeData sd = new SomeData("foo");
System.out.println(sd.data); //prints "foo"
voodoo(sd, "data", "bar");
System.out.println(sd.data); //prints "bar"
}
private static void voodoo(Object obj, String fieldName, Object value) throws Exception {
Field f = SomeData.class.getDeclaredField("data");
f.setAccessible(true);
Field modifiers = Field.class.getDeclaredField("modifiers");
modifiers.setAccessible(true);
modifiers.setInt(f, f.getModifiers() & ~Modifier.FINAL);
f.set(obj, "bar");
}
}
class SomeData {
final String data;
SomeData(String data) {
this.data = data;
}
}
Just an example to demonstrate that the "final" keyword is there to prevent programmer error, and not much more. Whereas reassigning a value lacking a final keyword can easily happen by accident, going to this length to change a value would have to be done intentionally. It's there for documentation and to prevent programmer error.

Immutable data structures can also help when coding recursive algorithms. For example, say that you're trying to solve a 3SAT problem. One way is to do the following:
Pick an unassigned variable.
Give it the value of TRUE. Simplify the instance by taking out clauses that are now satisfied, and recur to solve the simpler instance.
If the recursion on the TRUE case failed, then assign that variable FALSE instead. Simplify this new instance, and recur to solve it.
If you have a mutable structure to represent the problem, then when you simplify the instance in the TRUE branch, you'll either have to:
Keep track of all changes you make, and undo them all once you realize the problem can't be solved. This has large overhead because your recursion can go pretty deep, and it's tricky to code.
Make a copy of the instance, and then modify the copy. This will be slow because if your recursion is a few dozen levels deep, you'll have to make many many copies of the instance.
However if you code it in a clever way, you can have an immutable structure, where any operation returns an updated (but still immutable) version of the problem (similar to String.replace - it doesn't replace the string, just gives you a new one). The naive way to implement this is to have the "immutable" structure just copy and make a new one on any modification, reducing it to the 2nd solution when having a mutable one, with all that overhead, but you can do it in a more efficient way.

One of the reasons for the "need" for immutable classes is the combination of passing everything by reference and having no support for read-only views of an object (i.e. C++'s const).
Consider the simple case of a class having support for the observer pattern:
class Person {
public string getName() { ... }
public void registerForNameChange(NameChangedObserver o) { ... }
}
If string were not immutable, it would be impossible for the Person class to implement registerForNameChange() correctly, because someone could write the following, effectively modifying the person's name without triggering any notification.
void foo(Person p) {
p.getName().prepend("Mr. ");
}
In C++, getName() returning a const std::string& has the effect of returning by reference and preventing access to mutators, meaning immutable classes are not necessary in that context.

They also give us a guarantee. The guarantee of immutability means that we can expand on them and create new patters for efficiency that are otherwise not possible.
http://en.wikipedia.org/wiki/Singleton_pattern

One feature of immutable classes which hasn't yet been called out: storing a reference to a deeply-immutable class object is an efficient means of storing all of the state contained therein. Suppose I have a mutable object which uses a deeply-immutable object to hold 50K worth of state information. Suppose, further, that I wish to on 25 occasions make a "copy" of my original (mutable) object (e.g. for an "undo" buffer); the state could change between copy operations, but usually doesn't. Making a "copy" of the mutable object would simply require copying a reference to its immutable state, so 20 copies would simply amount to 20 references. By contrast, if the state were held in 50K worth of mutable objects, each of the 25 copy operations would have to produce its own copy of 50K worth of data; holding all 25 copies would require holding over a meg worth of mostly-duplicated data. Even though the first copy operation would produce a copy of the data that will never change, and the other 24 operations could in theory simply refer back to that, in most implementations there would be no way for the second object asking for a copy of the information to know that an immutable copy already exists(*).
(*) One pattern that can sometimes be useful is for mutable objects to have two fields to hold their state--one in mutable form and one in immutable form. Objects can be copied as mutable or immutable, and would begin life with one or the other reference set. As soon as the object wants to change its state, it copies the immutable reference to the mutable one (if it hasn't been done already) and invalidates the immutable one. When the object is copied as immutable, if its immutable reference isn't set, an immutable copy will be created and the immutable reference pointed to that. This approach will require a few more copy operations than would a "full-fledged copy on write" (e.g. asking to copy an object which has been mutated since the last copy would require a copy operation, even if the original object is never again mutated) but it avoids the threading complexities that FFCOW would entail.

Why Immutable class?
Once an object is instantiated it state cannot be changed in lifetime. Which also makes it thread safe.
Examples :
Obviously String, Integer and BigDecimal etc. Once these values are created cannot be changed in lifetime.
Use-case :
Once Database connection object is created with its configuration values you might not need to change its state where you can use an immutable class

from Effective Java;
An immutable class is simply a class whose instances cannot be modified. All of
the information contained in each instance is provided when it is created and is
fixed for the lifetime of the object. The Java platform libraries contain many
immutable classes, including String, the boxed primitive classes, and BigInte-
ger and BigDecimal. There are many good reasons for this: Immutable classes
are easier to design, implement and use than mutable classes. They are less prone
to error and are more secure.

An immutable class is good for caching purposes because you don't have to worry about the value changes. Another benefit of an immutable class is that it is inherently thread-safe, so you don't have to worry about thread safety in case of a multi-threaded environment.

Related

Is chaining method calls to constructors a good practice in Java?

I have always thought that I have to initialize a class, before I can call it's non-static method, however, I came across a solution which had a method like this in it:
public String someStringMethod(){
return new MyClass().toString();
}
So I may be new in development, but is it a good practice? Is this a better way to call a method than the "classic" (see below) way?
public String classicStringMethod(){
MyClass cl = new MyClass();
return cl.toString();
}
Do they have any performance difference? Does the first way has a "special name"?
No significant difference
As the comments explained, both approaches are semantically the same; both ways achieve the exact same result and the choice is really just a stylistic difference.
The second approach assigns the new object to a reference variable. The first approach skips the use of a reference variable. But in both cases the class was used as a definition for instantiating an object, and then the toString method was called on that object.
Semantically, first (chained/fluent) syntax usually informs you that the created object will be used only for a single chain of operations, and discarded afterwards. Since there's no explicit reference exported, it also signals that the scope of life of the object is limited to that very statement. The second (explicit) one hints that the object is/was/will be used for additional operations, be it another method calls, setting a field to it, returning it, or even just debugging. Still, the general notion of using (or not) temporary helper variables is just a stylistic one.
Keep in mind that the variable is not the object. For example, the line Dog hershey = new Dog( "Australian Shepard" , "red", "Hershey" ); uses two chunks of memory. In one chunk is the new object, holding the state data for the breed and color and name. In the other separate chunk is the reference variable hershey holding a pointer to the memory location of the memory chunk of the Dog object. The reference variable lets us later refer to the object.
Java syntax makes this jump from reference variable to object so seamlessly that we usually think of hershey as the Dog object “Hershey”, but in fact they are separate and distinct.
As for performance, any difference would be insignificant. Indeed, the compiler or JVM may well collapse the second approach’s two lines into the first approach‘s single line. I don't know for sure, and I don't really care. Neither should you. Our job is to write clear readable code. The job of the compiler and JVM is to run that code reliably, efficiently, and fast. Attempting micro-optimizations has been shown many times to be futile (or even counter-productive) as the JVM implementations are extremely sophisticated pieces of software engineering, highly-tuned for making such optimizations. You can best assist the compilers and JVMs by writing simple straight-forward code without “cleverness”.
Note that the second approach can make debugging easier, because your debugger can inspect the instantiated object by accessing the object via the reference variable, and because you can set a line breakpoint on that particular constructor call explicitly.

Why Wrapper class like Boolean in java is immutable?

I can't see the reason why the Boolean wrapper classes were made Immutable.
Why the Boolean Wrapper was not implemented like MutableBoolean in Commons lang which actually can be reset.
Does anyone have any idea/understanding about this ? Thanks.
Because 2 is 2. It won't be 3 tomorrow.
Immutable is always preferred as the default, especially in multithreaded situations, and it makes for easier to read and more maintainable code. Case in point: the Java Date API, which is riddled with design flaws. If Date were immutable the API would be very streamlined. I would know Date operations would create new dates and would never have to look for APIs that modify them.
Read Concurrency in Practice to understand the true importance of immutable types.
But also note that if for some reason you want mutable types, use AtomicInteger AtomicBoolean, etc. Why Atomic? Because by introducing mutability you introduced a need for threadsafety. Which you wouldn't have needed if your types stayed immutable, so in using mutable types you also must pay the price of thinking about threadsafety and using types from the concurrent package. Welcome to the wonderful world of concurrent programming.
Also, for Boolean - I challenge you to name a single operation that you might want to perform that cares whether Boolean is mutable. set to true? Use myBool = true. That is a re-assignment, not a mutation. Negate? myBool = !myBool. Same rule. Note that immutability is a feature, not a constraint, so if you can offer it, you should - and in these cases, of course you can.
Note this applies to other types as well. The most subtle thing with integers is count++, but that is just count = count + 1, unless you care about getting the value atomically... in which case use the mutable AtomicInteger.
Wrapper classes in Java are immutable so the runtime can have only two Boolean objects - one for true, one for false - and every variable is a reference to one of those two. And since they can never be changed, you know they'll never be pulled out from under you. Not only does this save memory, it makes your code easier to reason about - since the wrapper classes you're passing around you know will never have their value change, they won't suddenly jump to a new value because they're accidentally a reference to the same value elsewhere.
Similarly, Integer has a cache of all signed byte values - -128 to 127 - so the runtime doesn't have to have extra instances of those common Integer values.
Patashu is the closest. Many of the goofy design choices in Java were because of the limitations of how they implemented a VM. I think originally they tried to make a VM for C or C++ but it was too hard (impossible?) so made this other, similar language. Write one, run everywhere!
Any computer sciency justification like those other dudes spout is just after-the-fact folderal. As you now know, Java and C# are evolving to be as powerful as C. Sure, they were cleaner. Ought to be for languages designed decade(s) later!
Simple trick is to make a "holder" class. Or use a closure nowadays! Maybe Java is evolving into JavaScript. LOL.
Boolean or any other wrapper class is immutable in java. Since wrapper classes are used as variables for storing simple data, those should be safe and data integrity must be maintained to avoid inconsistent or unwanted results. Also, immutability saves lots of memory by avoiding duplicate objects. More can be found in article Why Strings & Wrapper classes are designed immutable in java?

C++ STL datastructures compared to Java

I'm currently learning C++ and trying to get used to the standard data structures that come with it, but they all seem very bare. For example, list doesn't have simple accessors like get(index) that I'm used to in Java. Methods like pop_back and pop_front don't return the object in the list either. So you have to do something like:
Object blah = myList.back();
myList.pop_back();
Instead of something simple like:
Object blah = myList.pop_back();
In Java, just about every data structure returns the object back so you don't have to make these extra calls. Why is the STL containers for C++ designed like this? Are common operations like this that I do in Java not so common for C++?
edit: Sorry, I guess my question was worded very poorly to get all these downvotes, but surely somebody could have edited it. To clarify, I'm wondering why the STL data structures are created like this in comparison to Java. Or am I using the wrong set of data structures to begin with? My point is that these seem like common operations you might use on (in my example) a list and surely everybody does not want to write their own implementation each time.
edit: reworded the question to be more clear.
Quite a few have already answered the specific points you raised, so I'll try to take a look for a second at the larger picture.
One of the must fundamental differences between Java and C++ is that C++ works primarily with values, while Java works primarily with references.
For example, if I have something like:
class X {
// ...
};
// ...
X x;
In Java, x is only a reference to an object of type X. To have an actual object of type X for it to refer to, I normally have something like: X x = new X;. In C++, however, X x;, by itself, defines an object of type X, not just a reference to an object. We can use that object directly, not via a reference (i.e., a pointer in disguise).
Although this may initially seem like a fairly trivial difference, the effects are substantial and pervasive. One effect (probably the most important in this case) is that in Java, returning a value does not involve copying the object itself at all. It just involves copying a reference to the value. This is normally presumed to be extremely inexpensive and (probably more importantly) completely safe -- it can never throw an exception.
In C++, you're dealing directly with values instead. When you return an object, you're not just returning a reference to the existing object, you're returning that object's value, usually in the form of a copy of that object's state. Of course, it's also possible to return a reference (or pointer) if you want, but to make that happen, you have to make it explicit.
The standard containers are (if anything) even more heavily oriented toward working with values rather than references. When you add a value to a collection, what gets added is a copy of the value you passed, and when you get something back out, you get a copy of the value that was in the container itself.
Among other things, this means that while returning a value might be cheap and safe just like in Java, it can also be expensive and/or throw an exception. If the programmer wants to store pointers, s/he can certainly do so -- but the language doesn't require it like Java does. Since returning an object can be expensive and/or throw, the containers in the standard library are generally built around ensuring they can work reasonably well if copying is expensive, and (more importantly) work correctly, even when/if copying throws an exception.
This basic difference in design accounts not only for the differences you've pointed out, but quite a few more as well.
back() returns a reference to the final element of the vector, which makes it nearly free to call. pop_back() calls the destructor of the final element of the vector.
So clearly pop_back() cannot return a reference to an element that is destroyed. So for your syntax to work, pop_back() would have to return a copy of the element before it is destroyed.
Now, in the case where you do not want that copy, we just needlessly made a copy.
The goal of C++ standard containers is to give you nearly bare-metal performance wrapped up in nice, easy to use dressing. For the most part, they do NOT sacrifice performance for ease of use -- and a pop_back() that returned a copy of the last element would be sacrificing performance for ease of use.
There could be a pop-and-get-back method, but it would duplicate other functionality. And it would be less efficient in many cases than back-and-pop.
As a concrete example,
vector<foo> vec; // with some data in it
foo f = std::move( vec.back() ); // tells the compiler that the copy in vec is going away
vec.pop_back(); // removes the last element
note that the move had to be done before the element was destroyed to avoid creating an extra temporary copy... pop_back_and_get_value() would have to destroy the element before it returned, and the assignment would happen after it returned, which is wasteful.
A list doesn't have a get(index) method because accessing a linked list by index is very inefficient. The STL has a philosophy of only providing methods that can be implemented somewhat efficiently. If you want to access a list by index in spite of the inefficiency, it's easy to implement yourself.
The reason that pop_back doesn't return a copy is that the copy constructor of the return value will be called after the function returns (excluding RVO/NRVO). If this copy constructor throws an exception, you have removed the item from the list without properly returning a copy. This means that the method would not be exception-safe. By separating the two operations, the STL encourages programming in an exception-safe manner.
Why is the STL containers for C++ designed like this?
I think Bjarne Stroustrup put it best:
C++ is lean and mean. The underlying principle is that you don't pay
for what you don't use.
In the case of a pop() method that would return the item, consider that in order to both remove the item and return it, that item could not be returned by reference. The referent no longer exists because it was just pop()ed. It could be returned by pointer, but only if you make a new copy of the original, and that's wasteful. So it would most likely be returned by value which has the potential to make a deep copy. In many cases it won't make a deep copy (through copy elision), and in other cases that deep copy would be trivial. But in some cases, such as large buffers, that copy could be extremely expensive and in a few, such as resource locks, it might even be impossible.
C++ is intended to be general-purpose, and it is intended to be fast as possible. General-purpose doesn't necessarily mean "easy to use for simple use cases" but "an appropriate platform for the widest range of applications."
list doesn't even have simple accessors like get(index)
Why should it? A method that lets you access the n-th element from the list would hide the complexity of O(n) of the operation, and that's the reason C++ doesn't offer it. For the same reason, C++'s std::vector doesn't offer a pop_front() function, since that one would also be O(N) in the size of the vector.
Methods like pop_back and pop_front don't return the object in the list either.
The reason is exception safety. Also, since C++ has free functions, it's not hard to write such an extension to the operations of std::list or any standard container.
template<class Cont>
typename Cont::value_type return_pop_back(Cont& c){
typename Cont::value_type v = c.back();
c.pop_back();
return v;
}
It should be noted, though, that the above function is not exception-safe, meaning if the return v; throws, you'll have a changed container and a lost object.
Concerning pop()-like functions, there are two things (at least) to consider:
1) There is no clear and safe action for a returning pop_back() or pop_front() for cases when there is nothing there to return.
2) These functions would return by value. If there were an exception thrown in the copy constructor of the type stored in the container, the item would be removed from the container and lost. I guess this was deemed to be undesirable and unsafe.
Concerning access to list, it is a general design principle of the standard library not to avoid providing inefficient operations. std::list is a double-linked list, and accessing a list element by index means traversing the list from the beginning or end until you get to the desired position. If you want to do this, you can provide your own helper function. But if you need random access to elements, then you should probably use a structure other than a list.
In Java a pop of a general interface can return a reference to the object popped.
In C++ returning the corresponding thing is to return by value.
But in the case of non-movable non-POD objects the copy construction might throw an exception. Then, an element would have been removed and yet not have been made accessible to the client code. A convenience return-by-value popper can always be defined in terms of more basic inspector and pure popper, but not vice versa.
This is also a difference in philosophy.
With C++ the standard library only provides basic building blocks, not directly usable functionality (in general). The idea is that you're free to choose from thousands of third party libraries, but that freedom of choice comes at a great cost, in usability, portability, training, etc. In contrast, with Java you have mostly all you need (for typical Java programming) in the standard library, but you're not effectively free to choose (which is another kind of cost).

String vs. A new Data class

I overheard two of my colleagues arguing about whether or not to create a new data model class which only contains one string field and a setter and a getter for it. A program will then create a few objects of the class and put them in an array list. The guy who is storing them argue that there should be a new type while the guy who is getting the data said there is not point going through all this trouble while you can simple store string.
Personally I prefer creating a new type so we know what's being stored in the array list, but I don't have strong arguments to persuade the 'getting' data guy. Do you?
Sarah
... a new data model class which only contains one string field and a setter and a getter for it.
If it was just a getter, then it is not possible to say in general whether a String or a custom class is better. It depends on things like:
consistency with the rest of your data model,
anticipating whether you might want to change the representation,
anticipating whether you might want to implement validation when creating an instance, add helper methods, etc,
implications for memory usage or persistence (if they are even relevant).
(Personally, I would be inclined to use a plain String by default, and only use a custom class if for example, I knew that it was likely that a future representation change / refinement would be needed. In most situations, it is not a huge problem to change a String into custom class later ... if the need arises.)
However, the fact that there is proposed to be a setter for the field changes things significantly. Instances of the class will be mutable, where instances of String are not. On the one hand this could possibly be useful; e.g. where you actually need mutability. On the other hand, mutability would make the class somewhat risky for use in certain contexts; e.g. in sets and as keys in maps. And in other contexts you may need to copy the instances. (This would be unnecessary for an immutable wrapper class or a bare String.)
(The simple answer is to get rid of the setter, unless you really need it.)
There is also the issue that the semantics of equals will be different for a String and a custom wrapper. You may therefore need to override equals and hashCode to get a more intuitive semantic in the custom wrapper case. (And that relates back to the issue of a setter, and use of the class in collections.)
Wrap it in a class, if it matches the rest of your data model's design.
It gives you a label for the string so that you can tell what it represents at run time.
It makes it easier to take your entity and add additional fields, and behavior. (Which can be a likely occurrence>)
That said, the key is if it matches the rest of your data model's design... be consistent with what you already have.
Counterpoint to mschaef's answer:
Keep it as a string, if it matches the rest of your data model's design. (See how the opening sounds so important, even if I temper it with a sentence that basically says we don't know the answer?)
If you need a label saying what it is, add a comment. Cost = one line, total. Heck, for that matter, you need a line (or three) to comment your new class, anyway, so what's the class declaration for?
If you need to add additional fields later, you can refactor it then. You can't design for everything, and if you tried, you'd end up with a horrible mess.
As Yegge says, "the worst thing that can happen to a code base is size". Add a class declaration, a getter, a setter, now call those from everywhere that touches it, and you've added size to your code without an actual (i.e., non-hypothetical) purpose.
I disagree with the other answers:
It depends whether there's any real possibility of adding behavior to the type later [Matthew Flaschen]
No, it doesn’t. …
Never hurts to future-proof the design [Alex]
True, but not relevant here …
Personally, I would be inclined to use a plain String by default [Stephen C]
But this isn’t a matter of opinion. It’s a matter of design decisions:
Is the entity you store logically a string, a piece of text? If yes, then store a string (ignoring the setter issue).
If not – then do not store a string. That data may be stored as a string is an implementation detail, it should not be reflected in your code.
For the second point it’s irrelevant whether you might want to add behaviour later on. All that matters is that in a strongly typed language, the data type should describe the logical entity. If you handle things that are not text (but may be represented by text, may contain text …) then use a class that internally stores said text. Do not store the text directly.
This is the whole point of abstraction and strong typing: let the types represent the semantics of your code.
And finally:
As Yegge says, "the worst thing that can happen to a code base is size". [Ken]
Well, this is so ironic. Have you read any of Steve Yegge’s blog posts? I haven’t, they’re just too damn long.
It depends whether there's any real possibility of adding behavior to the type later. Even if the getters and setters are trivial now, a type makes sense if there is a real chance they could do something later. Otherwise, clear variable names should be sufficient.
In the time spent discussing whether to wrap it in a class, it could be wrapped and done with. Never hurts to future-proof the design, especially when it only takes minimal effort.
I see no reason why the String should be wrapped in a class. The basic perception behind the discussion is, the need of time is a String object. If it gets augmented later, get it refactored then. Why add unnecessary code in the name of future proofing.
Wrapping it in a class provides you with more type safety - in your model you can then only use instances of the wrapper class, and you can't easily make a mistake where you put a string that contains something different into the model.
However, it does add overhead, extra complexity and verbosity to your code.

Why can't strings be mutable in Java and .NET?

Why is it that they decided to make String immutable in Java and .NET (and some other languages)? Why didn't they make it mutable?
According to Effective Java, chapter 4, page 73, 2nd edition:
"There are many good reasons for this: Immutable classes are easier to
design, implement, and use than mutable classes. They are less prone
to error and are more secure.
[...]
"Immutable objects are simple. An immutable object can be in
exactly one state, the state in which it was created. If you make sure
that all constructors establish class invariants, then it is
guaranteed that these invariants will remain true for all time, with
no effort on your part.
[...]
Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads
accessing them concurrently. This is far and away the easiest approach
to achieving thread safety. In fact, no thread can ever observe any
effect of another thread on an immutable object. Therefore,
immutable objects can be shared freely
[...]
Other small points from the same chapter:
Not only can you share immutable objects, but you can share their internals.
[...]
Immutable objects make great building blocks for other objects, whether mutable or immutable.
[...]
The only real disadvantage of immutable classes is that they require a separate object for each distinct value.
There are at least two reasons.
First - security http://www.javafaq.nu/java-article1060.html
The main reason why String made
immutable was security. Look at this
example: We have a file open method
with login check. We pass a String to
this method to process authentication
which is necessary before the call
will be passed to OS. If String was
mutable it was possible somehow to
modify its content after the
authentication check before OS gets
request from program then it is
possible to request any file. So if
you have a right to open text file in
user directory but then on the fly
when somehow you manage to change the
file name you can request to open
"passwd" file or any other. Then a
file can be modified and it will be
possible to login directly to OS.
Second - Memory efficiency http://hikrish.blogspot.com/2006/07/why-string-class-is-immutable.html
JVM internally maintains the "String
Pool". To achive the memory
efficiency, JVM will refer the String
object from pool. It will not create
the new String objects. So, whenever
you create a new string literal, JVM
will check in the pool whether it
already exists or not. If already
present in the pool, just give the
reference to the same object or create
the new object in the pool. There will
be many references point to the same
String objects, if someone changes the
value, it will affect all the
references. So, sun decided to make it
immutable.
Actually, the reasons string are immutable in java doesn't have much to do with security. The two main reasons are the following:
Thead Safety:
Strings are extremely widely used type of object. It is therefore more or less guaranteed to be used in a multi-threaded environment. Strings are immutable to make sure that it is safe to share strings among threads. Having an immutable strings ensures that when passing strings from thread A to another thread B, thread B cannot unexpectedly modify thread A's string.
Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.
Performance:
While String interning has been mentioned, it only represents a small gain in memory efficiency for Java programs. Only string literals are interned. This means that only the strings which are the same in your source code will share the same String Object. If your program dynamically creates string that are the same, they will be represented in different objects.
More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:
| myString |
v v
"The quick brown fox jumps over the lazy dog" <-- shared char[]
^ ^
| | myString.substring(0,5)
This makes this kind of operations extremely cheap, and O(1) since the operation neither depends on the length of the original string, nor on the length of the substring we need to extract. This behavior also has some memory benefits, since many strings can share their underlying char[].
Thread safety and performance. If a string cannot be modified it is safe and quick to pass a reference around among multiple threads. If strings were mutable, you would always have to copy all of the bytes of the string to a new instance, or provide synchronization. A typical application will read a string 100 times for every time that string needs to be modified. See wikipedia on immutability.
One should really ask, "why should X be mutable?" It's better to default to immutability, because of the benefits already mentioned by Princess Fluff. It should be an exception that something is mutable.
Unfortunately most of the current programming languages default to mutability, but hopefully in the future the default is more on immutablity (see A Wish List for the Next Mainstream Programming Language).
Wow! I Can't believe the misinformation here. Strings being immutable have nothing with security. If someone already has access to the objects in a running application (which would have to be assumed if you are trying to guard against someone 'hacking' a String in your app), they would certainly be a plenty of other opportunities available for hacking.
It's a quite novel idea that the immutability of String is addressing threading issues. Hmmm ... I have an object that is being changed by two different threads. How do I resolve this? synchronize access to the object? Naawww ... let's not let anyone change the object at all -- that'll fix all of our messy concurrency issues! In fact, let's make all objects immutable, and then we can removed the synchonized contruct from the Java language.
The real reason (pointed out by others above) is memory optimization. It is quite common in any application for the same string literal to be used repeatedly. It is so common, in fact, that decades ago, many compilers made the optimization of storing only a single instance of a String literal. The drawback of this optimization is that runtime code that modifies a String literal introduces a problem because it is modifying the instance for all other code that shares it. For example, it would be not good for a function somewhere in an application to change the String literal "dog" to "cat". A printf("dog") would result in "cat" being written to stdout. For that reason, there needed to be a way of guarding against code that attempts to change String literals (i. e., make them immutable). Some compilers (with support from the OS) would accomplish this by placing String literal into a special readonly memory segment that would cause a memory fault if a write attempt was made.
In Java this is known as interning. The Java compiler here is just following an standard memory optimization done by compilers for decades. And to address the same issue of these String literals being modified at runtime, Java simply makes the String class immutable (i. e, gives you no setters that would allow you to change the String content). Strings would not have to be immutable if interning of String literals did not occur.
String is not a primitive type, yet you normally want to use it with value semantics, i.e. like a value.
A value is something you can trust won't change behind your back.
If you write: String str = someExpr();
You don't want it to change unless YOU do something with str.
String as an Object has naturally pointer semantics, to get value semantics as well it needs to be immutable.
One factor is that, if Strings were mutable, objects storing Strings would have to be careful to store copies, lest their internal data change without notice. Given that Strings are a fairly primitive type like numbers, it is nice when one can treat them as if they were passed by value, even if they are passed by reference (which also helps to save on memory).
I know this is a bump, but...
Are they really immutable?
Consider the following.
public static unsafe void MutableReplaceIndex(string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
...
string s = "abc";
MutableReplaceIndex(s, '1', 0);
MutableReplaceIndex(s, '2', 1);
MutableReplaceIndex(s, '3', 2);
Console.WriteLine(s); // Prints 1 2 3
You could even make it an extension method.
public static class Extensions
{
public static unsafe void MutableReplaceIndex(this string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
}
Which makes the following work
s.MutableReplaceIndex('1', 0);
s.MutableReplaceIndex('2', 1);
s.MutableReplaceIndex('3', 2);
Conclusion: They're in an immutable state which is known by the compiler. Of couse the above only applies to .NET strings as Java doesn't have pointers. However a string can be entirely mutable using pointers in C#. It's not how pointers are intended to be used, has practical usage or is safely used; it's however possible, thus bending the whole "mutable" rule. You can normally not modify an index directly of a string and this is the only way. There is a way that this could be prevented by disallowing pointer instances of strings or making a copy when a string is pointed to, but neither is done, which makes strings in C# not entirely immutable.
For most purposes, a "string" is (used/treated as/thought of/assumed to be) a meaningful atomic unit, just like a number.
Asking why the individual characters of a string are not mutable is therefore like asking why the individual bits of an integer are not mutable.
You should know why. Just think about it.
I hate to say it, but unfortunately we're debating this because our language sucks, and we're trying to using a single word, string, to describe a complex, contextually situated concept or class of object.
We perform calculations and comparisons with "strings" similar to how we do with numbers. If strings (or integers) were mutable, we'd have to write special code to lock their values into immutable local forms in order to perform any kind of calculation reliably. Therefore, it is best to think of a string like a numeric identifier, but instead of being 16, 32, or 64 bits long, it could be hundreds of bits long.
When someone says "string", we all think of different things. Those who think of it simply as a set of characters, with no particular purpose in mind, will of course be appalled that someone just decided that they should not be able to manipulate those characters. But the "string" class isn't just an array of characters. It's a STRING, not a char[]. There are some basic assumptions about the concept we refer to as a "string", and it generally can be described as meaningful, atomic unit of coded data like a number. When people talk about "manipulating strings", perhaps they're really talking about manipulating characters to build strings, and a StringBuilder is great for that. Just think a bit about what the word "string" truly means.
Consider for a moment what it would be like if strings were mutable. The following API function could be tricked into returning information for a different user if the mutable username string is intentionally or unintentionally modified by another thread while this function is using it:
string GetPersonalInfo( string username, string password )
{
string stored_password = DBQuery.GetPasswordFor( username );
if (password == stored_password)
{
//another thread modifies the mutable 'username' string
return DBQuery.GetPersonalInfoFor( username );
}
}
Security isn't just about 'access control', it's also about 'safety' and 'guaranteeing correctness'. If a method can't be easily written and depended upon to perform a simple calculation or comparison reliably, then it's not safe to call it, but it would be safe to call into question the programming language itself.
Immutability is not so closely tied to security. For that, at least in .NET, you get the SecureString class.
Later edit: In Java you will find GuardedString, a similar implementation.
The decision to have string mutable in C++ causes a lot of problems, see this excellent article by Kelvin Henney about Mad COW Disease.
COW = Copy On Write.
It's a trade off. Strings go into the String pool and when you create multiple identical Strings they share the same memory. The designers figured this memory saving technique would work well for the common case, since programs tend to grind over the same strings a lot.
The downside is that concatenations make a lot of extra Strings that are only transitional and just become garbage, actually harming memory performance. You have StringBuffer and StringBuilder (in Java, StringBuilder is also in .NET) to use to preserve memory in these cases.
Strings in Java are not truly immutable, you can change their value's using reflection and or class loading. You should not be depending on that property for security.
For examples see: Magic Trick In Java
Immutability is good. See Effective Java. If you had to copy a String every time you passed it around, then that would be a lot of error-prone code. You also have confusion as to which modifications affect which references. In the same way that Integer has to be immutable to behave like int, Strings have to behave as immutable to act like primitives. In C++ passing strings by value does this without explicit mention in the source code.
There is an exception for nearly almost every rule:
using System;
using System.Runtime.InteropServices;
namespace Guess
{
class Program
{
static void Main(string[] args)
{
const string str = "ABC";
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
var handle = GCHandle.Alloc(str, GCHandleType.Pinned);
try
{
Marshal.WriteInt16(handle.AddrOfPinnedObject(), 4, 'Z');
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
}
finally
{
handle.Free();
}
}
}
}
It's largely for security reasons. It's much harder to secure a system if you can't trust that your Strings are tamperproof.

Categories

Resources