Related
That would be so obviously useful that I am starting to think I am missing a rationale to avoid it, since I am sure Oracle would have made it that way. It would be the most valuable feature on Optional for me.
public class TestOptionals{
public static void main(String[] args) {
test(null);
}
public static void test(Optional<Object> optional){
System.out.println(optional.orElse(new DefaultObject()));
}
}
(This throws a NullPointerException)
Without that feature I see too verbose using Optional for the argument.
I prefer a simple Object optional signature and
checking it by if (null = optional) that creating the object Optional for comparing later. It is not valuable if that doesn't help you checking the null
There was a HUGE discussion of Optional on all the various Java mailing lists, comprising hundreds of messages. Do a web search for
site:mail.openjdk.java.net optional
and you'll get links to lots of them. Of course, I can't even hope to summarize all the issues that were raised. There was a lot of controversy, and there was quite a breadth of opinion about how much "optionality" should be added to the platform. Some people thought that a library solution shouldn't be added at all; some people thought that a library solution was useless without language support; some people thought that a library solution was OK, but there was an enormous amount of quibbling about what should be in it; and so forth. See this message from Brian Goetz on the lambda-dev mailing list for a bit of perspective.
One pragmatic decision made by the lambda team was that any optional-like feature couldn't involve any language changes. The language and compiler team already had its hands full with lambda and default methods. These of course were the main priorities. Practically speaking, the choices were either to add Optional as a library class or not at all.
Certainly people were aware of other languages' type systems that support option types. This would be a big change to Java's type system. The fact is that for most of the past 20 years, reference types have been nullable, and there's been a single, untyped null value. Changing this is a massive undertaking. It might not even be possible to do this in a compatible way. I'm not an expert in this area, but most such discussions have tended to go off into the weeds pretty quickly.
A smaller change that might be more tractable (also mentioned by Marko Topolnik) is to consider the relationship between reference types and Optional as one of boxing, and then bring in the support for autoboxing/autounboxing that's already in the language.
Already this is somewhat problematic. When auto(un)boxing was added in Java 5, it made a large number of cases much nicer, but it added a lot of rough edges to the language. For example, with auto-unboxing, one can now use < and > to compare the values of boxed Integer objects. Unfortunately, using == still compares references instead of values! Boxing also made overload resolution more complicated; it's one of the most complicated areas of the language today.
Now let's consider auto(un)boxing between reference types and Optional types. This would let you do:
Optional<String> os1 = "foo";
Optional<String> os2 = null;
In this code, os1 would end up as a boxed string value, and os2 would end up as an empty Optional. So far, so good. Now the reverse:
String s1 = os1;
String s2 = os2;
Now s1 would get the unboxed string "foo", and s2 would be unboxed to null, I guess. But the point of Optional was to make such unboxing explicit, so that programmers would be confronted with a decision about what to do with an empty Optional instead of having it just turn into null.
Hmmm, so maybe let's just do autoboxing of Optional but not autounboxing. Let's return to the OP's use case:
public static void main(String[] args) {
test(null);
}
public static void test(Optional<Object> optional) {
System.out.println(optional.orElse(new DefaultObject()));
}
If you really want to use Optional, you can manually box it one line:
public static void test(Object arg) {
Optional<Object> optional = Optional.ofNullable(arg);
System.out.println(optional.orElse(new DefaultObject()));
}
Obviously it might be nicer if you didn't have to write this, but it would take an enormous amount of language/compiler work, and compatibility risk, to save this line of code. Is it really worth it?
What seems to be going on is that this would allow the caller to pass null in order to have some specific meaning to the callee, such as "use the default object" instead. In small examples this seems fine, but in general, loading semantics onto null increasingly seems like a bad idea. So this is an additional reason not to add specific language support for boxing of null. The Optional.ofNullable() method mainly is there to bridge the gap between code that uses null and code that uses Optional.
If you are committed to using the Optional class, then see the other answers.
On the other hand, I interpreted your question as, "Would it be a good idea to avoid the syntactic overhead of using Optional while still obtaining a guarantee of no null pointer exceptions in your code?" The answer to this question is a resounding yes. Luckily, Java has a feature, type annotations, that enables this. It does not require use of the the Optional class.
You can obtain the same compile-time guarantees, without adding Optional to your code and while retaining backward compatibility with existing code.
Annotate references that might be null with the #Nullable type
annotation.
Run a compiler plugin such as the Checker Framework's Nullness Checker.
If the plugin issues no errors, then you know that your code always checks for null where it needs to, and therefore your code never issues a null pointer exception exception at run time.
The plugin handles the special cases mentioned by #immibis and more, so your code is much less verbose than code using Optional. The plugin is compatible with normal Java code and does not require use of Java 8 as Optional does. It is in daily use at companies such as Google.
Note that this approach requires you to supply a command-line argument to the compiler to tell it to run the plugin. Also note that this approach does not integrate with Optional; it is an alternate approach.
So I do know that java Strings are immutable.
There are a bunch of methods that replace characters in a string in java.
So every time these methods are called, would it involve creation of a brand new String, therefore increasing the space complexity, or would replacement be done in the original String itself. I'm a little confused on this concept as to whether each of these replace statements in my code would be generating new Strings each time, and thus consuming more memory?
You noted correctly that String objects in Java are immutable. The only case when replacement, substring, etc. methods do not create a new object is when the replacement is a no-op. For example, if you ask to replace all 'x' characters in a "Hello, world!" string, there would be no new String object created. Similarly, there would be no new object when you call str.substring(0), because the entire string is returned. In all other cases, when the return value differs from the original, a new object is created.
Yes. You have noted it correctly. That immutability of String type has some consequences.
That is why the designers of Java have bring another type that should be used when you perform operations with char sequences.
A class called StringBuilder, should be used when you perform lot of operations that involve characters manipulations like replace. Off course it it more robust and requires more attention to details but that is all when you care about the performance.
So the immutability of String type do not increase memory usage. What increase it is wrong usage of String type.
They generate new ones each time; that is the corollary to being immutable.
It's true, in some sense, that it increases the 'space complexity', in that it uses more memory than the most efficient possible algorithms for replacement, but it's not as bad as it sounds; the transient objects created during the replaceAll operation and others like it are garbage collected very quickly; java is very efficient at garbage collecting transient objects. See http://www.infoq.com/articles/Java_Garbage_Collection_Distilled for an interesting writeup on some garbage collection basics.
It is true that it will return a new String , but unless the call is part of some giant loop or recursive function, one need not worry too much.
But if you purposefully wanted to crash your system, I'm sure you can think up some way.
JDK misses mutating operations for character sequences, i.e. StringBuilder for some reason does not implement replacement functionality.
A possible option would be to use third party libraries, i.e. a MutableString. It is available in Maven Central.
I recently had the following thoughts: when you define your object and override toString method, that might be called multiple times during the program execution. I am not sure about how certain UI components refresh themselves (does JTable upon refresh calls each cell-member toString method) or whether debugger calls toString every time when you step on instruction that modifies the object etc. Anyway, I was thinking whether it would be beneficial to define a lazily evaluated String as our to String definition if our structure is IMMUTABLE:
private String toString;
//.. definitions of many components, sets, lists which won't change
public String toString(){
if (toString == null) // instantiate
return toString;
}
Does the above is worth doing?
If the string creation process is long and the created string is often used, you should store it. If not, it depends on how many times you will call the toString() method (always the same old war between memory and CPU time)
If your classe is immutable and you know that the toString() method will be called at least once, you should instanciate the string in the constructor (or in the factory), instead of always checking.
The standard rules of optimisation come into play here.
Don't do it.
(For experts only) Don't do it yet - at least not until you've got profiling information illustrating why it's a problem that needs solving.
You're losing clarity and maintainability for performance reasons, so quantify what those performance reasons are and whether they're worth the cost.
You should not need to worry about this due to Javas String Literal Pool. Assuming that you are not creating lots of unique strings Java will traverse your code on loading of the class file or initialization and find string, adding them to the String Literal Pool.
This is a great article and overview on this with nice diagrams - String Literal Pool Overview
Other suggestions are to use StringBuffers as the independent parts would not added to the buffer only the output string.
Making StringBuffer static would also avoid creating a new object each time, but it really depends on how often you call toString().
If this is really a bottle neck running a profiler as Tom Johnson suggested would be the next direction to take.
I'm looking at some Java code that are maintained by other parts of the company, incidentally some former C and C++ devs. One thing that is ubiquitous is the use of static integer constants, such as
class Engine {
private static int ENGINE_IDLE = 0;
private static int ENGINE_COLLECTING = 1;
...
}
Besides a lacking 'final' qualifier, I'm a bit bothered by this kind of code. What I would have liked to see, being trained primarily in Java from school, would be something more like
class Engine {
private enum State { Idle, Collecting };
...
}
However, the arguments fail me. Why, if at all, is the latter better than the former?
Why, if at all, is the latter better
than the former?
It is much better because it gives you type safety and is self-documenting. With integer constants, you have to look at the API doc to find out what values are valid, and nothing prevents you from using invalid values (or, perhaps worse, integer constants that are completely unrelated). With Enums, the method signature tells you directly what values are valid (IDE autocompletion will work) and it's impossible to use an invalid value.
The "integer constant enums" pattern is unfortunately very common, even in the Java Standard API (and widely copied from there) because Java did not have Enums prior to Java 5.
An excerpt from the official docs, http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html:
This pattern has many problems, such as:
Not typesafe - Since a season is just an int you can pass in any other int value where a season is required, or add two seasons together (which makes no sense).
No namespace - You must prefix constants of an int enum with a string (in this case SEASON_) to avoid collisions with other int enum types.
Brittleness - Because int enums are compile-time constants, they are compiled into clients that use them. If a new constant is added between two existing constants or the order is changed, clients must be recompiled. If they are not, they will still run, but their behavior will be undefined.
Printed values are uninformative - Because they are just ints, if you print one out all you get is a number, which tells you nothing about what it represents, or even what type it is.
And this just about covers it. A one word argument would be that enums are just more readable and informative.
One more thing is that enums, like classes. can have fields and methods. This gives you the option to encompass some additional information about each type of state in the enum itself.
Because enums provide type safety. In the first case, you can pass any integer and if you use enum you are restricted to Idle and Collecting.
FYI : http://www.javapractices.com/topic/TopicAction.do?Id=1.
By using an int to refer to a constant, you're not forcing someone to actually use that constant. So, for example, you might have a method which takes an engine state, to which someone might happy invoke with:
engine.updateState(1);
Using an enum forces the user to stick with the explanatory label, so it is more legible.
There is one situation when static constance is preferred (rather that the code is legacy with tonne of dependency) and that is when the member of that value are not/may later not be finite.
Imagine if you may later add new state like Collected. The only way to do it with enum is to edit the original code which can be problem if the modification is done when there are already a lot of code manipulating it. Other than this, I personally see no reason why enum is not used.
Just my thought.
Readabiliy - When you use enums and do State.Idle, the reader immediately knows that you are talking about an idle state. Compare this with 4 or 5.
Type Safety - When use enum, even by mistake the user cannot pass a wrong value, as compiler will force him to use one of the pre-declared values in the enum. In case of simple integers, he could even pass -3274.
Maintainability - If you wanted to add a new state Waiting, then it would be very easy to add new state by adding a constant Waiting in your enum State without casuing any confusion.
The reasons from the spec, which Lajcik quotes, are explained in more detail in Josh Bloch's Effective Java, Item 30. If you have access to that book, I'd recommend perusing it. Java Enums are full-fledged classes which is why you get compile-time type safety. You can also give them behavior, giving you better encapsulation.
The former is common in code that started pre-1.5. Actually, another common idiom was to define your constants in an interface, because they didn't have any code.
Enums also give you a great deal of flexibility. Since Enums are essentially classes, you can augment them with useful methods (such as providing an internationalized resource string corresponding to a certain value in the enumeration, converting back and forth between instances of the enum type and other representations that may be required, etc.)
Why is it that they decided to make String immutable in Java and .NET (and some other languages)? Why didn't they make it mutable?
According to Effective Java, chapter 4, page 73, 2nd edition:
"There are many good reasons for this: Immutable classes are easier to
design, implement, and use than mutable classes. They are less prone
to error and are more secure.
[...]
"Immutable objects are simple. An immutable object can be in
exactly one state, the state in which it was created. If you make sure
that all constructors establish class invariants, then it is
guaranteed that these invariants will remain true for all time, with
no effort on your part.
[...]
Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads
accessing them concurrently. This is far and away the easiest approach
to achieving thread safety. In fact, no thread can ever observe any
effect of another thread on an immutable object. Therefore,
immutable objects can be shared freely
[...]
Other small points from the same chapter:
Not only can you share immutable objects, but you can share their internals.
[...]
Immutable objects make great building blocks for other objects, whether mutable or immutable.
[...]
The only real disadvantage of immutable classes is that they require a separate object for each distinct value.
There are at least two reasons.
First - security http://www.javafaq.nu/java-article1060.html
The main reason why String made
immutable was security. Look at this
example: We have a file open method
with login check. We pass a String to
this method to process authentication
which is necessary before the call
will be passed to OS. If String was
mutable it was possible somehow to
modify its content after the
authentication check before OS gets
request from program then it is
possible to request any file. So if
you have a right to open text file in
user directory but then on the fly
when somehow you manage to change the
file name you can request to open
"passwd" file or any other. Then a
file can be modified and it will be
possible to login directly to OS.
Second - Memory efficiency http://hikrish.blogspot.com/2006/07/why-string-class-is-immutable.html
JVM internally maintains the "String
Pool". To achive the memory
efficiency, JVM will refer the String
object from pool. It will not create
the new String objects. So, whenever
you create a new string literal, JVM
will check in the pool whether it
already exists or not. If already
present in the pool, just give the
reference to the same object or create
the new object in the pool. There will
be many references point to the same
String objects, if someone changes the
value, it will affect all the
references. So, sun decided to make it
immutable.
Actually, the reasons string are immutable in java doesn't have much to do with security. The two main reasons are the following:
Thead Safety:
Strings are extremely widely used type of object. It is therefore more or less guaranteed to be used in a multi-threaded environment. Strings are immutable to make sure that it is safe to share strings among threads. Having an immutable strings ensures that when passing strings from thread A to another thread B, thread B cannot unexpectedly modify thread A's string.
Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.
Performance:
While String interning has been mentioned, it only represents a small gain in memory efficiency for Java programs. Only string literals are interned. This means that only the strings which are the same in your source code will share the same String Object. If your program dynamically creates string that are the same, they will be represented in different objects.
More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:
| myString |
v v
"The quick brown fox jumps over the lazy dog" <-- shared char[]
^ ^
| | myString.substring(0,5)
This makes this kind of operations extremely cheap, and O(1) since the operation neither depends on the length of the original string, nor on the length of the substring we need to extract. This behavior also has some memory benefits, since many strings can share their underlying char[].
Thread safety and performance. If a string cannot be modified it is safe and quick to pass a reference around among multiple threads. If strings were mutable, you would always have to copy all of the bytes of the string to a new instance, or provide synchronization. A typical application will read a string 100 times for every time that string needs to be modified. See wikipedia on immutability.
One should really ask, "why should X be mutable?" It's better to default to immutability, because of the benefits already mentioned by Princess Fluff. It should be an exception that something is mutable.
Unfortunately most of the current programming languages default to mutability, but hopefully in the future the default is more on immutablity (see A Wish List for the Next Mainstream Programming Language).
Wow! I Can't believe the misinformation here. Strings being immutable have nothing with security. If someone already has access to the objects in a running application (which would have to be assumed if you are trying to guard against someone 'hacking' a String in your app), they would certainly be a plenty of other opportunities available for hacking.
It's a quite novel idea that the immutability of String is addressing threading issues. Hmmm ... I have an object that is being changed by two different threads. How do I resolve this? synchronize access to the object? Naawww ... let's not let anyone change the object at all -- that'll fix all of our messy concurrency issues! In fact, let's make all objects immutable, and then we can removed the synchonized contruct from the Java language.
The real reason (pointed out by others above) is memory optimization. It is quite common in any application for the same string literal to be used repeatedly. It is so common, in fact, that decades ago, many compilers made the optimization of storing only a single instance of a String literal. The drawback of this optimization is that runtime code that modifies a String literal introduces a problem because it is modifying the instance for all other code that shares it. For example, it would be not good for a function somewhere in an application to change the String literal "dog" to "cat". A printf("dog") would result in "cat" being written to stdout. For that reason, there needed to be a way of guarding against code that attempts to change String literals (i. e., make them immutable). Some compilers (with support from the OS) would accomplish this by placing String literal into a special readonly memory segment that would cause a memory fault if a write attempt was made.
In Java this is known as interning. The Java compiler here is just following an standard memory optimization done by compilers for decades. And to address the same issue of these String literals being modified at runtime, Java simply makes the String class immutable (i. e, gives you no setters that would allow you to change the String content). Strings would not have to be immutable if interning of String literals did not occur.
String is not a primitive type, yet you normally want to use it with value semantics, i.e. like a value.
A value is something you can trust won't change behind your back.
If you write: String str = someExpr();
You don't want it to change unless YOU do something with str.
String as an Object has naturally pointer semantics, to get value semantics as well it needs to be immutable.
One factor is that, if Strings were mutable, objects storing Strings would have to be careful to store copies, lest their internal data change without notice. Given that Strings are a fairly primitive type like numbers, it is nice when one can treat them as if they were passed by value, even if they are passed by reference (which also helps to save on memory).
I know this is a bump, but...
Are they really immutable?
Consider the following.
public static unsafe void MutableReplaceIndex(string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
...
string s = "abc";
MutableReplaceIndex(s, '1', 0);
MutableReplaceIndex(s, '2', 1);
MutableReplaceIndex(s, '3', 2);
Console.WriteLine(s); // Prints 1 2 3
You could even make it an extension method.
public static class Extensions
{
public static unsafe void MutableReplaceIndex(this string s, char c, int i)
{
fixed (char* ptr = s)
{
*((char*)(ptr + i)) = c;
}
}
}
Which makes the following work
s.MutableReplaceIndex('1', 0);
s.MutableReplaceIndex('2', 1);
s.MutableReplaceIndex('3', 2);
Conclusion: They're in an immutable state which is known by the compiler. Of couse the above only applies to .NET strings as Java doesn't have pointers. However a string can be entirely mutable using pointers in C#. It's not how pointers are intended to be used, has practical usage or is safely used; it's however possible, thus bending the whole "mutable" rule. You can normally not modify an index directly of a string and this is the only way. There is a way that this could be prevented by disallowing pointer instances of strings or making a copy when a string is pointed to, but neither is done, which makes strings in C# not entirely immutable.
For most purposes, a "string" is (used/treated as/thought of/assumed to be) a meaningful atomic unit, just like a number.
Asking why the individual characters of a string are not mutable is therefore like asking why the individual bits of an integer are not mutable.
You should know why. Just think about it.
I hate to say it, but unfortunately we're debating this because our language sucks, and we're trying to using a single word, string, to describe a complex, contextually situated concept or class of object.
We perform calculations and comparisons with "strings" similar to how we do with numbers. If strings (or integers) were mutable, we'd have to write special code to lock their values into immutable local forms in order to perform any kind of calculation reliably. Therefore, it is best to think of a string like a numeric identifier, but instead of being 16, 32, or 64 bits long, it could be hundreds of bits long.
When someone says "string", we all think of different things. Those who think of it simply as a set of characters, with no particular purpose in mind, will of course be appalled that someone just decided that they should not be able to manipulate those characters. But the "string" class isn't just an array of characters. It's a STRING, not a char[]. There are some basic assumptions about the concept we refer to as a "string", and it generally can be described as meaningful, atomic unit of coded data like a number. When people talk about "manipulating strings", perhaps they're really talking about manipulating characters to build strings, and a StringBuilder is great for that. Just think a bit about what the word "string" truly means.
Consider for a moment what it would be like if strings were mutable. The following API function could be tricked into returning information for a different user if the mutable username string is intentionally or unintentionally modified by another thread while this function is using it:
string GetPersonalInfo( string username, string password )
{
string stored_password = DBQuery.GetPasswordFor( username );
if (password == stored_password)
{
//another thread modifies the mutable 'username' string
return DBQuery.GetPersonalInfoFor( username );
}
}
Security isn't just about 'access control', it's also about 'safety' and 'guaranteeing correctness'. If a method can't be easily written and depended upon to perform a simple calculation or comparison reliably, then it's not safe to call it, but it would be safe to call into question the programming language itself.
Immutability is not so closely tied to security. For that, at least in .NET, you get the SecureString class.
Later edit: In Java you will find GuardedString, a similar implementation.
The decision to have string mutable in C++ causes a lot of problems, see this excellent article by Kelvin Henney about Mad COW Disease.
COW = Copy On Write.
It's a trade off. Strings go into the String pool and when you create multiple identical Strings they share the same memory. The designers figured this memory saving technique would work well for the common case, since programs tend to grind over the same strings a lot.
The downside is that concatenations make a lot of extra Strings that are only transitional and just become garbage, actually harming memory performance. You have StringBuffer and StringBuilder (in Java, StringBuilder is also in .NET) to use to preserve memory in these cases.
Strings in Java are not truly immutable, you can change their value's using reflection and or class loading. You should not be depending on that property for security.
For examples see: Magic Trick In Java
Immutability is good. See Effective Java. If you had to copy a String every time you passed it around, then that would be a lot of error-prone code. You also have confusion as to which modifications affect which references. In the same way that Integer has to be immutable to behave like int, Strings have to behave as immutable to act like primitives. In C++ passing strings by value does this without explicit mention in the source code.
There is an exception for nearly almost every rule:
using System;
using System.Runtime.InteropServices;
namespace Guess
{
class Program
{
static void Main(string[] args)
{
const string str = "ABC";
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
var handle = GCHandle.Alloc(str, GCHandleType.Pinned);
try
{
Marshal.WriteInt16(handle.AddrOfPinnedObject(), 4, 'Z');
Console.WriteLine(str);
Console.WriteLine(str.GetHashCode());
}
finally
{
handle.Free();
}
}
}
}
It's largely for security reasons. It's much harder to secure a system if you can't trust that your Strings are tamperproof.