Am I correctly interning my Strings? - java

I want to make sure I don't pummel the permgen space, so I'm carefully interning my strings.
Are these two statements equivalent ?
String s1 = ( "hello" + "world" ).intern();
String s2 = "hello".intern() + "world".intern();
UPDATE
How I framed my question was totally different from the actual application. Here's the method where I am using intern.
public String toAddress( Transport transport )
{
Constraint.NonNullArgument.check( transport, "transport" );
switch( transport )
{
case GOOGLE:
case MSN:
return ( transport.code() + PERIOD + _domain ).intern();
case YAHOO:
default:
return _domain;
}
}
private String _domain; // is initialized during constructor
private static final String PERIOD = ".";

The best advice I can think of is: don't bother. Statically declared String's will be in the constant pool any how so unless you are dynamically creating a String that is...errr no I can't think of a reason.
I've been programming using Java since 97 and I've never actually used String.intern().
EDIT: After seeing your update I really am of the opinion that you shouldn't be using intern(). Your method looks perfectly normal and there there is little or no reason to use intern().
My reason for this is that it is infect an optimisation and potentially a premature one at that, you are second guessing the garbage collector. If the just of you method is short lived then the resulting string will die the young generation very shortly afterwards in the next minor GC and if it isn't it'll be interned (for want of a better word) in the mature generation anyhow.
I guess the only time this could be a good idea is if you spend a bit of time with a profiler and prove that it makes a large difference to the performance of your application.

As jensgram says, the two statements are not equivalent. Two important rules:
Concatenating string literals in code ends up with a string constant, so these two statements are exactly equivalent (they'll produce identical bytecode):
String x = "foo" + "bar":
String x = "foobar";
String constants are interned automatically, you don't need to do it explicitly
Now, this concentrates on literals - are you actually calling intern on literals, or is your real use case somewhat different (e.g. interning values fetched from a database which you'll see frequently)? If so, please give us more details.
EDIT: Okay, based on the question edit: this could save some memory if you end up storing the return value of toAddress() somewhere that it'll stick around for a long time and you'll end up with the same address multiple times. If those aren't the case, interning will actually probably make things worse. I don't know for sure whether interned strings stick around forever, but it's quite possible.
This looks to me like it's unlikely to be a good use of interning, and may well be making things worse instead. You mention trying to save permgen space - why do you believe interning will help there? The concatenated strings won't end up in permgen anyway, unless I'm much mistaken.

No. Adding two interned strings together does not give you an interned string.
That said, it's pretty rare that one needs to "carefully intern one's strings". Unless you're dealing with huge numbers of identical strings, it's more trouble than it's worth.

I would say no. s1 adds "helloworld" to the pool, whereas s2 is made up of the two pooled strings "hello" and "world".

More info will help us to understand your query... Anyway...
If you manually want to intern for HelloWorld then go with first statement as in second statement you interning hello and world separately. Two statements are not identical at all.

You might want to have some form of proof (via profiling perhaps) that you are "pummelling the permgen space" before you write all your code like that.
Otherwise you may just be doing "premature optimisation" which is generally frowned upon.
See http://en.wikipedia.org/wiki/Optimization_(computer_science)#When_to_optimize for more details on why this may be a bad thing.

In many cases, "carefully interning" your strings gives you nothing but some time wasted.
Consider the following case:
void foobar(int x) {
String s1 = someMethod(x).intern();
...
...
}
So s1 is interned, no (heap) space wasted? Wrong! Most likely, the intermediary result of someMethod(x) still exists somewhere on the heap and needs to be garbage collected. That's because someMethod() somehow constructed the string, and (unless it returns a literal) it did that on the heap. But then... better look up what the permgen space is used for. It's used for metadata about classes and (ooops) the String.intern table. By interning all your strings, you are doing exactly what you wanted to avoid: Pummel the permgen space.
More information here: http://www.thesorensens.org/2006/09/09/java-permgen-space-stringintern-xml-parsing/

The amount of strings you're using has no effect on the Permananent Generation of the JVM, since we're still talking about one class.

interning strings is basically a memory leak waiting to happen :(
Unless you have a very, very good reason[1] don't do it, but leave it to the JVM.
[1] As in, "Dear Boss, Please don't fire me. I have this profiling data to support my decision to use intern" :)

Related

Does string constant pool works with singleton annotated variables too?

object Keys {
#Singleton
const val KEY_Q = "question"
#Singleton
const val KEY_ID = "quesid"
}
I am using Singleton annotation with many string variables in my singleton class. I wanted to ask as by default a string variable is stored in the constant string pool, and during the updating process, JVM checks in the string pool if the same variable is available or not, if yes then it returns the reference of the same without creating a new one.
Now I want to ask, does this process works the same when we are using singleton annotation with our string variables. If yes then is there any benefit for me to use such a class with these annotations with different variables. I am a newbie to singletons please describe in detail. Thanks
Annotations make no difference to string pool behavior in Java. If your example was Java, the #Singleton annotations would not save memory1.
There is a very simple rule that covers what goes into the string pool in Java.
If a string is a result of evaluating a compile time constant expression then a single copy is placed in the string pool. The JLS specifies what a compile time constant expression is2.
The only other circumstance in which a string goes into the string pool3 is if some code explicitly calls intern on it.
However ...
In a modern JVM on modern hardware, it is most likely to be irrelevant whether a string goes into the string pool.
The string pool is part of the heap and is garbage collected like the rest of the heap.
The space taken by each individual string literal is most likely to be trivial compared to the rest of your application's memory usage. A few bytes, compared with megabytes ... or gigabytes.
If you think you can intern strings and exploit the special property of strings in the string pool (by using == to compare strings) you are treading on very dangerous grounds. This is a micro-optimization ... and it only works if you can be sure that you have interned all of the strings. (And besides, interning is more expensive that a few string comparisons, so your attempt at optimizing might be a failure.)
Finally, since Java 9, the GC performs automatic string de-duping for strings that have survived a few GC cycles. So if you really do have a lot of string data with a lot of duplicates, the best solution is to let the GC handle it.
1 - I cannot tell you exactly what this means for your example, because you are using syntax that is not valid Java. Java doesn't have a const, val or object keywords. This looks like Kotlin.
2 - Common examples are a string literal or a concatenation of string literals, but there are others; see JLS 15.28.
3 - It is implementation dependent whether the string itself or a copy of the string goes into the pool. But it is really difficult for an application to distinguish the different behaviors.

Will the String passed from outside the java application be saved in String pool?

I read many answers but none of them really answers my question exactly.
If I've a java service running on some port and a client connects to it and calls a method like:
String data = getServiceData("clientKey");
Now my question is, will this key(clientKey) be stored in String literal pool on service side? Generally literals to be stored in constant pools are figured out at compile time but what happens to strings that are passed from outside the JVM or may be while reading a file?
String Object is serialized at your client side and deserialized and is kept in the Heap memory. If you want it to be stored in your String Pool memory, you should use the intern() method.
String value;
String data = (value =getServiceData("clientKey"))==null?null:value.intern();
Most methods which read strings from external sources (especially BufferedReader.getLine() or Java serialisation) will not intern the strings, so the answer is no.
However if you use third party libraries, they might do it: for example there are some XML/Dom parsers known to do that (at least for element names, less often for values). Also some high performance frameworks (servlet containers) to that for certain strings (for example HTTP header names).
But generally it is used very seldom in good(!) implementations as it is much less desirable as one might think. Don't forget: before you can intern a string it must exist as an object which needs to be collected anyway, so from the point of avoiding garbage using intern() does not help. It only might reduce the working set memory if those strings survive long time (which it is not in OLTP) and might speed up equality checks slightly. But typically this only helps if you do thousands of them on the same string object.
You can check yourself if the string is already interned (you should of course not do it in production code as it interns your string and it might not work in all implementations) with:
input == input.intern()?"yes":"no"`
And yes (as asked in a comment), having million instances of the same API key can happen with this. But don't be fooled to think this is a bad thing. Actually interning them would need to search for the value and deal with a growing string pool. This can take longer than processing (and freeing) the string. Especially when the JVM can optimize the string allocation with generational allocation and escape analysis.
BTW: Java 8u20 has a feature (-XX:+UseStringDeduplication -XX:+PrintStringDeduplicationStatistics) to detect duplicate strings in the background while doing the garbage collection in G1. It will combine those string arrays to reduce the memory consumption. (JEP192)

"Immutable" strings in Java - actually, it's a lie [duplicate]

We all know that String is immutable in Java, but check the following code:
String s1 = "Hello World";
String s2 = "Hello World";
String s3 = s1.substring(6);
System.out.println(s1); // Hello World
System.out.println(s2); // Hello World
System.out.println(s3); // World
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
char[] value = (char[])field.get(s1);
value[6] = 'J';
value[7] = 'a';
value[8] = 'v';
value[9] = 'a';
value[10] = '!';
System.out.println(s1); // Hello Java!
System.out.println(s2); // Hello Java!
System.out.println(s3); // World
Why does this program operate like this? And why is the value of s1 and s2 changed, but not s3?
String is immutable* but this only means you cannot change it using its public API.
What you are doing here is circumventing the normal API, using reflection. The same way, you can change the values of enums, change the lookup table used in Integer autoboxing etc.
Now, the reason s1 and s2 change value, is that they both refer to the same interned string. The compiler does this (as mentioned by other answers).
The reason s3 does not was actually a bit surprising to me, as I thought it would share the value array (it did in earlier version of Java, before Java 7u6). However, looking at the source code of String, we can see that the value character array for a substring is actually copied (using Arrays.copyOfRange(..)). This is why it goes unchanged.
You can install a SecurityManager, to avoid malicious code to do such things. But keep in mind that some libraries depend on using these kind of reflection tricks (typically ORM tools, AOP libraries etc).
*) I initially wrote that Strings aren't really immutable, just "effective immutable". This might be misleading in the current implementation of String, where the value array is indeed marked private final. It's still worth noting, though, that there is no way to declare an array in Java as immutable, so care must be taken not to expose it outside its class, even with the proper access modifiers.
As this topic seems overwhelmingly popular, here's some suggested further reading: Heinz Kabutz's Reflection Madness talk from JavaZone 2009, which covers a lot of the issues in the OP, along with other reflection... well... madness.
It covers why this is sometimes useful. And why, most of the time, you should avoid it. :-)
In Java, if two string primitive variables are initialized to the same literal, it assigns the same reference to both variables:
String Test1="Hello World";
String Test2="Hello World";
System.out.println(test1==test2); // true
That is the reason the comparison returns true. The third string is created using substring() which makes a new string instead of pointing to the same.
When you access a string using reflection, you get the actual pointer:
Field field = String.class.getDeclaredField("value");
field.setAccessible(true);
So change to this will change the string holding a pointer to it, but as s3 is created with a new string due to substring() it would not change.
You are using reflection to circumvent the immutability of String - it's a form of "attack".
There are lots of examples you can create like this (eg you can even instantiate a Void object too), but it doesn't mean that String is not "immutable".
There are use cases where this type of code may be used to your advantage and be "good coding", such as clearing passwords from memory at the earliest possible moment (before GC).
Depending on the security manager, you may not be able to execute your code.
You are using reflection to access the "implementation details" of string object. Immutability is the feature of the public interface of an object.
Visibility modifiers and final (i.e. immutability) are not a measurement against malicious code in Java; they are merely tools to protect against mistakes and to make the code more maintainable (one of the big selling points of the system). That is why you can access internal implementation details like the backing char array for Strings via reflection.
The second effect you see is that all Strings change while it looks like you only change s1. It is a certain property of Java String literals that they are automatically interned, i.e. cached. Two String literals with the same value will actually be the same object. When you create a String with new it will not be interned automatically and you will not see this effect.
#substring until recently (Java 7u6) worked in a similar way, which would have explained the behaviour in the original version of your question. It didn't create a new backing char array but reused the one from the original String; it just created a new String object that used an offset and a length to present only a part of that array. This generally worked as Strings are immutable - unless you circumvent that. This property of #substring also meant that the whole original String couldn't be garbage collected when a shorter substring created from it still existed.
As of current Java and your current version of the question there is no strange behaviour of #substring.
String immutability is from the interface perspective. You are using reflection to bypass the interface and directly modify the internals of the String instances.
s1 and s2 are both changed because they are both assigned to the same "intern" String instance. You can find out a bit more about that part from this article about string equality and interning. You might be surprised to find out that in your sample code, s1 == s2 returns true!
Which version of Java are you using? From Java 1.7.0_06, Oracle has changed the internal representation of String, especially the substring.
Quoting from Oracle Tunes Java's Internal String Representation:
In the new paradigm, the String offset and count fields have been removed, so substrings no longer share the underlying char [] value.
With this change, it may happen without reflection (???).
There are really two questions here:
Are strings really immutable?
Why is s3 not changed?
To point 1: Except for ROM there is no immutable memory in your computer. Nowadays even ROM is sometimes writable. There is always some code somewhere (whether it's the kernel or native code sidestepping your managed environment) that can write to your memory address. So, in "reality", no they are not absolutely immutable.
To point 2: This is because substring is probably allocating a new string instance, which is likely copying the array. It is possible to implement substring in such a way that it won't do a copy, but that doesn't mean it does. There are tradeoffs involved.
For example, should holding a reference to reallyLargeString.substring(reallyLargeString.length - 2) cause a large amount of memory to be held alive, or only a few bytes?
That depends on how substring is implemented. A deep copy will keep less memory alive, but it will run slightly slower. A shallow copy will keep more memory alive, but it will be faster. Using a deep copy can also reduce heap fragmentation, as the string object and its buffer can be allocated in one block, as opposed to 2 separate heap allocations.
In any case, it looks like your JVM chose to use deep copies for substring calls.
To add to the #haraldK's answer - this is a security hack which could lead to a serious impact in the app.
First thing is a modification to a constant string stored in a String Pool. When string is declared as a String s = "Hello World";, it's being places into a special object pool for further potential reusing. The issue is that compiler will place a reference to the modified version at compile time and once the user modifies the string stored in this pool at runtime, all references in code will point to the modified version. This would result into a following bug:
System.out.println("Hello World");
Will print:
Hello Java!
There was another issue I experienced when I was implementing a heavy computation over such risky strings. There was a bug which happened in like 1 out of 1000000 times during the computation which made the result undeterministic. I was able to find the problem by switching off the JIT - I was always getting the same result with JIT turned off. My guess is that the reason was this String security hack which broke some of the JIT optimization contracts.
According to the concept of pooling, all the String variables containing the same value will point to the same memory address. Therefore s1 and s2, both containing the same value of “Hello World”, will point towards the same memory location (say M1).
On the other hand, s3 contains “World”, hence it will point to a different memory allocation (say M2).
So now what's happening is that the value of S1 is being changed (by using the char [ ] value). So the value at the memory location M1 pointed both by s1 and s2 has been changed.
Hence as a result, memory location M1 has been modified which causes change in the value of s1 and s2.
But the value of location M2 remains unaltered, hence s3 contains the same original value.
The reason s3 does not actually change is because in Java when you do a substring the value character array for a substring is internally copied (using Arrays.copyOfRange()).
s1 and s2 are the same because in Java they both refer to the same interned string. It's by design in Java.
String is immutable, but through reflection you're allowed to change the String class. You've just redefined the String class as mutable in real-time. You could redefine methods to be public or private or static if you wanted.
Strings are created in permanent area of the JVM heap memory. So yes, it's really immutable and cannot be changed after being created.
Because in the JVM, there are three types of heap memory:
1. Young generation
2. Old generation
3. Permanent generation.
When any object are created, it goes into the young generation heap area and PermGen area reserved for String pooling.
Here is more detail you can go and grab more information from:
How Garbage Collection works in Java .
[Disclaimer this is a deliberately opinionated style of answer as I feel a more "don't do this at home kids" answer is warranted]
The sin is the line field.setAccessible(true); which says to violate the public api by allowing access to a private field. Thats a giant security hole which can be locked down by configuring a security manager.
The phenomenon in the question are implementation details which you would never see when not using that dangerous line of code to violate the access modifiers via reflection. Clearly two (normally) immutable strings can share the same char array. Whether a substring shares the same array depends on whether it can and whether the developer thought to share it. Normally these are invisible implementation details which you should not have to know unless you shoot the access modifier through the head with that line of code.
It is simply not a good idea to rely upon such details which cannot be experienced without violating the access modifiers using reflection. The owner of that class only supports the normal public API and is free to make implementation changes in the future.
Having said all that the line of code is really very useful when you have a gun held you your head forcing you to do such dangerous things. Using that back door is usually a code smell that you need to upgrade to better library code where you don't have to sin. Another common use of that dangerous line of code is to write a "voodoo framework" (orm, injection container, ...). Many folks get religious about such frameworks (both for and against them) so I will avoid inviting a flame war by saying nothing other than the vast majority of programmers don't have to go there.
String is immutable in nature Because there is no method to modify String object.
That is the reason They introduced StringBuilder and StringBuffer classes
This is a quick guide to everything
// Character array
char[] chr = {'O', 'K', '!'};
// this is String class
String str1 = new String(chr);
// this is concat
str1 = str1.concat("another string's ");
// this is format
System.out.println(String.format(str1 + " %s ", "string"));
// this is equals
System.out.println(str1.equals("another string"));
//this is split
for(String s: str1.split(" ")){
System.out.println(s);
}
// this is length
System.out.println(str1.length());
//gives an score of the total change in the length
System.out.println(str1.compareTo("OK!another string string's"));
// trim
System.out.println(str1.trim());
// intern
System.out.println(str1.intern());
// character at
System.out.println(str1.charAt(5));
// substring
System.out.println(str1.substring(5, 12));
// to uppercase
System.out.println(str1.toUpperCase());
// to lowerCase
System.out.println(str1.toLowerCase());
// replace
System.out.println(str1.replace("another", "hello"));
// output
// OK!another string's string
// false
// OK!another
// string's
// 20
// 7
// OK!another string's
// OK!another string's
// o
// other s
// OK!ANOTHER STRING'S
// ok!another string's
// OK!hello string's

long static Strings in short-lived objects

This might be a stupid question (or just make me look stupid :)), however I would be interested in how to work with long String objects in the context of short-lived objects.
Think about long SQL queries in cron job or anonymous, command or function-like classes. These are very short-lived classes and even will use these long Strings once in their lifetime for most of the time. What is better? To construct a String inline and let it be collected with the instance, or make it static final anyway and let them sit in the memory useless until the classes next instantiation?
Well, there's only so much control you can have over what happens to the String.
Even if you create it inline, that String will most probably be added to the String constant pool of the JVM, and will be reused when you declare it again, so in practice, you'll probably be reusing the same String object either way.
Unless the String is so huge that it has an impact on your application's performance, I wouldn't worry about it and choose the option that seemed more readable to me.
If that String will be used only in one particular point of the code, inside a method, I would declare it inline, I prefer to have my variables in the smallest scope that I can, but opinions here may vary.
If there is no change whatsoever, and it seems to make sense in your particular use case, by all means declare the String as static, again, I doubt it will affect performance.
String constants go into the constant pool of a class, and cannot be optimized away, i.e. are handled sufficiently well.
Creating long strings one does not do statically. For SQL use prepared statements with a ? place holder. The same holds for strings with placeholders: use MessageFormat.
To be explicit. The following does not cost anything extra:
static final String s = "... long string ...";
When remaining memory is limited JVM will normally do perm gen space cleaning and unload unused/unreferenced classes. So having the long strings as static variable won't do much harm in my opinion
If you feel your Strings can occupy lots of memory then dont make them static or declare them using String literal. As both of these will get stored in permgen space and will almost mever be garbage collected [there is chance but slim, statics may never be garbage collected is you have not created your own classloader].
So create String using new operator so that will be created in heap and can be easily garbage collected i.e.
String str = new String("long string");
EDIT:
How strings are stored: http://www.ntu.edu.sg/home/ehchua/programming/java/J3d_String.html
EDIT:
There has been a long discussion below regarding how new String works. Argument presented is that new String will create 2 objects one in heap and one in pool. THIS IS WRONG, it is not true by default and you can force java to do it by calling intern method. In order to back my argument following is the javadoc from Strin class for intern method:
intern
public String intern() Returns a canonical representation for the
string object. A pool of strings, initially empty, is maintained
privately by the class String.
When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are
interned. String literals are defined in §3.10.5 of the Java Language
Specification
As can be seen by above doc that if new String always created an object in pool then intern method will be completely useless!! Also logically it doesn't makes any sense.
EDIT:
Also read the answer for this post : String pool creating two string objects for same String in Java

String intern in equals method

Is it a good practise to use String#intern() in equals method of the class. Suppose we have a class:
public class A {
private String field;
private int number;
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final A other = (A) obj;
if ((this.field == null) ? (other.field != null) : !this.field.equals(other.field)) {
return false;
}
if (this.number != other.number) {
return false;
}
return true;
}
}
Will it be faster to use field.intern() != other.field.intern() instead of !this.field.equals(other.field).
No! Using String.intern() implicitly like this is not a good idea:
It will not be faster. As a matter of fact it will be slower due to the use of a hash table in the background. A get() operation in a hash table contains a final equality check, which is what you want to avoid in the first place. Used like this, intern() will be called each and every time you call equals() for your class.
String.intern() has a lot of memory/GC implications that you should not implicitly force on users of this class.
If you want to avoid full blown equality checks when possible, consider the following avenues:
If you know that the set of strings is limited and you have repeated equality checks, you can use intern() for the field at object creation, so that any subsequent equality checks will come down to an identity comparison.
Use an explicit HashMap or WeakHashMap instead of intern() to avoid storing strings in the GC permanent generation - this was an issue in older JVMs, not sure if it is still a valid concern.
Keep in mind that if the set of strings is unbounded, you will have memory issues.
That said, all this sounds like premature optimization to me. String.equals() is pretty fast in the general case, since it compares the string lengths before comparing the strings themselves. Have you profiled your code?
Good practice : Nope. You're doing something tricky, and that makes for brittle, less readable code. Unless this equals() method needs to be crazy performant (and your performance tests validate that it is in fact faster), it's not worth it.
Faster : Could be. But don't forget that you can have unintended side effects from using the intern() method: http://www.onkarjoshi.com/blog/213/6-things-to-remember-about-saving-memory-with-the-string-intern-method/
Any benefit gained by performing an identity comparison on the interned Strings is likely to be outweighed by the associated cost of interning the Strings.
In the above case you could consider interning the String when you instantiate the class, providing the field is constant (in which case you should also mark it as final). You could also check for null on instantiation to avoid having to check on each call to equals (assuming you disallow null Strings).
However, in general these types of micro-optimisation offer little gain in performance.
Let's go through this one step at a time...
The idea here is that if you use String#intern, you'll be given a canonical representation of that String. A pool of Strings is kept internally and each entry is guaranteed to be unique for that pool with regard to equals. If you call intern() on a String, then either a previously pooled identical String is going to be returned, or the String you called intern on is going to be pooled and returned.
So if we have two Strings s1 and s2 and we assume neither is null, then the following two lines of code are considered idempotent:
s1.equals(s2);
s1.intern() == s2.intern();
Let's investigate two assumptions we've made now:
s1.intern() and s2.intern() really will return the same object if s1.equals(s2) evaluates to true.
Using the == operator on two interned references to the same String will be more efficient than using the equals method.
The first assumption is probably the most dangerous of all. The JavaDoc for the intern method tells us that using this method will return a canonical representation for an internally kept pool of Strings. But it doesn't tell us anything about that pool. Once an entry has been added to the pool, can it ever be removed again? Will the pool keep growing indefinitely or will entries occassionally be culled to make it act as a limited-size cache? You'd have to check the actual specifications of the Java Language and Virtual Machine to get any certainty, if they offer it at all. Having to check specs for a limited optimization is usually a big warning sign. Checking the source code for Sun's JDK 7, I see that intern is specified as a native method. So not only is the implementation likely to be vendor-specific, it might vary across platforms as well for VMs from the same vendor. All bets are off regarding stuff that's not in the spec.
On to our second assumption. Let's consider for a moment what it would take to intern a String... First of all, we'll need to check if the String is already in the pool. We'll assume they've tried to get an O(1) complexity going there to keep this fast by using some hashing scheme. But that's assuming we've got a hash of the String. Since this is a native method, I'm not certain what would be used... Some hash of the native representation or simply what hashCode() returns. I know from the source code of Sun's JDK that a String instance caches its hash code. It'll only be calculated the first time the method is called, and after that the calculated value will be returned. So at the very least, a hash must be calculated at least once if we're to use that. Getting a reliable hash of a String will probably involve arithmetic on each and every character, which can be expensive for lenghty values. Even once we have the hash and thus a set of Strings that are candidates for being matches in the interned pool, we'd still have to verify if one of these really is an exact match which would involve... an equality check. Meaning going through each and every character of the Strings and seeing if they match if trivial cases like inequal length can't be applied first. Worse still, we might have to do this for more than one other String like we'd do with a regular equals, since multiple Strings in the pool might have the same hash or end up in the same hash bucket.
So, that stuff we need to do to find out if a String was already interned or not sounds suspiciously like what equals would need to do. Basically, we've gained nothing and might even have made our equals implementation more expensive. At least, if we're going to call intern each and every time. So maybe we should intern the String right away and simply always use that interned reference. Let's check how class A would look if that were the case. I'm assuming the String field is initialized on construction:
public class A {
private final String field;
public A(final String s) {
field = s.intern();
}
}
That's looking a little more sensible. Any Strings that are passed to the constructor and are equal will end up being the same reference. Now we can safely use == between the field field of A instances for equality checks, right?
Well, it'd be useless. Why? If you check the source for equals in class String, you'll find that any implementation made by someone with half a brain will first do a == check to catch the trivial case where the instance and the argument are the same reference first. That could save a potentially heavy char-by-char comparison. I know the JDK 7 source I'm using for reference does this. So you're still better off using equals because it does that reference check anyway.
The second reason this'd be a bad idea is that first point way up above... We simply don't know if the instances are going to be kept in the pool indefinitely. Check this scenario, which may or may not occur depending on JVM implementation:
String s1 = ... //Somehow gets passed a non-interned "test" value
A a1 = new A(s1);
//Lots of time passes... winter comes and goes and spring returns the land to a lush green...
String s2 = ... //Somehow gets passed a non-interned "test" value
A a2 = new A(s2);
a1.equals(a2); //Totally returns the wrong result
What happened? Well, if it turns out the interned String pool will sometimes be culled of certain entries, then that first construction of an A could have s1 interned, only to see it being removed from the pool, to have it later replaced by that s2 instance. Since s1 and s2 are conceivably different instances, the == check fails. Can this happen? I've got no idea. I certainly won't go check the specs and native code to find out. Will the programmer that's going through your code with a debugger to find out why the hell "test" is not considered the same as "test"?
It's no problem if we're using equals. It'll catch the same instance case early for optimal results, which will benefit us when we've interned our Strings, but we won't have to worry about cases where the instances still end up being different because then equals is gonna do the classic compare work. It just goes to show that it's best not to second-guess the actual runtime implementation or compiler, because these things were made by people who know the specs like the back of their hands and really worry about performance.
So String interning manually can be of benefit when you make sure that...
you're not interning each and every time, but just intern a String once like when intializing a field and then keep using that interned instance;
you still use equals to make sure implementation details won't ruin your day and your code doesn't actually rely on that interning, instead relying on the implementation of the method to catch the trivial cases.
After keeping this in mind, surely it's worth using intern()? Well, we still don't know how expensive intern() is. It's a native method so it might be really fast. But we're not sure unless we check the code for our target platform and JVM implementation. We've also had to make sure we understand exactly what interning does and what assumptions we've made about it. Are you sure the next person reading your code will have the same level of understanding? They might be bewildered about this weird method they've never seen before that dabbles in JVM internals and might spend an hour reading the same gibberish I'm typing right now, instead of getting work done.
That's the problem right there... Before, it was simple. You used equals and were done. Now, you've added another little thing that can nestle itself in your mind and cause you to wake up screaming one night because you've just realized that oh my God you've forgot to take out one of the == uses and that piece of code is used in a routine controlling the killer bots' apprisal of citizen disobedience and you've heard its JVM isn't too solid!
Donald Knuth was famously attributed the quote...
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"
Knuth was clever enough to add in that 97% detail. Sometimes, thoroughly micro-optimizing a small portion of code can make a big difference. Say, if that piece of code takes up 30% of the program's runtime execution. The problem with micro-optimizations is that they tend to work on assumptions. When you start using intern() and believe that from then on it'll be safe to make reference equality checks, you've made a hell of a lot of assumptions. And even if you go down to implementation level to check if they're right, are you sure they will be in the next JRE version?
I myself have used intern() manually. Did it in some piece of code where the same handful of Strings are gonna end up in hundreds if not thousands of object instances as fields. Those fields are gonna be used as keys in HashMaps and are frequently used while doing some validation over those instances. I figured interning was worth it for two purposes: reducing memory overhead by making all those equal Strings one single instance and speeding up the map lookups, since they're using hashCode() and equals. But I've made damn sure that you can take all those intern() calls out of the code and everything will still work fine. The interning is just some icing on the cake in this case, a little extra that may or may not make a bit of difference along the road. But it's not an essential part of my code's correctness.
Long post, eh? Why'd I go through the trouble of typing all of this up? To show you that if you make micro-optimizations, you'd better know damn well what you're doing and willing to document it so thoroughly that you might as well not have bothered.
This is hard to say given that you have not specified hardware. Timing test are difficult to get right and are not universal. Have you done a timing test yourself?
My feeling is that the intern pattern would not be faster as each string would need to be matched to a possible string in a dictionary of all interned strings.

Categories

Resources