I'm into Java since a short time, and I was wondering: Strings are in fact objects, but I heard that in assigning them a value and retrieving it they act quite differently, almost as if they were primitive types... could someone make it more clear?
What do I exactly have to care about when I declare/edit/access a string compared to other objects?
First of all Java has string literals. That means you may write String foo = "bar";.
String are immutible (once you create one, you can't change it) and it helps JVM to do one trick called "string pool". String literals are stored in pools, and in the following example both foo and bar may point to one instance of string. String foo = "baz"; String bar = "baz". You may even compare them with ==, but you should never do that. How ever, equals() method (which you use to compare strings in Java) may benefit from it since it does not need to compare strings if both vars point to the same string.
Please check this topic for more info What is the Java string pool and how is "s" different from new String("s")?
Related
Please consider below 2 lines of Java code :
String s = new String("Hello");
String s1 = s.toUpperCase();
In the above code how many objects will be created when we call toUpperCase() method on String s. Will an object be created in heap as well as String constant pool or will it be created only in heap?
I know toUpperCase() will create a new object in heap. But want to know if it will also be placed in the String constant pool.
The answer is ... it depends.
The one thing that we can say with certainty is that neither toUpperCase() or subString() will place a String in the string pool1. The only String operation that is specified as (possibly) adding a String to the string pool is intern().
We cannot say for certain that toUpperCase() and subString() will generate a new String object. For example, in Java 11 str.substring(0) returns str itself. This is an implementation detail, but the javadocs for many operations are worded so that it would be valid to return this ... if this satisfied the requirements.
In the cases when a new String is created, it is implementation dependent how may objects are created. A Java string is typically represented as a String object with a reference to an backing char[] or byte[] that represent the string's characters. But in some Java releases, a backing array can be shared between multiple String objects ... under certain circumstances. And indeed substring was one of the methods that would create String with shared backing arrays.
Finally, as a commenter pointed out, this stuff is irrelevant to real world Java programming ... in all but the most extreme circumstances. Normally, you should just let the JVM deal with this kind of stuff.
1 - It is the string pool not the "constant string pool". The "constant pool" is actually something that exists in class files. But the string pool may contain strings that do not correspond to Java constants.
I have read about intern() and almost everywhere it is written that it will speed up string comparison by using == instead of equals().
String a = new String("1");
String b = new String("1");
If I am right, the code above will create 2 strings in string pool. But after reading about intern, I don't see a reason to save 2 strings with the same value in the pool.
So the question is: why doesn't java use intern() by default?
If possible I would like to get a detailed explanation.
If I am right, code above will create 2 strings in string pool.
Not in the string pool, no. That produces one string in the string pool (because of the string constant "1") and two strings elsewhere in memory (because of the two calls to new String).
why java doesn't use intern by default?
Because there's no need to do it by default. Interning a string requires work (finding the existing entry if any, adding it if not there). If the strings are transient (as many are), that's unnecessary work. Thus, leave it to the programmer, who will call intern if appropriate.
...almoust everywhere written that it will speed up string comparition by using == instead equals()
I have no idea where this "almost everywhere" is, but using == to compare strings is just plain wrong in Java. == compares object references. While it's true (I think) that if you interned each and every string and never once forgot, then == would be reliable (I think), it's not best practice at all. Use equals. That's what it's for. :-) It will, after all, start by doing == to see if the two strings being compared are the same object, and only if that's false go on to compare the actual characters...
Just a side note: There's almost never any reason to explicitly call new String(String) (there are other overloads of the constructor that are definitely useful, for instance when you need to convert an array of bytes into a string using a particular charset).
This might be a stupid question (or just make me look stupid :)), however I would be interested in how to work with long String objects in the context of short-lived objects.
Think about long SQL queries in cron job or anonymous, command or function-like classes. These are very short-lived classes and even will use these long Strings once in their lifetime for most of the time. What is better? To construct a String inline and let it be collected with the instance, or make it static final anyway and let them sit in the memory useless until the classes next instantiation?
Well, there's only so much control you can have over what happens to the String.
Even if you create it inline, that String will most probably be added to the String constant pool of the JVM, and will be reused when you declare it again, so in practice, you'll probably be reusing the same String object either way.
Unless the String is so huge that it has an impact on your application's performance, I wouldn't worry about it and choose the option that seemed more readable to me.
If that String will be used only in one particular point of the code, inside a method, I would declare it inline, I prefer to have my variables in the smallest scope that I can, but opinions here may vary.
If there is no change whatsoever, and it seems to make sense in your particular use case, by all means declare the String as static, again, I doubt it will affect performance.
String constants go into the constant pool of a class, and cannot be optimized away, i.e. are handled sufficiently well.
Creating long strings one does not do statically. For SQL use prepared statements with a ? place holder. The same holds for strings with placeholders: use MessageFormat.
To be explicit. The following does not cost anything extra:
static final String s = "... long string ...";
When remaining memory is limited JVM will normally do perm gen space cleaning and unload unused/unreferenced classes. So having the long strings as static variable won't do much harm in my opinion
If you feel your Strings can occupy lots of memory then dont make them static or declare them using String literal. As both of these will get stored in permgen space and will almost mever be garbage collected [there is chance but slim, statics may never be garbage collected is you have not created your own classloader].
So create String using new operator so that will be created in heap and can be easily garbage collected i.e.
String str = new String("long string");
EDIT:
How strings are stored: http://www.ntu.edu.sg/home/ehchua/programming/java/J3d_String.html
EDIT:
There has been a long discussion below regarding how new String works. Argument presented is that new String will create 2 objects one in heap and one in pool. THIS IS WRONG, it is not true by default and you can force java to do it by calling intern method. In order to back my argument following is the javadoc from Strin class for intern method:
intern
public String intern() Returns a canonical representation for the
string object. A pool of strings, initially empty, is maintained
privately by the class String.
When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are
interned. String literals are defined in ยง3.10.5 of the Java Language
Specification
As can be seen by above doc that if new String always created an object in pool then intern method will be completely useless!! Also logically it doesn't makes any sense.
EDIT:
Also read the answer for this post : String pool creating two string objects for same String in Java
In Java, apparently, String s = "foo" is preferred over String s = new String("foo").
Why? Isn't a new string object created in both cases? Why would the first case preclude calling a constructor?
Why?
Because the second approach results in two string objects (the original due to the string literal, plus an explicit copy).
The first case is a string literal, simply a shorthand the language offers you to create a string. The String class constructor still gets called, just not explicitly, which means less typing and less code clutter.
The second case takes the String object already created by the literal and passes it to a constructor, which copies the content to create a new, separate String object. The literal will still be around because literals are interned.
There is rarely a point to using the String constructor (pretty much only when you've created a substring of a very large string and want to release the memory used by the rest of the string, because substrings by default use the same underlying char array as the original string, just with a different offset and length.
I don't think it's preferable. I assume the only "benefit" you get is that if you wrongfully use the "==" operator rather than the equals method, have two different instances of a string will fail faster which will prompt you to fix your code. (the == operator may "succeed" and fail unpredictably)
Unless of course your code requires you to construct two different instances for whatever reason
Why? Isn't a new string object created in both cases?
No, the initial form being a string literal will be interned such that only one instance is created:
String s = "foo";
String s2 = "foo";
s == s2 => true
I want to ask a question about avoiding String duplicates in Java.
The context is: an XML with tags and attributes like this one:
<product id="PROD" name="My Product"...></product>
With JibX, this XML is marshalled/unmarshalled in a class like this:
public class Product{
private String id;
private String name;
// constructor, getters, setters, methods and so on
}
The program is a long-time batch processing, so Product objects are created, used, copied, etc.
Well, the question is:
When I analysed the execution with software like Eclipse memory analyzer (MAT), I found several duplicated Strings. For example, in the id attribute, the PROD value is duplicated around 2000 instances, etc.
How can I avoid this situation? Other attributes in Product class may change their value along the execution, but attrs like id, name... don't change so frequently.
I have readed something about String.intern() method, but I haven't used yet and I'm not sure it's a solution for this. Could I define the most frequent values in those attributes like static final constants in the class?
I hope I'd have expressed my question in a right way.
Any help or advice is very appreciated. Thanks in advance.
interning would be the right solution, if you really have a problem. Java stores String literals and a lot of other Strings in an internal pool and whenever a new String is about to be created, the JVM first checks, if the String is already in the pool. If yes, it will not create a new instance but pass the reference to the interned String object.
There are two ways to control this behaviour:
String interned = String.intern(aString); // returns a reference to an interned String
String notInterned = new String(aString); // creates a new String instance (guaranteed)
So maybe, the libraries really create new instances for all xml attribute values. This is possible and you won't be able to change it.
intern has a global effect. An interned String is immediatly available "for any object" (this view doesn't really make sense, but it may help to understand it).
So, lets say we have a line in class Foo, method foolish:
String s = "ABCD";
String literals are interned immediatly. JVM checks, if "ABCD" is already in the pool, if not, "ABCD" is stored in the pool. The JVM assigns a reference to the interned String to s.
Now, maybe in another class Bar, in method barbar:
String t = "AB"+"CD";
Then the JVM will intern "AB" and "CD" like above, create the concatenated String, look, if it is intered already, Hey, yes it is, and assign the reference to the interned String "ABCD" to t.
Calling "PROD".intern() may work or fail. Yes, it will intern the String "PROD". But there's a chance, that jibx really creates new Strings for attribute values with
String value = new String(getAttributeValue(attribute));
In that case, value will not have a reference to an interned String (even if "PROD" is in the pool) but a reference to a new String instance on the heap.
And, to the other question in your command: this happens at runtime only. Compiling simply creates class files, the String pool is a datastructure on the object heap and that is used by the JVM, that executes the application.
While String.intern() could solve that problem by reducing each value to a single unique String instance, it would introduce another problem: every intern()-ed String can survive for a long time in the JVM. If the IDs vary a lot (i.e. they are not part of a limited set, but can be any value), then this can have massive negative effects in the long run.
Edit: I used to claim that intern()-ed Strings can't ever be GCed, but #nanda proved me wrong with this JavaWorld article. While this somewhat reduces the problem introduced by intern() it's still not entirely removed: the pool provided by intern() can't be controlled and can have unexpected results with regards to garbage-collection).
Luckily Guava provides a solution in the form of the Interner interface and it's helper class Interners: Using Interners.newStrongInterner() you can create an object that can act as a "pool" of unique String objects much in the same way as String.intern() does, except that the pool is bound to that instance and if you discard the pool, then the content can become eligible for garbage collection as well.
Yes, interning is the correct solution and you'd done your homework (that is checking with profiler that this is the problem).
Interning can cause problem if you store too much. The permgen memory needs to be increased. Despite what some people said, interned Strings are also garbage collected, so if some strings are not used anymore, it will be object to be garbage collected.
Some supporting articles:
My blog: http://blog.firdau.si/2009/01/06/java-tips-memory-optimization-for-string/
Does intern garbage collected?: http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
Busting the 'Busting String.intern() Myths': http://kohlerm.blogspot.com/2009/01/is-javalangstringintern-really-evil.html
An alternative solution:
You could try is to define an <xs:enumeration/> restriction on your #id attribute (if your domain model would allow such a thing). If JibX is as intelligent as JAXB or other XML-Java mapping standards, then this could be mapped as a Java enum with constant literals, which can be reused heavily.
I would try that for the ID value, since it kinda looks like an enumeration to me...
As everyone know, String objects can be created in two ways, by using the literals and through new operator.
If you use a literal like String test = "Sample"; then this will be cached in String object pool. So interning is not required here as by default the string object will be cached.
But if you create a string object like String test = new String("Sample"); then this string object will not be added to the string pool. So here we need to use String test = new String("Sample").intern(); to forcefully push the string object to the string cache.
So it is always advisable to use string literals than new operator.
So in your case private static final String id = "PROD"; is the right solution.