Java/.NET - reusing values - java

Do we reduce memory consumption when storing a String value that we use very frequently?
As far as I know, every time we do a "some text" declaration in code, a new String object is constructed, instead of using the address of an existing one with the same value. Is this correct?
Is there anything that can be done to make the memory more efficient by always addressing the same String rather than creating new ones?

.NET uses a string intern pool to store string.
The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system.
Example below shows that the intern pool is used for literal string only. (s2 doesn't reference the same string as s1 even if the content is the same)
string s1 = "MyTest";
string s2 = new StringBuilder().Append("My").Append("Test").ToString();
string s3 = String.Intern(s2);
Console.WriteLine((Object)s2==(Object)s1); // Different references.
Console.WriteLine((Object)s3==(Object)s1); // The same reference.
Java does the same thing :
All literal strings and string-valued constant expressions are interned.

I wouldn't sweat it unless you can identify a genuine performance / memory issue you are trying to address. If you absolutely must optimize, then Javamex has a useful article showing some tips on how to save memory, such as by using Java's internal string pool.

From the api-doc for the java 2 sdk:
A pool of strings, initially empty, is maintained privately by the class String.
When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.
So there is pooling...
The article here makes some good points about it.

In Java, you're totally wrong. Indeed, like in .net (or at least as #madgnome says), there is a constant string pool. Notice that, additionnaly to all constant strings, you can push a string to this pool by calling String.intern(). But use this method with cares, as it may be slow due to nature of that pool.

Related

Does string constant pool works with singleton annotated variables too?

object Keys {
#Singleton
const val KEY_Q = "question"
#Singleton
const val KEY_ID = "quesid"
}
I am using Singleton annotation with many string variables in my singleton class. I wanted to ask as by default a string variable is stored in the constant string pool, and during the updating process, JVM checks in the string pool if the same variable is available or not, if yes then it returns the reference of the same without creating a new one.
Now I want to ask, does this process works the same when we are using singleton annotation with our string variables. If yes then is there any benefit for me to use such a class with these annotations with different variables. I am a newbie to singletons please describe in detail. Thanks
Annotations make no difference to string pool behavior in Java. If your example was Java, the #Singleton annotations would not save memory1.
There is a very simple rule that covers what goes into the string pool in Java.
If a string is a result of evaluating a compile time constant expression then a single copy is placed in the string pool. The JLS specifies what a compile time constant expression is2.
The only other circumstance in which a string goes into the string pool3 is if some code explicitly calls intern on it.
However ...
In a modern JVM on modern hardware, it is most likely to be irrelevant whether a string goes into the string pool.
The string pool is part of the heap and is garbage collected like the rest of the heap.
The space taken by each individual string literal is most likely to be trivial compared to the rest of your application's memory usage. A few bytes, compared with megabytes ... or gigabytes.
If you think you can intern strings and exploit the special property of strings in the string pool (by using == to compare strings) you are treading on very dangerous grounds. This is a micro-optimization ... and it only works if you can be sure that you have interned all of the strings. (And besides, interning is more expensive that a few string comparisons, so your attempt at optimizing might be a failure.)
Finally, since Java 9, the GC performs automatic string de-duping for strings that have survived a few GC cycles. So if you really do have a lot of string data with a lot of duplicates, the best solution is to let the GC handle it.
1 - I cannot tell you exactly what this means for your example, because you are using syntax that is not valid Java. Java doesn't have a const, val or object keywords. This looks like Kotlin.
2 - Common examples are a string literal or a concatenation of string literals, but there are others; see JLS 15.28.
3 - It is implementation dependent whether the string itself or a copy of the string goes into the pool. But it is really difficult for an application to distinguish the different behaviors.

How many String objects are created if we call any method like subString() or toUpperCase() on a string in Java

Please consider below 2 lines of Java code :
String s = new String("Hello");
String s1 = s.toUpperCase();
In the above code how many objects will be created when we call toUpperCase() method on String s. Will an object be created in heap as well as String constant pool or will it be created only in heap?
I know toUpperCase() will create a new object in heap. But want to know if it will also be placed in the String constant pool.
The answer is ... it depends.
The one thing that we can say with certainty is that neither toUpperCase() or subString() will place a String in the string pool1. The only String operation that is specified as (possibly) adding a String to the string pool is intern().
We cannot say for certain that toUpperCase() and subString() will generate a new String object. For example, in Java 11 str.substring(0) returns str itself. This is an implementation detail, but the javadocs for many operations are worded so that it would be valid to return this ... if this satisfied the requirements.
In the cases when a new String is created, it is implementation dependent how may objects are created. A Java string is typically represented as a String object with a reference to an backing char[] or byte[] that represent the string's characters. But in some Java releases, a backing array can be shared between multiple String objects ... under certain circumstances. And indeed substring was one of the methods that would create String with shared backing arrays.
Finally, as a commenter pointed out, this stuff is irrelevant to real world Java programming ... in all but the most extreme circumstances. Normally, you should just let the JVM deal with this kind of stuff.
1 - It is the string pool not the "constant string pool". The "constant pool" is actually something that exists in class files. But the string pool may contain strings that do not correspond to Java constants.

How many String objects are created by JVM? [duplicate]

As per my understanding
String s = new String("abc") will create two objects-
one in String constant pool (if "abc" is not already in constant pool)
one in Heap memory
Although more than understandings exist about how many objects will actually be created and where.
Anyway I have following questions -
Why is it not covered in java docs of String class about how many objects will be created and where?
Why new String(String) is provided anyway in String class provided Strings are immutable?.Also can it can be assumed that all strings, created by either String s = "abc" or String s = new String("abc"), will be available in String constant pool?
The String literals used in creating or appended in StringBuilder or StringBuffer,do they also go in String constant pool or they remain in heap memory only.
Edit
java docs for intern() method clearly talks about constant pool but
nothing is said like that in String(String).
Why is it not covered in java docs of String class about how many objects will be created and where?
It is covered in Docs of String
The String class represents character strings. All string literals in Java programs, such as "abc", are implemented as instances of this class.
Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared. For example:
String str = "abc";
is equivalent to:
char data[] = {'a', 'b', 'c'};
String str = new String(data);
And from the Java language specification
A String object has a constant (unchanging) value.
String literals (§3.10.5) are references to instances of class String.
And from JSL # 3.10.5. String Literals
Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
.
Why new String(String) is provided anyway in String class provided Strings are immutable?.Also can it can be assumed that all strings, created by either String s = "abc" or String s = new String("abc"), will be available in String constant pool?
Since String is object the valid way of declaration is
String s = new String("abc");
But where String s = "abc"; is designed for other reasons
The designers of Java decided to retain primitive types in an object-oriented language, instead of making everything an object, so as to improve the performance of the language.
Since it is the most useful class For performance reason, Java's String is designed to be in between a primitive and a class.
The String literals used in creating or appended in StringBuilder or StringBuffer,do they also go in String constant pool or they remain in heap memory only.
Consider the example
StringBuilder sb = new StringBuilder("abc");
The literal "abc" available in constant pool and the object sb created in heap.
Give a shot to read my old answer : How can a string be initialized using " "?
String s = new String("abc") will create two objects:
one in String constant pool (if "abc" is not already in constant pool)
No. It is in the constant pool. It was put there by the compiler.
one in Heap memory
Correct.
Although more than understandings exist about how many objects will actually be created and where.
Many misunderstandings: only one correct understanding.
Why is it not covered in java docs of String class about how many objects will be created and where?
The premiss of the question is not correct. It is covered. Not possibly in the exact form 'how many strings are created', but as the question has been asked millions of time in the last 20 years the answer isn't exactly a secret. Or shouldn't be.
Why new String(String) is provided anyway in String class provided Strings are immutable?.
So you can create a new one.
Also can it can be assumed that all strings, created by either String s = "abc" or String s = new String("abc"), will be available in String constant pool?
No. Only string literals and the result of String.intern() are in the constant pool. And that is documented as well.
The String literals used in creating or appended in StringBuilder or StringBuffer,do they also go in String constant pool or they remain in heap memory only.
All String literals are placed in the constant pool by the compiler.

long static Strings in short-lived objects

This might be a stupid question (or just make me look stupid :)), however I would be interested in how to work with long String objects in the context of short-lived objects.
Think about long SQL queries in cron job or anonymous, command or function-like classes. These are very short-lived classes and even will use these long Strings once in their lifetime for most of the time. What is better? To construct a String inline and let it be collected with the instance, or make it static final anyway and let them sit in the memory useless until the classes next instantiation?
Well, there's only so much control you can have over what happens to the String.
Even if you create it inline, that String will most probably be added to the String constant pool of the JVM, and will be reused when you declare it again, so in practice, you'll probably be reusing the same String object either way.
Unless the String is so huge that it has an impact on your application's performance, I wouldn't worry about it and choose the option that seemed more readable to me.
If that String will be used only in one particular point of the code, inside a method, I would declare it inline, I prefer to have my variables in the smallest scope that I can, but opinions here may vary.
If there is no change whatsoever, and it seems to make sense in your particular use case, by all means declare the String as static, again, I doubt it will affect performance.
String constants go into the constant pool of a class, and cannot be optimized away, i.e. are handled sufficiently well.
Creating long strings one does not do statically. For SQL use prepared statements with a ? place holder. The same holds for strings with placeholders: use MessageFormat.
To be explicit. The following does not cost anything extra:
static final String s = "... long string ...";
When remaining memory is limited JVM will normally do perm gen space cleaning and unload unused/unreferenced classes. So having the long strings as static variable won't do much harm in my opinion
If you feel your Strings can occupy lots of memory then dont make them static or declare them using String literal. As both of these will get stored in permgen space and will almost mever be garbage collected [there is chance but slim, statics may never be garbage collected is you have not created your own classloader].
So create String using new operator so that will be created in heap and can be easily garbage collected i.e.
String str = new String("long string");
EDIT:
How strings are stored: http://www.ntu.edu.sg/home/ehchua/programming/java/J3d_String.html
EDIT:
There has been a long discussion below regarding how new String works. Argument presented is that new String will create 2 objects one in heap and one in pool. THIS IS WRONG, it is not true by default and you can force java to do it by calling intern method. In order to back my argument following is the javadoc from Strin class for intern method:
intern
public String intern() Returns a canonical representation for the
string object. A pool of strings, initially empty, is maintained
privately by the class String.
When the intern method is invoked, if the pool already contains a
string equal to this String object as determined by the equals(Object)
method, then the string from the pool is returned. Otherwise, this
String object is added to the pool and a reference to this String
object is returned.
It follows that for any two strings s and t, s.intern() == t.intern()
is true if and only if s.equals(t) is true.
All literal strings and string-valued constant expressions are
interned. String literals are defined in §3.10.5 of the Java Language
Specification
As can be seen by above doc that if new String always created an object in pool then intern method will be completely useless!! Also logically it doesn't makes any sense.
EDIT:
Also read the answer for this post : String pool creating two string objects for same String in Java

Avoid duplicate Strings in Java

I want to ask a question about avoiding String duplicates in Java.
The context is: an XML with tags and attributes like this one:
<product id="PROD" name="My Product"...></product>
With JibX, this XML is marshalled/unmarshalled in a class like this:
public class Product{
private String id;
private String name;
// constructor, getters, setters, methods and so on
}
The program is a long-time batch processing, so Product objects are created, used, copied, etc.
Well, the question is:
When I analysed the execution with software like Eclipse memory analyzer (MAT), I found several duplicated Strings. For example, in the id attribute, the PROD value is duplicated around 2000 instances, etc.
How can I avoid this situation? Other attributes in Product class may change their value along the execution, but attrs like id, name... don't change so frequently.
I have readed something about String.intern() method, but I haven't used yet and I'm not sure it's a solution for this. Could I define the most frequent values in those attributes like static final constants in the class?
I hope I'd have expressed my question in a right way.
Any help or advice is very appreciated. Thanks in advance.
interning would be the right solution, if you really have a problem. Java stores String literals and a lot of other Strings in an internal pool and whenever a new String is about to be created, the JVM first checks, if the String is already in the pool. If yes, it will not create a new instance but pass the reference to the interned String object.
There are two ways to control this behaviour:
String interned = String.intern(aString); // returns a reference to an interned String
String notInterned = new String(aString); // creates a new String instance (guaranteed)
So maybe, the libraries really create new instances for all xml attribute values. This is possible and you won't be able to change it.
intern has a global effect. An interned String is immediatly available "for any object" (this view doesn't really make sense, but it may help to understand it).
So, lets say we have a line in class Foo, method foolish:
String s = "ABCD";
String literals are interned immediatly. JVM checks, if "ABCD" is already in the pool, if not, "ABCD" is stored in the pool. The JVM assigns a reference to the interned String to s.
Now, maybe in another class Bar, in method barbar:
String t = "AB"+"CD";
Then the JVM will intern "AB" and "CD" like above, create the concatenated String, look, if it is intered already, Hey, yes it is, and assign the reference to the interned String "ABCD" to t.
Calling "PROD".intern() may work or fail. Yes, it will intern the String "PROD". But there's a chance, that jibx really creates new Strings for attribute values with
String value = new String(getAttributeValue(attribute));
In that case, value will not have a reference to an interned String (even if "PROD" is in the pool) but a reference to a new String instance on the heap.
And, to the other question in your command: this happens at runtime only. Compiling simply creates class files, the String pool is a datastructure on the object heap and that is used by the JVM, that executes the application.
While String.intern() could solve that problem by reducing each value to a single unique String instance, it would introduce another problem: every intern()-ed String can survive for a long time in the JVM. If the IDs vary a lot (i.e. they are not part of a limited set, but can be any value), then this can have massive negative effects in the long run.
Edit: I used to claim that intern()-ed Strings can't ever be GCed, but #nanda proved me wrong with this JavaWorld article. While this somewhat reduces the problem introduced by intern() it's still not entirely removed: the pool provided by intern() can't be controlled and can have unexpected results with regards to garbage-collection).
Luckily Guava provides a solution in the form of the Interner interface and it's helper class Interners: Using Interners.newStrongInterner() you can create an object that can act as a "pool" of unique String objects much in the same way as String.intern() does, except that the pool is bound to that instance and if you discard the pool, then the content can become eligible for garbage collection as well.
Yes, interning is the correct solution and you'd done your homework (that is checking with profiler that this is the problem).
Interning can cause problem if you store too much. The permgen memory needs to be increased. Despite what some people said, interned Strings are also garbage collected, so if some strings are not used anymore, it will be object to be garbage collected.
Some supporting articles:
My blog: http://blog.firdau.si/2009/01/06/java-tips-memory-optimization-for-string/
Does intern garbage collected?: http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
Busting the 'Busting String.intern() Myths': http://kohlerm.blogspot.com/2009/01/is-javalangstringintern-really-evil.html
An alternative solution:
You could try is to define an <xs:enumeration/> restriction on your #id attribute (if your domain model would allow such a thing). If JibX is as intelligent as JAXB or other XML-Java mapping standards, then this could be mapped as a Java enum with constant literals, which can be reused heavily.
I would try that for the ID value, since it kinda looks like an enumeration to me...
As everyone know, String objects can be created in two ways, by using the literals and through new operator.
If you use a literal like String test = "Sample"; then this will be cached in String object pool. So interning is not required here as by default the string object will be cached.
But if you create a string object like String test = new String("Sample"); then this string object will not be added to the string pool. So here we need to use String test = new String("Sample").intern(); to forcefully push the string object to the string cache.
So it is always advisable to use string literals than new operator.
So in your case private static final String id = "PROD"; is the right solution.

Categories

Resources