String performance- memory - java

While writing code i came across an interesting problem.
I have a PersonPOJO with name as one of its String members with its getters and setters
class PersonPOJO {
private String name;
public setName(Name) {
this.name = Name;
}
public getName() {
return name;
}
}
Now i am using it in Test class. There are two approach of using String getters in it.
Approach 1 :-
class Test1 {
............
String name = personPojo.getName();
logger.debug("....."+name);
if (name.equals("ABC")) {
....
}
}
Approach 2 :-
class Test2 {
.............
logger.debug("...."+personPojo.getName());
if (personPojo.getName.equals("ABC")) {
..
}
}
Thus in second approach i am not creating intermediate String variable. Will not creating an extra String variable helps in performance like no extra String object creation , less load on GC etc.
Please explain in detail which approach is better ?
Thanks,

there is nothing extra getting created anywhere except the "ABC" literal that you have in your if check. See the 'name' is just a reference to a String object to which personPojo.getName() points. So in terms of memory there are no dents.
However calling personPojo.getName() again and again in second example does have a performance hit as compared to option 1. Local variables reside on stack and are often the fastest to access as compared to getting an object form heap and then calling a method on it.

Since the other answers already answered your question, some additional information regarding strings in Java:
Strings in java are special. There are 2 ways of creatings Strings:
1. implicit via string literal
String literal = "Some text";
and
2. explicit via new()
String explicit = new String("Some text");
While string literals are kept in a so called string common pool, string objects created via new are kept in the heap like any other object. This means if you have three different string literals
String literal1 = "Test";
String literal2 = "Test";
String literal3 = "Test";
and each of em have the same content, they all share the same storage inside the string common pool.
Like I mentioned just some additional information, but always good to know. :D

In both the approaches you are not creating any new String object. When you say :
String name = personPojo.getName();
name is a reference to personPojo.name Object on heap. It doesnt creates intermediate string.
Also this reference "name" becomes eligible for GC as program control goes out of scope
Thus both the cases are similar performance wise and memory wise.

The only point where you create an intermediate String variable is your debug code. What you do is a reference copy, which is about the same as writing:
int myStringRef = personPojo.getName();
And that really doesn't take any significant time to execute, especially not since the JVM is likely to remove that extra variable anyways. Hint: If you want to make it even more likely, add a final in front of it.
If you want to improve it even further, make the PersonPOJO object immutable by declaring the name as final and remove the setter.
And then both suggestions will increase speed by such a low amount that even with the best tool you wouldn't be able to notice it.

You are not creating an object in either example. Neither are you altering an object. You are merely creating a local variable reference.
As for the created reference, the difference is not as big as you might think. The Java run time will optimize this additional reference away eventually and even if it was not doing that, you would never notice the difference. When writing code, do not think about those things. Rather think about the readability of your code.

String name = personPojo.getName();
logger.debug("....."+name);
if (name.equals("ABC"))
logger.debug("...."+personPojo.getName());
if (personPojo.getName.equals("ABC")) {
Both cases are same w.r.t. performance is concern. Because by String name = personPojo.getName(); you are not creating a new object rather you are just creating a new reference to the same string Object.
May be String name = personPojo.getName(); is little bit better because you not calling the function again and again rather using a local variable. Here it may not have much impact, may be negligible but it good not to call the same function again and again.

Related

static final String = "something" -- does the static make sense?

static final String = "something";
I was wondering if such a declaration has some kind real sense in Java. I've found it thousands of time within the code but studying and playing with strings I've learnt that does not matter how many times you declare a String object: if some class before yours declared this string this is pooled and reused (I'm talking about String created without explicit constructor invocation)
public class StringCompare {
private void toCompare(String h) {
String x = "hello";
System.out.println( x == h);
}
public void compare() {
String h = "hello";
toCompare(h);
}
}
This code, in fact, prints true when calling compare so the 2 variables are referencing the same object. Said that a final variable can't be redefined, the static word is completely useless in this case. Am I missing the point?
Couple of more things:
1 - Why explicit call to String constructor won't cause the String to be pooled? Same code above using a new String("hello") print false.
2 - Is the pooling behaviour reserved to Strings? There are some other immutable objects (like BigInteger) but I think these are not pooled ... why?
Thanks,
Carlo
There are a few reasons why developers use this pattern, even though your analysis of the runtime behavior and its use the string pool is correct.
It communicates the developers intent more cleanly. Specifically the string now has a name that is hopefully more clear than the constant, and it communicates the runtime behaviour that not everybody is aware of (or they forget from time to time).
If the developer wants to change the value within the source file, and it has been used in multiple places then there is only one place to make that change.
Once the decision to use a variable has been made, the keyword static will mean that the memory usage is lower. That is, there is only one field used to store the ref rather than one field per instance of the object.
As for your follow on questions:
Q: Why explicit call to String constructor won't cause the String to be pooled? Same code above using a new String("hello") print false.
Because the string pool is only used for string literals, as described by the JLS and invoking a constructor is not classified as a string literal.
Q: Is the pooling behaviour reserved to Strings?
The string pool is only used for string literals, but there is of course other caches for different use cases. The most obvious one that jumps to mind is the int cache used to optimize auto boxing of ints.
Even if the static doesn't give you a memory benefit in this case, you can still access the field in a static way, so you don't need an instance. For example, you frequently use some URIs, so you create the following class:
public class Uris
{
public static final String GOOGLE = "http://google.com";
public static final String BBC = "http://bbc.co.uk";
}
Using static, you can just use Uris.GOOGLE instead of new Uris().GOOGLE.
The static means that for each new object created for that class, the String construction & pooling only has to happen once, ever. This is a small saving in computation.
it is because String ="hello" or String h are not creating any object in heap these all are stored inString constant pool where as new String("hello"); create a new object in heap and so address will be different
check details here
String x = "hello";
String h = "hello";
are two String literals. They are in common pool(part of String pool). There are referring same reference. Then == check the reference you will get true
If you use String x = new String("hello"), it will create a new String object x. of course referring new reference. Now compare reference give you false.

rely on java String copy on write

My application creates a lot of instances of a class, say class A. All instance contains a string, and most of them contain the same String
class A {
String myString;
}
I know that JVM makes "all equal strings" point to the same String that is stored just one time. If myString field of one of my A instances is overwritten, the reference to the original string is replaced by the reference to the new String value and all works as expected, that is as if each instance had a copy of the string all for itself.
Is this behaviour required to a compliant JVM, or is it a sort of improvement of the jvm that may change from a jvm to another, or from version to version?
Another way to put the question: when designing higly redundant (string based) data-structures, should one rely only on the copy on write mechanism or it is adviceable to put in place something at the application level?
Another aspect of this is that your Strings will not be the same if they are created dynamically (e.g. allocated by parser). Check out String.intern() if space is a concern:
String a = String.valueOf('a') + "b";
String b = a.intern();
String c = "ab";
// now b == c is true
as #Hot Licks said: strings are immutable so there is no place to talk about copy on write. also when you are using mutable object you have to be aware that 'copy on write' may not be available on your client's environment.
and another thing that may be very important when you create a lot of objects. each object contains a few bytes of header, pointers etc. if i remember correctly empty object is like 20 bytes or so. when you we are talking about a lot of objects containing properties it starts to be significant. be aware of that and when you measure that it is causing the problem then you have to do something at the application level (lightweight design pattern, using stream xml parser etc).
The fact is that String are regular objects.
String a = "test";
String b = a;
Does exactly the same thing as:
StringBuffer a = new StringBuffer("test");
StringBuffer b = a;
that is: in both cases, b is a second reference to a, and this is not due to the immutability.
Immutability comes into play
So, you always handle two pointers to the same data. Now, if the class is immutable, you can forget about it: nobody will change your data under your shoes not because you have a copy for your own, but because the shared copy is immutable. You can even think that you have a copy of the string, but actually a copy has never existed since String b = a; does what it does for each object: a copy of the only reference.

Avoid duplicate Strings in Java

I want to ask a question about avoiding String duplicates in Java.
The context is: an XML with tags and attributes like this one:
<product id="PROD" name="My Product"...></product>
With JibX, this XML is marshalled/unmarshalled in a class like this:
public class Product{
private String id;
private String name;
// constructor, getters, setters, methods and so on
}
The program is a long-time batch processing, so Product objects are created, used, copied, etc.
Well, the question is:
When I analysed the execution with software like Eclipse memory analyzer (MAT), I found several duplicated Strings. For example, in the id attribute, the PROD value is duplicated around 2000 instances, etc.
How can I avoid this situation? Other attributes in Product class may change their value along the execution, but attrs like id, name... don't change so frequently.
I have readed something about String.intern() method, but I haven't used yet and I'm not sure it's a solution for this. Could I define the most frequent values in those attributes like static final constants in the class?
I hope I'd have expressed my question in a right way.
Any help or advice is very appreciated. Thanks in advance.
interning would be the right solution, if you really have a problem. Java stores String literals and a lot of other Strings in an internal pool and whenever a new String is about to be created, the JVM first checks, if the String is already in the pool. If yes, it will not create a new instance but pass the reference to the interned String object.
There are two ways to control this behaviour:
String interned = String.intern(aString); // returns a reference to an interned String
String notInterned = new String(aString); // creates a new String instance (guaranteed)
So maybe, the libraries really create new instances for all xml attribute values. This is possible and you won't be able to change it.
intern has a global effect. An interned String is immediatly available "for any object" (this view doesn't really make sense, but it may help to understand it).
So, lets say we have a line in class Foo, method foolish:
String s = "ABCD";
String literals are interned immediatly. JVM checks, if "ABCD" is already in the pool, if not, "ABCD" is stored in the pool. The JVM assigns a reference to the interned String to s.
Now, maybe in another class Bar, in method barbar:
String t = "AB"+"CD";
Then the JVM will intern "AB" and "CD" like above, create the concatenated String, look, if it is intered already, Hey, yes it is, and assign the reference to the interned String "ABCD" to t.
Calling "PROD".intern() may work or fail. Yes, it will intern the String "PROD". But there's a chance, that jibx really creates new Strings for attribute values with
String value = new String(getAttributeValue(attribute));
In that case, value will not have a reference to an interned String (even if "PROD" is in the pool) but a reference to a new String instance on the heap.
And, to the other question in your command: this happens at runtime only. Compiling simply creates class files, the String pool is a datastructure on the object heap and that is used by the JVM, that executes the application.
While String.intern() could solve that problem by reducing each value to a single unique String instance, it would introduce another problem: every intern()-ed String can survive for a long time in the JVM. If the IDs vary a lot (i.e. they are not part of a limited set, but can be any value), then this can have massive negative effects in the long run.
Edit: I used to claim that intern()-ed Strings can't ever be GCed, but #nanda proved me wrong with this JavaWorld article. While this somewhat reduces the problem introduced by intern() it's still not entirely removed: the pool provided by intern() can't be controlled and can have unexpected results with regards to garbage-collection).
Luckily Guava provides a solution in the form of the Interner interface and it's helper class Interners: Using Interners.newStrongInterner() you can create an object that can act as a "pool" of unique String objects much in the same way as String.intern() does, except that the pool is bound to that instance and if you discard the pool, then the content can become eligible for garbage collection as well.
Yes, interning is the correct solution and you'd done your homework (that is checking with profiler that this is the problem).
Interning can cause problem if you store too much. The permgen memory needs to be increased. Despite what some people said, interned Strings are also garbage collected, so if some strings are not used anymore, it will be object to be garbage collected.
Some supporting articles:
My blog: http://blog.firdau.si/2009/01/06/java-tips-memory-optimization-for-string/
Does intern garbage collected?: http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
Busting the 'Busting String.intern() Myths': http://kohlerm.blogspot.com/2009/01/is-javalangstringintern-really-evil.html
An alternative solution:
You could try is to define an <xs:enumeration/> restriction on your #id attribute (if your domain model would allow such a thing). If JibX is as intelligent as JAXB or other XML-Java mapping standards, then this could be mapped as a Java enum with constant literals, which can be reused heavily.
I would try that for the ID value, since it kinda looks like an enumeration to me...
As everyone know, String objects can be created in two ways, by using the literals and through new operator.
If you use a literal like String test = "Sample"; then this will be cached in String object pool. So interning is not required here as by default the string object will be cached.
But if you create a string object like String test = new String("Sample"); then this string object will not be added to the string pool. So here we need to use String test = new String("Sample").intern(); to forcefully push the string object to the string cache.
So it is always advisable to use string literals than new operator.
So in your case private static final String id = "PROD"; is the right solution.

What are the benefits of using identical String literals instead of a final variable?

I've come across a class that includes multiple uses of a string literal, "foo".
What I'd like to know, is what are the benefits and impact (in terms of object creation, memory usage and speed) of using this approach instead of declaring the String as final and replacing all the literals with the final variable?
For example (although obviously not a real word usage):
private static final String FINAL_STRING = "foo";
public void stringPrinter(){
for(int i=0;i<10;i++){
System.out.println(FINAL_STRING);
}
}
Versus:
public void stringPrinter(){
for(int i=0;i<10;i++){
System.out.println("foo");
}
}
Which is preferable and why (assuming the string value will remain constant)?
Would the above (second) example result in 10 String objects being created or would the JVM realise that only a single literal is actually used, and create a single reference. If so, is there any advantage for declaring the String as final (as in the first example)?
If the interpreted code does replace the string literal with a single reference, does that still apply if the same literal occurs in more than one place:
public void stringPrinter(){
for(int i=0;i<5;i++){
System.out.println("foo"); // first occurence
System.out.println("foo"); // second occurence
}
}
They will be exactly the same. The literal is interned (any compile time constant expression that results in that string shares the same instance as all other constants/literals) in both cases and a smart compiler+runtime should have no trouble reducing both to the most optimized example.
The advantage comes more in maintainability. If you want to change the literal, you would need only change one occurrence with a constant but you would need to search and change every instance if they were included inline.
From the JLS
Compile-time constants of type String are always "interned" so as to share unique instances, using the method String.intern.
So, no, there's gonna be only one string object.
As Mark notes, this is strictly the question of maintainability and not performance.
The advantage is not in performance, but in maintainability and reliability.
Let me take a real example I came across just recently. A programmer created a function that took a String parameter that identified the type of a transaction. Then in the program he did string compares against this type. Like:
if (type.equals("stock"))
{ ... do whatever ... }
Then he called this function, passing it the value "Stock".
Do you notice the difference in capitalization? Neither did the original programmer. It proved to be a fairly subtle bug to figure out, because even looking at both listings, the difference in capitalization didn't strike me.
If instead he had declared a final static, say
final static String stock="stock";
Then the first time he tried to pass in "Stock" instead of "stock", he would have gotten a compile-time error.
Better still in this example would have been to make an enum, but let's assume he actually had to write the string to an output file or something so it had to be a string.
Using final statics gives at least x advantages:
(1) If you mis-spell it, you get a compile-time error, rather than a possibly-subtle run-time error.
(2) A static can assign a meaingful name to a value. Which is more comprehensible:
if (employeeType.equals("R")) ...
or
if (employeeType.equals(EmployeeType.RETIRED)) ...
(3) When there are multiple related values, you can put a group of final statics together at the top of the program, thus informing future readers what all the possible values are. I've had plenty of times when I've seen a function compare a value against two or three literals. And that leaves me wondering: Are there other possible values, or is this it? (Better still is often to have an enum, but that's another story.)
All String literals are kept in a String cache (this is across all classes)
Using a constant can make the code clearer, give the the string some context and make the code easier to maintain esp if the same string appears in multiple places.
Those string literals are internalized, so no new String objects are created in the loop. Using the same literal twice could still be a sign for code smell, though; but not in terms of speed or memory usage.
In the cases you are providing, I believe the biggest reason for having it declared as FINAL_STRING somewhere is to ensure it stays in one centralized location. There will only ever be one instance of that string constant, but the first example is far easier to maintain.

What is the purpose of the expression "new String(...)" in Java?

While looking at online code samples, I have sometimes come across an assignment of a String constant to a String object via the use of the new operator.
For example:
String s;
...
s = new String("Hello World");
This, of course, compared to
s = "Hello World";
I'm not familiar with this syntax and have no idea what the purpose or effect would be.
Since String constants typically get stored in the constant pool and then in whatever representation the JVM has for dealing with String constants, would anything even be allocated on the heap?
The one place where you may think you want new String(String) is to force a distinct copy of the internal character array, as in
small=new String(huge.substring(10,20))
However, this behavior is unfortunately undocumented and implementation dependent.
I have been burned by this when reading large files (some up to 20 MiB) into a String and carving it into lines after the fact. I ended up with all the strings for the lines referencing the char[] consisting of entire file. Unfortunately, that unintentionally kept a reference to the entire array for the few lines I held on to for a longer time than processing the file - I was forced to use new String() to work around it, since processing 20,000 files very quickly consumed huge amounts of RAM.
The only implementation agnostic way to do this is:
small=new String(huge.substring(10,20).toCharArray());
This unfortunately must copy the array twice, once for toCharArray() and once in the String constructor.
There needs to be a documented way to get a new String by copying the chars of an existing one; or the documentation of String(String) needs to be improved to make it more explicit (there is an implication there, but it's rather vague and open to interpretation).
Pitfall of Assuming what the Doc Doesn't State
In response to the comments, which keep coming in, observe what the Apache Harmony implementation of new String() was:
public String(String string) {
value = string.value;
offset = string.offset;
count = string.count;
}
That's right, no copy of the underlying array there. And yet, it still conforms to the (Java 7) String documentation, in that it:
Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.
The salient piece being "copy of the argument string"; it does not say "copy of the argument string and the underlying character array supporting the string".
Be careful that you program to the documentation and not one implementation.
The only time I have found this useful is in declaring lock variables:
private final String lock = new String("Database lock");
....
synchronized(lock)
{
// do something
}
In this case, debugging tools like Eclipse will show the string when listing what locks a thread currently holds or is waiting for. You have to use "new String", i.e. allocate a new String object, because otherwise a shared string literal could possibly be locked in some other unrelated code.
String s1="foo"; literal will go in StringPool and s1 will refer.
String s2="foo"; this time it will check "foo" literal is already available in StringPool or not as now it exist so s2 will refer the same literal.
String s3=new String("foo"); "foo" literal will be created in StringPool first then through string arg constructor String Object will be created i.e "foo" in the heap due to object creation through new operator then s3 will refer it.
String s4=new String("foo"); same as s3
so System.out.println(s1==s2); //true due to literal comparison.
and System.out.println(s3==s4);// false due to object comparison(s3 and s4 is created at different places in heap)
The sole utility for this constructor described by Software Monkey and Ruggs seems to have disappeared from JDK7.
There is no longer an offset field in class String, and substring always use
Arrays.copyOfRange(char[] original, int from, int to)
to trim the char array for the copy.
Well, that depends on what the "..." is in the example. If it's a StringBuffer, for example, or a byte array, or something, you'll get a String constructed from the data you're passing.
But if it's just another String, as in new String("Hello World!"), then it should be replaced by simply "Hello World!", in all cases. Strings are immutable, so cloning one serves no purpose -- it's just more verbose and less efficient to create a new String object just to serve as a duplicate of an existing String (whether it be a literal or another String variable you already have).
In fact, Effective Java (which I highly recommend) uses exactly this as one of its examples of "Avoid creating unnecessary objects":
As an extreme example of what not to do, consider this statement:
String s = new String("stringette"); **//DON'T DO THIS!**
(Effective Java, Second Edition)
Here is a quote from the book Effective Java Third Edition (Item 17: Minimize Mutability):
A consequence of the fact that immutable objects can be shared freely
is that you never have to make defensive copies of them (Item
50). In fact, you never have to make any copies at all because the
copies would be forever equivalent to the originals. Therefore, you
need not and should not provide a clone method or copy constructor
(Item 13) on an immutable class. This was not well understood in the
early days of the Java platform, so the String class does have a copy
constructor, but it should rarely, if ever, be used.
So It was a wrong decision by Java, since String class is immutable they should not have provided copy constructor for this class, in cases you want to do costly operation on immutable classes, you can use public mutable companion classes which are StringBuilder and StringBuffer in case of String.
Generally, this indicates someone who isn't comfortable with the new-fashioned C++ style of declaring when initialized.
Back in the C days, it wasn't considered good form to define auto variables in an inner scope; C++ eliminated the parser restriction, and Java extended that.
So you see code that has
int q;
for(q=0;q<MAX;q++){
String s;
int ix;
// other stuff
s = new String("Hello, there!");
// do something with s
}
In the extreme case, all the declarations may be at the top of a function, and not in enclosed scopes like the for loop here.
IN general, though, the effect of this is to cause a String ctor to be called once, and the resulting String thrown away. (The desire to avoid this is just what led Stroustrup to allow declarations anywhere in the code.) So you are correct that it's unnecessary and bad style at best, and possibly actually bad.
There are two ways in which Strings can be created in Java. Following are the examples for both the ways:
1) Declare a variable of type String(a class in Java) and assign it to a value which should be put between double quotes. This will create a string in the string pool area of memory.
eg: String str = "JAVA";
2)Use the constructor of String class and pass a string(within double quotes) as an argument.
eg: String s = new String("JAVA");
This will create a new string JAVA in the main memory and also in the string pool if this string is not already present in string pool.
I guess it will depend on the code samples you're seeing.
Most of the times using the class constructor "new String()" in code sample are only to show a very well know java class instead of creating a new one.
You should avoid using it most of the times. Not only because string literals are interned but mainly because string are inmutable. It doesn't make sense have two copies that represent the same object.
While the article mensioned by Ruggs is "interesting" it should not be used unless very specific circumstances, because it could create more damage than good. You'll be coding to an implementation rather than an specification and the same code could not run the same for instance in JRockit, IBM VM, or other.

Categories

Resources