StringBuilder/StringBuffer vs. "+" Operator - java

I'm reading "Better, Faster, Lighter Java" (by Bruce Tate and Justin Gehtland) and am familiar with the readability requirements in agile type teams, such as what Robert Martin discusses in his clean coding books. On the team I'm on now, I've been told explicitly not to use the + operator because it creates extra (and unnecessary) string objects during runtime.
But this article, Written back in '04 talks about how object allocation is about 10 machine instructions. (essentially free)
It also talks about how the GC also helps to reduce costs in this environment.
What is the actual performance tradeoffs between using +, StringBuilder or StringBuffer? (In my case it is StringBuffer only as we are limited to Java 1.4.2.)
StringBuffer to me results in ugly, less readable code, as a couple of examples in Tate's book demonstrates. And StringBuffer is thread-synchronized which seems to have its own costs that outweigh the "danger" in using the + operator.
Thoughts/Opinions?

Using String concatenation is translated into StringBuilder operations by the compiler.
To see how the compiler is doing I'll take a sample class, compile it and decompile it with jad to see what's the generated bytecode.
Original class:
public void method1() {
System.out.println("The answer is: " + 42);
}
public void method2(int value) {
System.out.println("The answer is: " + value);
}
public void method3(int value) {
String a = "The answer is: " + value;
System.out.println(a + " what is the question ?");
}
The decompiled class:
public void method1()
{
System.out.println("The answer is: 42");
}
public void method2(int value)
{
System.out.println((new StringBuilder("The answer is: ")).append(value).toString());
}
public void method3(int value)
{
String a = (new StringBuilder("The answer is: ")).append(value).toString();
System.out.println((new StringBuilder(String.valueOf(a))).append(" what is the question ?").toString());
}
On method1 the compiler performed the operation at compile time.
On method2 the String concatenation is equivalent to manually use StringBuilder.
On method3 the String concatenation is definitely bad as the compiler is creating a second StringBuilder rather than reusing the previous one.
So my simple rule is that concatenations are good unless you need to concatenate the result again: for instance in loops or when you need to store an intermediate result.

Your team needs to learn about the reasons for avoiding repeated string concatenation.
There certainly are times when it makes sense to use StringBuffer - in particular when you're creating a string in a loop, especially if you aren't sure that there will be few iterations in the loop. Note that it's not just a matter of creating new objects - it's a matter of copying all the text data you've appended already. Also bear in mind that object allocation is only "essentially free" if you don't consider garbage collection. Yes, if there's enough room in the current generation, it's basically a matter of incrementing a pointer... but:
That memory must have been cleared at some point. That's not free.
You're shortening the time until the next GC is required. GC isn't free.
If your object lives into the next generation, it may take longer to be cleaned up - again, not free.
All of these things are reasonably cheap in that it's "usually" not worth bending a design away from elegance to avoid creating objects... but you shouldn't regard them as free.
On the other hand, there is no point in using StringBuffer in cases where you won't need the intermediate strings. For example:
String x = a + b + c + d;
is at least as efficient as:
StringBuffer buffer = new StringBuffer();
buffer.append(a);
buffer.append(b);
buffer.append(c);
buffer.append(d);
String x = buffer.toString();

For small concatenations you can simply use String and + for the sake of readability. Performance is not going to suffer. But if you are doing lots of concatenation operations go for StringBuffer.

Other answers have mentioned that StringBuilder should be used when you are creating a string in a loop. However, most of the loops are over collections and from Java 8 the collections can be transformed to Strings using the joining method from Collectors class.
As an example, in the next code:
String result = Arrays.asList("Apple", "Banana", "Orange").stream()
.collect(Collectors.joining(", ", "<", ">"));
result will be: <Apple, Banana, Orange>

Related

Is this string join implementation really O(n^2)?

I was reading a popular book on algorithm questions and saw an implementation of string join such as the following:
public String joinTheStrings(String[] theStrings){
String joinedString = "";
for(String singleString : theStrings){
joinedString = joinedString + singleString;
}
return joinedString;
}
The author then went on to claim this implementation is O(n^2), and that an optimization would be to use a StringBuffer in place of joinedString, which they claimed would make the algorithm O(n). However, I fail to see how the original algorithm is O(n^2) - It appears to me that for N words there will be N operations (adding the strings together).
Update: Thank you for responses. It looks like I am confused about how the author treats the copying of an array of characters (which I would think to be a constant) as another factor of N in the amortized runtime?
The + operator used to create a new String by copying both of its operands's characters in a new char[] with size firstString.length() + secondString.length(), but for that it had to iterate over the chars of both of them. That's where your hidden inner loop lies.
However, recent versions of JDK optimize string concatenation at compile time by automatically turning it into a StringBuilder's append() operation. The book you mention must be quite old, nowadays StringBuffer is not used in general, because it's synchronized.
Anyway, the compiler might not be smart enough to extract StringBuilder instantiation outside of the loop, creating new string builders for each iteration, thereby reducing its usefulness as a performance optimization. So it's not a bad idea to write this by hand:
public String joinTheStrings(String[] theStrings) {
StringBuilder joinedString = new StringBuilder();
for (String singleString : theStrings)
joinedString.append(singleString);
return joinedString.toString();
}

When to use StringBuilder in Java [duplicate]

This question already has answers here:
StringBuilder vs String concatenation in toString() in Java
(20 answers)
Closed 8 years ago.
It is supposed to be generally preferable to use a StringBuilder for string concatenation in Java. Is this always the case?
What I mean is this: Is the overhead of creating a StringBuilder object, calling the append() method and finally toString() already smaller then concatenating existing strings with the + operator for two strings, or is it only advisable for more (than two) strings?
If there is such a threshold, what does it depend on (perhaps the string length, but in which way)?
And finally, would you trade the readability and conciseness of the + concatenation for the performance of the StringBuilder in smaller cases like two, three or four strings?
Explicit use of StringBuilder for regular concatenations is being mentioned as obsolete at obsolete Java optimization tips as well as at Java urban myths.
If you use String concatenation in a loop, something like this,
String s = "";
for (int i = 0; i < 100; i++) {
s += ", " + i;
}
then you should use a StringBuilder (not StringBuffer) instead of a String, because it is much faster and consumes less memory.
If you have a single statement,
String s = "1, " + "2, " + "3, " + "4, " ...;
then you can use Strings, because the compiler will use StringBuilder automatically.
Ralph's answer is fabulous. I would rather use StringBuilder class to build/decorate the String because the usage of it is more look like Builder pattern.
public String decorateTheString(String orgStr){
StringBuilder builder = new StringBuilder();
builder.append(orgStr);
builder.deleteCharAt(orgStr.length()-1);
builder.insert(0,builder.hashCode());
return builder.toString();
}
It can be use as a helper/builder to build the String, not the String itself.
As a general rule, always use the more readable code and only refactor if performance is an issue. In this specific case, most recent JDK's will actually optimize the code into the StringBuilder version in any case.
You usually only really need to do it manually if you are doing string concatenation in a loop or in some complex code that the compiler can't easily optimize.
Have a look at: http://www.javaspecialists.eu/archive/Issue068.html and http://www.javaspecialists.eu/archive/Issue105.html
Do the same tests in your environment and check if newer JDK or your Java implementation do some type of string operation better with String or better with StringBuilder.
Some compilers may not replace any string concatenations with StringBuilder equivalents. Be sure to consider which compilers your source will use before relying on compile time optimizations.
The + operator uses public String concat(String str) internally. This method copies the characters of the two strings, so it has memory requirements and runtime complexity proportional to the length of the two strings. StringBuilder works more efficent.
However I have read here that the concatination code using the + operater is changed to StringBuilder on post Java 4 compilers. So this might not be an issue at all. (Though I would really check this statement if I depend on it in my code!)
For two strings concat is faster, in other cases StringBuilder is a better choice, see my explanation in concatenation operator (+) vs concat()
The problem with String concatenation is that it leads to copying of the String object with all the associated cost. StringBuilder is not threadsafe and is therefore faster than StringBuffer, which used to be the preferred choice before Java 5. As a rule of thumb, you should not do String concatenation in a loop, which will be called often. I guess doing a few concatenations here and there will not hurt you as long as you are not talking about hundreds and this of course depends on your performance requirements. If you are doing real time stuff, you should be very careful.
The Microsoft certification material addresses this same question. In the .NET world, the overhead for the StringBuilder object makes a simple concatenation of 2 String objects more efficient. I would assume a similar answer for Java strings.

The best alternative for String flyweight implementation in Java

My application is multithreaded with intensive String processing. We are experiencing excessive memory consumption and profiling has demonstrated that this is due to String data. I think that memory consumption would benefit greatly from using some kind of flyweight pattern implementation or even cache (I know for sure that Strings are often duplicated, although I don't have any hard data in that regard).
I have looked at Java Constant Pool and String.intern, but it seems that it can provoke some PermGen problems.
What would be the best alternative for implementing application-wide, multithreaded pool of Strings in java?
EDIT: Also see my previous, related question: How does java implement flyweight pattern for string under the hood?
Note: This answer uses examples that might not be relevant in modern runtime JVM libraries. In particular, the substring example is no longer an issue in OpenJDK/Oracle 7+.
I know it goes against what people often tell you, but sometimes explicitly creating new String instances can be a significant way to reduce your memory.
Because Strings are immutable, several methods leverage that fact and share the backing character array to save memory. However, occasionally this can actually increase the memory by preventing garbage collection of unused parts of those arrays.
For example, assume you were parsing the message IDs of a log file to extract warning IDs. Your code would look something like this:
//Format:
//ID: [WARNING|ERROR|DEBUG] Message...
String testLine = "5AB729: WARNING Some really really really long message";
Matcher matcher = Pattern.compile("([A-Z0-9]*): WARNING.*").matcher(testLine);
if ( matcher.matches() ) {
String id = matcher.group(1);
//...do something with id...
}
But look at the data actually being stored:
//...
String id = matcher.group(1);
Field valueField = String.class.getDeclaredField("value");
valueField.setAccessible(true);
char[] data = ((char[])valueField.get(id));
System.out.println("Actual data stored for string \"" + id + "\": " + Arrays.toString(data) );
It's the whole test line, because the matcher just wraps a new String instance around the same character data. Compare the results when you replace String id = matcher.group(1); with String id = new String(matcher.group(1));.
This is already done at the JVM level. You only need to ensure that you aren't creating new Strings everytime, either explicitly or implicitly.
I.e. don't do:
String s1 = new String("foo");
String s2 = new String("foo");
This would create two instances in the heap. Rather do so:
String s1 = "foo";
String s2 = "foo";
This will create one instance in the heap and both will refer the same (as evidence, s1 == s2 will return true here).
Also don't use += to concatenate strings (in a loop):
String s = "";
for (/* some loop condition */) {
s += "new";
}
The += implicitly creates a new String in the heap everytime. Rather do so
StringBuilder sb = new StringBuilder();
for (/* some loop condition */) {
sb.append("new");
}
String s = sb.toString();
If you can, rather use StringBuilder or its synchronized brother StringBuffer instead of String for "intensive String processing". It offers useful methods for exactly those purposes, such as append(), insert(), delete(), etc. Also see its javadoc.
Java 7/8
If you are doing what the accepted answer says and using Java 7 or newer you are not doing what it says you are.
The implementation of subString() has changed.
Never write code that relies on an implementation that can change drastically and might make things worse if you are relying on the old behavior.
1950 public String substring(int beginIndex, int endIndex) {
1951 if (beginIndex < 0) {
1952 throw new StringIndexOutOfBoundsException(beginIndex);
1953 }
1954 if (endIndex > count) {
1955 throw new StringIndexOutOfBoundsException(endIndex);
1956 }
1957 if (beginIndex > endIndex) {
1958 throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
1959 }
1960 return ((beginIndex == 0) && (endIndex == count)) ? this :
1961 new String(offset + beginIndex, endIndex - beginIndex, value);
1962 }
So if you use the accepted answer with Java 7 or newer you are creating twice as much memory usage and garbage that needs to be collected.
Effeciently pack Strings in memory! I once wrote a hyper memory efficient Set class, where Strings were stored as a tree. If a leaf was reached by traversing the letters, the entry was contained in the set. Fast to work with, too, and ideal to store a large dictionary.
And don't forget that Strings are often the largest part in memory in nearly every app I profiled, so don't care for them if you need them.
Illustration:
You have 3 Strings: Beer, Beans and Blood. You can create a tree structure like this:
B
+-e
+-er
+-ans
+-lood
Very efficient for e.g. a list of street names, this is obviously most reasonable with a fixed dictionary, because insert cannot be done efficiently. In fact the structure should be created once, then serialized and afterwards just loaded.
First, decide how much your application and developers would suffer if you eliminated some of that parsing. A faster application does you no good if you double your employee turnover rate in the process! I think based on your question we can assume you passed this test already.
Second, if you can't eliminate creating an object, then your next goal should be to ensure it doesn't survive Eden collection. And parse-lookup can solve that problem. However, a cache "implemented properly" (I disagree with that basic premise, but I won't bore you with the attendant rant) usually brings thread contention. You'd be replacing one kind of memory pressure for another.
There's a variation of the parse-lookup idiom that suffers less from the sort of collateral damage you usually get from full-on caching, and that's a simple precalculated lookup table (see also "memoization"). The Pattern you usually see for this is the Type Safe Enumeration (TSE). With the TSE, you parse the String, pass it to the TSE to retrieve the associated enumerated type, and then you throw the String away.
Is the text you're processing free-form, or does the input have to follow a rigid specification? If a lot of your text renders down to a fixed set of possible values, then a TSE could help you here, and serves a greater master: Adding context/semantics to your information at the point of creation, instead of at the point of use.

Is there a "fastest way" to construct Strings in Java?

I normally create a String in Java the following way:
String foo = "123456";
However, My lecturer has insisted to me that forming a String using the format method, as so:
String foo = String.format("%s", 123456);
Is much faster.
Also, he says that using the StringBuilder class is even faster.
StringBuilder sb = new StringBuilder();
String foo = sb.append(String.format("%s", 123456)).toString();
Which is the fastest method to create a String, if there even is one?
They could not be 100% accurate as I might not remember them fully.
If there is only one string then:
String foo = "123456";
Is fastest. You'll notice that the String.format line has "%s%" declared in it, so I don't see how the lecturer could possibly think that was faster. Plus you've got a method call on top of it.
However, if you're building a string over time, such as in a for-loop, then you'll want to use a StringBuilder. If you were to just use += then you're building a brand new string every time the += line is called. StringBuilder is much faster since it holds a buffer and appends to that every time you call append.
Slightly off-topic, but I wish that the whole "must-not-use-plus-to-concatenate-strings-in-Java" myth would go away. While it might have been true in early versions of Java that StringBuffer was faster and "+ was evil", it is certainly not true in modern JVMs that are taking care of a lot of optimisations.
For example, which is faster?
String s = "abc" + "def";
or
StringBuffer buf = new StringBuffer();
buf.append("abc");
buf.append("def");
String s = buf.toString();
The answer is the former. The JVM recognises that this is a string constant and will actually put "abcdef" in the string pool, whereas the "optimised stringbuffer" version will cause an extra StringBuffer object to be built.
Another JVM optimisation is
String s = onestring + " concat " + anotherstring;
Where the JVM will work out what the best way of concatenating will be. In JDK 5, this means a StringBuilder will be internally used and it will be faster than using a string buffer.
But as other answers have said, the "123456" constant in your question is certainly the fastest way and your lecturer should go back to being a student :-)
And yes, I've been sad enough to verify this by looking at the Java bytecode...
This whole discussion is moot. Please read this article by Jeff, i.e., the guy who created Stack Overflow.
The Sad Tragedy of Micro-Optimization Theater
Please refer your instructor to this post and ask him to stop ruining his/her student's brains with useless information. Algorithmic optimizations are where your code will live or die, not with what method you use to construct strings. In any case, StringBuilder, and String formatter have to execute ACTUAL CODE with REAL MEMORY, if you just construct a string it gets set aside during compile time and is ready to be used when you need it, in essence, it has 0 run-time cost, while the other options have real cost, since code actually needs to be executed.
String foo = "some string literal";
Is certainly the fastest way to make a String. It's embedded in the .class file and is a simple memory look-up to retrieve.
Using String.format when you have nothing to really format just looks ugly and might cause junior developers to cry.
If the String is going to be modified, then StringBuilder is the best since Strings are immutable.
In your second example, using:
String foo = String.format("%s", 123456);
doesn't buy you anything; 123456 is already a constant value, so why not just assign foo = "123456"? For constant strings, there's no better way.
If you're creating a string from multiple parts being appended together at runtime, use StringBuffer or StringBuilder (the former being thread-safe).
If your string is known at compile-time, then using a literal is best: String foo = "123456";.
If your string is not known at compile-time and is composed of an aggregation of smaller strings, StringBuilder is usually the way to go (but beware thread-safety!).
Using String foo = String.format("%s", 123456); could reduce your .class' size and make class-loading it a tiny bit faster, but that would be extremely aggressive (extreme) memory tuning there ^^.
As has been pointed out, if you're just building a single string with no concatenation, just use String.
For concatenating multiple bits into one big string, StringBuffer is slower than StringBuilder, but StringBuffer is synchronized. If you don't need synchronization, StringBuilder.
Are you 100% certain that the instructor was not talking about something like:
String foo = "" + 123456;
I see my students do that type of thing "all the time" (a handful will do that each term). The reason that they do it is that some book showed them how to do it that way. Shakes head and fist at lazy book writers!
The first example you gave is the fastest and the simplest. Use that.
Each piece of code you added in those examples makes it significantly slower and more difficult to read.
I would suggest example 2 is at least 10-100x slower than example 1 and example 3 is about 2x slower than example 2.
Did your processor provide any justification for this assertion?
BTW: Your first example doesn't construct a String at all (which is why it is fastest), it just hands you a String sitting in the String constant pool.
How about measuring dynamic strings so that VM cannot optimise it:
public static void measureConcats(long lim){
double sum = 0;
long start = System.currentTimeMillis();
for(long a = 0;a<lim;++a){
sum+=Math.random();
}
long end = System.currentTimeMillis();
System.out.println("Sum:" +sum);
System.out.println("Double creations time:" + (end - start));
String res = "";
Double sad = 0.0;
start = System.currentTimeMillis();
for(long b = 0;b<lim;++b){
sad = Math.random();
String sa = sad.toString();
res+=sa;
}
end = System.currentTimeMillis();
System.out.println("Pure string concat time:" + (end - start));
System.out.println("len:"+res.length());
StringBuffer sbf = new StringBuffer();
start = System.currentTimeMillis();
for(long c = 0;c<lim;++c){
sad = Math.random();
String sa = sad.toString();
sbf.append(sa);
}
end = System.currentTimeMillis();
System.out.println("StringBuffer concat time:" + (end - start));
System.out.println("len:"+sbf.length());}
My result for 10000 concats is 364ms for String+=String and 14ms for StringBuffer append.
I was very surprised about this result.

Strings are immutable - that means I should never use += and only StringBuffer?

Strings are immutable, meaning, once they have been created they cannot be changed.
So, does this mean that it would take more memory if you append things with += than if you created a StringBuffer and appended text to that?
If you use +=, you would create a new 'object' each time that has to be saved in the memory, wouldn't you?
Yes, you will create a new object each time with +=. That doesn't mean it's always the wrong thing to do, however. It depends whether you want that value as a string, or whether you're just going to use it to build the string up further.
If you actually want the result of x + y as a string, then you might as well just use string concatenation. However, if you're really going to (say) loop round and append another string, and another, etc - only needing the result as a string at the very end, then StringBuffer/StringBuilder are the way to go. Indeed, looping is really where StringBuilder pays off over string concatenation - the performance difference for 5 or even 10 direct concatenations is going to be quite small, but for thousands it becomes a lot worse - basically because you get O(N2) complexity with concatenation vs O(N) complexity with StringBuilder.
In Java 5 and above, you should basically use StringBuilder - it's unsynchronized, but that's almost always okay; it's very rare to want to share one between threads.
I have an article on all of this which you might find useful.
Rule of thumb is simple:
If you are running concatenations in a loop, don't use +=
If you are not running concatenations in a loop, using += simply does not matter. (Unless a performance critical application
In Java 5 or later, StringBuffer is thread safe, and so has some overhead that you shouldn't pay for unless you need it. StringBuilder has the same API but is not thread safe (i.e. you should only use it internal to a single thread).
Yes, if you are building up large strings, it is more efficient to use StringBuilder. It is probably not worth it to pass StringBuilder or StringBuffer around as part of your API. This is too confusing.
I agree with all the answers posted above, but it will help you a little bit to understand more about the way Java is implemented. The JVM uses StringBuffers internally to compile the String + operator (From the StringBuffer Javadoc):
String buffers are used by the
compiler to implement the binary
string concatenation operator +. For
example, the code:
x = "a" + 4 + "c"
is compiled to the equivalent of:
x = new StringBuffer().append("a").append(4).append("c")
.toString()
Likewise, x += "some new string" is equivalent to x = x + "some new string". Do you see where I'm going with this?
If you are doing a lot of String concatenations, using StringBuffer will increase your performance, but if you're only doing a couple of simple String concatenations, the Java compiler will probably optimize it for you, and you won't notice a difference in performance
Yes. String is immutable. For occasional use, += is OK. If the += operation is intensive, you should turn to StringBuilder.
But the garbage collector will end up freeing the old strings once there are no references to them
Exactly. You should use a StringBuilder though if thread-safety isn't an issue.
As a side note: There might be several String objects using the same backing char[] - for instance whenever you use substring(), no new char[] will be created which makes using it quite efficient.
Additionally, compilers may do some optimization for you. For instance if you do
static final String FOO = "foo";
static final String BAR = "bar";
String getFoobar() {
return FOO + BAR; // no string concatenation at runtime
}
I wouldn't be surprised if the compiler would use StringBuilder internally to optimize String concatenation where possible - if not already maybe in the future.
I think it relies on the GC to collect the memory with the abandoned string.
So doing += with string builder will be definitely faster if you have a lot of operation on string manipulation. But it's shouldn't a problem for most cases.
Yes you would and that is exactly why you should use StringBuffer to concatenate alot of Strings.
Also note that since Java 5 you should also prefer StringBuilder most of the time. It's just some sort of unsynchronized StringBuffer.
You're right that Strings are immutable, so if you're trying to conserve memory while doing a lot of string concatenation, you should use StringBuilder rather than +=.
However, you may not mind. Programs are written for their human readers, so you can go with clarity. If it's important that you optimize, you should profile first. Unless your program is very heavily weighted toward string activity, there will probably be other bottlenecks.
No
It will not use more memory. Yes, new objects are created, but the old ones are recycled. In the end, the amount of memory used is the same.

Categories

Resources