Java : Clearing StringBuffer contents - java

All,
I was wondering if clearing a StringBuffer contents using the setLength(0) would make sense. i.e. Is it better to do :
while (<some condition>)
{
stringBufferVariable = new StringBuffer(128);
stringBufferVariable.append(<something>)
.append(<more>)
... ;
Append stringBufferVariable.toString() to a file;
stringBufferVariable.setLength(0);
}
My questions:
1 > Will this still have better performance than having a String object to append the contents?
I am not really sure how reinitializing the StringBuffer variable would affect the performance and hence the question.
Please pour in your comments
[edit]: Removed the 2nd question about comparing to StringBuilder since I have understood there nothing more to look into based on responses.

Better than concatenating strings?
If you're asking whether
stringBufferVariable.append("something")
.append("more");
...
will perform better than concatenating with +, then yes, usually. That's the whole reason these classes exist. Object creation is expensive compared to updating the values in a char array.
It appears most if not all compilers now convert string concatenation into using StringBuilder in simple cases such as str = "something" + "more" + "...";. The only performance difference I can then see is that the compiler won't have the advantage of setting the initial size. Benchmarks would tell you whether the difference is enough to matter. Using + would make for more readable code though.
From what I've read, the compiler apparently can't optimize concatenation done in a loop when it's something like
String str = "";
for (int i = 0; i < 10000; i++) {
str = str + i + ",";
}
so in those cases you would still want to explicitly use StringBuilder.
StringBuilder vs StringBuffer
StringBuilder is not thread-safe while StringBuffer is, but they are otherwise the same. The synchronization performed in StringBuffer makes it slower, so StringBuilder is faster and should be used unless you need the synchronization.
Should you use setLength?
The way your example is currently written I don't think the call to setLength gets you anything, since you're creating a new StringBuffer on each pass through the loop. What you should really do is
StringBuilder sb = new StringBuilder(128);
while (<some condition>) {
sb.append(<something>)
.append(<more>)
... ;
// Append stringBufferVariable.toString() to a file;
sb.setLength(0);
}
This avoids unnecessary object creation and setLength will only be updating an internal int variable in this case.

I'm just focusing on this part of the question. (The other parts have been asked and answered many times before on SO.)
I was wondering if clearing a StringBuffer contents using the setLength(0) would make sense.
It depends on the Java class libraries you are using. In some older Sun releases of Java, the StringBuffer.toString() was implemented on the assumption that a call to sb.toString() is the last thing that is done with the buffer. The StringBuffer's original backing array becomes part of the String returned by toString(). A subsequent attempt to use the StringBuffer resulted a new backing array being created and initialized by copying the String contents. Thus, reusing a StringBuffer as your code tries to do would actually make the application slower.
With Java 1.5 and later, a better way to code this is as follows:
bufferedWriter.append(stringBufferVariable);
stringBufferVariable.setLength(0);
This should copy the characters directly from the StringBuilder into the file buffer without any need to create a temporary String. Providing that the StringBuffer declaration is outside the loop, the setLength(0) then allows you to reuse the buffer.
Finally, you should only be worrying about all of this if you have evidence that this part of the code is (or is likely to be) a bottleneck. "Premature optimization is the root of all evil" blah, blah.

For question 2, StringBuilder will perform better than StringBuffer. StringBuffer is thread safe, meaning methods are synchronized. String Builder is not synchronized. So if the code you have is going to be run by ONE thread, then StringBuilder is going to have better performance since it does not have the overhead of doing synchronization.
As camickr suggest, please check out the API for StringBuffer and StringBuilder for more information.
Also you may be interested in this article: The Sad Tragedy of Micro-Optimization Theater

1 > Will this still have better performance than having a String object to append the contents?
Yes, concatenating Strings is slow since you keep creating new String Objects.
2 > If using StringBuilder would perform better than StringBuffer here, than why?
Have you read the API description for StringBuilder and/or StringBuffer? This issued is addressed there.
I am not really sure how reinitializing the StringBuffer variable would affect the performance and hence the question.
Well create a test program. Create a test that creates a new StringBuffer/Builder every time. Then rerun the test and just reset the characters to 0 and then compare the times.

Perhaps I am misunderstanding something in your question... why are you setting the length to 0 at the bottom if you are just creating a new one at the start of each iteration?
Assuming the variable is a local variable to the method or that it will not be using by multiple threads if it is declared outside of a method (if it is outside of a method your code probably has issues though) then make it a StringBuilder.
If you declare the StringBuilder outside of the loop then you don't need to make a new one each time you enter the loop but you would want to set the length to 0 at the end of the loop.
If you declare the StringBuilder inside of the loop then you don't need to set the length to 0 at the end of the loop.
It is likely that declaring it outside of the loop and setting the length to 0 will be faster, but I would measure both and if there isn't a large difference declare the variable inside the loop. It is good practice to limit the scope of variables.

yup! setLength(0) is a great idea! that's what its for. anything quicker would be to discard the stringBuffer & make a new one. its faster, can't say anything about it being memory efficient :)

Related

Why do we have String class, if StringBuilder or StringBuffer can do what a String does? [duplicate]

This question already has answers here:
Why can't strings be mutable in Java and .NET?
(17 answers)
Closed 7 years ago.
I've always wondered why does JAVA and C# has String (immutable & threadsafe) class, if they have StringBuilder (mutable & not threadsafe) or StringBuffer (mutable & threadsafe) class. Isn't StringBuilder/StringBuffer superset of String class? I mean, why should I use String class, if I've option of using StringBuilder/StringBuffer?
For example, Instead of using following,
String str;
Why can't I always use following?
StringBuilder strb; //or
StringBuffer strbu;
In short, my question is, How will my code get effected if I replace String with StringBuffer class? Also, StringBuffer has added advantage of mutability.
I mean, why should I use String class, if I've option of using StringBuilder/StringBuffer?
Precisely because it's immutable. Immutability has a whole host of benefits, primarily that it makes it much easier to reason about your code without creating copies of the data everywhere "just in case" something decides to mutate the value. For example:
private readonly String name;
public Person(string name)
{
if (string.IsNullOrEmpty(name)) // Or whatever
{
// Throw some exception
}
this.name = name;
}
// All the rest of the code can rely on name being a non-null
// reference to a non-empty string. Nothing can mutate it, leaving
// evil reflection aside.
Immutability makes sharing simple and efficient. That's particularly useful for multi-threaded code. It makes "modifying" (i.e. creating a new instance with different data) more painful, but in many situations that's absolutely fine, because values pass through the system without ever being modified.
Immutability is particularly useful for "simple" types such as strings, dates, numbers (BigDecimal, BigInteger etc). It allows them to be used within maps more easily, it allows a simple equality definition, etc.
1) StringBuilder as well as StringBuffer both are mutable. So it will cause a few problems like using in collections like keys in hashMap. See this link.
Another example of advantage of immutability will be what Jon has mentioned in his comments. I am just pasting here.
Someone can call Person p = new Person(builder); with a builder which initially passes my validation criteria - and then modify it afterwards, without the Person class having any say in it. In order to avoid that, the Person class would need to copy the validated data.
Immutabilty assures this does not happen.
2) As string is most extensively used object in java, the string pool offers to resuse same string, thus saving memory.
I completely agree with Jon Skeet that immutability is one reason to use String. Another reason (from a C# perspective) is that String is actually lighter weight than StringBuilder. If you look at reference source for both String and String Builder you will see that StringBuilder actually has a number of String constants in it. As a developer, you should only use what you need so unless you need the added benefits provided from StringBuilder you should use String.
Many answers have already outlined that there are shortcomings from using mutable variants such as StringBuilder. To illustrate the problem, one thing that you cannot achieve with StringBuilder is associative memory, i.e. hash tables. Sure, most implementations will allow you to use StringBuilder as a key for hashtables, but they will only find the values for the exact same instance of StringBuilder. However, the typical behavior that you would want to achieve is that it does not matter where the string comes from as only the characters are important, as you e.g. reade the string from a database or file (or any other external resource).
However, as far as I understood your question, you were mainly asking about field types. And indeed, I see your point particularly taking into account that we are doing the exact same thing with collections of other objects which are usually not immutable objects but mutable collections, such as List or ArrayList in C# or Java, respectively. In the end, a string is only a collection of characters, so why not making it mutable?
The answer I would give here is that the usual behavior of how such a string is changed is very different to usual collections. If you have a collection of subsequent elements, it is very common to only add a single element to the collection, leaving most of the collection untouched, i.e. you would not discard a list to insert an item, at least unless you are programming in Haskell :). For many strings like names, this is different as you typically replace the whole string. Given the importance of a string data type, the platforms usually offer a lot of optimization for strings such as interned strings, making the choice even more biased towards strings.
However, in the end, every program is different and you might have requirements that make it more reasonable to use StringBuilder by default, but for the given reasons, I think that these cases are rather rare.
EDIT: As you were asking for examples. Consider the following code:
stopwatch.Start();
var s = "";
for (int i = 0; i < 100000; i++)
{
s = "." + s;
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
var s2 = new StringBuilder();
for (int i = 0; i < 100000; i++)
{
s2.Insert(0, ".");
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
Technically, both bits are doing a very similar thing, they will insert a character at the first position and shift whatever comes after. Both versions will involve copying the whole string that has been there before. The version with string completes in 1750ms on my machine whereas StringBuilder took 2245ms. However, both versions are reasonably fast, making the performance impact negligible in this case.
I would like to add some differences between String and StringBuilder classes:
Yes, as mentioned above String is immutable class and content cannot be changed after string has been created. It is allow to work with the same string objects from different threads without locking.
If you need to concatenate a lot of strings together, use StringBuilder class. When you use "+" operator it creates a lot of string objects on managed heap and hurts performance.
StringBuilder is mutable class. StringBuilder stores characters in array and can manipulate with characters without creating a new string object (such as add, remove, replace, append).
If you know approximate length of result string you should set capacity. Default capacity is 16 (.NET 4.5). It gives you performance improvements because StringBuilder has inner array of chars. Array of chars recreates when count of characters exceeds current capacity.
String:
is immutable (so you can use it in collections)
every operation creates a new instance on the Heap. Technically speaking really depends on the code.
For performance and memory consumption purposes it makes sense to use StringBuilder.

Is a new string created every time replaceAll() is used on a String?

So I do know that java Strings are immutable.
There are a bunch of methods that replace characters in a string in java.
So every time these methods are called, would it involve creation of a brand new String, therefore increasing the space complexity, or would replacement be done in the original String itself. I'm a little confused on this concept as to whether each of these replace statements in my code would be generating new Strings each time, and thus consuming more memory?
You noted correctly that String objects in Java are immutable. The only case when replacement, substring, etc. methods do not create a new object is when the replacement is a no-op. For example, if you ask to replace all 'x' characters in a "Hello, world!" string, there would be no new String object created. Similarly, there would be no new object when you call str.substring(0), because the entire string is returned. In all other cases, when the return value differs from the original, a new object is created.
Yes. You have noted it correctly. That immutability of String type has some consequences.
That is why the designers of Java have bring another type that should be used when you perform operations with char sequences.
A class called StringBuilder, should be used when you perform lot of operations that involve characters manipulations like replace. Off course it it more robust and requires more attention to details but that is all when you care about the performance.
So the immutability of String type do not increase memory usage. What increase it is wrong usage of String type.
They generate new ones each time; that is the corollary to being immutable.
It's true, in some sense, that it increases the 'space complexity', in that it uses more memory than the most efficient possible algorithms for replacement, but it's not as bad as it sounds; the transient objects created during the replaceAll operation and others like it are garbage collected very quickly; java is very efficient at garbage collecting transient objects. See http://www.infoq.com/articles/Java_Garbage_Collection_Distilled for an interesting writeup on some garbage collection basics.
It is true that it will return a new String , but unless the call is part of some giant loop or recursive function, one need not worry too much.
But if you purposefully wanted to crash your system, I'm sure you can think up some way.
JDK misses mutating operations for character sequences, i.e. StringBuilder for some reason does not implement replacement functionality.
A possible option would be to use third party libraries, i.e. a MutableString. It is available in Maven Central.

Java -- StringBuilder using String & not StringBuilder

I'm using StringBuilder, instead of String, in my code in effort
to make the code time-efficient during all that parsing & concatenation.
But when i look into its source, the substring() method of
AbstractStringBuilder and thus StringBuilder is returning a String and not a StringBuilder.
What would be the idea behind this ?
Thanks.
The reason the substring method returns an immutable String is that once you get a part of the string inside your StringBuilder, it must make a copy. It cannot give you a mutable "live view" into the middle of StringBuilder's content, because otherwise you would run into conflicts of which changes to apply to what string.
Since a copy is to be made anyway, it might as well be immutable: you can easily make it mutable if you wish by constructing a StringBuilder around it, with full understanding that it is detached from the mutable original.
To go from one StringBuilder to another containing a segment of the original, you could use:
StringBuilder original = ...;
StringBuilder sub = new StringBuilder().append(original, offset, length);
This could have been provided as a method of original, but as things stand it isn't.
This aside, you should profile your code before engaging in micro-optimisations of this sort.

What are the best practices for building strings in the given situation with StringBuilder?

We are constructing a large block of text and are using a single instance of StringBuilder. We have broken up the block of text into subsections (5) and assigned each a corresponding method. Each method takes input variables and spits out text.
Is it better to pass in the StringBuilder object to each method, append the data in the method and return void or have each method return a string that we append to the object outside of smaller functions?
What would be some benefits/drawbacks to both ideas.
I'd pass in the StringBuilder, and append directly to it - assuming you don't actually need that intermediate string for any other reason.
The whole point of using StringBuilder is to avoid creating more strings than you need.
The main advantage of just returning strings is that if you ever want to use the same code in a situation where you're not appending to a StringBuilder, it's more convenient and idiomatic. But I'm assuming these are actually private methods just called from the "build the text" method, which makes it less of an issue.
Zim-Zam's point about parallelism is an interesting one, but I wouldn't worry about that unless you're actually planning to parallelize this.
Passing in a StringBuilder is more memory efficient because there's no need to garbage-collect the strings that are being appended. Returning strings is more amenable to parallelization because the strings can be generated concurrently and then appended in sequence.
fixed literal will be detected by compiler so here string is better to use
String s1="str1"+"str2"+"str3";
When you need to append strings dynamically, prefer StringBuilder
String s1="";
s1+="str1";
s1+="str2";
s1+="str3";
and use methods of StringBuilder to append and finally convert to
toString();
as by using + to append later (compiler optimization is not done) more objects are created
So use StringBilder

Question about reversing a string

i am trying to do a simple string manipulation. input is "murder", i want to get "murderredrum".
i tried this
String str = "murder";
StringBuffer buf = new StringBuffer(str);
// buf is now "murder", so i append the reverse which is "redrum"
buf.append(buf.reverse());
System.out.println(buf);
but now i get "redrumredrum" instead of "murderredrum".
can someone explain what's wrong with my program? thank you.
The short answer
The line:
buf.append(buf.reverse());
essentially does the following:
buf.reverse(); // buf is now "redrum"
buf.append(buf);
This is why you get "redrumredrum".
That is, buf.reverse() doesn't return a new StringBuffer which is the reverse of buf. It returns buf, after it had reversed itself!
There are many ways to "fix" this, but the easiest would be to explicitly create a new StringBuffer for the reversal, so something like this:
buf.append(new StringBuffer(str).reverse());
Deeper insight: comparing String and StringBuffer
String in Java is immutable. On the other hand, StringBuffer is mutable (which is why you can, among other things, append things to it).
This is why with String, a transforming method really returns a new String. This is why something like this is "wrong"
String str = "murder";
str.toUpperCase(); // this is "wrong"!!!
System.out.println(str); // still "murder"
Instead you want to do:
String str = "murder";
str = str.toUpperCase(); // YES!!!
System.out.println(str); // now "MURDER"!!!
However, the situation is far from analogous with StringBuffer. Most StringBuffer methods do return StringBuffer, but they return the same instance that it was invoked on! They do NOT return a new StringBuffer instance. In fact, you're free to discard the "result", because these methods have already accomplished what they do through various mutations (i.e. side effects) to the instance it's invoked upon.
These methods could've been declared as void, but the reason why they essentially return this; instead is because it facilitates method chaining, allowing you to write something like:
sb.append(thisThing).append(thatThing).append(oneMoreForGoodMeasure);
Related questions
Method chaining - why is it a good practice, or not?
Fluent Interfaces - Method Chaining
Appendix: StringBuffer vs StringBuilder
Instead of StringBuffer, you should generally prefer StringBuilder, which is faster because it's not synchronized. Most of the discussions above also applies to StringBuilder.
From the documentation:
StringBuffer : A thread-safe, mutable sequence of characters. [...] As of JDK 5, this class has been supplemented with an equivalent class designed for use by a single thread, StringBuilder, which should generally be preferred as it supports all of the same operations but faster, as it performs no synchronization.
StringBuilder : A mutable sequence of characters. [...] Instances of StringBuilder are not safe for use by multiple threads. If such synchronization is required then it is recommended that StringBuffer be used.
Related questions
StringBuilder and StringBuffer in Java
Bonus material! Alternative solution!
Here's an alternative "fix" to the problem that is perhaps more readable:
StringBuilder word = new StringBuilder("murder");
StringBuilder worddrow = new StringBuilder(); // starts empty
worddrow.append(word).append(word.reverse());
System.out.println(worddrow); // "murderredrum"
Note that while this should do fine for short strings, it does use an extra buffer which means that it's not the most efficient way to solve the problem.
Related questions
Reverse a string in Java, in O(1)? - as a CharSequence, yes this can be done!
Bonus material again! The last laugh!
StringBuilder sb = new StringBuilder("ha");
sb.append(sb.append(sb));
System.out.println(sb); // "hahahaha"
buf.reverse() gets called first it modifies the stringbuffer to redrum. Now you are appending redrum to redrum

Categories

Resources