Is there a "fastest way" to construct Strings in Java? - java

I normally create a String in Java the following way:
String foo = "123456";
However, My lecturer has insisted to me that forming a String using the format method, as so:
String foo = String.format("%s", 123456);
Is much faster.
Also, he says that using the StringBuilder class is even faster.
StringBuilder sb = new StringBuilder();
String foo = sb.append(String.format("%s", 123456)).toString();
Which is the fastest method to create a String, if there even is one?
They could not be 100% accurate as I might not remember them fully.

If there is only one string then:
String foo = "123456";
Is fastest. You'll notice that the String.format line has "%s%" declared in it, so I don't see how the lecturer could possibly think that was faster. Plus you've got a method call on top of it.
However, if you're building a string over time, such as in a for-loop, then you'll want to use a StringBuilder. If you were to just use += then you're building a brand new string every time the += line is called. StringBuilder is much faster since it holds a buffer and appends to that every time you call append.

Slightly off-topic, but I wish that the whole "must-not-use-plus-to-concatenate-strings-in-Java" myth would go away. While it might have been true in early versions of Java that StringBuffer was faster and "+ was evil", it is certainly not true in modern JVMs that are taking care of a lot of optimisations.
For example, which is faster?
String s = "abc" + "def";
or
StringBuffer buf = new StringBuffer();
buf.append("abc");
buf.append("def");
String s = buf.toString();
The answer is the former. The JVM recognises that this is a string constant and will actually put "abcdef" in the string pool, whereas the "optimised stringbuffer" version will cause an extra StringBuffer object to be built.
Another JVM optimisation is
String s = onestring + " concat " + anotherstring;
Where the JVM will work out what the best way of concatenating will be. In JDK 5, this means a StringBuilder will be internally used and it will be faster than using a string buffer.
But as other answers have said, the "123456" constant in your question is certainly the fastest way and your lecturer should go back to being a student :-)
And yes, I've been sad enough to verify this by looking at the Java bytecode...

This whole discussion is moot. Please read this article by Jeff, i.e., the guy who created Stack Overflow.
The Sad Tragedy of Micro-Optimization Theater
Please refer your instructor to this post and ask him to stop ruining his/her student's brains with useless information. Algorithmic optimizations are where your code will live or die, not with what method you use to construct strings. In any case, StringBuilder, and String formatter have to execute ACTUAL CODE with REAL MEMORY, if you just construct a string it gets set aside during compile time and is ready to be used when you need it, in essence, it has 0 run-time cost, while the other options have real cost, since code actually needs to be executed.

String foo = "some string literal";
Is certainly the fastest way to make a String. It's embedded in the .class file and is a simple memory look-up to retrieve.
Using String.format when you have nothing to really format just looks ugly and might cause junior developers to cry.
If the String is going to be modified, then StringBuilder is the best since Strings are immutable.

In your second example, using:
String foo = String.format("%s", 123456);
doesn't buy you anything; 123456 is already a constant value, so why not just assign foo = "123456"? For constant strings, there's no better way.
If you're creating a string from multiple parts being appended together at runtime, use StringBuffer or StringBuilder (the former being thread-safe).

If your string is known at compile-time, then using a literal is best: String foo = "123456";.
If your string is not known at compile-time and is composed of an aggregation of smaller strings, StringBuilder is usually the way to go (but beware thread-safety!).
Using String foo = String.format("%s", 123456); could reduce your .class' size and make class-loading it a tiny bit faster, but that would be extremely aggressive (extreme) memory tuning there ^^.

As has been pointed out, if you're just building a single string with no concatenation, just use String.
For concatenating multiple bits into one big string, StringBuffer is slower than StringBuilder, but StringBuffer is synchronized. If you don't need synchronization, StringBuilder.

Are you 100% certain that the instructor was not talking about something like:
String foo = "" + 123456;
I see my students do that type of thing "all the time" (a handful will do that each term). The reason that they do it is that some book showed them how to do it that way. Shakes head and fist at lazy book writers!

The first example you gave is the fastest and the simplest. Use that.
Each piece of code you added in those examples makes it significantly slower and more difficult to read.
I would suggest example 2 is at least 10-100x slower than example 1 and example 3 is about 2x slower than example 2.
Did your processor provide any justification for this assertion?
BTW: Your first example doesn't construct a String at all (which is why it is fastest), it just hands you a String sitting in the String constant pool.

How about measuring dynamic strings so that VM cannot optimise it:
public static void measureConcats(long lim){
double sum = 0;
long start = System.currentTimeMillis();
for(long a = 0;a<lim;++a){
sum+=Math.random();
}
long end = System.currentTimeMillis();
System.out.println("Sum:" +sum);
System.out.println("Double creations time:" + (end - start));
String res = "";
Double sad = 0.0;
start = System.currentTimeMillis();
for(long b = 0;b<lim;++b){
sad = Math.random();
String sa = sad.toString();
res+=sa;
}
end = System.currentTimeMillis();
System.out.println("Pure string concat time:" + (end - start));
System.out.println("len:"+res.length());
StringBuffer sbf = new StringBuffer();
start = System.currentTimeMillis();
for(long c = 0;c<lim;++c){
sad = Math.random();
String sa = sad.toString();
sbf.append(sa);
}
end = System.currentTimeMillis();
System.out.println("StringBuffer concat time:" + (end - start));
System.out.println("len:"+sbf.length());}
My result for 10000 concats is 364ms for String+=String and 14ms for StringBuffer append.
I was very surprised about this result.

Related

Is chain of StringBuilder.append more efficient than string concatenation?

According to Netbeans hint named Use chain of .append methods instead of string concatenation
Looks for string concatenation in the parameter of an invocation of the append method of StringBuilder or StringBuffer.
Is StringBuilder.append() really more efficient than strings concatenation?
Code sample
StringBuilder sb = new StringBuilder();
sb.append(filename + "/");
vs.
StringBuilder sb = new StringBuilder();
sb.append(filename).append("/");
You have to balance readability with functionality.
Let's say you have the following:
String str = "foo";
str += "bar";
if(baz) str += "baz";
This will create 2 string builders (where you only need 1, really) plus an additional string object for the interim. You would be more efficient if you went:
StringBuilder strBuilder = new StringBuilder("foo");
strBuilder.append("bar");
if(baz) strBuilder.append("baz");
String str = strBuilder.toString();
But as a matter of style, I think the first one looks just fine. The performance benefit of a single object creation seems very minimal to me. Now, if instead of 3 strings, you had 10, or 20, or 100, I would say the performance outweighs the style. If it was in a loop, for sure I'd use the string builder, but I think just a couple strings is fine to do the 'sloppy' way to make the code look cleaner. But... this has a very dangerous trap lurking in it! Read on below (pause to build suspense... dun dun dunnnn)
There are those who say to always use the explicit string builder. One rationale is that your code will continue to grow, and it will usually do so in the same manner as it is already (i.e. they won't take the time to refactor.) So you end up with those 10 or 20 statements each creating their own builder when you don't need to. So to prevent this from the start, they say always use an explicit builder.
So while in your example, it's not going to be particularly faster, when someone in the future decides they want a file extension on the end, or something like that, if they continue to use string concatenation instead of a StringBuilder, they're going to run into performance problems eventually.
We also need to think about the future. Let's say you were making Java code back in JDK 1.1 and you had the following method:
public String concat(String s1, String s2, String s3) {
return s1 + s2 + s3;
}
At that time, it would have been slow because StringBuilder didn't exist.
Then in JDK 1.3 you decided to make it faster by using StringBuffer (StringBuilder still doesn't exist yet). You do this:
public String concat(String s1, String s2, String s3) {
StringBuffer sb = new StringBuffer();
sb.append(s1);
sb.append(s2);
sb.append(s3);
return sb.toString();
}
It gets a lot faster. Awesome!
Now JDK 1.5 comes out, and with it comes StringBuilder (which is faster than StringBuffer) and the automatic transation of
return s1 + s2 + s3;
to
return new StringBuilder().append(s1).append(s2).append(s3).toString();
But you don't get this performance benefit because you're using StringBuffer explicitly. So by being smart, you have caused a performance hit when Java got smarter than you. So you have to keep in mind that there are things out there you won't think of.
Well, your first example is essentially translated by the compiler into something along the lines:
StringBuilder sb = new StringBuilder();
sb.append(new StringBuilder().append(filename).append("/").toString());
so yes, there is a certain inefficiency here. However, whether it really matters in your program is a different question. Aside from being questionable style (hint: subjective), it usually only matters, if you are doing this in a tight loop.
None of the answers so far explicitly address the specific case that hint is for. It's not saying to always use StringBuilder#append instead of concatenation. But, if you're already using a StringBuilder, it doesn't make sense to mix in concatenation, because it creates a redundant StringBuilder (See Dirk's answer) and an unnecessary temporary String instance.
Several answers already discuss why the suggested way is more efficient, but the main point is, if you already have a StringBuilder instance, just call append on it. It's just as readable (in my opinion, and apparently whoever wrote the NetBeans hint) since you're calling append anyway, and it's a little more efficient.
Theoretically, yes. Because String objects are immutable: once constructed they cannot be changed anymore. So using "+" (concatenation) basically creates a new object each time.
Practically no. The compiler is clever enough to replace all your "+" with StringBuilder appendings.
For a more detailed explanation:
http://kaioa.com/node/59
PS: Netbeans??? Come on!
A concat of two strings is faster using this function.
However, if you have multiple strings or different data type, you should use a StringBuilder either explicitly or implicitly. Using a + with Strings is using a StringBuilder implicitly.
It's only more efficient if you are using lots of concatenation and really long strings. For general-use, such as creating a filename in your example, any string concatenation is just fine and more readable.
At any rate, this part of your application is unlikely to be the performance bottleneck.

StringBuilder/StringBuffer vs. "+" Operator

I'm reading "Better, Faster, Lighter Java" (by Bruce Tate and Justin Gehtland) and am familiar with the readability requirements in agile type teams, such as what Robert Martin discusses in his clean coding books. On the team I'm on now, I've been told explicitly not to use the + operator because it creates extra (and unnecessary) string objects during runtime.
But this article, Written back in '04 talks about how object allocation is about 10 machine instructions. (essentially free)
It also talks about how the GC also helps to reduce costs in this environment.
What is the actual performance tradeoffs between using +, StringBuilder or StringBuffer? (In my case it is StringBuffer only as we are limited to Java 1.4.2.)
StringBuffer to me results in ugly, less readable code, as a couple of examples in Tate's book demonstrates. And StringBuffer is thread-synchronized which seems to have its own costs that outweigh the "danger" in using the + operator.
Thoughts/Opinions?
Using String concatenation is translated into StringBuilder operations by the compiler.
To see how the compiler is doing I'll take a sample class, compile it and decompile it with jad to see what's the generated bytecode.
Original class:
public void method1() {
System.out.println("The answer is: " + 42);
}
public void method2(int value) {
System.out.println("The answer is: " + value);
}
public void method3(int value) {
String a = "The answer is: " + value;
System.out.println(a + " what is the question ?");
}
The decompiled class:
public void method1()
{
System.out.println("The answer is: 42");
}
public void method2(int value)
{
System.out.println((new StringBuilder("The answer is: ")).append(value).toString());
}
public void method3(int value)
{
String a = (new StringBuilder("The answer is: ")).append(value).toString();
System.out.println((new StringBuilder(String.valueOf(a))).append(" what is the question ?").toString());
}
On method1 the compiler performed the operation at compile time.
On method2 the String concatenation is equivalent to manually use StringBuilder.
On method3 the String concatenation is definitely bad as the compiler is creating a second StringBuilder rather than reusing the previous one.
So my simple rule is that concatenations are good unless you need to concatenate the result again: for instance in loops or when you need to store an intermediate result.
Your team needs to learn about the reasons for avoiding repeated string concatenation.
There certainly are times when it makes sense to use StringBuffer - in particular when you're creating a string in a loop, especially if you aren't sure that there will be few iterations in the loop. Note that it's not just a matter of creating new objects - it's a matter of copying all the text data you've appended already. Also bear in mind that object allocation is only "essentially free" if you don't consider garbage collection. Yes, if there's enough room in the current generation, it's basically a matter of incrementing a pointer... but:
That memory must have been cleared at some point. That's not free.
You're shortening the time until the next GC is required. GC isn't free.
If your object lives into the next generation, it may take longer to be cleaned up - again, not free.
All of these things are reasonably cheap in that it's "usually" not worth bending a design away from elegance to avoid creating objects... but you shouldn't regard them as free.
On the other hand, there is no point in using StringBuffer in cases where you won't need the intermediate strings. For example:
String x = a + b + c + d;
is at least as efficient as:
StringBuffer buffer = new StringBuffer();
buffer.append(a);
buffer.append(b);
buffer.append(c);
buffer.append(d);
String x = buffer.toString();
For small concatenations you can simply use String and + for the sake of readability. Performance is not going to suffer. But if you are doing lots of concatenation operations go for StringBuffer.
Other answers have mentioned that StringBuilder should be used when you are creating a string in a loop. However, most of the loops are over collections and from Java 8 the collections can be transformed to Strings using the joining method from Collectors class.
As an example, in the next code:
String result = Arrays.asList("Apple", "Banana", "Orange").stream()
.collect(Collectors.joining(", ", "<", ">"));
result will be: <Apple, Banana, Orange>

The best alternative for String flyweight implementation in Java

My application is multithreaded with intensive String processing. We are experiencing excessive memory consumption and profiling has demonstrated that this is due to String data. I think that memory consumption would benefit greatly from using some kind of flyweight pattern implementation or even cache (I know for sure that Strings are often duplicated, although I don't have any hard data in that regard).
I have looked at Java Constant Pool and String.intern, but it seems that it can provoke some PermGen problems.
What would be the best alternative for implementing application-wide, multithreaded pool of Strings in java?
EDIT: Also see my previous, related question: How does java implement flyweight pattern for string under the hood?
Note: This answer uses examples that might not be relevant in modern runtime JVM libraries. In particular, the substring example is no longer an issue in OpenJDK/Oracle 7+.
I know it goes against what people often tell you, but sometimes explicitly creating new String instances can be a significant way to reduce your memory.
Because Strings are immutable, several methods leverage that fact and share the backing character array to save memory. However, occasionally this can actually increase the memory by preventing garbage collection of unused parts of those arrays.
For example, assume you were parsing the message IDs of a log file to extract warning IDs. Your code would look something like this:
//Format:
//ID: [WARNING|ERROR|DEBUG] Message...
String testLine = "5AB729: WARNING Some really really really long message";
Matcher matcher = Pattern.compile("([A-Z0-9]*): WARNING.*").matcher(testLine);
if ( matcher.matches() ) {
String id = matcher.group(1);
//...do something with id...
}
But look at the data actually being stored:
//...
String id = matcher.group(1);
Field valueField = String.class.getDeclaredField("value");
valueField.setAccessible(true);
char[] data = ((char[])valueField.get(id));
System.out.println("Actual data stored for string \"" + id + "\": " + Arrays.toString(data) );
It's the whole test line, because the matcher just wraps a new String instance around the same character data. Compare the results when you replace String id = matcher.group(1); with String id = new String(matcher.group(1));.
This is already done at the JVM level. You only need to ensure that you aren't creating new Strings everytime, either explicitly or implicitly.
I.e. don't do:
String s1 = new String("foo");
String s2 = new String("foo");
This would create two instances in the heap. Rather do so:
String s1 = "foo";
String s2 = "foo";
This will create one instance in the heap and both will refer the same (as evidence, s1 == s2 will return true here).
Also don't use += to concatenate strings (in a loop):
String s = "";
for (/* some loop condition */) {
s += "new";
}
The += implicitly creates a new String in the heap everytime. Rather do so
StringBuilder sb = new StringBuilder();
for (/* some loop condition */) {
sb.append("new");
}
String s = sb.toString();
If you can, rather use StringBuilder or its synchronized brother StringBuffer instead of String for "intensive String processing". It offers useful methods for exactly those purposes, such as append(), insert(), delete(), etc. Also see its javadoc.
Java 7/8
If you are doing what the accepted answer says and using Java 7 or newer you are not doing what it says you are.
The implementation of subString() has changed.
Never write code that relies on an implementation that can change drastically and might make things worse if you are relying on the old behavior.
1950 public String substring(int beginIndex, int endIndex) {
1951 if (beginIndex < 0) {
1952 throw new StringIndexOutOfBoundsException(beginIndex);
1953 }
1954 if (endIndex > count) {
1955 throw new StringIndexOutOfBoundsException(endIndex);
1956 }
1957 if (beginIndex > endIndex) {
1958 throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
1959 }
1960 return ((beginIndex == 0) && (endIndex == count)) ? this :
1961 new String(offset + beginIndex, endIndex - beginIndex, value);
1962 }
So if you use the accepted answer with Java 7 or newer you are creating twice as much memory usage and garbage that needs to be collected.
Effeciently pack Strings in memory! I once wrote a hyper memory efficient Set class, where Strings were stored as a tree. If a leaf was reached by traversing the letters, the entry was contained in the set. Fast to work with, too, and ideal to store a large dictionary.
And don't forget that Strings are often the largest part in memory in nearly every app I profiled, so don't care for them if you need them.
Illustration:
You have 3 Strings: Beer, Beans and Blood. You can create a tree structure like this:
B
+-e
+-er
+-ans
+-lood
Very efficient for e.g. a list of street names, this is obviously most reasonable with a fixed dictionary, because insert cannot be done efficiently. In fact the structure should be created once, then serialized and afterwards just loaded.
First, decide how much your application and developers would suffer if you eliminated some of that parsing. A faster application does you no good if you double your employee turnover rate in the process! I think based on your question we can assume you passed this test already.
Second, if you can't eliminate creating an object, then your next goal should be to ensure it doesn't survive Eden collection. And parse-lookup can solve that problem. However, a cache "implemented properly" (I disagree with that basic premise, but I won't bore you with the attendant rant) usually brings thread contention. You'd be replacing one kind of memory pressure for another.
There's a variation of the parse-lookup idiom that suffers less from the sort of collateral damage you usually get from full-on caching, and that's a simple precalculated lookup table (see also "memoization"). The Pattern you usually see for this is the Type Safe Enumeration (TSE). With the TSE, you parse the String, pass it to the TSE to retrieve the associated enumerated type, and then you throw the String away.
Is the text you're processing free-form, or does the input have to follow a rigid specification? If a lot of your text renders down to a fixed set of possible values, then a TSE could help you here, and serves a greater master: Adding context/semantics to your information at the point of creation, instead of at the point of use.

Speed issue while appending strings

Whenever I try to add the numbers in string like:
String s=new String();
for(int j=0;j<=1000000;j++)
s+=String.valueOf(j);
My program is adding the numbers, but very slowly. But When I altered my program and made it like:
StringBuffer sb=new StringBuffer();
for(int j=0;j<=1000000;j++)
sb.append(String.valueOf(j));
I got the result very quickly. Why is that so?
s+=String.valueOf(j); needs to allocate a new String object every time it is called, and this is expensive. The StringBuffer only needs to grow some internal representation when the contained string is too large, which happens much less often.
It would probably be even faster if you used a StringBuilder, which is a non-synchronized version of a StringBuffer.
One thing to note is that while this does apply to loops and many other cases, it does not necessarily apply to all cases where Strings are concatenated using +:
String helloWorld = getGreeting() + ", " + getUsername() + "!";
Here, the compiler will probably optimize the code in a way that it sees fit, which may or may not be creating a StringBuilder, since that is also an expensive operation.
Because s += "string" creates a new instance. A String is immutable. StringBuffer or StringBuilder adds the String without creating a new instance.
In Java as in .NET Strings are immutable. They cannot be changed after creation. The result is that using the +operator will create a new string and copy the contents of both strings into it.
A StringBuffer will double the allocated space every time it runs out of space to add characters. Thus reducing the amount of memory allocations.
Take a look at this, from the Javaspecialists newsletter by Heinz Kabutz:
http://www.javaspecialists.eu/archive/Issue068.html
and this later article:
http://java.sun.com/developer/technicalArticles/Interviews/community/kabutz_qa.html

Strings are immutable - that means I should never use += and only StringBuffer?

Strings are immutable, meaning, once they have been created they cannot be changed.
So, does this mean that it would take more memory if you append things with += than if you created a StringBuffer and appended text to that?
If you use +=, you would create a new 'object' each time that has to be saved in the memory, wouldn't you?
Yes, you will create a new object each time with +=. That doesn't mean it's always the wrong thing to do, however. It depends whether you want that value as a string, or whether you're just going to use it to build the string up further.
If you actually want the result of x + y as a string, then you might as well just use string concatenation. However, if you're really going to (say) loop round and append another string, and another, etc - only needing the result as a string at the very end, then StringBuffer/StringBuilder are the way to go. Indeed, looping is really where StringBuilder pays off over string concatenation - the performance difference for 5 or even 10 direct concatenations is going to be quite small, but for thousands it becomes a lot worse - basically because you get O(N2) complexity with concatenation vs O(N) complexity with StringBuilder.
In Java 5 and above, you should basically use StringBuilder - it's unsynchronized, but that's almost always okay; it's very rare to want to share one between threads.
I have an article on all of this which you might find useful.
Rule of thumb is simple:
If you are running concatenations in a loop, don't use +=
If you are not running concatenations in a loop, using += simply does not matter. (Unless a performance critical application
In Java 5 or later, StringBuffer is thread safe, and so has some overhead that you shouldn't pay for unless you need it. StringBuilder has the same API but is not thread safe (i.e. you should only use it internal to a single thread).
Yes, if you are building up large strings, it is more efficient to use StringBuilder. It is probably not worth it to pass StringBuilder or StringBuffer around as part of your API. This is too confusing.
I agree with all the answers posted above, but it will help you a little bit to understand more about the way Java is implemented. The JVM uses StringBuffers internally to compile the String + operator (From the StringBuffer Javadoc):
String buffers are used by the
compiler to implement the binary
string concatenation operator +. For
example, the code:
x = "a" + 4 + "c"
is compiled to the equivalent of:
x = new StringBuffer().append("a").append(4).append("c")
.toString()
Likewise, x += "some new string" is equivalent to x = x + "some new string". Do you see where I'm going with this?
If you are doing a lot of String concatenations, using StringBuffer will increase your performance, but if you're only doing a couple of simple String concatenations, the Java compiler will probably optimize it for you, and you won't notice a difference in performance
Yes. String is immutable. For occasional use, += is OK. If the += operation is intensive, you should turn to StringBuilder.
But the garbage collector will end up freeing the old strings once there are no references to them
Exactly. You should use a StringBuilder though if thread-safety isn't an issue.
As a side note: There might be several String objects using the same backing char[] - for instance whenever you use substring(), no new char[] will be created which makes using it quite efficient.
Additionally, compilers may do some optimization for you. For instance if you do
static final String FOO = "foo";
static final String BAR = "bar";
String getFoobar() {
return FOO + BAR; // no string concatenation at runtime
}
I wouldn't be surprised if the compiler would use StringBuilder internally to optimize String concatenation where possible - if not already maybe in the future.
I think it relies on the GC to collect the memory with the abandoned string.
So doing += with string builder will be definitely faster if you have a lot of operation on string manipulation. But it's shouldn't a problem for most cases.
Yes you would and that is exactly why you should use StringBuffer to concatenate alot of Strings.
Also note that since Java 5 you should also prefer StringBuilder most of the time. It's just some sort of unsynchronized StringBuffer.
You're right that Strings are immutable, so if you're trying to conserve memory while doing a lot of string concatenation, you should use StringBuilder rather than +=.
However, you may not mind. Programs are written for their human readers, so you can go with clarity. If it's important that you optimize, you should profile first. Unless your program is very heavily weighted toward string activity, there will probably be other bottlenecks.
No
It will not use more memory. Yes, new objects are created, but the old ones are recycled. In the end, the amount of memory used is the same.

Categories

Resources