Is this string join implementation really O(n^2)?

Is this string join implementation really O(n^2)? - java

I was reading a popular book on algorithm questions and saw an implementation of string join such as the following:
public String joinTheStrings(String[] theStrings){
String joinedString = "";
for(String singleString : theStrings){
joinedString = joinedString + singleString;
}
return joinedString;
}
The author then went on to claim this implementation is O(n^2), and that an optimization would be to use a StringBuffer in place of joinedString, which they claimed would make the algorithm O(n). However, I fail to see how the original algorithm is O(n^2) - It appears to me that for N words there will be N operations (adding the strings together).
Update: Thank you for responses. It looks like I am confused about how the author treats the copying of an array of characters (which I would think to be a constant) as another factor of N in the amortized runtime?

The + operator used to create a new String by copying both of its operands's characters in a new char[] with size firstString.length() + secondString.length(), but for that it had to iterate over the chars of both of them. That's where your hidden inner loop lies.
However, recent versions of JDK optimize string concatenation at compile time by automatically turning it into a StringBuilder's append() operation. The book you mention must be quite old, nowadays StringBuffer is not used in general, because it's synchronized.
Anyway, the compiler might not be smart enough to extract StringBuilder instantiation outside of the loop, creating new string builders for each iteration, thereby reducing its usefulness as a performance optimization. So it's not a bad idea to write this by hand:
public String joinTheStrings(String[] theStrings) {
StringBuilder joinedString = new StringBuilder();
for (String singleString : theStrings)
joinedString.append(singleString);
return joinedString.toString();
}

Related

Is chain of StringBuilder.append more efficient than string concatenation?

According to Netbeans hint named Use chain of .append methods instead of string concatenation
Looks for string concatenation in the parameter of an invocation of the append method of StringBuilder or StringBuffer.
Is StringBuilder.append() really more efficient than strings concatenation?
Code sample
StringBuilder sb = new StringBuilder();
sb.append(filename + "/");
vs.
StringBuilder sb = new StringBuilder();
sb.append(filename).append("/");

You have to balance readability with functionality.
Let's say you have the following:
String str = "foo";
str += "bar";
if(baz) str += "baz";
This will create 2 string builders (where you only need 1, really) plus an additional string object for the interim. You would be more efficient if you went:
StringBuilder strBuilder = new StringBuilder("foo");
strBuilder.append("bar");
if(baz) strBuilder.append("baz");
String str = strBuilder.toString();
But as a matter of style, I think the first one looks just fine. The performance benefit of a single object creation seems very minimal to me. Now, if instead of 3 strings, you had 10, or 20, or 100, I would say the performance outweighs the style. If it was in a loop, for sure I'd use the string builder, but I think just a couple strings is fine to do the 'sloppy' way to make the code look cleaner. But... this has a very dangerous trap lurking in it! Read on below (pause to build suspense... dun dun dunnnn)
There are those who say to always use the explicit string builder. One rationale is that your code will continue to grow, and it will usually do so in the same manner as it is already (i.e. they won't take the time to refactor.) So you end up with those 10 or 20 statements each creating their own builder when you don't need to. So to prevent this from the start, they say always use an explicit builder.
So while in your example, it's not going to be particularly faster, when someone in the future decides they want a file extension on the end, or something like that, if they continue to use string concatenation instead of a StringBuilder, they're going to run into performance problems eventually.
We also need to think about the future. Let's say you were making Java code back in JDK 1.1 and you had the following method:
public String concat(String s1, String s2, String s3) {
return s1 + s2 + s3;
}
At that time, it would have been slow because StringBuilder didn't exist.
Then in JDK 1.3 you decided to make it faster by using StringBuffer (StringBuilder still doesn't exist yet). You do this:
public String concat(String s1, String s2, String s3) {
StringBuffer sb = new StringBuffer();
sb.append(s1);
sb.append(s2);
sb.append(s3);
return sb.toString();
}
It gets a lot faster. Awesome!
Now JDK 1.5 comes out, and with it comes StringBuilder (which is faster than StringBuffer) and the automatic transation of
return s1 + s2 + s3;
to
return new StringBuilder().append(s1).append(s2).append(s3).toString();
But you don't get this performance benefit because you're using StringBuffer explicitly. So by being smart, you have caused a performance hit when Java got smarter than you. So you have to keep in mind that there are things out there you won't think of.

Well, your first example is essentially translated by the compiler into something along the lines:
StringBuilder sb = new StringBuilder();
sb.append(new StringBuilder().append(filename).append("/").toString());
so yes, there is a certain inefficiency here. However, whether it really matters in your program is a different question. Aside from being questionable style (hint: subjective), it usually only matters, if you are doing this in a tight loop.

None of the answers so far explicitly address the specific case that hint is for. It's not saying to always use StringBuilder#append instead of concatenation. But, if you're already using a StringBuilder, it doesn't make sense to mix in concatenation, because it creates a redundant StringBuilder (See Dirk's answer) and an unnecessary temporary String instance.
Several answers already discuss why the suggested way is more efficient, but the main point is, if you already have a StringBuilder instance, just call append on it. It's just as readable (in my opinion, and apparently whoever wrote the NetBeans hint) since you're calling append anyway, and it's a little more efficient.

Theoretically, yes. Because String objects are immutable: once constructed they cannot be changed anymore. So using "+" (concatenation) basically creates a new object each time.
Practically no. The compiler is clever enough to replace all your "+" with StringBuilder appendings.
For a more detailed explanation:
http://kaioa.com/node/59
PS: Netbeans??? Come on!

A concat of two strings is faster using this function.
However, if you have multiple strings or different data type, you should use a StringBuilder either explicitly or implicitly. Using a + with Strings is using a StringBuilder implicitly.

It's only more efficient if you are using lots of concatenation and really long strings. For general-use, such as creating a filename in your example, any string concatenation is just fine and more readable.
At any rate, this part of your application is unlikely to be the performance bottleneck.

StringBuilder/StringBuffer vs. "+" Operator

I'm reading "Better, Faster, Lighter Java" (by Bruce Tate and Justin Gehtland) and am familiar with the readability requirements in agile type teams, such as what Robert Martin discusses in his clean coding books. On the team I'm on now, I've been told explicitly not to use the + operator because it creates extra (and unnecessary) string objects during runtime.
But this article, Written back in '04 talks about how object allocation is about 10 machine instructions. (essentially free)
It also talks about how the GC also helps to reduce costs in this environment.
What is the actual performance tradeoffs between using +, StringBuilder or StringBuffer? (In my case it is StringBuffer only as we are limited to Java 1.4.2.)
StringBuffer to me results in ugly, less readable code, as a couple of examples in Tate's book demonstrates. And StringBuffer is thread-synchronized which seems to have its own costs that outweigh the "danger" in using the + operator.
Thoughts/Opinions?

Using String concatenation is translated into StringBuilder operations by the compiler.
To see how the compiler is doing I'll take a sample class, compile it and decompile it with jad to see what's the generated bytecode.
Original class:
public void method1() {
System.out.println("The answer is: " + 42);
}
public void method2(int value) {
System.out.println("The answer is: " + value);
}
public void method3(int value) {
String a = "The answer is: " + value;
System.out.println(a + " what is the question ?");
}
The decompiled class:
public void method1()
{
System.out.println("The answer is: 42");
}
public void method2(int value)
{
System.out.println((new StringBuilder("The answer is: ")).append(value).toString());
}
public void method3(int value)
{
String a = (new StringBuilder("The answer is: ")).append(value).toString();
System.out.println((new StringBuilder(String.valueOf(a))).append(" what is the question ?").toString());
}
On method1 the compiler performed the operation at compile time.
On method2 the String concatenation is equivalent to manually use StringBuilder.
On method3 the String concatenation is definitely bad as the compiler is creating a second StringBuilder rather than reusing the previous one.
So my simple rule is that concatenations are good unless you need to concatenate the result again: for instance in loops or when you need to store an intermediate result.

Your team needs to learn about the reasons for avoiding repeated string concatenation.
There certainly are times when it makes sense to use StringBuffer - in particular when you're creating a string in a loop, especially if you aren't sure that there will be few iterations in the loop. Note that it's not just a matter of creating new objects - it's a matter of copying all the text data you've appended already. Also bear in mind that object allocation is only "essentially free" if you don't consider garbage collection. Yes, if there's enough room in the current generation, it's basically a matter of incrementing a pointer... but:
That memory must have been cleared at some point. That's not free.
You're shortening the time until the next GC is required. GC isn't free.
If your object lives into the next generation, it may take longer to be cleaned up - again, not free.
All of these things are reasonably cheap in that it's "usually" not worth bending a design away from elegance to avoid creating objects... but you shouldn't regard them as free.
On the other hand, there is no point in using StringBuffer in cases where you won't need the intermediate strings. For example:
String x = a + b + c + d;
is at least as efficient as:
StringBuffer buffer = new StringBuffer();
buffer.append(a);
buffer.append(b);
buffer.append(c);
buffer.append(d);
String x = buffer.toString();

For small concatenations you can simply use String and + for the sake of readability. Performance is not going to suffer. But if you are doing lots of concatenation operations go for StringBuffer.

Other answers have mentioned that StringBuilder should be used when you are creating a string in a loop. However, most of the loops are over collections and from Java 8 the collections can be transformed to Strings using the joining method from Collectors class.
As an example, in the next code:
String result = Arrays.asList("Apple", "Banana", "Orange").stream()
.collect(Collectors.joining(", ", "<", ">"));
result will be: <Apple, Banana, Orange>

When to use StringBuilder in Java [duplicate]

This question already has answers here:
StringBuilder vs String concatenation in toString() in Java
(20 answers)
Closed 8 years ago.
It is supposed to be generally preferable to use a StringBuilder for string concatenation in Java. Is this always the case?
What I mean is this: Is the overhead of creating a StringBuilder object, calling the append() method and finally toString() already smaller then concatenating existing strings with the + operator for two strings, or is it only advisable for more (than two) strings?
If there is such a threshold, what does it depend on (perhaps the string length, but in which way)?
And finally, would you trade the readability and conciseness of the + concatenation for the performance of the StringBuilder in smaller cases like two, three or four strings?
Explicit use of StringBuilder for regular concatenations is being mentioned as obsolete at obsolete Java optimization tips as well as at Java urban myths.

If you use String concatenation in a loop, something like this,
String s = "";
for (int i = 0; i < 100; i++) {
s += ", " + i;
}
then you should use a StringBuilder (not StringBuffer) instead of a String, because it is much faster and consumes less memory.
If you have a single statement,
String s = "1, " + "2, " + "3, " + "4, " ...;
then you can use Strings, because the compiler will use StringBuilder automatically.

Ralph's answer is fabulous. I would rather use StringBuilder class to build/decorate the String because the usage of it is more look like Builder pattern.
public String decorateTheString(String orgStr){
StringBuilder builder = new StringBuilder();
builder.append(orgStr);
builder.deleteCharAt(orgStr.length()-1);
builder.insert(0,builder.hashCode());
return builder.toString();
}
It can be use as a helper/builder to build the String, not the String itself.

As a general rule, always use the more readable code and only refactor if performance is an issue. In this specific case, most recent JDK's will actually optimize the code into the StringBuilder version in any case.
You usually only really need to do it manually if you are doing string concatenation in a loop or in some complex code that the compiler can't easily optimize.

Have a look at: http://www.javaspecialists.eu/archive/Issue068.html and http://www.javaspecialists.eu/archive/Issue105.html
Do the same tests in your environment and check if newer JDK or your Java implementation do some type of string operation better with String or better with StringBuilder.

Some compilers may not replace any string concatenations with StringBuilder equivalents. Be sure to consider which compilers your source will use before relying on compile time optimizations.

The + operator uses public String concat(String str) internally. This method copies the characters of the two strings, so it has memory requirements and runtime complexity proportional to the length of the two strings. StringBuilder works more efficent.
However I have read here that the concatination code using the + operater is changed to StringBuilder on post Java 4 compilers. So this might not be an issue at all. (Though I would really check this statement if I depend on it in my code!)

For two strings concat is faster, in other cases StringBuilder is a better choice, see my explanation in concatenation operator (+) vs concat()

The problem with String concatenation is that it leads to copying of the String object with all the associated cost. StringBuilder is not threadsafe and is therefore faster than StringBuffer, which used to be the preferred choice before Java 5. As a rule of thumb, you should not do String concatenation in a loop, which will be called often. I guess doing a few concatenations here and there will not hurt you as long as you are not talking about hundreds and this of course depends on your performance requirements. If you are doing real time stuff, you should be very careful.

The Microsoft certification material addresses this same question. In the .NET world, the overhead for the StringBuilder object makes a simple concatenation of 2 String objects more efficient. I would assume a similar answer for Java strings.

The best alternative for String flyweight implementation in Java

My application is multithreaded with intensive String processing. We are experiencing excessive memory consumption and profiling has demonstrated that this is due to String data. I think that memory consumption would benefit greatly from using some kind of flyweight pattern implementation or even cache (I know for sure that Strings are often duplicated, although I don't have any hard data in that regard).
I have looked at Java Constant Pool and String.intern, but it seems that it can provoke some PermGen problems.
What would be the best alternative for implementing application-wide, multithreaded pool of Strings in java?
EDIT: Also see my previous, related question: How does java implement flyweight pattern for string under the hood?

Note: This answer uses examples that might not be relevant in modern runtime JVM libraries. In particular, the substring example is no longer an issue in OpenJDK/Oracle 7+.
I know it goes against what people often tell you, but sometimes explicitly creating new String instances can be a significant way to reduce your memory.
Because Strings are immutable, several methods leverage that fact and share the backing character array to save memory. However, occasionally this can actually increase the memory by preventing garbage collection of unused parts of those arrays.
For example, assume you were parsing the message IDs of a log file to extract warning IDs. Your code would look something like this:
//Format:
//ID: [WARNING|ERROR|DEBUG] Message...
String testLine = "5AB729: WARNING Some really really really long message";
Matcher matcher = Pattern.compile("([A-Z0-9]*): WARNING.*").matcher(testLine);
if ( matcher.matches() ) {
String id = matcher.group(1);
//...do something with id...
}
But look at the data actually being stored:
//...
String id = matcher.group(1);
Field valueField = String.class.getDeclaredField("value");
valueField.setAccessible(true);
char[] data = ((char[])valueField.get(id));
System.out.println("Actual data stored for string \"" + id + "\": " + Arrays.toString(data) );
It's the whole test line, because the matcher just wraps a new String instance around the same character data. Compare the results when you replace String id = matcher.group(1); with String id = new String(matcher.group(1));.

This is already done at the JVM level. You only need to ensure that you aren't creating new Strings everytime, either explicitly or implicitly.
I.e. don't do:
String s1 = new String("foo");
String s2 = new String("foo");
This would create two instances in the heap. Rather do so:
String s1 = "foo";
String s2 = "foo";
This will create one instance in the heap and both will refer the same (as evidence, s1 == s2 will return true here).
Also don't use += to concatenate strings (in a loop):
String s = "";
for (/* some loop condition */) {
s += "new";
}
The += implicitly creates a new String in the heap everytime. Rather do so
StringBuilder sb = new StringBuilder();
for (/* some loop condition */) {
sb.append("new");
}
String s = sb.toString();
If you can, rather use StringBuilder or its synchronized brother StringBuffer instead of String for "intensive String processing". It offers useful methods for exactly those purposes, such as append(), insert(), delete(), etc. Also see its javadoc.

Java 7/8
If you are doing what the accepted answer says and using Java 7 or newer you are not doing what it says you are.
The implementation of subString() has changed.
Never write code that relies on an implementation that can change drastically and might make things worse if you are relying on the old behavior.
1950 public String substring(int beginIndex, int endIndex) {
1951 if (beginIndex < 0) {
1952 throw new StringIndexOutOfBoundsException(beginIndex);
1953 }
1954 if (endIndex > count) {
1955 throw new StringIndexOutOfBoundsException(endIndex);
1956 }
1957 if (beginIndex > endIndex) {
1958 throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
1959 }
1960 return ((beginIndex == 0) && (endIndex == count)) ? this :
1961 new String(offset + beginIndex, endIndex - beginIndex, value);
1962 }
So if you use the accepted answer with Java 7 or newer you are creating twice as much memory usage and garbage that needs to be collected.

Effeciently pack Strings in memory! I once wrote a hyper memory efficient Set class, where Strings were stored as a tree. If a leaf was reached by traversing the letters, the entry was contained in the set. Fast to work with, too, and ideal to store a large dictionary.
And don't forget that Strings are often the largest part in memory in nearly every app I profiled, so don't care for them if you need them.
Illustration:
You have 3 Strings: Beer, Beans and Blood. You can create a tree structure like this:
B
+-e
+-er
+-ans
+-lood
Very efficient for e.g. a list of street names, this is obviously most reasonable with a fixed dictionary, because insert cannot be done efficiently. In fact the structure should be created once, then serialized and afterwards just loaded.

First, decide how much your application and developers would suffer if you eliminated some of that parsing. A faster application does you no good if you double your employee turnover rate in the process! I think based on your question we can assume you passed this test already.
Second, if you can't eliminate creating an object, then your next goal should be to ensure it doesn't survive Eden collection. And parse-lookup can solve that problem. However, a cache "implemented properly" (I disagree with that basic premise, but I won't bore you with the attendant rant) usually brings thread contention. You'd be replacing one kind of memory pressure for another.
There's a variation of the parse-lookup idiom that suffers less from the sort of collateral damage you usually get from full-on caching, and that's a simple precalculated lookup table (see also "memoization"). The Pattern you usually see for this is the Type Safe Enumeration (TSE). With the TSE, you parse the String, pass it to the TSE to retrieve the associated enumerated type, and then you throw the String away.
Is the text you're processing free-form, or does the input have to follow a rigid specification? If a lot of your text renders down to a fixed set of possible values, then a TSE could help you here, and serves a greater master: Adding context/semantics to your information at the point of creation, instead of at the point of use.

Is there a "fastest way" to construct Strings in Java?

I normally create a String in Java the following way:
String foo = "123456";
However, My lecturer has insisted to me that forming a String using the format method, as so:
String foo = String.format("%s", 123456);
Is much faster.
Also, he says that using the StringBuilder class is even faster.
StringBuilder sb = new StringBuilder();
String foo = sb.append(String.format("%s", 123456)).toString();
Which is the fastest method to create a String, if there even is one?
They could not be 100% accurate as I might not remember them fully.

If there is only one string then:
String foo = "123456";
Is fastest. You'll notice that the String.format line has "%s%" declared in it, so I don't see how the lecturer could possibly think that was faster. Plus you've got a method call on top of it.
However, if you're building a string over time, such as in a for-loop, then you'll want to use a StringBuilder. If you were to just use += then you're building a brand new string every time the += line is called. StringBuilder is much faster since it holds a buffer and appends to that every time you call append.

Slightly off-topic, but I wish that the whole "must-not-use-plus-to-concatenate-strings-in-Java" myth would go away. While it might have been true in early versions of Java that StringBuffer was faster and "+ was evil", it is certainly not true in modern JVMs that are taking care of a lot of optimisations.
For example, which is faster?
String s = "abc" + "def";
or
StringBuffer buf = new StringBuffer();
buf.append("abc");
buf.append("def");
String s = buf.toString();
The answer is the former. The JVM recognises that this is a string constant and will actually put "abcdef" in the string pool, whereas the "optimised stringbuffer" version will cause an extra StringBuffer object to be built.
Another JVM optimisation is
String s = onestring + " concat " + anotherstring;
Where the JVM will work out what the best way of concatenating will be. In JDK 5, this means a StringBuilder will be internally used and it will be faster than using a string buffer.
But as other answers have said, the "123456" constant in your question is certainly the fastest way and your lecturer should go back to being a student :-)
And yes, I've been sad enough to verify this by looking at the Java bytecode...

This whole discussion is moot. Please read this article by Jeff, i.e., the guy who created Stack Overflow.
The Sad Tragedy of Micro-Optimization Theater
Please refer your instructor to this post and ask him to stop ruining his/her student's brains with useless information. Algorithmic optimizations are where your code will live or die, not with what method you use to construct strings. In any case, StringBuilder, and String formatter have to execute ACTUAL CODE with REAL MEMORY, if you just construct a string it gets set aside during compile time and is ready to be used when you need it, in essence, it has 0 run-time cost, while the other options have real cost, since code actually needs to be executed.

String foo = "some string literal";
Is certainly the fastest way to make a String. It's embedded in the .class file and is a simple memory look-up to retrieve.
Using String.format when you have nothing to really format just looks ugly and might cause junior developers to cry.
If the String is going to be modified, then StringBuilder is the best since Strings are immutable.

In your second example, using:
String foo = String.format("%s", 123456);
doesn't buy you anything; 123456 is already a constant value, so why not just assign foo = "123456"? For constant strings, there's no better way.
If you're creating a string from multiple parts being appended together at runtime, use StringBuffer or StringBuilder (the former being thread-safe).

If your string is known at compile-time, then using a literal is best: String foo = "123456";.
If your string is not known at compile-time and is composed of an aggregation of smaller strings, StringBuilder is usually the way to go (but beware thread-safety!).
Using String foo = String.format("%s", 123456); could reduce your .class' size and make class-loading it a tiny bit faster, but that would be extremely aggressive (extreme) memory tuning there ^^.

As has been pointed out, if you're just building a single string with no concatenation, just use String.
For concatenating multiple bits into one big string, StringBuffer is slower than StringBuilder, but StringBuffer is synchronized. If you don't need synchronization, StringBuilder.

Are you 100% certain that the instructor was not talking about something like:
String foo = "" + 123456;
I see my students do that type of thing "all the time" (a handful will do that each term). The reason that they do it is that some book showed them how to do it that way. Shakes head and fist at lazy book writers!

The first example you gave is the fastest and the simplest. Use that.
Each piece of code you added in those examples makes it significantly slower and more difficult to read.
I would suggest example 2 is at least 10-100x slower than example 1 and example 3 is about 2x slower than example 2.
Did your processor provide any justification for this assertion?
BTW: Your first example doesn't construct a String at all (which is why it is fastest), it just hands you a String sitting in the String constant pool.

How about measuring dynamic strings so that VM cannot optimise it:
public static void measureConcats(long lim){
double sum = 0;
long start = System.currentTimeMillis();
for(long a = 0;a<lim;++a){
sum+=Math.random();
}
long end = System.currentTimeMillis();
System.out.println("Sum:" +sum);
System.out.println("Double creations time:" + (end - start));
String res = "";
Double sad = 0.0;
start = System.currentTimeMillis();
for(long b = 0;b<lim;++b){
sad = Math.random();
String sa = sad.toString();
res+=sa;
}
end = System.currentTimeMillis();
System.out.println("Pure string concat time:" + (end - start));
System.out.println("len:"+res.length());
StringBuffer sbf = new StringBuffer();
start = System.currentTimeMillis();
for(long c = 0;c<lim;++c){
sad = Math.random();
String sa = sad.toString();
sbf.append(sa);
}
end = System.currentTimeMillis();
System.out.println("StringBuffer concat time:" + (end - start));
System.out.println("len:"+sbf.length());}
My result for 10000 concats is 364ms for String+=String and 14ms for StringBuffer append.
I was very surprised about this result.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Is this string join implementation really O(n^2)? - java

Related

Is chain of StringBuilder.append more efficient than string concatenation?

StringBuilder/StringBuffer vs. "+" Operator

When to use StringBuilder in Java [duplicate]

The best alternative for String flyweight implementation in Java

Is there a "fastest way" to construct Strings in Java?

Categories

Resources