Building long string literals without + operator - java

people. I wonder whether it is possible in Java. I want to log a long string message. For example
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
But to keep my source code readable I don't want to write it in one line since otherwise any code reader will have to scroll all the time. So I wrote it like
"aaaaaaaaaaa" +
"aaaaaaaaaaaaaaaaaa" +
"aaaaaaaaaaaaaaaaa"
However, since Java creates a new object for each string, and for concatenation it creates even more temporary objects, this kind of notation produces plenty of overhead. The construct is in a loop in my code, so performance issue is very important.
Are there any other ways to write the string efficiently? I searched the web but did not find anything except for using StringBuffer.
BR
Ewgenij

If the strings are known at compile time, they will be concatenated automatically, so you don't have any penalties at all.
If the strings are generated at runtime, use a StringBuilder (not a StringBuffer, it's slower because of the synchronization overhead).

StringBuilder builder = new StringBuilder("aaaaaaaaaaa");
builder.append("aaaaaaaaaaaaaaaaaa");
builder.append("aaaaaaaaaaaaaaaaa");
builder.toString(); //This returns "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
However, in your case, with hard-coded string, doing
String msg = "aaaaaaa" +
"aaaaaaa" +
"aaaaaaa"
Will compile into
String msg = "aaaaaaaaaaaaaaaaaaaaa";

If the string can be evaluated at compile time, the compiler will take care of combining them for you then. If you're building long strings at runtime, use a StringBuilder.
Here's an example of how the compiler deals with concatenation at compile time:
Source
String foo = "asdf" + "fdsa";
Class file
Constant pool:
#1 = Methodref #6.#17 // java/lang/Object."<init>":()V
#2 = String #18 // asdffdsa
...
#18 = Utf8 asdffdsa

Related

How many Strings are formed? [duplicate]

This question already has answers here:
How many string objects will be created in memory? [duplicate]
(4 answers)
Closed 3 days ago.
String a="hello";
String b=a+"Bye";
How many Strings are formed?
From my understanding of Java.
What happens in this code is:
String a="hello"; // hello is created in string pool
String b=a+"bye"; // new StringBuilder(a).append("bye")
So totally 2 strings are to be created, right?
1.Hello
2.HelloBye (In the Heap)
Or does Java create 3?
1.Hello
2.Bye
3.HelloBye
If this is the case, does append method create the appending strings in the string pool?
String a = "hello";
JVM will create one string in the string pool. (FIRST STRING IN POOL)
Now, here comes the tricky part>
b = a + "bye";
Internally + operator uses StringBuffer for concatenating strings.
String b= new StringBuilder(a).append("bye").toString(); (The toString() method of StringBuilder is returning a new String which will be definitely in the Heap since it is created with new String(...). So "bye" will be SECOND STRING IN POOL.)
Now,
b="hellobye" ("hellobye" will be THIRD STRING IN POOL)
First string "hello" is created and added to the string pool.
Next, the String "Bye" is created and added to the string pool.
The concatenation of a and "Bye" results in a new String "helloBye",
which is also added to the string pool.
A total of 3 Strings will be created in the pool: "hello", "Bye",
and "helloBye".
When you create a new StringBuilder and append a string to it, the resulting string will not be added to the string pool. Instead, a new String object will be created in the heap memory to represent the combined string.
So, the code new StringBuilder(a).append("bye") will create one new String object in the heap memory to represent the combined string and one string in pool for "a".
The only part of your question that can be answered with complete certainty is this:
Does append method create the appending strings in the string pool?
The answer is No. The result of a string concatenation that is not a constant expression is not placed in the string pool. At least not in any implementation of mainstream Java to date. However, there is no specification that actually guarantees this.
There are a couple of reasons why we don't know for sure how many strings are "formed".
We don't know when the String objects corresponding to the literals are actually created. In some Java implementation they will be created (and interned) when the code is loaded. In others, the string creation could occur the first time this code is run.
We don't know whether one or both of those literals are used by another class ... and hence whether this code is "forming" them.
Depending on the Java implementation, interning a string (to put it in the string pool) may result in a new String object being created. So you might get a scenario where two String objects get "formed" for each literal.
In short there is enough ambiguity that we cannot be 100% sure of the precise number of strings that are created during the execution of that code.
Does it matter that we don't know for sure?
Frankly, no. It should make zero difference to the way that you write your code1. Let the Java compiler and runtime take care of it ... and use a recent version of Java to get the benefit of the work they have done on optimizing this.
1 - But it is still wise to avoid string concatenation loops. I don't know if they can be optimized.
In your commented version you wrote:
String a = "hello"; // hello is created in string pool
String b = a + "bye"; // new StringBuilder(a).append("bye")
Both of those comments are questionable:
The "hello is created in string pool" comment is questionable for reasons that I gave above.
The new StringBuilder(a).append("bye") pseudo-code is questionable because that is an implementation detail. In Java 9 and later, expressions that involve string concatenations are translated to a invokedynamic bytecode. The JIT compiler generates native instructions directly. See How much does Java optimize string concatenation with +? for more information.

What is the difference between concatenation using StringBuilder append() method and String "+" operator? [duplicate]

This question already has answers here:
Java: String concat vs StringBuilder - optimised, so what should I do?
(4 answers)
StringBuilder vs String concatenation in toString() in Java
(20 answers)
Why StringBuilder when there is String?
(9 answers)
Closed 2 months ago.
Given the following two code snippets:
StringBuilder sb = new StringBuilder("");
for (int i = 0; i < 5; i++) {
sb.append("hello ");
}
System.out.println(sb.toString());
and
String ss = "";
for (int i = 0; i < 5; i++) {
ss += "hello ";
}
System.out.println(ss);
I read that string concatenation using +, uses StringBuffer or StringBuilder's append method. As far as I know, StringBuilder is mutable, so any modification should not create a new instance. String is immutable, so I expect concatenation in a loop should create a new instance every time (5 times over here). But if + is using StringBuffer or StringBuilder's append method, is it really creating a new instance of ss each time in the loop? Also which among the the code snippets will be more efficient?
It used to be the case that some Java bytecode compilers would compile the String concatenation operator (+) to operations on a temporary StringBuilder.
However:
This was an implementation detail. Though the JLS (in some versions1) said that it may be optimized in that way, it was never stated that it shall be.
This is not how more recent OpenJDK Java compilers deal with this. Concatenation expressions are now compiled by the bytecode compiler to invokedynamic calls ... which are then optimized by the JIT compiler.
However, your example involves concatenation in a loop. I don't think that JIT compiler in current versions of Java can optimize across multiple loop iterations.
But, that may change too. Indeed, JEP 280 hints that the "indifying" of string concatenations by the bytecode compiler will enable further JIT compiler optimization. That could include optimization of loops like the 2nd version of your example. And if it does change, "hand optimization" to use StringBuilder calls in your source code could actually interfere with the JIT compiler's ability to optimize.
My advice: avoid premature / unnecessary optimization. Unless your application-level profiling tells you that a particular concatenation sequence is a significant bottleneck, leave it alone.
1 - For example, the Java 18 JLS says: "An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression." JLS 15.18.1.
I'd always use StringBuilder to concatenate strings in a loop. Doing so has several advantages:
you are using StringBuilder what is made to do, so it is easier to read (but less concise);
you can pass StringBuilder around using method arguments (if you'd use the immutable String you can only use it as return value);
StringBuilder has additional options to alter the string if that would be necessary;
it will allocate some buffer space in advance, so that subsequent alterations / concatenations don't take that much memory;
it can be assumed to be relatively performant whichever direction the current JVM takes when it comes to string concatenation.
Of course, I'd use new StringBuilder() instead of the new StringBuilder("") which makes no sense - StringBuilder instances will already start with an empty string (and 16 characters of buffer space).
If you know the size of the string in advance then it does make sense to create a buffer large enough to hold it, so it doesn't need to request a larger area to hold the characters. Memory management is relatively expensive.
With regard to that, note that StringBuilder implements CharSequence so you can actually use it instead of a String for some methods if you want to avoid copies.
"Concatenate" joins two specific items together, whereas "append " adds what you specify to whatever may already be there

Is chain of StringBuilder.append more efficient than string concatenation?

According to Netbeans hint named Use chain of .append methods instead of string concatenation
Looks for string concatenation in the parameter of an invocation of the append method of StringBuilder or StringBuffer.
Is StringBuilder.append() really more efficient than strings concatenation?
Code sample
StringBuilder sb = new StringBuilder();
sb.append(filename + "/");
vs.
StringBuilder sb = new StringBuilder();
sb.append(filename).append("/");
You have to balance readability with functionality.
Let's say you have the following:
String str = "foo";
str += "bar";
if(baz) str += "baz";
This will create 2 string builders (where you only need 1, really) plus an additional string object for the interim. You would be more efficient if you went:
StringBuilder strBuilder = new StringBuilder("foo");
strBuilder.append("bar");
if(baz) strBuilder.append("baz");
String str = strBuilder.toString();
But as a matter of style, I think the first one looks just fine. The performance benefit of a single object creation seems very minimal to me. Now, if instead of 3 strings, you had 10, or 20, or 100, I would say the performance outweighs the style. If it was in a loop, for sure I'd use the string builder, but I think just a couple strings is fine to do the 'sloppy' way to make the code look cleaner. But... this has a very dangerous trap lurking in it! Read on below (pause to build suspense... dun dun dunnnn)
There are those who say to always use the explicit string builder. One rationale is that your code will continue to grow, and it will usually do so in the same manner as it is already (i.e. they won't take the time to refactor.) So you end up with those 10 or 20 statements each creating their own builder when you don't need to. So to prevent this from the start, they say always use an explicit builder.
So while in your example, it's not going to be particularly faster, when someone in the future decides they want a file extension on the end, or something like that, if they continue to use string concatenation instead of a StringBuilder, they're going to run into performance problems eventually.
We also need to think about the future. Let's say you were making Java code back in JDK 1.1 and you had the following method:
public String concat(String s1, String s2, String s3) {
return s1 + s2 + s3;
}
At that time, it would have been slow because StringBuilder didn't exist.
Then in JDK 1.3 you decided to make it faster by using StringBuffer (StringBuilder still doesn't exist yet). You do this:
public String concat(String s1, String s2, String s3) {
StringBuffer sb = new StringBuffer();
sb.append(s1);
sb.append(s2);
sb.append(s3);
return sb.toString();
}
It gets a lot faster. Awesome!
Now JDK 1.5 comes out, and with it comes StringBuilder (which is faster than StringBuffer) and the automatic transation of
return s1 + s2 + s3;
to
return new StringBuilder().append(s1).append(s2).append(s3).toString();
But you don't get this performance benefit because you're using StringBuffer explicitly. So by being smart, you have caused a performance hit when Java got smarter than you. So you have to keep in mind that there are things out there you won't think of.
Well, your first example is essentially translated by the compiler into something along the lines:
StringBuilder sb = new StringBuilder();
sb.append(new StringBuilder().append(filename).append("/").toString());
so yes, there is a certain inefficiency here. However, whether it really matters in your program is a different question. Aside from being questionable style (hint: subjective), it usually only matters, if you are doing this in a tight loop.
None of the answers so far explicitly address the specific case that hint is for. It's not saying to always use StringBuilder#append instead of concatenation. But, if you're already using a StringBuilder, it doesn't make sense to mix in concatenation, because it creates a redundant StringBuilder (See Dirk's answer) and an unnecessary temporary String instance.
Several answers already discuss why the suggested way is more efficient, but the main point is, if you already have a StringBuilder instance, just call append on it. It's just as readable (in my opinion, and apparently whoever wrote the NetBeans hint) since you're calling append anyway, and it's a little more efficient.
Theoretically, yes. Because String objects are immutable: once constructed they cannot be changed anymore. So using "+" (concatenation) basically creates a new object each time.
Practically no. The compiler is clever enough to replace all your "+" with StringBuilder appendings.
For a more detailed explanation:
http://kaioa.com/node/59
PS: Netbeans??? Come on!
A concat of two strings is faster using this function.
However, if you have multiple strings or different data type, you should use a StringBuilder either explicitly or implicitly. Using a + with Strings is using a StringBuilder implicitly.
It's only more efficient if you are using lots of concatenation and really long strings. For general-use, such as creating a filename in your example, any string concatenation is just fine and more readable.
At any rate, this part of your application is unlikely to be the performance bottleneck.

When should we change a String to a Stringbuilder?

In an application a String is a often used data type. What we know, is that the mutation of a String uses lots of memory. So what we can do is to use a StringBuilder/StringBuffer.
But at what point should we change to StringBuilder?
And what should we do, when we have to split it or to remplace characters in there?
eg:
//original:
String[] split = string.split("?");
//better? :
String[] split = stringBuilder.toString().split("?);
or
//original:
String replacedString = string.replace("l","st");
//better? :
String replacedString = stringBuilder.toString().replace("l","st");
//or
StringBuilder replacedStringBuilder = new StringBuilder(stringBuilder.toString().replace("l","st);
In your examples, there are no benefits in using a StringBuilder, since you use the toString method to create an immutable String out of your StringBuilder.
You should only copy the contents of a StringBuilder into a String after you are done appending it (or modifying it in some other way).
The problem with Java's StringBuilder is that it lacks some methods you get when using a plain string (check this thread, for example: How to implement StringBuilder.replace(String, String)).
What we know, is that a String uses lots of memory.
Actually, to be precise, a String uses less memory than a StringBuilder with equivalent contents. A StringBuilder class has some additional constant overhead, and usually has a preallocated buffer to store more data than needed at any given moment (to reduce allocations). The issue with Strings is that they are immutable, which means Java needs to create a new instance whenever you need to change its contents.
To conclude, StringBuilder is not designed for the operations you mentioned (split and replace), and it won't yield much better performance in any case. A split method cannot benefit from StringBuilder's mutability, since it creates an array of immutable strings as its output anyway. A replace method still needs to iterate through the entire string, and do a lot of copying if replaced string is not the same size as the searched one.
If you need to do a lot of appending, then go for a StringBuilder. Since it uses a "mutable" array of characters under the hood, adding data to the end will be especially efficient.
This article compares the performance of several StringBuilder and String methods (although I would take the Concatenation part with reserve, because it doesn't mention dynamic string appending at all and concentrates on a single Join operation only).
What we know, is that the mutation of a String uses lots of memory.
That is incorrect. Strings cannot be mutated. They are immutable.
What you are actually talking about is building a String from other strings. That can use a lot more memory than is necessary, but it depends how you build the string.
So what we can do is to use a StringBuilder/StringBuffer.
Using a StringBuilder will help in some circumstances:
String res = "";
for (String s : ...) {
res = res + s;
}
(If the loop iterates many times then optimizing the above to use a StringBuilder could be worthwhile.)
But in other circumstances it is a waste of time:
String res = s1 + s2 + s3 + s4 + s5;
(It is a waste of time to optimize the above to use a StringBuilder because the Java compiler will automatically translate the expression into code that creates and uses a StringBuilder.)
You should only ever use a StringBuffer instead of a StringBuilder when the string needs to be accessed and/or updated by more than one thread; i.e. when it needs to be thread-safe.
But at what point should we change to StringBuilder?
The simple answer is to only do it when the profiler tells you that you have a performance problem in your string handling / processing.
Generally speaking, StringBuilders are used for building strings rather as the primary representation of the strings.
And what should we do, when we have to split it or to replace characters in there?
Then you have to review your decision to use a StringBuilder / StringBuffer as your primary representation at that point. And if it is still warranted you have to figure out how to do the operation using the API you have chosen. (This may entail converting to a String, performing the operation and then creating a new StringBuilder from the result.)
If you frequently modify the string, go with StringBuilder. Otherwise, if it's immutable anyway, go with String.
To answer your question on how to replace characters, check this out: http://download.oracle.com/javase/tutorial/java/data/buffers.html. StringBuilder operations is what you want.
Here's another good write-up on StringBuilder: http://www.yoda.arachsys.com/csharp/stringbuilder.html
If you need to lot of alter operations on your String, then you can go for StringBuilder. Go for StringBuffer if you are in multithreaded application.
Both a String and a StringBuilder use about the same amount of memory. Why do you think it is “much”?
If you have measured (for example with jmap -histo:live) that the classes [C and java.lang.String take up most of the memory in the heap, only then should you think further in this direction.
Maybe there are multiple strings with the same value. Then, since Strings are immutable, you could intern the duplicate strings. Don't use String.intern for it, since it has bad performance characteristics, but Google Guava's Interner.

Is there a "fastest way" to construct Strings in Java?

I normally create a String in Java the following way:
String foo = "123456";
However, My lecturer has insisted to me that forming a String using the format method, as so:
String foo = String.format("%s", 123456);
Is much faster.
Also, he says that using the StringBuilder class is even faster.
StringBuilder sb = new StringBuilder();
String foo = sb.append(String.format("%s", 123456)).toString();
Which is the fastest method to create a String, if there even is one?
They could not be 100% accurate as I might not remember them fully.
If there is only one string then:
String foo = "123456";
Is fastest. You'll notice that the String.format line has "%s%" declared in it, so I don't see how the lecturer could possibly think that was faster. Plus you've got a method call on top of it.
However, if you're building a string over time, such as in a for-loop, then you'll want to use a StringBuilder. If you were to just use += then you're building a brand new string every time the += line is called. StringBuilder is much faster since it holds a buffer and appends to that every time you call append.
Slightly off-topic, but I wish that the whole "must-not-use-plus-to-concatenate-strings-in-Java" myth would go away. While it might have been true in early versions of Java that StringBuffer was faster and "+ was evil", it is certainly not true in modern JVMs that are taking care of a lot of optimisations.
For example, which is faster?
String s = "abc" + "def";
or
StringBuffer buf = new StringBuffer();
buf.append("abc");
buf.append("def");
String s = buf.toString();
The answer is the former. The JVM recognises that this is a string constant and will actually put "abcdef" in the string pool, whereas the "optimised stringbuffer" version will cause an extra StringBuffer object to be built.
Another JVM optimisation is
String s = onestring + " concat " + anotherstring;
Where the JVM will work out what the best way of concatenating will be. In JDK 5, this means a StringBuilder will be internally used and it will be faster than using a string buffer.
But as other answers have said, the "123456" constant in your question is certainly the fastest way and your lecturer should go back to being a student :-)
And yes, I've been sad enough to verify this by looking at the Java bytecode...
This whole discussion is moot. Please read this article by Jeff, i.e., the guy who created Stack Overflow.
The Sad Tragedy of Micro-Optimization Theater
Please refer your instructor to this post and ask him to stop ruining his/her student's brains with useless information. Algorithmic optimizations are where your code will live or die, not with what method you use to construct strings. In any case, StringBuilder, and String formatter have to execute ACTUAL CODE with REAL MEMORY, if you just construct a string it gets set aside during compile time and is ready to be used when you need it, in essence, it has 0 run-time cost, while the other options have real cost, since code actually needs to be executed.
String foo = "some string literal";
Is certainly the fastest way to make a String. It's embedded in the .class file and is a simple memory look-up to retrieve.
Using String.format when you have nothing to really format just looks ugly and might cause junior developers to cry.
If the String is going to be modified, then StringBuilder is the best since Strings are immutable.
In your second example, using:
String foo = String.format("%s", 123456);
doesn't buy you anything; 123456 is already a constant value, so why not just assign foo = "123456"? For constant strings, there's no better way.
If you're creating a string from multiple parts being appended together at runtime, use StringBuffer or StringBuilder (the former being thread-safe).
If your string is known at compile-time, then using a literal is best: String foo = "123456";.
If your string is not known at compile-time and is composed of an aggregation of smaller strings, StringBuilder is usually the way to go (but beware thread-safety!).
Using String foo = String.format("%s", 123456); could reduce your .class' size and make class-loading it a tiny bit faster, but that would be extremely aggressive (extreme) memory tuning there ^^.
As has been pointed out, if you're just building a single string with no concatenation, just use String.
For concatenating multiple bits into one big string, StringBuffer is slower than StringBuilder, but StringBuffer is synchronized. If you don't need synchronization, StringBuilder.
Are you 100% certain that the instructor was not talking about something like:
String foo = "" + 123456;
I see my students do that type of thing "all the time" (a handful will do that each term). The reason that they do it is that some book showed them how to do it that way. Shakes head and fist at lazy book writers!
The first example you gave is the fastest and the simplest. Use that.
Each piece of code you added in those examples makes it significantly slower and more difficult to read.
I would suggest example 2 is at least 10-100x slower than example 1 and example 3 is about 2x slower than example 2.
Did your processor provide any justification for this assertion?
BTW: Your first example doesn't construct a String at all (which is why it is fastest), it just hands you a String sitting in the String constant pool.
How about measuring dynamic strings so that VM cannot optimise it:
public static void measureConcats(long lim){
double sum = 0;
long start = System.currentTimeMillis();
for(long a = 0;a<lim;++a){
sum+=Math.random();
}
long end = System.currentTimeMillis();
System.out.println("Sum:" +sum);
System.out.println("Double creations time:" + (end - start));
String res = "";
Double sad = 0.0;
start = System.currentTimeMillis();
for(long b = 0;b<lim;++b){
sad = Math.random();
String sa = sad.toString();
res+=sa;
}
end = System.currentTimeMillis();
System.out.println("Pure string concat time:" + (end - start));
System.out.println("len:"+res.length());
StringBuffer sbf = new StringBuffer();
start = System.currentTimeMillis();
for(long c = 0;c<lim;++c){
sad = Math.random();
String sa = sad.toString();
sbf.append(sa);
}
end = System.currentTimeMillis();
System.out.println("StringBuffer concat time:" + (end - start));
System.out.println("len:"+sbf.length());}
My result for 10000 concats is 364ms for String+=String and 14ms for StringBuffer append.
I was very surprised about this result.

Categories

Resources