Advantages of format over concatenation

Advantages of format over concatenation - java

I'm using Java, and I was wondering if there is any advantage to using format over simple concatenation.
I would either format like this:
a = String.format("%s/hi", b);
or like this:
a = (b + "/hi");
Is there any advantage (other than cleanliness) of using one over the other?

It is better practice to use String.format over String.concat.
String.format() is more than just concatenating strings. For example, you can display numbers is a specific locale using String.format().
However, if you don't care about localization, functionally, there is no difference. Maybe one is faster than the other, but in most cases it will be negligible.

Format can handle quite complex patterns and formatting, if you need them. I would personally go for readability over any perceived "performance" benefits for anything other than code used in a large loop. Who cares if your method creates a couple of objects that get garbage collected soon afterwards anyway?

Only one String will be created in the following line,
a = String.format("%s/hi", b);
More than One String is being created in the following line
a = (b + "/hi");

Related

String.format() vs string concatenation performance

is there any difference in performance between these two idioms ?
String firstStr = "Hello ";
String secStr = "world";
String third = firstStr + secStr;
and
String firstStr = "Hello ";
String secStr = "world";
String third = String.format("%s%s",firstStr , secStr);
I know that concatenation with + operator is bad for performance specially if the operation is done a lot of times, but what about String.format() ? is it the same or it can help to improve performance?

The second one will be even slower (if you look at the source code of String.format() you will see why). It is just because String.format() executes much more code than the simple concatenation. And at the end of the day, both code versions create 3 instances of String. There are other reasons, not performance related, to use String.format(), as others already pointed out.

First of all, let me just put a disclaimer against premature optimization. Unless you are reasonably sure this is going to be a hotspot in your program, just choose the construct that fits your program the best.
If you are reasonably sure, however, and want to have good control over the concatenation, just use a StringBuilder directly. That's what the built-in concatenation operation does anyway, and there's no reason to assume that it's slow. As long as you keep the same StringBuilder and keep appending to it, rather than risking creating several in a row (which would have to be "initialized" with the previously created data), you'll have proper O(n) performance. Especially so if you make sure to initialize the StringBuilder with proper capacity.
That also said, however, a StringBuilder is, as mentioned, what the built-in concatenation operation uses anyway, so if you just keep all your concatenations "inline" -- that is, use A + B + C + D, rather than something like e = A + B followed by f = C + D followed by e + f (this way, the same StringBuilder is used and appended to throughout the entire operation) -- then there's no reason to assume it would be slow.
EDIT: In reply to your comment, I'd say that String.format is always slower. Even if it appends optimally, it can't do it any faster than StringBuilder (and therefore also the concatenation operation) does anyway, but it also has to create a Formatter object, do parsing of the input string, and so on. So there's more to it, but it still can't do the basic operation any faster.
Also, if you look internally in how the Formatter works, you'll find that it also (by default) uses a StringBuilder, just as the concatenation operation does. Therefore, it does the exact same basic operation -- that is, feeding a StringBuilder with the strings you give it. It just does it in a far more roundabout way.

As described in this excellent answer , you would rather use String.format, but mainly because of localization concerns.
Suppose that you had to supply different text for different languages, in that case then using String.format - you can just plug in the new languages(using resource files). But concatenation leaves messy code.
See : Is it better practice to use String.format over string Concatenation in Java?

Can Java skip .toUpperCase() on literal string constants already in upper case?

I have a .toUpperCase() happening in a tight loop and have profiled and shown it is impacting application performance. Annoying thing is it's being called on strings already in capital letters. I'm considering just dropping the call to .toUpperCase() but this makes my code less safe for future use.
This level of Java performance optimization is past my experience thus far. Is there any way to do a pre-compilation, set an annotation, etc. to skip the call to toUpperCase on already upper case strings?

What you need to do if you can is call .toUpperCase() on the string once, and store it so that when you go through the loop you won't have to do it each time.
I don't believe there is a pre-compilation situation - you can't know in advance what data the code will be handling. If anyone can correct me on this, it's be pretty awesome.
If you post some example code, I'd be able to help a lot more - it really depends on what kind of access you have to the data before you get to the loop. If your loop is actually doing the data access (e.g., reading from a file) and you don't have control over where those files come from, your hands are a lot more tied than if the data is hardcoded.
Any many cases there's an easy answer, but in some, there's not much you can do.
You can try equalsIgnoreCase, too. It doesn't make a new string.

No you cannot do this using an annotation or pre-compilation because your input is given during the runtime and the annotation and pre-compilation are compile time constructions.
If you would have known the input in advance then you could simply convert it to uppercase before running the application, i.e. before you compile your application.
Note that there are many ways to optimize string handling but without more information we cannot give you any tailor made solution.

You can write a simple function isUpperCase(String) and call it before calling toUpperCase():
if (!isUpperCase(s)) {
s = s.toUpperCase()
}
It might be not significantly faster but at least this way less garbage will be created. If a majority of the strings in your input are already upper case this is very valid optimization.
isUpperCase function will look roughly like this:
boolean isUpperCase(String s) {
for (int i = 0; i < s.length; i++) {
if (Character.isLowerCase(s.charAt(i)) {
return false;
}
}
return true;
}

you need to do an if statement that conditions those letters out of it. the ideas good just have a condition. Then work with ascii codes so convert it using (int) then find the ascii numbers for uppercase which i have no idea what it is, and then continue saying if ascii whatever is true then ignore this section or if its for specific letters in a line then ignore it for charAt(i)
sorry its a rough explanation

Primitive + "" versus Wrapper.toString(primitive)

When needing to convert a primitive to a String, for example to pass to a method that expects a String, there are basically two options.
Using int as an example, given:
int i;
We can do one of:
someStringMethod(Integer.toString(i));
someStringMethod(i + "");
The first one is the "formal" approach, and the second one seems somewhat of a "hack".
It's certainly a lot less code and easier to read the "hack" IMHO.
But is it "good" coding style to use the "hack"?

The best way to convert * to String is to use:
String.valueOf(x)
it works with primitive types, wrapper classes and Object who do implement toString().
where x is any kind of primitive or object. If it is null it returns the string 'null'.
The reason why this is best, is because using "+" operand implies String concatenation, plus the "" implies String instantiation. If you decompile a class using ""+something you'll see the compiler translate that to multiple operations.
"" concatenation result is the same as String.valueOf() but it is a little bit more expensive.
The performance difference is probably negligible, but good programmers don't write '"" + something' to convert something to a String when there is a better way, which happens to be the correct way :).
For arrays, have a look at Arrays.toString() and -better- Arrays.deepToString()
But is it "good" coding style to use the "hack"?
Sometimes syntactic hack make the code better. But the case above, is not really one of those cases.
"" concatenation not considered good code.
An example of a useful syntactic hack is the double brace instantiation:
List<String> list = new ArrayList<String>() {{
add("foo");
add("bar");
add("baz");
}};
instead of
List<String> list = new ArrayList<String>();
list.add("foo");
list.add("bar");
list.add("baz");
The ""+something is more a code smell than a hack that improve readability; anybody with some experience would think to a Java developer who lack some knowledge of the API and the language.
Other interesting "hacks" are fluent APIs (like Mockito), dsl, or things like lambda4j, which is quite an hack.

It's not a good idea, but it's not going to be very harmful.
In both examples Integer.toString() is being called by the JVM (even if not directly in the first case).
In the second example, the JVM possibly constructs two strings and then produces the result of the concatenation. Depending on optimizations this may result in 0, 1 or 2 extra strings to garbage collect.
From a code writing perspective, the third is shorter, but also possibly less transparent as to what is happening.
The problems with the second approach in readability follow:
int i = 1, j = 2;
someStringMethod(i + j + "" + i + j + "");
Produces:
someStringMethod(312);
and not
someStringMethod(33);

When strings are formed using + sign, jvm will create a intermediate string/stringbuffer object to carry out trasnformation, which is not the optimum way.
Also, + sign is left associative,so when used in expressions like ""+1+2 will result in 12 and not "3".So it's better to use String.toValue() or Type.toString().

Concatenation symbol in java is +.
So you convert the result of (i + "") to String because you concat an empty string to a number in your example code.
The two ways are good but I normally use String.valueOf(x);

Java's String.replace() vs. String.replaceFirst() vs. homebrew

I have a class that is doing a lot of text processing. For each string, which is anywhere from 100->2000 characters long, I am performing 30 different string replacements.
Example:
string modified;
for(int i = 0; i < num_strings; i++){
modified = runReplacements(strs[i]);
//do stuff
}
public runReplacements(String str){
str = str.replace("foo","bar");
str = str.replace("baz","beef");
....
return str;
}
'foo', 'baz', and all other "targets" are only expected to appear once and are string literals (no need for an actual regex).
As you can imagine, I am concerned about performance :)
Given this,
replaceFirst() seems a bad choice because it won't use Pattern.LITERAL and will do extra processing that isn't required.
replace() seems a bad choice because it will traverse the entire string looking for multiple instances to be replaced.
Additionally, since my replacement texts are the same everytime, it seems to make sense for me to write my own code otherwise String.replaceFirst() or String.replace() will be doing a Pattern.compile every single time in the background. Thinking that I should write my own code, this is my thought:
Perform a Pattern.compile() only once for each literal replacement desired (no need to recompile every single time) (i.e. p1 - p30)
Then do the following for each pX: p1.matcher(str).replaceFirst(Matcher.quoteReplacement("desiredReplacement"));
This way I abandon ship on the first replacement (instead of traversing the entire string), and I am using literal vs. regex, and I am not doing a re-compile every single iteration.
So, which is the best for performance?

So, which is the best for performance?
Measure it! ;-)
ETA: Since a two word answer sounds irretrievably snarky, I'll elaborate slightly. "Measure it and tell us..." since there may be some general rule of thumb about the performance of the various approaches you cite (good ones, all) but I'm not aware of it. And as a couple of the comments on this answer have mentioned, even so, the different approaches have a high likelihood of being swamped by the application environment. So, measure it in vivo and focus on this if it's a real issue. (And let us know how it goes...)

First, run and profile your entire application with a simple match/replace. This may show you that:
your application already runs fast enough, or
your application is spending most of its time doing something else, so optimizing the match/replace code is not worthwhile.
Assuming that you've determined that match/replace is a bottleneck, write yourself a little benchmarking application that allows you to test the performance and correctness of your candidate algorithms on representative input data. It's also a good idea to include "edge case" input data that is likely to cause problems; e.g. for the substitutions in your example, input data containing the sequence "bazoo" could be an edge case. On the performance side, make sure that you avoid the traps of Java micro-benchmarking; e.g. JVM warmup effects.
Next implement some simple alternatives and try them out. Is one of them good enough? Done!
In addition to your ideas, you could try concatenating the search terms into a single regex (e.g. "(foo|baz)" ), use Matcher.find(int) to find each occurrence, use a HashMap to lookup the replacement strings and a StringBuilder to build the output String from input string substrings and replacements. (OK, this is not entirely trivial, and it depends on Pattern/Matcher handling alternates efficiently ... which I'm not sure is the case. But that's why you should compare the candidates carefully.)
In the (IMO unlikely) event that a simple alternative doesn't cut it, this wikipedia page has some leads which may help you to implement your own efficient match/replacer.

Isn't if frustrating when you ask a question and get a bunch of advice telling you to do a whole lot of work and figure it out for yourself?!
I say use replaceAll();
(I have no idea if it is, indeed, the most efficient, I just don't want you to feel like you wasted your money on this question and got nothing.)
[edit]
PS. After that, you might want to measure it.
[edit 2]
PPS. (and tell us what you found)

Strings are immutable - that means I should never use += and only StringBuffer?

Strings are immutable, meaning, once they have been created they cannot be changed.
So, does this mean that it would take more memory if you append things with += than if you created a StringBuffer and appended text to that?
If you use +=, you would create a new 'object' each time that has to be saved in the memory, wouldn't you?

Yes, you will create a new object each time with +=. That doesn't mean it's always the wrong thing to do, however. It depends whether you want that value as a string, or whether you're just going to use it to build the string up further.
If you actually want the result of x + y as a string, then you might as well just use string concatenation. However, if you're really going to (say) loop round and append another string, and another, etc - only needing the result as a string at the very end, then StringBuffer/StringBuilder are the way to go. Indeed, looping is really where StringBuilder pays off over string concatenation - the performance difference for 5 or even 10 direct concatenations is going to be quite small, but for thousands it becomes a lot worse - basically because you get O(N2) complexity with concatenation vs O(N) complexity with StringBuilder.
In Java 5 and above, you should basically use StringBuilder - it's unsynchronized, but that's almost always okay; it's very rare to want to share one between threads.
I have an article on all of this which you might find useful.

Rule of thumb is simple:
If you are running concatenations in a loop, don't use +=
If you are not running concatenations in a loop, using += simply does not matter. (Unless a performance critical application

In Java 5 or later, StringBuffer is thread safe, and so has some overhead that you shouldn't pay for unless you need it. StringBuilder has the same API but is not thread safe (i.e. you should only use it internal to a single thread).
Yes, if you are building up large strings, it is more efficient to use StringBuilder. It is probably not worth it to pass StringBuilder or StringBuffer around as part of your API. This is too confusing.

I agree with all the answers posted above, but it will help you a little bit to understand more about the way Java is implemented. The JVM uses StringBuffers internally to compile the String + operator (From the StringBuffer Javadoc):
String buffers are used by the
compiler to implement the binary
string concatenation operator +. For
example, the code:
x = "a" + 4 + "c"
is compiled to the equivalent of:
x = new StringBuffer().append("a").append(4).append("c")
.toString()
Likewise, x += "some new string" is equivalent to x = x + "some new string". Do you see where I'm going with this?
If you are doing a lot of String concatenations, using StringBuffer will increase your performance, but if you're only doing a couple of simple String concatenations, the Java compiler will probably optimize it for you, and you won't notice a difference in performance

Yes. String is immutable. For occasional use, += is OK. If the += operation is intensive, you should turn to StringBuilder.

But the garbage collector will end up freeing the old strings once there are no references to them

Exactly. You should use a StringBuilder though if thread-safety isn't an issue.
As a side note: There might be several String objects using the same backing char[] - for instance whenever you use substring(), no new char[] will be created which makes using it quite efficient.
Additionally, compilers may do some optimization for you. For instance if you do
static final String FOO = "foo";
static final String BAR = "bar";
String getFoobar() {
return FOO + BAR; // no string concatenation at runtime
}
I wouldn't be surprised if the compiler would use StringBuilder internally to optimize String concatenation where possible - if not already maybe in the future.

I think it relies on the GC to collect the memory with the abandoned string.
So doing += with string builder will be definitely faster if you have a lot of operation on string manipulation. But it's shouldn't a problem for most cases.

Yes you would and that is exactly why you should use StringBuffer to concatenate alot of Strings.
Also note that since Java 5 you should also prefer StringBuilder most of the time. It's just some sort of unsynchronized StringBuffer.

You're right that Strings are immutable, so if you're trying to conserve memory while doing a lot of string concatenation, you should use StringBuilder rather than +=.
However, you may not mind. Programs are written for their human readers, so you can go with clarity. If it's important that you optimize, you should profile first. Unless your program is very heavily weighted toward string activity, there will probably be other bottlenecks.

No
It will not use more memory. Yes, new objects are created, but the old ones are recycled. In the end, the amount of memory used is the same.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.