I've been searching through SO for awhile and all I can find is references to speed for strings and one or two rather misguided attempts at memory benchmarking.
My situation is that we have a ton of logging messages in our application and we're wondering if there is any measurable MEMORY advantage to using String.format vs. + vs. StringBuilder.
I've got a solid grip on measuring the time each of these is taking and there are plenty of SO posts for that.
Can anyone tell me which one is better for lowering memory consumption?
Example:
if(LOG.isDebugEnabled()) LOG.debug(String.format("Invoice id = %s is waiting for processing", invoice.getId()));
Since String.format() is much more complicated because it supports format sequences and data types (%s, %d etc) it is expected to be more performance and memory expensive. However I believe this may be significant for very long strings only.
I believe String.format would use less memory.
When using StringBuilder, you need to create a new builder object and then append string to it. The creation of the object and the constant references to it (via append or similar methods) would seem to me to be more memory intensive than to return a straight forward string using String.format.
StringBuilder uses a temporary object before creating the final string.
String.format seems like a more direct way and therefore less memory intensive.
Moreover, StringBuilder asks for a specific size when initialized (or else it will default to some value).
You could compare the default allocated memory values of a StringBuilder object versus a plain old String object.
You could test these two options with a large dataset of strings to be built and assessing the time it takes for each approach.
Hope this helps.
Related
I feel strings can replace character array in all the scenarios. Even considering the immutability characteristic of Strings, declaration of strings in appropriate scope and java's garbage collection feature should help us avoid any memory leaks. I want to know if there is any corner case where character array should be used instead of Strings in Java.
Character arrays have some slight advantage over plain strings when it comes to storing security sensitive data. There's a lot of resources on that, for example this question: Why is char[] preferred over String for passwords? (with an answer by Jon Skeet himself).
In general it boils down to two things:
You have very little influence on how long a String stays in memory. Because of that you might leak sensitive data through a memory dump.
Leaking sensitive data accidentally in application logs as clear text is much more likely with plain strings
More reading:
Why we read password from console in char array instead of String
https://www.codebyamir.com/blog/use-character-arrays-to-store-sensitive-data-java
https://www.geeksforgeeks.org/use-char-array-string-storing-passwords-java/amp/
https://www.baeldung.com/java-storing-passwords
https://javarevisited.blogspot.com/2012/03/why-character-array-is-better-than.html
https://javainsider.wordpress.com/2012/12/10/character-array-is-better-than-string-for-storing-password-in-java/amp/
String is a class, not a build in type. It most likely does what it does by using a char array underneath, but there is no guarantee. "We dont care how it is implemented". It has methods that make sense for strings, like comparing strings. Comparing arrays?? Hmm. Doesn't really make sense to do it. You could check if they are equal sure, but less or greater than...
Back in point. One scenario is you want to operate with chars, not a string. For example you have letters of the alphabet and want to sort them. Or grades in A-F system and you want to sort them. Generally where it makes sense having chars that are not connected to have some meaning together (like in a message string, or a text message). You would not generally need to sort the chars of a text message now, would you? So, you use an array.
To sort, you can take advantage of the Arrays.sort() method for example, while i dont think there is a method that does it for strings. Perhaps 3rd part libraries.
On another note(unrelated to question) , you can use StringBuilder to if you want to modify strings often. Its better at performace.
You don't have to look much further than at methods in the JDK core API that use char[].
Such as this one (java.io.Reader):
public int read(char[] cbuf)
throws IOException
Reads characters into an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
Parameters:
cbuf - Destination buffer
Returns:
The number of characters read, or -1 if the end of the stream has been reached
Throws:
IOException - If an I/O error occurs
Instead of returning a String they ask you to pass in a char[] to use as a buffer to write the result into. The reason is efficiency.
You might be knowing String is immutable and how Substring can cause memory leak in Java.
Since Strings are immutable in Java if you store password as plain text it will be available in memory until Garbage collector clears it and since String are used in String pool for reusability there is pretty high chance that it will be remain in memory for long duration, which pose a security threat. Since any one who has access to memory dump can find the password in clear text. Since Strings are immutable there is no way contents of Strings can be changed because any change will produce new String, while if you char[] you can still set all his element as blank or zero. So Storing password in character array clearly mitigates security risk of stealing password.
I'm using Nashorn on 1.8u60 to create model objects to pass back to view tier (thymeleaf). Part of the model object is a somewhat large string (not big enough to cause any issues in plain java) containing HTML. When trying to convert the object back into Java using ScriptObjectMirror methods i'm hitting the following exception. Changing max heap size doesn't seem to have any effect ( changed from 900mb to 1800mb, same error). I couldn't find much online about this, but are there any restrictions that Nashorn places on object sizes? I'm going to try latest 1.8 JDK now.
java.lang.OutOfMemoryError: Java heap space
at jdk.nashorn.internal.runtime.ConsString.flatten(ConsString.java:105)
at jdk.nashorn.internal.runtime.ConsString.flattened(ConsString.java:98)
at jdk.nashorn.internal.runtime.ConsString.toString(ConsString.java:69)
at jdk.nashorn.api.scripting.ScriptObjectMirror.wrap(ScriptObjectMirror.java:704)
at jdk.nashorn.api.scripting.ScriptObjectMirror.wrapLikeMe(ScriptObjectMirror.java:721)
at jdk.nashorn.api.scripting.ScriptObjectMirror.wrapLikeMe(ScriptObjectMirror.java:730)
at jdk.nashorn.api.scripting.ScriptObjectMirror.access$300(ScriptObjectMirror.java:64)
at jdk.nashorn.api.scripting.ScriptObjectMirror$13.call(ScriptObjectMirror.java:371)
at jdk.nashorn.api.scripting.ScriptObjectMirror$13.call(ScriptObjectMirror.java:364)
at jdk.nashorn.api.scripting.ScriptObjectMirror.inGlobal(ScriptObjectMirror.java:859)
at jdk.nashorn.api.scripting.ScriptObjectMirror.entrySet(ScriptObjectMirror.java:364)
...
Thanks,Adrian
That line reads
final char[] chars = new char[length];
so it appears there's indeed not enough memory for the final string. Nashorn uses ConsString as a way to amortize concatenation costs by delaying concatenation until the result is used (most JS engines use this optimization otherwise e.g. a concatenating lots of strings in a loop will require O(n^2) time).
This means that you might have a result of many + operators on strings be a tree of ConsString objects that get "flattened" at once. The tradeoff for linearizing the time of the concatenation is the need to keep those ConsStrings around, which'll require more than twice the memory required for the string (more than twice 'cause of the ConsString objects own overhead).
One way to get around this is to periodically invoke str.toString(). It is seemingly a no-op but internally it forces flattening of the concatenation tree. Try introducing it into your code at some point and see whether it helps.
In an application a String is a often used data type. What we know, is that the mutation of a String uses lots of memory. So what we can do is to use a StringBuilder/StringBuffer.
But at what point should we change to StringBuilder?
And what should we do, when we have to split it or to remplace characters in there?
eg:
//original:
String[] split = string.split("?");
//better? :
String[] split = stringBuilder.toString().split("?);
or
//original:
String replacedString = string.replace("l","st");
//better? :
String replacedString = stringBuilder.toString().replace("l","st");
//or
StringBuilder replacedStringBuilder = new StringBuilder(stringBuilder.toString().replace("l","st);
In your examples, there are no benefits in using a StringBuilder, since you use the toString method to create an immutable String out of your StringBuilder.
You should only copy the contents of a StringBuilder into a String after you are done appending it (or modifying it in some other way).
The problem with Java's StringBuilder is that it lacks some methods you get when using a plain string (check this thread, for example: How to implement StringBuilder.replace(String, String)).
What we know, is that a String uses lots of memory.
Actually, to be precise, a String uses less memory than a StringBuilder with equivalent contents. A StringBuilder class has some additional constant overhead, and usually has a preallocated buffer to store more data than needed at any given moment (to reduce allocations). The issue with Strings is that they are immutable, which means Java needs to create a new instance whenever you need to change its contents.
To conclude, StringBuilder is not designed for the operations you mentioned (split and replace), and it won't yield much better performance in any case. A split method cannot benefit from StringBuilder's mutability, since it creates an array of immutable strings as its output anyway. A replace method still needs to iterate through the entire string, and do a lot of copying if replaced string is not the same size as the searched one.
If you need to do a lot of appending, then go for a StringBuilder. Since it uses a "mutable" array of characters under the hood, adding data to the end will be especially efficient.
This article compares the performance of several StringBuilder and String methods (although I would take the Concatenation part with reserve, because it doesn't mention dynamic string appending at all and concentrates on a single Join operation only).
What we know, is that the mutation of a String uses lots of memory.
That is incorrect. Strings cannot be mutated. They are immutable.
What you are actually talking about is building a String from other strings. That can use a lot more memory than is necessary, but it depends how you build the string.
So what we can do is to use a StringBuilder/StringBuffer.
Using a StringBuilder will help in some circumstances:
String res = "";
for (String s : ...) {
res = res + s;
}
(If the loop iterates many times then optimizing the above to use a StringBuilder could be worthwhile.)
But in other circumstances it is a waste of time:
String res = s1 + s2 + s3 + s4 + s5;
(It is a waste of time to optimize the above to use a StringBuilder because the Java compiler will automatically translate the expression into code that creates and uses a StringBuilder.)
You should only ever use a StringBuffer instead of a StringBuilder when the string needs to be accessed and/or updated by more than one thread; i.e. when it needs to be thread-safe.
But at what point should we change to StringBuilder?
The simple answer is to only do it when the profiler tells you that you have a performance problem in your string handling / processing.
Generally speaking, StringBuilders are used for building strings rather as the primary representation of the strings.
And what should we do, when we have to split it or to replace characters in there?
Then you have to review your decision to use a StringBuilder / StringBuffer as your primary representation at that point. And if it is still warranted you have to figure out how to do the operation using the API you have chosen. (This may entail converting to a String, performing the operation and then creating a new StringBuilder from the result.)
If you frequently modify the string, go with StringBuilder. Otherwise, if it's immutable anyway, go with String.
To answer your question on how to replace characters, check this out: http://download.oracle.com/javase/tutorial/java/data/buffers.html. StringBuilder operations is what you want.
Here's another good write-up on StringBuilder: http://www.yoda.arachsys.com/csharp/stringbuilder.html
If you need to lot of alter operations on your String, then you can go for StringBuilder. Go for StringBuffer if you are in multithreaded application.
Both a String and a StringBuilder use about the same amount of memory. Why do you think it is “much”?
If you have measured (for example with jmap -histo:live) that the classes [C and java.lang.String take up most of the memory in the heap, only then should you think further in this direction.
Maybe there are multiple strings with the same value. Then, since Strings are immutable, you could intern the duplicate strings. Don't use String.intern for it, since it has bad performance characteristics, but Google Guava's Interner.
I read somewhere that the Java StringBuilder uses around 1 mb for 500 characters. Is this true and if so, isn't that a bit extreme? Is the StringBuilder doing some incredible things with this amount of memory? What is the reason and does this mean I should not make too much use of this class?
No, that's complete rubbish - unless you create a StringBuilder with a mammoth capacity, of course.
Java in general uses 2 bytes per char. There's a little bit of overhead in String and StringBuilder for the length and the array itself, but not a lot.
Now 1K for 500 characters is about right... I suspect that was the cause of confusion. (Either you misheard, or the person talking to you was repeating something they'd misheard.)
I have seen two cases where StringBuilder's tend to use large amounts of memory:
When the StringBuilder is created with an insane initial-capacity.
StringBuilder's who were "cached" to "save" object-allocation time.
So in the second case a StringBuilder might consume 1Mb of memory if some code, which used the SB earlier, stored a very big string in it. That's because it will only grow but not shrink its internal char-array.
Both cases can (and should) easy be avoided.
This information is erroneous, do you remember what the source of this information was? If so you should correct it. Java normally uses 2 bytes per character.
Because of the doubling reallocation 2K for 500 characters would also be right, but not more. Here is a similar question.
I think StringBuilder is the best choice to use. It is faster and safer too. It depends on the scenario. If you have String literal that doesn't change frequently then I would say String is a better choice because it is immutable else StringBuilder is right there. Now for the space that you are talking about I haven't heard that any where.
Strings are immutable, meaning, once they have been created they cannot be changed.
So, does this mean that it would take more memory if you append things with += than if you created a StringBuffer and appended text to that?
If you use +=, you would create a new 'object' each time that has to be saved in the memory, wouldn't you?
Yes, you will create a new object each time with +=. That doesn't mean it's always the wrong thing to do, however. It depends whether you want that value as a string, or whether you're just going to use it to build the string up further.
If you actually want the result of x + y as a string, then you might as well just use string concatenation. However, if you're really going to (say) loop round and append another string, and another, etc - only needing the result as a string at the very end, then StringBuffer/StringBuilder are the way to go. Indeed, looping is really where StringBuilder pays off over string concatenation - the performance difference for 5 or even 10 direct concatenations is going to be quite small, but for thousands it becomes a lot worse - basically because you get O(N2) complexity with concatenation vs O(N) complexity with StringBuilder.
In Java 5 and above, you should basically use StringBuilder - it's unsynchronized, but that's almost always okay; it's very rare to want to share one between threads.
I have an article on all of this which you might find useful.
Rule of thumb is simple:
If you are running concatenations in a loop, don't use +=
If you are not running concatenations in a loop, using += simply does not matter. (Unless a performance critical application
In Java 5 or later, StringBuffer is thread safe, and so has some overhead that you shouldn't pay for unless you need it. StringBuilder has the same API but is not thread safe (i.e. you should only use it internal to a single thread).
Yes, if you are building up large strings, it is more efficient to use StringBuilder. It is probably not worth it to pass StringBuilder or StringBuffer around as part of your API. This is too confusing.
I agree with all the answers posted above, but it will help you a little bit to understand more about the way Java is implemented. The JVM uses StringBuffers internally to compile the String + operator (From the StringBuffer Javadoc):
String buffers are used by the
compiler to implement the binary
string concatenation operator +. For
example, the code:
x = "a" + 4 + "c"
is compiled to the equivalent of:
x = new StringBuffer().append("a").append(4).append("c")
.toString()
Likewise, x += "some new string" is equivalent to x = x + "some new string". Do you see where I'm going with this?
If you are doing a lot of String concatenations, using StringBuffer will increase your performance, but if you're only doing a couple of simple String concatenations, the Java compiler will probably optimize it for you, and you won't notice a difference in performance
Yes. String is immutable. For occasional use, += is OK. If the += operation is intensive, you should turn to StringBuilder.
But the garbage collector will end up freeing the old strings once there are no references to them
Exactly. You should use a StringBuilder though if thread-safety isn't an issue.
As a side note: There might be several String objects using the same backing char[] - for instance whenever you use substring(), no new char[] will be created which makes using it quite efficient.
Additionally, compilers may do some optimization for you. For instance if you do
static final String FOO = "foo";
static final String BAR = "bar";
String getFoobar() {
return FOO + BAR; // no string concatenation at runtime
}
I wouldn't be surprised if the compiler would use StringBuilder internally to optimize String concatenation where possible - if not already maybe in the future.
I think it relies on the GC to collect the memory with the abandoned string.
So doing += with string builder will be definitely faster if you have a lot of operation on string manipulation. But it's shouldn't a problem for most cases.
Yes you would and that is exactly why you should use StringBuffer to concatenate alot of Strings.
Also note that since Java 5 you should also prefer StringBuilder most of the time. It's just some sort of unsynchronized StringBuffer.
You're right that Strings are immutable, so if you're trying to conserve memory while doing a lot of string concatenation, you should use StringBuilder rather than +=.
However, you may not mind. Programs are written for their human readers, so you can go with clarity. If it's important that you optimize, you should profile first. Unless your program is very heavily weighted toward string activity, there will probably be other bottlenecks.
No
It will not use more memory. Yes, new objects are created, but the old ones are recycled. In the end, the amount of memory used is the same.